Local time: Friday, 05-Dec-2008 12:13:30 EST
Last update: at /special/conference/apwww95 , Friday, 21-May-2004 09:46:38 EST

Market Based Solutions to Resource Discovery

Andrew Jennings, Simon Cleary, Chong Wai & Senthil Kumar

Computer Systems Engineering
RMIT, Melbourne
Australia

ajennings@rmit.edu.au
http://www.cse.rmit.edu.au/~rdsajj/web95/market.html

Abstract
Resource discovery is generally framed in a similar fashion to searching for information in a library catalogue, essentially a utility view of resource discovery. We consider an alternative view, of resource discovery as information trading on a network. This allows for new roles for both users, system developers and advertisers. Through network modelling we show that the introduction of intermediaries in the network can produce significant economies, and we therefore explore new ways of encouraging the growth of intermediaries. We demonstrate three new services for trading information of differing value, from high end services such as stock quotation to low end services such as gossip. The simulations and service demonstrations show that a market based approach is both scalable and feasible.
Keywords
resource discovery, advertising, network efficiency, information retrieval, user modelling

1. Introduction

The prevailing view of resource discovery is based on the utility model. It is assumed that resource discovery tools will be provided as central utilities for searching Web space. The publishing model is that individuals both author and promote their Web pages. In many ways the existing tools adopt the model of the library catalogue, which is perhaps not surprising given the research model adopted for the early development of the Internet.

We argue that this model, this approach, is now of limited use. The sheer volume of information on the Web predicates against it. Most significantly it does not differentiate between the value of information to users. For highly timely stock information a user may be prepared to pay high charges, but for gossip only a very low fee. Instead we adopt a trading approach where we encourage the growth of intermediaries. Bowman [1994] provides a comprehensive view of the research problems associated with resource discovery, and provides a good setting for our work.

There is an enormous variety of Web crawlers and information retrieval tools available. The Web robots page [Robot 1995] gives links to most of the current systems. Closely associated is the work on information filtering that is surveyed in [Oard 1995]. Most prominent amongst resource discovery systems is the Harvest system [Harvest 1995].

It is also clear that much information traverses the network many times at the request of users [Schwartz 1993]. One possible solution is to employ extensive network caching, but this begs the question of who purchases and maintains the caches? Our approach instead adopts the view that information brokering and distribution is an emerging industry. Within that industry we should encourage a framework that encourages participation in the market, that allows for world wide specialisation and delivers higher quality systems to users. This work is based extensively on the pioneering work Cocchi [1993] and Mackie-Mason [1994] in forming a better understanding of network economics.

The structure of the paper is as follows. In the next section we present the results of a study of the value of intermediaries in network commerce [Chong Wai 1995]. These results form the basis for development of new services in the following section. Finally we discuss the implications of this work for publishing on the Web.

2. Efficiency

Here we present some results of network modelling. The model estimates average flows on the network with typical media content: the details of the model are presented in Appendix 1, together with references for further details of the methodology used to develop the model. We take a network of buyers and sellers who communicate directly, and then compare the overall network cost with the introduction of intermediaries. Why should we be concerned with network efficiency? In the current environment bandwidth is "free", so there is only a limited attempt to consider the cost. Perhaps this view is based on a likely future where bandwidth is trivially cheap. However as Cocchi [1994] argues this is essentially trading cost for congestion. At present we tolerate high congestion in avoiding supply and demand based pricing for bandwidth. It is unlikely that this situation can scale by several orders of magnitude without some sort of usage based charging.

We consider two types of intermediary. Brokers act on behalf of sellers, and they seek out customers for the sale of information. Agents act on behalf of buyers and seek to satisfy information queries for users. In the next section we describe in more detail the role of these intermediaries and contrast their behaviour with the brokers for the Harvest system [Harvest 1995].

The motivation for the simulation is to attempt to determine efficiencies that can be gained by introducing market based intermediaries. If there are economies then this justifies pursuing the development of services based on this approach.

From detailed simulation we arrive at the following results which show the relative network cost with intermediaries:

The modelling results show clearly that by fostering intermediaries there are dramatic economies. Since all participants in the market share the overhead cost of communicating, this represents an opportunity to develop a new industry based on the trading of connectivity information. We are already familiar with traditional publishing, but this represents a new commodity: the identification of semantic connectivity between Web pages. At present this commodity is only traded by centralised searching machines, but there is no reason why it cannot be traded by a large number of market participants.

So far we have not differentiated between the value of information to the user: in the next section we argue that new services should be developed on the basis of information value. In market based solutions the network efficiency gains are available to all participants, and are not exclusively the province of the network operator.

3. Three Services

Here we consider three examples of the possibilities for market based mechanisms. We highlight the variation in mechanisms based on the value of the information being traded. In the current Web there are very few value adding links, only some general guides and page collections. If we wish to foster a second layer of open, responsive and commercially viable information organisers then these are the sort of services that we believe should be considered.

High Value Services: Brokers and Agents

As the detailed network efficiency study above shows, there is ample scope for efficiency improvements in the use of the current network. In cases where there is high value information at issue, then brokers and agents should be considered. Schwartz [1993] outlined the concepts of brokers and agents for the Internet prior to the widespread use of the Web. In our study above we have adopted these concepts with some minor modification. In particular we do not assume that brokers create pointers to Web pages, we assume that they cache the page, or even further clone the page. We advocate that page cloning should include all pages from the originating host in addition to local links: the intent is to avoid network transfer as far as possible

Why do we advocate cloning pages? This arises from our discussion above concerning network efficiency. If the original page is copied, together with immediately linked pages, then there is no need to refer back to the original. In a regime where bandwidth is "free" there is no incentive for users to be economical in its use, but we anticipate that this regime cannot persist. In an environment where bandwidth is charged back to users, there is a strong incentive to avoid transporting pages, especially when that transfer is across international links. In our view of the Web as a marketplace, the use of cloning gives dramatic gains in efficiency, so there should be substantial financial advantage to the creation of broker sites that specialise in high value information.

In a demonstration service we present a page provider with the option of directly cloning their pages onto a broker machine. Once the page URL is known, this can be accomplished directly by a cgi Perl script. In implementing this process we have to address the depth of relative link that we wish to include in the cloning process: we should certainly clone all relative links, and perhaps all pages from the originating site, but generally we will not clone pages from linked sites or applications. Once the pages are cloned we are faced with a distributed update problem to ensure that cloned pages remain current.. A charging regime related to volume of information cloned multiplied by the frequency of updates can address these issues, in that an information provider with very large pages with a large number of links, updated often will pay a higher charge for cloning services. Brokers should charge a percentage of each sale to their participating vendors, with competition between brokers acting to keep these percentages within reasonable bounds.

This service of brokers that provide page cloning services is a direct attempt to address the popularity of high value information such as stock price quotation and highly timely news information. We argue that this approach is both scalable and feasible, in that enough brokers and agents will arise as market demands grow, and if too many of these sites are provided on the network then some will cease business through the natural process of attrition. This is in direct contrast to the approaches based on caching, where the central problem is who pays for the caching resources: the user or the provider? Since providers are typically offering commercial services it seems more logical for providers to meet this cost. If we view the network as a utility then services such as caching are to be provided by the network operator (who benefits through improved operating efficiency also), but if we view the network as a marketplace then the cost of caches must be carried by the market participants.

In order to provide a full service, brokers also need functions such as directory creation, special guides and feature highlights. Our development is aimed at providing these facilities at a low cost, for example directory creation is largely automated. We propose that for privacy reasons brokers should be prohibited from interacting directly with customers, instead they should deal with agents who act on behalf of users.

Our detailed modelling outlined in the previous section shows that for optimum network efficiency there should be a large number of Agents acting on behalf of users. Agents take user requests and search amongst the brokers to best satisfy these requests. They are designed to perform the Web crawling tasks required to search the information space ([Harvest 1995] [Oard 1995] [Robot 1995]). The value they add for the user is in reducing the number of Web pages the user has to examine in order to meet the request. There are two possible charging modes that can be employed: a subscription service for low value requests or a direct percentage fee for high value requests.

Of course the problem of filtering of information to satisfy a request is not new. Publishers and librarians have been tackling this problem since the beginning of printed information. There is now a new aspect to this task in that a large proportion of content is available directly to be searched, allowing for full text searching of articles. Whilst at first glance this might appear to provide dramatic improvements, there are still fundamental obstacles to easily matching user requests. The variety and variability of language is not overcome simply by allowing full text searching. Intensive experimental research shows that this intuition is quite at odds with reality. See [Furnas 1987] and [Gomez 1990] for a profoundly revealing demonstration of these difficulties. At the present stage of Web development we are perhaps seeing a world wide demonstration of these experiments, with the difficulty they face using large scale Web crawlers.

Typically there are two modes of searching to be provided: browsing and direct searching. Browsing is best guided by an overall impression of user interests. In our case we use a declared user model, but it is also possible to use an implicit model based on cached Web pages. However most challenging is to provide direct searching mechanisms that allow rapid association between pages, allowing a user to quickly navigate between related pages. Our experience in developing such systems [Jennings 1993] is that computationally intensive user models are not justified, instead we employ a simpler approach based on fuzzy association.. The model is summarised in Appendix 2.

How should brokers and agents communicate? In the interests of an open market we should prefer a standardised scheme of messages. Although we propose differing roles for brokers and agents, this issue is almost identical with those faced by the Harvest system [1995]. Solutions based on the news protocol or multicasting can be considered. As we have shown in our analysis of the previous section it is important to keep these overheads to a minimum when we wish to allow a large number of brokers and agents.

In the proposed high value service we do not allow direct communication between brokers and buyers. This has advantages in protecting user privacy. At present many information providers are in the process of keeping extensive logs of user behaviour. Although such logs may seem innocuous, it represents considerable surrender of autonomy by users. There are many situations where the interests of brokers and users may be in conflict, we avoid this by ensuring that brokers act exclusively in the interests of sellers and agents act exclusively in the interests of buyers.

Our direction of development of high value information sites shares much in common with the Harvest system as proposed by Schwartz [1993] and detailed further in [Harvest 1995]. This work provided the inspiration for our efforts, but there are some important differences. Our emphasis is on direct economies by cloning pages rather than by establishing pointers, and our approach to user navigation incorporates our experiences in experimental trials of filtering systems [Jennings 1993] to provide highly flexible navigation.

Association Services: Magazines

Most Web magazines do not attempt to extend the medium to include new possibilities, except for the extensive inclusion of links to related material. It is natural to attempt to stretch the medium further and to attempt to extend the notion of what a magazine can be on the Web. Magazines typically provide information of variable value for a very large audience: they can serve to highlight sources of information that the reader may not encounter in other reading.

If we take a magazine as a collection of articles by featured writers, then we can delineate several important roles that may be played by either network applications or by individuals. An editor is responsible for matching articles to an audience and ensuring that quality is maintained. Within the magazine itself, some authors serve other roles of importance. Navigators show readers pointers to other parts of the Web that are of direct relevance. Some applications such as "What's New" and "What's Hot" serve this purpose directly. Advisors answer reader queries on particular questions.

How can editors, navigators and advisors derive a suitable fee for their services? At present they must join a particular magazine, but in the future there may be specialist agencies that hire navigators and advisors and collectively advertise their services.

How to deal with copyright issues? It is always assumed that once a document is published on the Web then all copyright is lost, and this is taken as an argument to hide publications behind Mastercard or Visa walls, where the reader must deposit a payment before opening the page. However in many ways the Web is more secure against copyright violations than a photocopier. How can this be so? If a person copies a Web page and then attempts to publish it directly then it can be detected immediately by Web crawlers and searching programs. The author of the article can then directly take legal action to protect violations. It is not difficult to imagine legal firms specialising in attacking these violations: it is much less difficult than searching for photocopies.

As the Web grows we can expect that navigation information will acquire higher value. The growth of isolated "islands" behind an online service is an attempt to deal with the confusion of many tens of millions of Web pages. There are alternative solutions based on a careful mixture of network services combined with skilled human assistance. It is to be hoped that skilled navigators can add their value to the network by being able to make a decent living from providing this service. Using the broker and agent framework we have described above, it is possible to construct network applications that serve the purpose of navigators and advisors. It is interesting to note that in the context of highly variable information value there will perhaps be the greatest role for dedicated professionals.

Gossip: Web Interactive Talk

Web Interactive Talk [WIT 94] is a development of the multiple user talk systems such as the MUD and like systems. A topic is proposed for discussion and posted to the system. Users can then respond with further comments and propose new topics. It is in essence a multiple party chat line with no editorial control over the content.

How can we add value to such a channel? In a similar manner to Internet news there is a large volume of discussion, very little of which is of interest to a particular user. One means to add value to such an information channel is to hunt for brief discussion sequences that are of interest. If a user volunteers a text description of his interests, in the form of a structured user model, then this can be used with a matching engine to highlight recent conversation streams that match to the topic.

Here the user offers a user model as a basis for filtering the talk sessions. It seems most sensible to provide this as a labelled Web page linked to the user's home page.. The user model is a structured textual description that conforms to a set of prescriptions concerning topics and threads of interest. A matching service aims to provide a high degree of commonality between the conversational thread and the user's interests. The matching process works more effectively if the structure is correctly followed by the user, but it is also simply possible to use a free text user model.

Given the low level of value in most conversational services, such a service will be of interest only if the charging can be kept at a very low level. Perhaps the only feasible approach here is to offer the mail service as a subscription service. For a low fee over a time period, email messages are sent to the user when the profile indicates a thread or topic of value. If a user wishes to change his or her profile it is simply a matter of editing the Web page that describes the user model. Experimental news servers based on user models are available [NW 95] and this service is in a similar vein to information filtering of news. The use of a Web page as a user model makes this process much more transparent and avoids the need for expensive implicit user modelling [Jennings 1992]. An experimental version of this service is under development.

4. Discussion

In many ways the Web culture represents an idealistic view where bandwidth is plentiful and network congestion is banished. We argue that this view mitigates against the development of viable network services, and that when users access the "free" bandwidth and experience congestion they will be easily persuaded to join a proprietary network. This is in essence a question of how you value your time. If you can wait 3 minutes for a page to download then perhaps this is not an issue for you, but it will certainly be an issue for a high proportion of users. Only by developing viable services that address the reality of finite resources can an industry prosper.

The decision between utility based solutions and market based approaches is perhaps a choice between economies of scale and solutions that are highly adapted to specialised markets. We have already seen the emergence of powerful searching engines that allow full text searching, are they scalable and feasible? Our approach advocates a proliferation of specialised brokers and magazines that represent a fragmentation of this centralised approach. Is this an appeal for a cottage industry based information trading industry? Our argument in favour of a specialised approach rests primarily on the research and experience in development of information filtering systems. As we have emphasised, the common sense argument is that extensive searching combined with full text capability will satisfy the vast majority of user needs. However the research and experience both strongly contradict this common sense view. Perhaps a glance at the proliferation of paper magazines on the average newstand may be enough to illustrate the fundamental laws of information entropy.

At present the Web publishing arena is dominated by the traditional publishers at one extreme, and an enormous number of individual publishers at the other extreme. The solutions we have presented aim to fill the void between these two extremes.

5. References

[Bowman 1994]
Bowman, C.M., Danzig, P., Manber, U. & Schwartz, M. "Scalable Internet Resource Discovery: Research Problems and Approaches" Communications of the ACM, Vol. 37, No. 8, pp. 98-107, August 1994

[Chong Wai 1995]
Wai, C. & Jennings, A. "The Value of Intermediaries in Network Commerce" ftp://www.cse.rmit.edu.au/~rdsajj/int_nc.ps
[Cocchi 1993]
Cocchi, R., Shenker, S., Estrin, D. & Zhang, L. "Pricing in Computer Networks: Motivation, Formulation and Example" IEEE/ACM Transactions on Networking, Vol. 1, No. 6, pp. 614-627, 1993
[Elwalid 1993]
Elwalid, A.I., Mitra, D "Effective Bandwidth of General Markovian Traffic Sources and Admission Control of High Speed Networks" IEEE/ACM Transactions on Networking. Vol.1, No.3, pp.329-343, 1993
[Furnas 1987]
Furnas, G.W., Landauer, T.K. et al "The Vocabulary Problem in Human-System Communication" Communications of the ACM, Vol. 30, No. 11, pp. 964-971, 1987
[Gomez 1990]
Gomez, L.M. & Lochbaum, C.C. "All the Right Words: Finding what you want as a function of Richness of Indexing Vocabulary" Journal of the American Society for Information Science, Vol. 41, pp. 547-559, 1990
[Guerin 1991]
Guerin, R., Ahmahi, H., Naghshineh, M., 1991 'Equivalent Capacity and Its Application to Bandwidth Allocation in High-Speed Networks', IEEE Journal on Selected Areas in Communications, vol.9, no.7, pp.968-981.
[Harvest 1995]
Harvest System Home Page http://harvest.cs.colorado.edu
[Jennings 1992]
Jennings, A & Flower, M. "A Multimedia Shop" Proceedings of the First International Interactive Multimedia Symposium, Perth pp. 573-583, 1992
[Jennings 1992]
Jennings, A & Higuchi, H. "A Personal News Service based on a User Model Neural Network" IEICE Transactions on Information and Systems, Vol. E75-D, No. 2, pp. 198-209, 1992 ftp://www.cse.rmit.edu.au/~rdsajj/brow.ps
[Jennings 1993]
Jennings, A. & Higuchi, H. "A User Model Neural Network for a Personal News Service" User Modeling and User Adapted Interaction, Vol. 3, pp. 1-25, 1993
[Johnson 95]
Johnson, M.J & Mamdani, E.H. "Feedback In Internet Resource Discovery Systems: SIMON" http://www.elec.qmw.ac.uk/simon/irdpaper/ird-paper_1.html
[Larsen 1993]
Larsen, H.L. & Yager, R.R "The Use of Fuzzy Relational Thesauri for Classificatory Problem Solving in Information Retrieval and Expert Systems" IEEE Transactions on Systems, Man and Cybernetics, Vol. 23, No. 1, pp. 31- 41, 1993
[Mackie-Mason 1994]
Mackie-Mason, J.K. & Varian, H.R. "Some Economics of the Internet" Tenth Michigan Public Utility Conference, Western Michigan University, March 25-27, 1994
[Mackie-Mason 1994a]
Mackie-Mason, J.K. & Varian, H.R. "Pricing Congestible Network Resources" http://gopher.econ.isa.umich.edu/EconInternet/Pricing.html
[NW 95]
News Weeder http://anther.learning.cs.cmu.edu/ifhome.html
[Oard 95]
Oard, D. Text Filtering http://www.ee.umd.edu/medlab/filter/filter_project.html
[Povey 1995]
Povey, D. "Distributed Internet Cache" ftp://www.psy.uq.edu.au/~dean/project/protocol.html
[RDW95]
Resource Discovery Workshop 1995, CSIRO, 723 Swanston St, Carlton, Melbourne Tuesday 11th July 1995 http://www.dstc.edu.au/RDU/rdw95/
[Robot 95]
List of Robots http://web.nexor.co.uk/mak/doc/robots/active.html
[Schwartz 1993]
Schwartz, M. "Internet Resource Discovery at the University of Colorado" IEEE Computer, Vol. 26, No. 9, pp. 25-35, 1993
[WIT 94]
Web Interactive Talk http://www.w3.org/hypertext/WWW/WIT/User/Overview.html

Appendix 1 Network Flow Estimation

There is an extensive research literature devoted to estimating the required bandwidth of a stochastic traffic source in the network. Guerin et al. proposed a simple approximation for the equivalent capacity or bandwidth requirement of a single or multiplexed connection on the basis of their statistical characteristics [Guerin 1991]. His computation is the combination of two different approaches; one based on a fluid-flow model and the other on an approximation of the stationary bit rate distribution. Elwalid and Mitra offer the computation of the effective bandwidth of a general Markovian traffic source [Elwalid 1993]. In this paper, we use this general Markovian source to estimate the bandwidth requirement for each traffic source.

In order to estimate the required bandwidth for each seller, we propose an example of a seller distributing multimedia advertisements to potential buyers. The advertisement consists of 5 seconds of video with CD-audio quality of sound, 10 seconds of voice annotated high resolution image and 20 seconds of voice annotated text. This is considered representative of advertisements on a commerce oriented network.


State     Type of information      Mean bit rate in     Mean Holding Time  
Mbps                 in s               

1         video/audio              1.484                5                  

2         voice annotated          0.512(*)             10                 
          high-resolution image                                            

3         voice annotated text     0.0323               20                 



*2000x2000 resolution, 12 bits/pixel

In a long run average, we assume that each seller is a generic Markovian source with multiple states correspond to different bit rates.

Each traffic source is considered as a fluid source of a constant rate when in state s. For a given buffer size B and the overflow probability p, w can derive the equivalent bandwidth e, of a Markovian source is approximately equal to the maximal real eigenvalue of an essentially non negative matrix.

Appendix 2 Matching Mechanisms

The variability of language requires that we make extensive use of Thesauri, or semantic association between words, when attempting to resolve a user query. Within a document collection we can model this as a set of implications [Larsen 1993] where an association gives an alternative meaning or interpretation.

Here we have shown the two Thesauri: the user's associations, and the document collection associations. To satisfy a user query we need to consider a range of associations from both models. These are constructed on the basis of statistical information gathered from the user and the document collection. Larsen's model allows us to directly calculate these associations by calculating in advance the strongest association paths between all words in both models. This is based on a max-star partition of the graph using the Floyd-Marshall algorithm. To satisfy a user query we look for the strength of connection between the user's terms and the indexing terms that appear in the document. As illustrated above, the use of a term "wheels" is related to a term "touring".

It is important to be able to rank documents without extensive computation to allow users to quickly search through large collections. With these paths stored we can compare different terms for searching directly through table lookup. Our experience in field trials of filtering systems [Jennings 1993] argues strongly that a rapid search capability is highly valued by users.


[Return to Table of Contents]
COPYRIGHT © 1995 by AUUG95 and APWWW95 Charles Sturt University. ALL RIGHTS RESERVED. ISBN 1 875781 43 9