AntWorld
Vladimir Meñkov - David J. Neu - Qin Shi
SCILS, Rutgers University
4 Huntington St, New Brunswick, NJ 08903
vmenkov@cs.indiana.edu, djneu@acm.org
http://aplab.rutgers.edu/ant/
The amount of information currently available on the World Wide Web or in specialized full-text databases (e.g., patents, legal documents) often makes it difficult to find useful documents using only purely automatic techniques, such as the keyword search. Finding information on a particular specialized topic--e.g., ``How does the Canadian tax law treat a Roth IRA?'' or ``How to use unsigned JavaScript in a layer within a signed HTML document?'' may require the human user to apply his or her intelligence in various ways: trying several different queries with various search engines, reading a number of documents found by a search engine to determine if they are relevant, following sequences of links, learning new terms to refine one's queries, etc. Thus, if a number of people with common interests and needs often perform web searches on related topics, much effort could be saved with a system that makes use of the intelligence of all its previous users to guide the current user to documents that are likely to be useful for his goals.
AntWorld [1,7], developed at Rutgers University and publicly available at our web site http://aplab.rutgers.edu/ant/, is the result of our attempt to design a tool which harnesses the expertise of the members of a common-interest group, as displayed by their evaluation of documents encountered while searching. It stores users' judgments about the documents they find, and uses this information to guide other users to pages they may find useful.
Our design criteria included the following:
In accordance with Criterion 1, the main concept behind AntWorld is that of an information quest: a persistent information-searching process [6], identified by a numerical ID. Each quest is owned by a user, and a user can switch between his quests at will. When the user starts a quest, he provides AntWorld with an explicit description of his goals. This description can be as short as a typical search engine query (just a few words), or it can contain several sentences. As the user keeps browsing the web, he is encouraged to tell AntWorld how well the documents he finds answer his question.
Since the user has to explicitly judge a page in order for his opinion to be recorded, our technique cannot collect as much data as systems that infer the user's opinion based on indirect parameters, such as the time spent viewing a document (e.g. [4]). Motivating users to make judgments, and not merely to use judgments made by other people, may be an important issue [8]. On the other hand, we hope that the data collected via explicit judgments are more reliable.
AntWorld uses a client-server architecture (Fig. 1). On the client side, as described in Sec. 3, each participating user's web browser collects the user's judgments and provides a variety of user aids: ranked lists of suggested web pages, marked links to suggested pages from the the page the user is currently viewing, etc. Each client talks to the Organizational AntWorld Server (OAWS), which is shared by the members of the user community (``organization'') the user belongs to. The OAWS stores in its database profiles of all AntWorld quests ever run by the members of the community. Every time a new page judgment arrives from a client, the OAWS updates the quest profile, performs quest matching and link scoring (Sec. 5), and finally sends the updated suggestion list back to the client. The OAWS is implemented using Java servlets and a Sybase SQL server database; relevant technical details and installation instructions can be found at the project web site [1].
![]() |
Although as of Spring 2000 there is just one Organizational AntWorld Server in active use, we expect that an organization or user group whose members desire to share their web searches will usually want to set up an OAWS of its own, to run on a computer within the organization's computer network (and within the organization's firewall, when applicable).
In a world with multiple organizations using AntWorld, we envisage three-level architecture for the entire system. The administrators of the participating Organizational AntWorld Servers may choose to periodically exchange their quest profile data via one specially designated OAWS: the AntWorld Central, using tools included into the AntWorld server distribution. For privacy reasons, exported quest data will normally be anonymized, so that they won't be identifiable with a particular person or e-mail address.
To satisfy Criterion 3, AntWorld's user interface is designed as a ``browser assistant''. The user can navigate as usual in one browser window (the document window, seen in the lower part of the screen shot in Fig. 2), while another, smaller, window (the console window, in the upper part of Fig. 2) opens when the user starts (or resumes) a quest, and stays on the screen as the user works, providing two-way communication with the Organizational AntWorld Server. On the one hand, the console allows the user to access the suggestion list, generated by the OAWS based on the information AntWorld has about the user's current quest and all other quests. On the other hand, it lets the user judge (and, optionally, annotate) the page he is viewing in the document window. The user can also use the console to submit the quest description to a search engine as a query, switch to a different quest, start a new quest, view the quest summary (the complete list of judgments and annotations made during the quest), view the list of similar quests known to AntWorld, edit the description of the current quest, or tune the method AntWorld uses for finding similar quests. The front-end console functionality has been implemented using HTML and JavaScript, while most of the actual computation is done on the server side.
![]() |
Similar ``browser assistant'' programs have been developed for other purposes elsewhere, for example by Chakrabarti et al [3], where the console is implemented as an applet rather than a JavaScript program.
Implementing the client as a Java applet would have offered us the advantage of a more powerful, compilable language. However, we did not pursue the applet approach beyond our 1998 version, primarily because at the time no mechanism for putting a standard Java Virtual Machine in a user's browser was available.
Web browser designers have long recognized that it is useful to mark (usually, by a changed color) links pointing to documents that the user have already accessed. AntWorld goes one step further, and allows an option for ``ant marking'': marking all links to ``relevant'' documents (i.e., those that appear on the current quest's suggestion list), by inserting a small ant-shaped icon next to them. One such icon is seen in Fig. 2, near Alta Vista search result No. 20. In our experience this feature helps a user not only to collaborate with other people but also to ``collaborate with oneself'', because it lets him quickly identify those of the previously useful links that he actually found useful for this quest or a related one.
To mark links, the system must parse all HTML files requested by the user, and modify them when needed. Doing this at the OAWS is not scalable, and intervening network delays can negatively impact system usability.
Our solution is to supply each user who wants the ant marking functionality with a small client-side application (``Antscape''), whose main component is the Local Ant Proxy Server (LAPS). Antscape works like a wrapper around Netscape. It can be used to modify the Netscape preference file, for the duration of the session, so that the browser will forward all HTTP and FTP request to the LAPS, as shown in Fig. 1. The LAPS analyzes the content type of each document received, and if it is an HTML document, parses it for links.
Since all pages requested by the browser pass through LAPS, using LAPS also allows us log on the organizational server each page the user views, and not only those he explicitly judges. Although these ``user itinerary'' data are not as important for AntWorld as page judgments, they are used in some quest matching methods (See ``Using LAD'' in Sec. 5) and in usability studies.
LAPS needs to have a significant amount of knowledge about the general AntWorld setup (where the OAWS is) and the current quest (the quest ID to log the pages under; the suggested URL list). It receives this information from the OAWS. Fixed-length data are transmitted via additional headers that the OAWS includes in HTTP responses it sends to the browser when the user starts a quest, switches to another quest, or judges a page. Putting the suggestion list into HTTP headers would be impractical because of its size; instead, when the OAWS sends the suggestion list to the browser (as a JavaScript array in an HTML document), it simply indicates that fact in an HTTP header, and the LAPS parses the document body to extract the suggestion list.
Although Antscape is written in Java, which is a highly portable language, it needs to have some idea of how Netscape is installed on particular platform in order to be able to interact with it. The currently available version of Antscape works with UNIX and Microsoft operating systems. It needs no special configuring on systems where Netscape is installed in the most standard fashion, but on other machines some amount of configuring is required, depending on how the local Netscape browser has been installed and configured. In certain cases (e.g., where a local site administrator uses Netscape Mission Control to ``prohibit'' browsers from ever using a proxy server), LAPS cannot be used at all.
In accordance with Criterion 4, using the Antscape client is optional: we want most web users to be able to ``test-drive'' AntWorld without installing additional software on their desktop.
If AntWorld is to be a useful tool for collaboration among members of a common-interest group, it needs an effective mechanism for making the user aware of the quests most similar to his current quest and guiding him to the pages that the owners of those quests found useful. To accomplish these goals, AntWorld uses a framework that consists of two stages: quest matching (QM) and link scoring.
Quest matching means finding quests in the AntWorld database
which are ``similar'' to a given quest, and ranking them according to
this similarity. This operation requires some quest similarity
measure,
,
which reflects how relevant quest
Q2 may be for a person whose goals and judgment history are
encapsulated in quest Q1. Most quest matching methods employed in
AntWorld are dynamic: they update the list of similar quests
list each time the owner of Q1 makes a judgment, thereby providing
AntWorld with additional information about his goals and preferences
in this quest.
Although an AntWorld user has direct access to the lists of similar quests, it is also important to provide him with a list of suggested URLs (``links''). This ranked list is available to the user both directly in his AntWorld console and, if the Antscape client is being used, as ant marks (Sec. 4). The suggestion list consists of URLs that were judged as useful in some of the quests that AntWorld believes relevant to the user's current quest. We refer to the process of compiling this ranked list as link scoring. A link scoring method similar to the one currently used by AntWorld has been presented in our earlier report [6].
For the purposes of a vector model, any query q or documents d can
be represented as a vector of term frequencies
or
,
,
where n is
the number of distinct terms in the document collection, and fi is
the term frequency, i.e. the number of occurrences, of term ti in
the document or query in question. These vectors are usually very
sparse, since every single document contains only a fraction of all
n terms from the collection. In a document-ranking method using a
vector model with a cosine similarity measure, the relevance of
document d for query q is computed as
AntWorld quest matching does not distinguish
between ``queries'' and ``documents'', since it measures the
similarity between entities of the same nature: the current quest
Q1 and some other quest Q2. Thus a symmetric similarity measure
seems more intuitive.
Such a symmetric measure can be obtained if the vectors
and
in (1) are computed by the same
formula, resulting in
An important vector for several AntWorld's quest matching methods is
.
This vector has a non-zero component for every term
t that appears in any of the documents in the profile of Q (the
set of terms we will refer to as
). The value of the
component
component is based on the frequencies of t in
the underlying documents and the importance of those documents for the
quest. All quest matching methods use the same term weighting
function g(t). Both
and g are fully described in our
earlier report [6], where we presented a quest matching
methods extending the document matching method proposed by Singhal
[9].
Some of AntWorld's quest matching methods, identified by their internal names, will be briefly described below.
The only difference between this method and the one in [6] is the denominator, which was absent in the earlier report. As one might expect, introducing this normalization did improve the quality of quest matching.
This quest matching method is dynamic:
and the
list of quests similar to Q1 are updated every time the owner of Q1makes a judgment.
Some of AntWorld's more interesting quest matching method are loosely
based on Logical Analysis of Data (LAD) [2]. LAD computes,
for two given sets of documents (the positive and negative training
sets), the cut set
:
a short set of terms that
best discriminate between the documents from the two sets. In the
AntWorld context, the positive training set for a quest Q consists
of the quest's explicit description and the documents that have been
judged useful, while the negative set includes those that have been
judged useless, or visited but not judged at all. Our current use of
LAD is rather simple: we use the quest's cut set
,
which is typically much smaller than the quest profile term set
(10-30 terms vs. 102-103), to reduce the cost
of computing dot products (1) or (2).
For example, the method Relevance13 uses a symmetric dot product
quest similarity measure (2) with
:
the vector
restricted to the terms that are in
the cut set
.
While this greatly reduces the scope of
the summation (2)--from
to
,
it also reduces
the quest matching recall, i.e. it results in some relevant quests not
being selected. Another disadvantage of this approach is that making
this quest matching method dynamic would require recomputing
after every judgment--a rather expensive
proposition.
A compromise approach, implemented as the method Relevance07,
restricts only
to the cut set
,
resulting in
Although the similarity measure used in this method is not truly
symmetric, we make it obey (3) by the expedient of always
computing
using full
instead of
.
Currently we are also experimenting with quest matching methods which use the LAD-produced cut sets in more complex ways, e.g. by taking into account the ranks of terms in the cut sets.
As already mentioned in Sec. 1, the main goal of AntWorld is to provide a finding-sharing tool for members of common-interest user communities. We hope it to be particularly useful for group of several people whose search tasks are both complex and overlapping, and who have a real need to find answers to their questions. We are interested in collaboration with academic or industrial groups who feel that the system may be of interest to them. So far, the main users of the system have been the AntWorld project staff and the students using AntWorld as paid testers or as an assignment for a class [8]. These two categories of users accounted for 4,200 judgments and 1,500 judgments, respectively, out of the 6,000 judgment recorded in AntWorld database from June 1998 to February 2000.
While current AntWorld users do benefit from discovering one another's quests, we have also observed the system's suitability for a number of other uses:
This work is supported by the Defense Advanced Research Projects Agency (DARPA) Contract Number N66001-97-C-8537.
Most of the ideas behind AntWorld were developed by the project's Principal Investigators: Paul B. Kantor, Endre Boros, and Benjamin Melamed. Bracha Shapira and Insuk Oh conducted user studies.
This document was generated using the LaTeX2HTML translator Version 98.1p1 release (March 2nd, 1998)
Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -short_extn -split +0 -local_icons -address vmenkov@cs.indiana.edu dcw.tex.
The translation was initiated by Vladimir Menkov on 2000-04-04