Paul B. Kantor, Endre Boros, Benjamin Melamed, Vladimir Meñkov
Rutgers University
4 Huntington St, New Brunswick, NJ 08903
kantor@scils.rutgers.edu
http://aplab.rutgers.edu/ant/
In networked information environments, using server-browser architectures, nearly all information finding episodes become extended interactions between the user and the system. In this setting the system needs some way to ``understand" what the user is seeking, as this goal adapts and is modified during a session or a series of sessions. We describe a formal model, in which the model of the user's quest is represented as a generalized abstract ``response function" representing the user's response to the information delivered by the system. Representing this response as u(n) = Q( S(n-1)) shows that the user's utterance u(n) at a time step n is determined according to the user's ``response function" Q by the materials S(n-1) that had been presented up through the previous time step n-1.
The entire history of materials presented thus plays a role in determining the user's response, providing a very rich probe into the precise nature of the user's information quest, here represented by the rule Q. We show how this gives rise naturally to a new model for assimilating relevance feedback information, and to the concept of itineraries in the information network. Finally the concept of an information quest Q, provides a natural framework for considering the time dependence of information about the user's needs, and for various models of information aging. The use and effectiveness of this concept are illustrated with data collected in the Ant World Project at Rutgers.
The Ant World Project at Rutgers [Ant World, 1999,Kantor et al., 1999], simulates the laying of information-bearing traces in the World Wide Web. Since the links of the World Wide Web are incorporeal, these traces are actually stored in a database maintained at the Ant World server site (aplab.rutgers.edu [Ant World, 1999]). The contents of that database include a quest identification number, representing the quest of the user who left the mark, a pair of URLs indicating where the link comes from and where it goes to, and a user judgment, which varies according to the particular judgment scheme being tested at the moment. Unlike popularity-ranking systems such as Direct Hit [Direct Hit, 1999], which try to infer the user's opinion based on indirect feedback, such as the time he spends viewing a document, the Ant World asks the user to explicitly provide his judgment of the usefulness of the document. In early implementations the judgment is drawn on a 5 point scale from ``extremely useful'' to ``worthless''.
The ID of the quest Q is linked in the database to the quest's descriptions, provided by the user at the beginning of the quest: the short description, or SD(Q), and an optional long description, or LD(Q). The SD is a very short text, similar to a search engine query, for example ``data fusion information retrieval''. The LD is an extended natural language text, resembling a TREC query, e.g. ``I am interested in learning more about data fusion and information retrieval, and in particular on non-parametric methods for estimating the most effective rule for combining several IR systems''.
These short and long quest descriptions provide an initial representation of
what this particular individual user wants on this particular occasion. If
we think of them in terms of the quest response function, Q(S), which
represents the way in which the user with this particular quest responds to
what the system S has revealed so far, this is in fact the response to the
empty set:
.
Even in this raw form, obviously it contains
more information than that on which most web search
engines base their retrieval. In actual use the Ant World system in fact
does a preliminary retrieval from the base of stored quests, using this
information as a query. Before examining the details of that retrieval
mechanism, we consider what happens as individual pages are viewed and
judged.
As a new page is seen and judged, its contents are noted in a database, which is in effect an inverted index to all the pages that have been seen so far. In addition, the judgment is attached to that information together with the user's quest ID.
When the quest is created, the pairs (
,
SD(Q)) and (
,
LD(Q)) stored in the database constitute the quest profile. (The
and
are special symbols indicating that the texts
marked by them are an SD and an LD). As time progresses and the user visits
more pages, the quest profile is elaborated by the addition of pairs of the
form (J, d). In each such pair J is a tuple representing the user's
judgment, and d is the full text of the document that the user is seeing
when he or she makes this judgment. In a sense,
and
play the role of special judgments.
The judgment J is a tuple because not only the ``judgment label''
(the label on the button in the console window [Kantor et al., 1999]
the user clicked when he judged the
document), but also the configuration which determines the rule in
effect at the time that judgment was posted (such as what is the set
of available judgment buttons). In addition, for time
sensitive information, the tuple must also contain the time
at
which the judgment was posted. Thus the judgment J is
This information forms the basis for our investigations in quest description and quest matching. As noted, each such judgment can be represented as the response of the user to the system's presentation of the document represented by D, given everything that the user has experienced and can remember during the course of this quest, and from her entire outside life.
As judgments are applied to individual pages, we could, if we had a suitable differencing technology, infer that the judgment applies to that which is new and different about the given page. One such trivial technology is to subtract, from the vector representing the given page, a normalized vector representing the sum of previously seen pages. However, to represent the user's mental processes, each earlier page ought to be weighted according to the attention that the user gave to it, a quantity which is unknown to us. Hence, at present, we do not attempt differencing.
For matching of quests to each other, which is needed in order to
recall judgments posted by users whose quests most nearly match the
present quest, we adopt a hybrid vector retrieval system. Note that
the quest Q is defined as a mapping from the space of system
responses (pages delivered) to the space of judgments. This mapping
can be represented by its graph, as a set of ordered pairs. The set
of ordered pairs is what we store and manipulate in our database. The
representation of the present quest denoted by Q, plays the role of
a query in conventional information retrieval, and the collection of
stored quests Q' represents the corpus or document collection. In
general then we are concerned with measuring the quest similarity
score Sim(Q, Q') between the present quest and any particular
stored quest Q'. Given any specific choice of the similarity
function Sim
the system compiles the relevant
quest list: the list of stored quests sorted by decreasing similarity
score, down to some cutoff level.
For every document that has been judged by a user during any of the relevant quests, the system computes a putative usefulness score, based on the judgments about this document made during the quests on the relevant quest list. If the combined score exceeds a certain threshold (which is a variable parameter determined by the present operating configuration of the Ant World system), the document's URL is included into the suggestion list for the present quest Q: the list of URLs of potential interest to the user running quest Q.
During quest Q, whenever the page currently viewed by the user contains a link to any of the documents on the suggestion list, a small red ant appears on the user's screen next to this link.
To reiterate, the system must make two decisions. The first is the determination of similarity between the present quest and stored quests. The second is the setting of a threshold to determine whether ants should appear on the screen. The same threshold is also used to determine whether pages appear in a list of suggested pages, and the judgment of similarity is used to determine the order in which these pages appear.
As stated, the system uses a variant of the vector model. Each document dis represented by a vector over a base consisting of all the unstemmed non-stop-word terms that have appeared in the collection C, which consists of N documents: all documents viewed during all the recorded quests, plus the SD and LD of all quests. For every document d in the collection and for any non-stop-word term toccurring in it, we store f(t,d): the (raw) term frequency, i.e. the number of occurrences of term t in document d.
In our current computational model (as of May 1999), the similarity of
quests is based upon the similarity of the labeled documents
associated with the quests, i.e. the quest's SD and LD, and the set of
documents judged during the quest). The similarity of documents
d1 and d2 is computed using Singhal's formula [Singhal, 1997]:
The term prevalence factor g(t) in equation (2) is defined
so that it is high for rare terms and low for common words:
| (9) |
In general, one can define the similarity measure of two quests Q1 and
Q2 via the similarity of the underlying documents in the following
manner:
The coefficients
w(J1,J2) should be defined to indicate the
importance of the documents involved for the quests. A reasonable
model would probably have
as a large positive
number, and
In practice, performing fast real-time computations with an arbitrary coefficient matrix w may produce further bottlenecks. For this reason we have so far limited our experiments to a simpler model in which w has a product form (rank w=1):
The currently used
is defined as follows: For short
and long descriptions,
and
.
For a ``page evaluation'' judgment
,
is in the range [0;1], depending on the
judgment label s and configuration v, but not (yet) on the
judgment time
.
The actual conversion table (from s to J) is
created for each configuration v when the configuration is created,
along with the names of the judgment buttons, as in Table 2.2.
Note that at present we are making no use of the conceptual distinction between a page that seems useful in a navigational way, and one that is useful in terms of providing ``information'' per se. We have not yet conducted the human factors experiments needed to tell us whether these should be teated differently. If the are to be treated differently we will most likely have to give up the simple product form for w.
Factorization (11) allows to factorize the quest similarity
formula (10) as
Our quest matching is dynamic. This means that we update the tables
that store the quest profile data (
)
and term prevalence factor (g)
in real time. As the user makes more judgments during his quest Q, the
relevant quest list and the suggestion list are continuously updated.
A disadvantage of document and quest similarity formulas
(2, 12), as compared to
cosine models [Salton, 1971],
is that there is no natural upper bound, such as 1.0, on the value
of this similarity measure. (Since all
's are non-negative,
the Sim(Q,Q') can only increase as new judgments are added
to Q or Q'!)
When compiling the list of
relevant quests Q' for the present quest Q, ordering
the quests by the degree of relevance, or scoring
a link, we normalize all similarity contributions by
Sim(Q,Q) (``the similarity of the quest to itself'').
Thus the list of quests relevant to Q is
This normalization is still far from perfect solution, however. If
Q' is a much larger quest than Q (i.e. it includes a much greater
number of judged documents), Sim(Q,Q')/Sim(Q,Q) may be quite
high -- theoretically, even greater than 1 - even if the similarity
Sim(d,d') of each particular pair
of
underlying documents is low. A possible solution, which we have not
tried but which is very much in line with the traditional cosine
model, is to normalize Sim(Q,Q') by dividing it by
.
Choosing the values of the coefficients
's in (11) is
a tricky issue. How important should the SD and LD be? In earlier
experiments, when
and
were
much higher than the current values of 6 and 3 (on the order of
10-20), the influence of the quests' extended descriptions (judged
pages) on the determination of quest similarity was insignificant:
basically, we just matched the short and long descriptions of one
quest to those of the other. With the current values,
and
,
extended
descriptions are important--but the chance of spurious quest
similarity are quite high too. The latter phenomenon is due to the
fact that even when the topics of two documents d1 and d2 are
quite different, their similarity value Sim(d1,d2) can be quite
high because of numerous not-very-common words that they share. With
large quests, this may add up quickly.
|
|
We illustrate some of the issues surrounding quest matching with the following example. (In this example, all numbers include stopwords.) Document d1= http://coombs.anu.edu.au/SpecialProj/APM/TXT/low-m-02-96.html (document ID=-8117), with nDU(d1) = 1236, comes from a quest on Japanese history (quest ID=7856); document d2= http://www.geocities.com/CapitolHill/2078/npa7.htm (document ID=-958), with nDU(d2) = 568, comes from a quest on Philippine guerilla movements (quest ID=1312). The self-similarity of quests, using Singhal's formula, Sim (d1,d1)=1730.6, and Sim (d2,d2)=1821.7, comes to a large extent from words that are indeed specific to the topic. But Sim (d1,d2)=231.7 mainly comes from words that can be found in any paper on a political or historic topic. The top 20 terms contributing to each of the three similarity values are shown in Table 2.3. From the first two lists we get a rough idea of what the documents d1 and d2 are about; but the third list (that of the top-contributing words the two documents have in common) seem to convey little idea about either document.
We have alleviated the spurious similarity problem by omitting stopwords from all computations, using the popular 429-word stopword list [Fox, 1992]. Excluding stopwords helps because their contributions to Sim(d1,d2) for unrelated documents d1and d2 is relatively more significant than their contribution to either document's similarity to itself. For the above example, the effect of stopword exclusion is illustrated in Table 2.3. No doubt the situation would be improved much more if we represented texts in terms of two-word and three-word ``phrases'' based on statistical analysis. This technique has been used (with the initial exclusion of single terms) in the work of Schatz and Chen [Chen et al., 1996].
Another precision problem we observe in an Ant World search arises because the formula (15) is additive. A document need not appear in a highly relevant quests to get a high score; it is enough to appear in a large number of marginally relevant (just above the cutoff for the inclusion into the relevant quest lists) quests.
The Ant World quest database is stored using a relational database
management system (Sybase SQL Server 11). At present (May 21, 1999)
the database contains informations on 1,000 non-trivial quests (quests
with at least 2 documents viewed), which have been run by the project
staff and a few student testers over the previous 11 months. During
these quests 8,000 unique URLs have been viewed, and N=1650 of them
have been judged. There are 23,000 itinerary and judgment entries
(one entry per each document viewed and per each judgment). There are
entries in the Inst table, which contains the
entire text of all N judged documents split into words. (We don't
currently use this table, but it is maintained against the time when
we consider pairs and triples of consecutive words.) The table Freq, which contains the term frequency data f(t,d) contains
462,000 entries, and the table Psi, which contains the quest
profile vectors
has 305,000 entries. The judged documents
contain 65,000 different words; for each word, there is an entry in
the Prev table, which stores the term prevalence values g(t).
The algorithm we use to compute quest relevance and link scores according to
the formulas (3, 4, 8, 12)
presented in
the
"Quest Matching" Section
does not scale well as the number of users
and quests grows, since every time a user makes a judgment about document
d, a large number (
)
of entries need to be added to the table
describing the current quest profile.
We are running the database server on a Sun Sparc Ultra-1 workstation (aplab.rutgers.edu), with a 167 MHz processor (according to psrinfo) that performs at about 15 Mflops at simple timing tests, and 246 megabytes of RAM (according to top). The web server runs on the same computer.
![]() |
![]() |
![]() |
Depending on the quest and the current load of the computer, it takes
1-10 seconds (wall clock time) to incrementally update the
vector (equation (13) after a new document is added, as
shown by the histogram in Figure 1. To compile the
relevant quest list as per equation (14) takes 1-5
seconds. It takes 0.1-2 sec to compile the suggested documents
list (the list of all d whose scores (15) are above
the threshold 0.1; Figure 3).
This work is supported by the Defense Advanced Research Projects Agency (DARPA) Contract Number N66001-97-C-8537.
This document was generated using the LaTeX2HTML translator Version 98.1p1 release (March 2nd, 1998)
Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -short_extn -split +0 -local_icons -address kantor@scils.rutgers.edu quest.tex.
The translation was initiated by Vladimir Menkov on 1999-05-28