Findings
Excite consistently returned the largest total number of hits per search, clearly showing that Excite does have a more exhaustive index of the WWW than WebCrawler. For doing a thorough search on a topic, a "leaving no stone unturned" search, a search retrieval set including every document that even mentions the search string may be desirable. However, most searches, I feel, are more of a "quick and dirty" search: the searcher does not want every document returned, only those they feel are most relevant. Thus, precision and relevancy ranking are usually important to the average searcher.
Table 1: Total Returns
| Search | Excite | WebCrawler |
| 1 | 6,751 | 333 |
| 2 | 23,042 | 674 |
| 3 | 469 | 4 |
| 4 | 92,563 | 9,569 |
| 5 | 2,572 | 120 |
| 6 | 5,632 | 202 |
At first it would appear that because Excite returns far more hits than WebCrawler for each search, precision and the relevancy ranking of the documents in the retrieval set would be more important in searches done by Excite than in searches done by WebCrawler. Precision and relevancy ranking go hand in hand for large retrieval sets. On any search that has exhaustive indexing, if precision is judged on the entire set, precision may be low, but if precision is based on the topmost returned items, precision should be high. This combines a "quick and dirty" search with a "no stone unturned" search: those needing to find only the most relevant documents will not need to look far into the list of returned hits, while those wanting a thorough search have every record containing their search terms returned. Excite's precision scores were equal or better than WebCrawlers in each of the six searches performed. However, even though WebCrawler returns smaller retrieval sets, and thus is not a recommended search engine for "no stone unturned" searches, precision and ranking is still very important. First, some retrieval sets were still fairly sizeable (9,569 hits for "memorial AND day"). Secondly, since WebCrawler seems to be more geared to the "quick and dirty" searching, relevancy ranking is very important for precision to help ensure that the most relevant documents are included in the retrieval set shown to the searcher.
Table 2: Precision
| Search | Excite | WebCrawler |
| 1 | 0.04 | 0.04 |
| 2 | 0.04 | 0.02 |
| 3 | 0.02 | 0.00 |
| 4 | 0.04 | 0.00 |
| 5 | 0.72 | 0.06 |
| 6 | 0.06 | 0.02 |
Comparing the precision of each search at first shows that Excite does a better job at ranking the hits it returns. This could be very much influenced, however, by the size difference between Excite's and WebCrawler's indexes. Since the Excite index is much larger, it has a higher probability of finding more relevant documents to rank and place in the top fifty results in its retrieval sets than does WebCrawler.
One way to determine which does a better job at ranking hits returned is to look at documents returned by both engines, and to compare how they were ranked by each engine. It is interesting to note, however, that in half of the searches there was no overlap in documents returned. Of the remaining half, only one or two documents were found to overlap. All overlapping documents that were ranked by one search engine as firstmost, was likewise ranked by the other search engine. For search 1, document B, WebCrawler listed the document second on its return list, while Excite listed it as fourth. However, Excite's total returned hits for that search was twice that of WebCrawler. With twice the "noise" it placed the document twice as far down. However, since Excite claims to use a proprietary "intelligent" search engine (ICE) that is supposed to help it return the best relevancy, twice the noise should matter little. In this search (dancing AND dragon), both search engines returned the same two documents. One, "Dancing Dragon Designs" was ranked first by both. The second document, "Dancing Dragon,1." WebCrawler listed it as second, while Excite had two documents from the "Dancing Bear - PLS Dragon King Home" pages listed before it. I have difficulties reconciling that with Excite's statement assertion that their ICE search engine ranks documents not only on keywords but on correlation of concepts as well. "Dancing Bear - PLS Dragon King Home" pages contained many references to the keywords "dancing" and "dragon" within them, even though "dancing" was directly associated with "bear" and not directly with "dragon." This points to a statistical analysis of the repetition of keywords within a document more than any correlation of concepts. This appeared to be the case for search 5, document B as well. WebCrawler consistently gave higher relevancy scores for the same document than did Excite, even for documents listed as first in the both engines' retrieval sets.
Table 3a: No. of overlapping documents
| Search | Overlap |
| 1 | 2 |
| 2 | 0 |
| 3 | 0 |
| 4 | 0 |
| 5 | 2 |
| 6 | 1 |
|
|
Table 3b: Search engine ranking of the overlapped documents
| Search / Document | Engine | List postion | Relevancy Score |
Search 1: Document A |
Excite | 1 | 85% |
| WebCrawler | 1 | 94% |
Search 1: Document B |
Excite | 4 | 80% |
| WebCrawler | 2 | 93% |
Search 5: Document A |
Excite | 1 | 93% |
| WebCrawler | 1 | 95% |
Search 5: Document B |
Excite | 31 | 90% |
| WebCrawler | 6 | 92% |
Search 6: Document A |
Excite | 1 | 90% |
| WebCrawler | 1 | 94% |
|
Excite consistently returned the largest number of unique hits per search. In only one search did it not return a single unique hit. WebCrawler only returned a single unique hit in two searches. Again, this may very well be due to Excite having a much larger index.
Table 4: No. of unique documents
| Search | Excite | WebCrawler |
| 1 | 0 | 0 |
| 2 | 2 | 1 |
| 3 | 1 | 0 |
| 4 | 2 | 0 |
| 5 | 35 | 1 |
| 6 | 3 | 0 |
1. Dancing Dragon had, to my surprise, two official websites involving two different domain addresses, and with slightly varying page titles: "Dancing Dragon" and "Dancing Dragon Designs." After comparison, I found that the two sites were for the same mail-order catalog company.