| With the chaotic nature of the Internet, and the almost boundless amount of information available on it, a search can be daunting and time consuming. Research was initiated to compare indexing practices that will affect search retrieval results between Excite and WebCrawler. A list of six proper noun phrases was used as search strings was used in the comparison. Results were examined for relevance, and uniqueness. Retrieved item sets were compared to each other to test the effectiveness of using a Boolean operator in a proper noun phrase search between the two search engines. |
|---|
Individuals often need to search, or surf, the Internet to keep up with the constant addition of new items to the Internet in the areas that interest them. One survey, the Find/SVP's American Internet User Survey (1996), determined that 44% of Internet users search weekly, and 24% search daily. An ever growing number of search engines are available to fill a surfer's need, and the abilities of the engines themselves continue to grow as well. Thus, not only does the Internet surfer need to keep up with the changes in the Internet, but they also need to keep up with the changes in the search engines as well.
Because of this constant flux, individuals tend to get used to one or two search engines. They tend to fail to re-evaluate an earlier search engine, especially one they judged poor at first and thus never use it again, and tend to be reluctant to try out yet another new engine. For this reason I chose to compare a relative newcomer to the search engine field, "Excite," with one of the first search engines: "WebCrawler."
WebCrawler represents one of the oldest and most popular web search engines, and has changed much since its beginings. I used to use WebCrawler only for instances where I knew the site I was looking for was popular or well established, as it returned such queries quickly and toward the top of its list or returned hits. But for the most part, my searches were looking for more esoteric information, and I used other search engines. Excite represents one of the newest search engines and has high claims of precision and WWW coverage.
The Search Engines:
1. Excite
Excite is one of the newcomers in the search engine field. First called Architext, it was developed by six Stanford University graduates in September 1993. It uses the Intelligent Concept Extraction (ICE) search engine, a search engine that uses a proprietary concept-based technology whose statistical algorithms rank documents based both on keywords and correlation of concepts. Excite claims to have the largest index, and the best relevancy returns.
Search Features/Aids:
Size: 50 million pages actually retrieved and indexed.
Stop Words: Use not indicated.
Updated: weekly.
2. WebCrawler
WebCrawler was one of the pioneer full text search engines on the Net. As such it is said to be one of the most popular. It began as a project by Brian Pinkerton at the University of Washington in Seattle. It went public in April 1994 and was showcased on Netscape's Net Search page until it was bought by America OnLine in 1995. The content is now produced by both America OnLine in San Francisco, CA and the Global Network Navigator, Inc editorial team at Berkeley, CA. It uses Personal Library Software on Silicon Graphics Challenge servers (housed in leftover hardened missile silos). It ranks retrieved hits by frequency of search term(s) occurence in a document and by uniqueness of search term(s) to a document (if one of the search terms occurs in only a few documents, the documents that it does occur in rank high in relevancy). It claims to be one of the fastest search engines. It claims that adding search operators will not "fundamentally change your top [25] results" but just "limit the total number of results you get back" (WebCrawler Help: Search Features.).
Search Features/Aids:
Size: Over 500,000 pages.
Stop Words: Does not say what stop words are used, only that WebCrawler "aggressively" does not index such common words as "web" or "WWW" or alphanumeric strings.
Updated: weekly.