Log in ....Tribune


Dot.ComLatest in ITFree DownloadsOn hardware

Monday, January 29, 2001
Lead Article

Hunting on the Wild Wild Web
By Naveen S. Garewal

THE growth of the Internet has led to a very paradoxical situation. While on the one hand there is a huge amount of information available on the Internet, on the other sheer volume of information makes it tough to seek relevant information from the Net in a fast, efficient and reliable manner. Every additional bit of information makes chances of finding appropriate information quickly very difficult. To tackle the issue, computer programmers came up with a piece of software called a "search engine" which actually looks through the information chaff to find the grain that you want.

A search engine looks through billion pages on the World Wide Web, all loosely tied together for documents containing keywords or phrases of interest acting like an information robot or "info-bot," a sort of obedient genie that digs up thousands of documents quickly. To make it all the more surreal, search engines allow you to examine each retrieved file with just a click of the computer’s mouse.

 

What is it that the search engine does that cannot be done manually? The search engines adopt a technique that grabs key words of the search using yet another software called "Spider" that sucks up every link on the Web pages it scans. The links that the search engine throws up depends on the query. A carefully worded query can hit the nail on the head in the first go, but at the same time a loosely made search can throw up thousands of unwanted results, making life more miserable.

With many search engines available, each claiming to be more efficient than the other, Net surfers are bound to be confused. Which is the best search engine? The answer to this question is not all that simple. It is actually quite bewildering to understand which engine will work best for a given search. A little insight into the way a search engine works could prove very useful.

All search engines can broadly be categorised into five types: robotic Internet search engines, mega-indexes, simultaneous (parallel) mega-indexes, subject directories and the robotic specialized search engines.

The robotic Internet search engines traverse the Web’s hypertext structure by retrieving a document, and recursively retrieving all documents that are relevant. This type of programmes are also known as "spiders," "Web wanderers," or "Web worms." The robotic search engines attempt to cover at random a significant portion of the World Wide Web. They examine that portion of the Internet with Universal Resource Locator (URL) addresses starting with http:// or with www as well as parts of the Internet with Hyper Text Markup Language (HTML) links. Popular search engines that fit into this category include AltaVista, Excite, HotBot, InfoSeek, Lycos, Open, Ultra, WebCrawler, etc.

The second category, meta-indexes or mega-indexes, do not have any of their own databases. They, instead, are linked to robotic search engines. There are thousands of such mega-indexes — many could just be personal Web pages with search engine links. Some meta-indexes are @Once!, All in One, Galaxy, Internet Sleuth, Magellan, Net Search, etc.

A variant of the above known as the multi-threaded meta-indexes or simultaneous (parallel) mega-indexes access robotic Internet search engines in parallel (simultaneously) and present the unified results as a single package. Two best-known simultaneous mega-indexes are MetaCrawler and Savvy Search.

The fourth category — the subject directories — is often manually maintained, browsable, and is often searchable with robotic search engines. Yahoo!, being the most famous in this category. Yahoo! has several subject headings. Once a query is submitted, Yahoo! automatically connects to AltaVista for searching the Web at large. In another sense, Yahoo! is also a mega-index since its hypertext links will take you to other robotic search engines besides AltaVista.

The last in the category is the robotic specialized search engines. These engines focus on a portion of the Internet, which includes the World Wide Web; newsgroups and discussion lists; files available by file transfer protocol (FTP); people (white pages); companies (yellow pages); and software. These serve as a convenient one-stop location with links to the specialised search engines. The links to yellow pages, white pages, etc fall under this.
Searching the Internet has itself become an industry with big players like Netscape investing in search pages and bringing out books on how to search the Web. While the search engines like AltaVista and Lycos attempt to index most pages available on the Net, most spider search engines allow advance Boolean logic, truncation and field searching. Unless carefully worded, search results can sometimes be very illogical.

Internet search has evolved rapidly since 1993, when WebCrawler become the first widely used search engine. Among the two dozen odd search engines prevalent today, Yahoo! is undoubtedly the most popular. But Google has emerged as a very powerful search engine recently. Developed by Lawrence Page and Sergey Brin at Stanford University in the UAS, it claims, "Google uses sophisticated text-matching techniques to find pages that are both important and relevant to your search. For instance, when Google analyses a page, it looks at what those pages linking to that page have to say about it. Google also prefers pages in which your query terms are near each other."

For ease of use and convenience several software like Copernic have been developed which simultaneously consult the many search engines and brings back relevant results with summaries. It also removes duplicate information and dead links, making the Internet search easy. Software like Internet Detective provide scores of links based on topics like locate people, investigative resources, information and government resources, newsgroup search, etc, that make locating specific information from the Net easy.

Different engines have different strong points; use the engine and features that best suit your requirements. One thing is obvious, the engine that brings up the maximum number of results is certainly not the best. The search engine that gives you a few, but specific answers is what you should choose. The best results can be got by selecting the words for the query very carefully. Research is on to classify information into categories that will really improve searching. The pursuit of finding the best way to navigate through the world of electronic information goes on.

Major search engines
(in alphabetical order):

AltaVista is a powerful search engine with a lot of extras which provide the quickest search results.

AllTheWeb uses a fast’ search technology, but results are not very encouraging

DirectHit aims for relevance using popularity factor.

Excite is rated as one of the best search engines and is known for its ease of use. WebCrawler, one of the first known search engines is owned by Excite.

GoTo gives out rankings to sites and has a commercial relevance.

Google claims hold over a billion pages. Adds weight to frequent citations.

InfoSeek parent company owns Go and Disney.

Lycos "Wolf spider" (Latin). Owns HotBot, which uses the parallel, scalable searches. It searches domains such as [.edu].

NorthernLight has a special pay collection.

Snap Clean design for this fast growing heavyweight.

Yahoo! is a Web index pioneer. Maintains its own huge directories.

 

Getting more accurate results

Use "And", "Or", "not" to make search more specific — e.g. "cake OR pastry NOT recipes."

Use "+" and "-" symbols—for example, search "computer memory" by adding +harddisk+CDROM. Otherwise the search will bring up all links having the words "Computer" and "Memory" on them.

Similarly, searching for cake-recipes will bring up more relevant results.

With the wild card *.* you can search for chaos, chat, channel, etc, by searching for "Cha*.*".

 

7Search
Acclaim Search
AOL
All The Web
AltaVista
Amnesi
Ask Jeeves
DejaNews
Deoji
Dewa
DevSearch
DirectHit
Excite
Findit2000
FindWhat
Frequent Finders
Funkycat
Google
Go2Net
GoTo
HotBot
iBound
Info Hiway
Infomak
InfoSeek
IXQuick
Jump City
Kanoodle
Link Centre
Link Master
Links2Go
Look Up

Lost Link/ Web Links
Lycos
MSN
Mamma.com
TheNet1
NexorAliweb
NorthernLight
Pathfinder/ Time-Warner
Reference.Com
Rocket Links
Scrub The Web Search4Info
Search.Com
Search Hound
Search King
Snap
Splat Search
Subjex
Super Cyber Search
ToggleBot
TopClick
WebCrawler
Web Direct
WebSearch2K
WebVentureHotlist
What-U-Seek
Where2Go
WWWHunter
Yahoo!
ZenSearch
Z Search

 

Home Top