Log in ....Tribune

Monday, January 26, 2004
Feature

Future search engines will be personalised
Amardeep Gupta

Current search engines seem unable to leap up to the next big barrier in search — trillions of bytes of dynamically generated data created by individual Websites around the world, or what some researchers call the "deep Web." You can’t look up the status of a Federal Express package without going to the Federal Express site or the details on an eBay item without checking the eBay site. One cannot spider the dynamically generated data, so to say.

The first generation of Web search tools used on-the-page relevancy ranking, creating algorithms based on location and frequency of keywords. First generation added relevancy for Meta tags, keywords in the domain name, and a few bonus points for having keywords in the URL. Basic spam filters emerged that got rid of keyword stuffing and the same colour text. The portals also made their appearance and engines started looking like giant billboards and overstuffed yellow pages. But do Meta tags hold as much importance as they once did? No. Does using keywords in various tags help as much? Generally not.

Instead, the engines took it a step further in their quest for relevant results by bringing in the second generation engines. Second generation, which is in full swing with the themes thing, added off-the-page relevancy, using hyperlinks and visit duration data for results ranking. A few of the major components they employ are tracking clicks, page reputation, link popularity, temporal tracking, and link quality. Then they started adding in term vectors, stats analysis, cache data, and context where two-word keyword pairs were extracted from a page to better categorise it.

A few examples of second-generation search engines are www.ask.com, www.google.com, www.northernlight.com, www.surfwax.com, www.directhit.com

Meta search engines search many search engines at once and return the search results. A few examples are www.metacrawler.com, www.dogpile.com and www.profusion.com

Third generation engine

Third generation is already underway. It adds word stemming and a thesaurus on the top of the term vector database to assist in keeping a search in context. Auto extraction of keyword pairs also helps to automatically categorise a page, where searches like `shop for’ or `find’ trigger totally different search results based on the context or intent of the person doing the search.3G adds Web maps which, although not searchable, are a useful filtering tool to get rid of duplicate sites and many stand alone pages that drive traffic to only a few destinations.
They will also be extracting as much data as possible about individual searching habits. All major engines plan on building personal profiles; little robots that ‘come to know you’ over a period of time, based on past searching habits.
It is just another way to say they are implementing a ‘second generation’ search engine strategies. Using a term vector database, they weigh page keyword density to calculate the page vector, which is compared and stored relative to the term vector. They then compute a Web page reputation by graphing interconnectivity and link relevancy, making sure the reputation of the page and the content on the page actually match. The closest matches get the highest search engine positioning. Today all search engines are moving toward being theme-based.

Emerging tools

In the future, you might be able to load the engine full of lists of keywords. Your interests, likes and dislikes, geographical information, and favourite Websites can be entered, from which the engine can create a context engine just for you. Just think, they’ll know what your next search is likely to be, even before you do.

The future of searching will not only be about text, but will increasingly rely on visual models to help users understand the distribution of meaning and relationships between information sources. Perhaps the most promising visual meta-search engine for educators is Kartoo (). Kartoo is one of the most student-friendly and stable members of the new visual search engines. If you are attached to Google, you may want to check out the TouchGraph and Anacubis visual browsers for Google, as well as the Google Set Vista for visualising Google sets. Instructional applications of the Google browsers are not as self-evident as with Kartoo, but advanced searchers should enjoy using the tools to play with their favourite searches. If your budget allows, then you might be interested in the comprehensive (and visually stunning) Grokker, currently available as a preview release.