Future search
engines will be personalised
Amardeep Gupta
Current
search engines seem
unable to leap up to the next big barrier in search — trillions of
bytes of dynamically generated data created by individual Websites
around the world, or what some researchers call the "deep
Web." You can’t look up the status of a Federal Express
package without going to the Federal Express site or the details on
an eBay item without checking the eBay site. One cannot spider the
dynamically generated data, so to say.
The first generation
of Web search tools used on-the-page relevancy ranking, creating
algorithms based on location and frequency of keywords. First
generation added relevancy for Meta tags, keywords in the domain
name, and a few bonus points for having keywords in the URL. Basic
spam filters emerged that got rid of keyword stuffing and the same
colour text. The portals also made their appearance and engines
started looking like giant billboards and overstuffed yellow pages. But
do Meta tags hold as much importance as they once did? No. Does
using keywords in various tags help as much? Generally not.
Instead, the engines
took it a step further in their quest for relevant results by
bringing in the second generation engines. Second generation, which
is in full swing with the themes thing, added off-the-page
relevancy, using hyperlinks and visit duration data for results
ranking. A few of the major components they employ are tracking
clicks, page reputation, link popularity, temporal tracking, and
link quality. Then they started adding in term vectors, stats
analysis, cache data, and context where two-word keyword pairs were
extracted from a page to better categorise it.
A few examples of
second-generation search engines are www.ask.com, www.google.com,
www.northernlight.com, www.surfwax.com, www.directhit.com
Meta search engines
search many search engines at once and return the search results. A
few examples are www.metacrawler.com, www.dogpile.com and
www.profusion.com
Third generation
engine
Third generation is
already underway. It adds word stemming and a thesaurus on the top
of the term vector database to assist in keeping a search in
context. Auto extraction of keyword pairs also helps to
automatically categorise a page, where searches like `shop for’ or
`find’ trigger totally different search results based on the
context or intent of the person doing the search.3G adds Web maps
which, although not searchable, are a useful filtering tool to get
rid of duplicate sites and many stand alone pages that drive traffic
to only a few destinations.
They will also be extracting as much data as possible about
individual searching habits. All major engines plan on building
personal profiles; little robots that ‘come to know you’ over a
period of time, based on past searching habits.
It is just another way to say they are implementing a ‘second
generation’ search engine strategies. Using a term vector
database, they weigh page keyword density to calculate the page
vector, which is compared and stored relative to the term vector.
They then compute a Web page reputation by graphing
interconnectivity and link relevancy, making sure the reputation of
the page and the content on the page actually match. The closest
matches get the highest search engine positioning. Today all search
engines are moving toward being theme-based.
Emerging tools
In the future, you
might be able to load the engine full of lists of keywords. Your
interests, likes and dislikes, geographical information, and
favourite Websites can be entered, from which the engine can create
a context engine just for you. Just think, they’ll know what your
next search is likely to be, even before you do.
The future of
searching will not only be about text, but will increasingly rely on
visual models to help users understand the distribution of meaning
and relationships between information sources. Perhaps the most
promising visual meta-search engine for educators is Kartoo ().
Kartoo is one of the most student-friendly and
stable members of the new visual search engines. If you are attached
to Google, you may want to check out the TouchGraph and Anacubis
visual browsers for Google, as well as the Google Set Vista for
visualising Google sets. Instructional applications of the Google
browsers are not as self-evident as with Kartoo, but advanced
searchers should enjoy using the tools to play with their favourite
searches. If your budget allows, then you might be interested in the
comprehensive (and visually stunning) Grokker, currently available
as a preview release.
|