Tuesday, September 26, 2006

Search enhancement - take 2

Yesterday I suggested a way to leverage the properties of blogs to enhance searches. This made me think of some additional enhancements possible, if you try to use the type of content at hand.

Let me explain:

Current search engines employ 3 basic information sources to retrieve the most relevant results:

  1. The actual text in the web page.
  2. The structure of the text (i.e. headlines vs. simple content, various HTML tags, etc).
  3. The structure of the web - we all know about PageRank.

What is common to all 3 sources is that they don't seriously differentiate between various types of web pages (enterprise vs. private homepages, blogs, newsgroups, news channels, e-commerce, etc.). This isn't completely accurate, since it is possible to perform searches that only search in specific sources of information (newsgroups, blogs, etc.), but that's not the point. 

What I feel is missing is an intelligent usage of the structure of each type of web page. 

Some examples:

  1. Blogs, news channels - why not implement a voting mechanism (similar to PageRank or other) that takes into consideration the number of talkbacks, the number of registered RSS clients, etc.? Even if a webpage has a low PageRank, if it has a large number of commenters or many RSS subscribers, from many different places, it may indicate that the site is much more important than it may seem.
  2. Newsgroups - number of threads, number of users, etc. I don't see many links to newsgroups on the web in general. Yet, some are very active. Then why not use additional, newsgroups-specific parameters to measure a newsgroup's relevance?
  3. e-commerce - Many price-comparison web sites allow their users to rate products and write comments about them. I'm sure that the more products are being rated and the more raters there are, the more chances there are that the price-comparison site is a good one. Even more so - products/vendors with many/high rating accross price-comparison sites should be promoted.
  4. Professional Magazines - Many give users the possibility to rank and write comments about products (like CNET), others give users the possibility to give feedback about the quality of the articles (like MSDN). Why not use that as part of the retrieval process?

What do you think?

No comments: