January 30, 2009

Statement Of Content

Off a tip from a co-worker, I pointed my browser at the CS Distinguished Lecture Series 08/09 presentations at the University of Toronto's Knowledge Media Design Institute. If you have the chance, check out Dr. Raghavan's talk on Web Search. Coles Notes version: semantic web.

What does that mean? To the average computer user today, web search is a done deal: we have Google (or Yahoo, or (ugh) Microsoft Live Search.) They scarcely notice, for example, Google's push for universal search. Yet this shows that, even at current levels of quality, traditional page/click-driven search falls short of the goal: to understand and enable the user's intent.

Yes, Google will return flight results, weather, cinema listings, and maps. Yes, it serves up definitions, exchange rates, and simple computations. The problem lies in implementation: all of these are special exceptions, extra branches in a behemoth decision tree that is roughly equivalent to this:

Does this look like a location query? No? Does it look like a request for movie times? No? Does it...

This doesn't generalize well, for obvious reasons: the search engine knows nothing whatsoever about what I'm trying to accomplish. If it has a model for user intent, it's stunningly rudimentary. Sure, there's room for optimizations, like promoting results that look like results I've clicked on before - but these are optimizations on top of a fundamentally limited model.

So how do we form such a semantic web or, as Dr. Raghavan puts it, a web of objects? That remains an open question. On the other hand, thanks to the success of Google et al we now have massive datasets of user click patterns, queries, times spent on various pages - the list goes on. Perhaps we can harness that data to build this new semantic, intent-driven, user-centered web on top of the content-driven web we have now. In fact, I'd be truly surprised if there's a single big-name search engine out there that hasn't been actively researching this for years. If there is, they are certainly doomed to obsolescence.

