Monday, June 1, 2009

TO BE NOTED: From the FT:

Wolfram Alpha asks some searching questions of the web

By John Gapper

Published: May 22 2009 19:36 | Last updated: May 22 2009 19:36

For years, there has been little competition in the business of enabling people to find out things on the web: Google led and a bunch of its would-be rivals lagged behind. Suddenly, however, internet search is becoming lively again.

Next week, Microsoft will launch its latest effort to catch up with Google – a refreshed search engine codenamed Kumo. Meanwhile Yahoo has just shown off its own efforts to help people extract data from the internet’s millions of web pages, rather than wade through it link by link.

All this might be yawn-inducing – Microsoft and Yahoo have tried and failed to catch up with Google before – but for two things.

One is that Google, despite its 64 per cent share of search, according to the comScore research group, knows there is a gulf between what it provides and what many people want and is experimenting with making its search engine perform better.

The second is that Google faces a new challenge from an Illinois-based software group founded by Stephen Wolfram, a British scientist. This week, Wolfram Research launched Wolfram Alpha, a web application that resembles a search engine but aspires to be a digital oracle.

Wolfram Alpha will never rival Google as an entry point to the web because it serves up information from a private database, rather than the internet as a whole. But it is an intellectual slap in the face to Google because it approaches the quest for knowledge in another way.

Wolfram Alpha’s launch this week garnered a lot of hype and many people were disappointed. After receiving blank responses to queries that its software could not recognise – “Wolfram Alpha isn’t sure what to do with your input” – some gave up.

Even when it knows what to do with a query, the software is very curt on some subjects that would return thousands of web pages, videos and images on Google, and a detailed entry on Wikipedia.

Type “Barack Obama”, for example, and you are told his full name, his birth date and place, and that he is a head of state (although not of which country). The timeline of his life has no entries apart from his birth.

But it is a different story when it comes to scientific and mathematical data, or the sort of information held routinely on public databases such as the Central Intelligence Agency’s World Factbook or by the International Monetary Fund. Then Wolfram Alpha comes to life.

Enter “Halley’s Comet” and you get scientific details and a map of where it is at any moment. Enter “GDP per capita UK/GDP per capita US” and it builds a graph showing that Britons were half as rich as Americans in 1970 but approached parity by 2005.

The data are not drawn from the web but from a database that is “curated” by Wolfram Research, a company that makes most of its money by selling licences for Mathematica, a software package used in colleges. That makes it much more limited than the internet, but clean, precise and easily malleable. While search engines are a starting point in a quest to find things out, Wolfram Alpha provides complete answers.

Unlike Wikipedia, it is also tightly controlled. Its data are drawn only from sources that are edited and checked so that, at least in theory, all the information is trustworthy.

“Search engines are like traffic directions to everything, systematic and random, that is on the web. We are collecting knowledge accumulated by civilisations and making that data computable,” says Mr Wolfram.

Search engines are now trying to do something similar with the internet as a whole, but it is very difficult. “One of the hardest problems in computer science is data extraction. Can we look at the unstructured web and extract values and facts in a meaningful way?” asked Marissa Mayer, a Google executive, at a presentation last week.

Ms Mayer showed off Google Squared, an experimental new feature that would allow Google’s users quickly to assemble data about, for example, various breeds of small dogs in a form like a spreadsheet.

It would be a lot easier to achieve if data were written into web pages in a structured way. Tim Berners-Lee, one of the creators of the internet in its graphical form, has been working on a project called the “semantic web”, which encourages this approach, but progress has been slow.

That is what makes Wolfram Alpha so radical – it is a challenge not just to Google but to the internet as a whole. Instead of grappling with all the data that are theoretically discoverable on the web, Mr Wolfram has got around the difficulties by building his own black box.

Similar struggles for dominance between private databases and open information systems are common. In financial services, stock exchanges contend with “dark pools” of liquidity – private networks of banks and institutional investors that allow them to trade with each other.

So far in the history of the internet, the public has soundly defeated the private. Private networks such as the original AOL and Compuserve gave way to the internet as a whole, made comprehensible by Google.

Now that faces a challenge. If all the data on the internet are simply too messy to be analysed and structured, Google will be unable to produce a service rivalling Wolfram Alpha in clarity and reliability.

This would not spell the end for Google and other search engines. But it would mean that search itself – on which we rely to map the internet – had bumped up against its natural limits. Let the battle begin."

