Monday, June 4, 2007

To stay on top, Google has to improve search results

These days, Google seems to be doing everything, everywhere. It takes pictures of your house from outer space, copies rare Sanskrit books in India, charms its way onto Madison Avenue, picks fights with Hollywood and tries to undercut Microsoft's software dominance.

But at its core, Google remains a search engine. And its search pages, blue hyperlinks set against a white background, have made it the most visited, most profitable and arguably the most powerful company on the Internet.

Yet the site is also among the world's biggest teases. Millions of times a day, users click away from Google, disappointed they couldn't find the hotel, the recipe or the background of that hot guy. Google often finds what users want, but not always.

That's why Amit Singhal and hundreds of other Google engineers constantly tweak the company's search engine in an elusive quest to close the gap between often and always.

Singhal is the master of what Google calls its "ranking algorithm" -- the formulas that decide which Web pages best answer a user's question. It is a key part of Google's inner sanctum, a department called "search quality" that the company treats like a state secret.

Google values Singhal and his team so highly for the most basic of competitive reasons: It believes that its ability to decrease the number of times it leaves searchers disappointed is crucial to fending off the likes of Yahoo and Microsoft.

"The fundamental value created by Google is the ranking," says John Battelle, the chief executive of Federated Media, a blog ad network, and author of "The Search," a book about Google.

The search-quality team makes about a half-dozen major and minor changes a week to the mathematical formulas that power the search engine.

These formulas have grown better at reading the minds of users to interpret a very short query. Are the users looking for a job, a purchase or a fact? The formulas can tell that people who types "apples" are likely to be thinking about fruit, while those who type "Apple" are mulling computers or iPods. They can even compensate for vaguely worded queries or outright mistakes.

"Search over the last few years has moved from 'Give me what I typed' to 'Give me what I want,' " says Singhal, 39, a native of India who joined Google in 2000 and is now a Google Fellow, the designation the company reserves for its elite engineers.

As Google constantly fine-tunes its search engine, one challenge it faces is sheer scale. It is now the most popular Web site in the world, offering its services in 112 languages, indexing tens of billons of Web pages and handling hundreds of millions of queries a day. At the same time, users expect Google to sift through all that data and find what they are seeking, with just a few words as clues.

"Expectations are higher now," said Udi Manber, who oversees Google's entire search-quality group.

The search-quality group operates in small teams of engineers. Some, like Singhal's, focus on systems that process queries after users type them in. Others work on features that improve the display of results, like extracting snippets -- the text that hints at a site's content.

Other members of Manber's team work on what happens before users even start a search: maintaining a giant index of all the world's Web pages.

Google makes a copy of the entire Internet in each of its huge customized data centers so it can comb through the information faster. Google recently developed a new system that can hold far more data and search through it far faster.

As Google compiles its index, it calculates a "PageRank" for each page it finds. That was the key invention of Google's founders, Larry Page and Sergey Brin. PageRank tallies how many times other sites link to a given page. Sites that are more popular, especially with sites that have high PageRanks themselves, are considered likely to be of higher quality.

Singhal has developed a far more elaborate system for ranking pages that involves more than 200 types of information, or what Google calls "signals." PageRank is but one signal.

Once Google corrals its myriad signals, it feeds them into formulas it calls classifiers that try to infer useful information about the type of search, in order to send that user to the most helpful pages.

These signals and classifiers calculate several key measures of a page's relevance, including one it calls topicality -- a measure of how the topic of a page relates to the broad category of the user's query.

If all of that wasn't excruciating enough, Google's engineers must compensate for users who are vague, often typing ambiguous phrases or misspelled words.

So it built a system that understands variations of words. So elegant and powerful is that model that it can look for pages when only an abbreviation or synonym is typed in.

In the end, it's hard to gauge exactly how advanced Google's techniques are, because so much of what it and its search rivals do is veiled in secrecy.

"People still think that Google is the gold standard of search," Battelle says. "Their secret sauce is how these guys are doing it all in aggregate. There are 1,000 little tunings they do."
Source :http://seattlepi.nwsource.com

0 comments: