Home

Hacks

Interesting Links

Local Links

Photography

Parks

Cooking

Comparing Google and MSN Search

(Apr. 7, 2005) Update: Some of the deficiencies mentioned here have been fixed by MSN; I will mark those out as and when I confirm the fixes.

Feb. 23, 2005 Google is the undisputed heavyweight champion of search engines. Even though it was a latecomer to the search arena, superior technology and elegant design have made it numero uno. As expected, Microsoft has responded to this phenomenon by unleashing their own Mothra to Google's Godzilla (you didn't think the wings on MSN Search were a coincidence, did you?).

Seeing that MSN Search (hereafter referred to as MSN) has Google squarely in its sights, it is interesting to see how far MSN has come, and how close is it to catching up to (or even surpassing) Google.

Having suddenly found myself with a lot of spare time, I decided to put it to good use: I decided to objectively compare the two search engines using the following criteria:

These terms will be explained in the relevant sections.

Methodology: I used a P4 2.8Ghz machine with 256MB memory, running the browser Firefox. I used the same browser for both sites, to avoid any browser bias. The WinXP system was patched with the latest patches from Microsoft. This machine is connected to a Linux box (my main workstation) which is also the firewall, and I used tcpdump on the Linux box to capture the packets flying about, to get accurate size and timing results. There was nothing else of significance running on the WinXP machine; it had been freshly rebooted. Windows firewall was active, but Firefox was allowed unfettered access to the 'net. Also, a few queries were run on both the engines to "prime" them, so to speak (resolve DNS queries, cache images, etc).

Before going any further, please keep in mind the following disclaimers: I have no relationship with Google and/or Microsoft (or MSN); while it is possible to nitpick problems with this little "study", this is not a PhD dissertation and should be treated just like any other such study you would find on the Internet.

Response time

We have all seen the smarmy "this search took 0.0042 seconds" on top of the search results. But how much of that is true? Does it matter much if the SE (Search Engine) can whip up the results in nanoseconds, if they take minutes to appear on your screen? With this in mind, I decided to look at the network traffic itself to see how accurate these timings are, and how much time does it really take to get the full results page.

But before we jump into monitoring the network traffic, we must remove the network delay from our calculations; after all, it isn't the SE's fault if I'm sitting in Hicksville USA on the wrong end of a dialup line (with due apologies to those of you who are on the wrong end of a dialup line).

Ping times:
C:\>ping -n 10 search.msn.com
Pinging a134.g.akamai.net [209.18.34.71] with 32 bytes of data:
Approximate round trip times in milli-seconds:
    Minimum = 46ms, Maximum = 107ms, Average = 53ms

C:\>ping -n 10 www.google.com
Pinging www.google.akadns.net [64.233.161.99] with 32 bytes of data:
Approximate round trip times in milli-seconds:
    Minimum = 47ms, Maximum = 89ms, Average = 55ms
Since the average network delay is in the same ballpark, I decided to forego the added step of removal of network delay from time calculations. (sidetrack: Linux fans can rejoice that Linux is behind both Google and MSN, via Akamai)

Actual tests: First, I queried both the engines with randomly generate pairs of words. The idea was to see how long the engines take for some really rare search terms: usually these resulted in none or few matches. The time taken (the time difference between the sending of the first packet and the sending of the last ACK on delivery of the search results) is shown below, along with the terms.

Search Term Google Results MSN Results Time taken: Google Time taken: MSN Time taken: MSN+
fantasizes Lucienne 3 4 0.585509 0.458916 2.969367
drosophilae nonhieratic 0 0 0.439381 0.494420 0.494420
cognateness mimographer 1 1 0.383132 0.290492 0.510973
beggarliness satiably 3 2 0.244643 0.264325 0.473614
untense nauntle 5 0 0.326356 0.343544 0.559968
Verdict? Too close to call. In 3 instances Google was quicker; in 2 MSN. But on average MSN was faster, if you ignore the requests that get sent to switch.atdmt.com for every web search. (The column "MSN+" includes everything). Without including this MSN web-bug, MSN's the winner by a whisker on average; if you include this web-bug's presence, Google handily beats MSN. I feel this web-bug's timing should be incorporated, because when the web-bug's site is slow, it slows down the whole browsing experience. From now on, I will include the full time taken to get the results as well as data from other sites that the results page requests.

Next, we move to the other end of the spectrum and query the engines with really popular terms. Since Ms. Hilton seems to be quite popular these days, her name was the first term used. The others were chosen for similar reasons ("Apache" is a popular server and many sites have the server's name at the bottom; "contact us" is also in many pages, hence "contact" was chosen).

Search Term Google Results MSN Results Time taken: Google Time taken: MSN+
Paris Hilton 7,350,000 950,044 0.279638 0.628478
Windows 290,000,000 152,355,317 0.298586 0.731100
Linux 221,000,000 98,953,706 0.343402 0.604388
Apache 50,900,000 26,910,715 0.200375 0.697605
contact 893,000,000 966,654,120 1.377154[*] 0.563293
Verdict? Google's the winner. The only time when Google took longer ([*]) was when it decided to offer up images from the movie "Contact" . This involved a DNS query and JPEG files. In keeping with the rules, all of these counted against Google. But it still came out ahead, both on average as well as in the number of wins.

Response Size

The other question of interest is: how big are the resulting pages? How bloated are they with ads, etc., filling up my cache and wasting bandwidth? For the 10 searches done above, I calculated the total bytes sent and received by the client: including SYN/FIN/ACK packets, retransmissions, etc. (if any). In addition, I did 5 more searches which were sure to be popular with the advertisers: class action lawsuit, life insurance, cell phones, stocks, and mortgage.
Search Google MSN+
The 10 searches above 60898 bytes 79416 bytes
5 ad-laden searches 37390 bytes 58656 bytes
Verdict? Google's the clear winner. Even though both use on-the-fly gzip compression, Google's barebones pages make the Olsen twins look flabby. I'm sure people on dialup lines will appreciate this.

Number of hits

This time, we are looking at how many matches did the SE report. Of course, there's no way to verify this number; the SE can report whatever number it pleases, but I'm hoping the SEers won't resort to such shenanigans.

For this category, I will just refer to the "Paris Hilton" table above (sidetrack: it is quite interesting to see how Google vanquishes MSN in the number of matches for "Paris Hilton". Since she's a recent "pop" on the scene, my guess is that Google's robots are quicker at trawling for information than MSN's).
Verdict? Looking at that table, it is clear that the winner is Google.

Relevance

This can be a bit subjective. After all, relevance is defined by the person who's doing the searching: if the results satisfy his/her need, then they are relevant; otherwise, no.

Not to be discouraged by this amazing bit of insight, I did several queries with terms derived from emails I have received seeking out information. The idea was to see if the same terms, when given to an SE, would result in hits that might have answered the original question.

Most often, the sites returned by both the engines were similar and ranked similarly. But there were a few exceptions, and these are given below.

Query: how to scan pictures
Google's result has these matches as the first 5:

Scanning Basics 101 - All about digital images
A few scanning tips. by Wayne Fulton. The purpose is to offer some scanning
tips and to explain the basics for photos and documents. ...
Scanning for Beginners or Basic Scanning Techniques
... I'd even seen it done. Crisp, clean scans that looked as good as the original photos. ...
Unless otherwise noted, all photos on this site are displayed as scanned. ...

HOW DO I SCAN PICTURES TAKEN WITH FILM CAMERA WITH MY HOME... ...
...SCAN PICTURES TAKEN WITH FILM CAMERA WITH MY HOME... ... how do i scan pictures with
a printer/scanner. Your answer will be published for anyone to see and rate. ...
Scanning Pictures
... and then select Adobe Photoshop 6.0, as shown in the picture below ... photos, black
and white vs ... The preview button is used to scan a preview of the document in the ...

Windows XP and Digital Photography: Printing and Scanning Pictures
... How to insert digital photos into Word to customize their printing layout. ... like home
printing and online photo sharing to pictures you shot ... Scanning Line Art ...

and for MSN, these are the first 5:
Trend Micro - Free online virus Scan
... Ease your mind and scan your PC for viruses. Scan Now. It's Free! Trend Micro. Mobile Security: The integrated solution provides automatic, real-time scanning to protect wireless devices against ...
Computed Tomography (CT) Scan
... scan . The dye makes blood vessels and other structures or organs more visible on the CT scan pictures . The dye may be used to evaluate blood flow, detect tumors, and locate areas of ...
BinaryPhotography.com - powered by vBulletin
... What do you scan pictures with? Never 2 6 Printers Discuss regular, color laser, and photo printers Never 1 6 Etc. Tripods, Camera bags, Memory cards, Filters, Batteries, Etc. Never 3 9 Software for manipulating ...
Big Lottery Fund
Access key list, click here to skip accessibility, alt+0. .home, alt+1, The big lottery fund, alt+l. Newsroom, alt+r. Consultation forum, alt+c. Funding Programmes.alt+p Click here to read access keys ...
How To Scan Pictures and Prepare Them for the Web
How To Scan Pictures and Prepare Them for the Web by Sheryl Cormicle Knox with adaptations for use at Lapeer County Library by Victor P. Illian Note: This document describes the process of ...
I was indeed quite surprised to see links to an anti-virus site, a CATscan site and a Lottery site (???) in the top 5 for this search.

Another search that returned less-than-stellar results was the question, "do I need more memory" (without the quotes). First, Google reports 25,900,000 matches to MSN's 1,180,365 (a 22x difference). But 4 of the 9 results returned by MSN (yes, it returns 9 on the first page) all point to the same site, kingston.com. While I agree that Kingston are purveyors of some fine memory, I am sure there are other sites which could have answered the question better.

The query for "sony vaio laptop" would seem fairly straightforward; but this was not to be. On Google, the first result is to Sony's site. However, on MSN, the first two results are to your run-of-the-mill "get free laptop" pyramid scheme sites.

Verdict? Google is better; while most of the searches will fare equally well on both engines, Google's results seem more relevant in some cases.

Coverage

If you thought "relevance" was subjective, wait till you read about this. By "coverage", we define how well diverse topics are represented in the first page of results. Sometimes, there are 2 or more distinct meanings of a search term, each with their own web of interlinked pages. A typical example that is often used is the search term "jaguar". It could mean the car; the big cat; or the new OS from Apple. If you're searching for information about the OS, you would hate to wade through pages upon pages of links to cars.

In other words, we are interested in seeing how well the clusters of information are represented in the search results. For a better view of this clustering phenomena, I would recommend a quick excursion to Clusty the clustering engine.

Query:jaguar. The first page of results on Google has representatives from all of the three distinct sets mentioned above. On the other hand, MSN's results are dominated by links to the car, with a couple of links to companies with "Jaguar" in the name thrown in. Surprisingly, there's no mention of Apple's Jaguar on the first page of results, and the big cat is nowhere to be found.

Query: football. Now, what the North Americans call "football" is known as something else in the world; and what the rest of the world calls "football" is known as "soccer" in North America. On Google, a search for "football" brings up NFL at #1, FIFA (the world body of soccer) at #2, as well as UEFA (European league) among the top 10. On MSN, there's no mention of FIFA or UEFA in the first page of results.

Query: polo. A search on Google returns links to Ralph Lauren's outfit, as well as to US Polo Association and Water Polo sites in the first page. On MSN, 7 of the first 10 hits are to Ralph Lauren's company, an unwarranted domination.

Verdict? Google looks to be the winner. Its results are well-balanced, with no single site hogging the limelight, and diverse subsets are well-represented.

Enhancements

Gone are the days when search engines were purely text search engines. Google has taken their usability to a whole new level. (It is rumored that a "George Foreman" model of Google is in the works, but don't quote me on that). Google has an ever-growing list of search features which clearly shows that the brainiacs at Google are busier than a rooster in a henhouse.

MSN has responded with its own list of features, and here I'll take a quick look at how they compare, without exploring all possibilities exhaustively.

Calculator

Both offer a calculator; you can type in expressions and the SE will return the answer to you. For example, 987*432 returns 426,384 as the answer. However, there are some differences:
  • MSN's calculator is a bit weaker; whereas 987*432 returns the answer (as above), the expression 987*432/151 returns nothing on MSN (and the correct answer, 2823.7351 on Google). Limiting expressions to just 1 operator seems a bit cheesy. Update: MSN has fixed this; now it returns the correct answer.
  • MSN's calculator reports the answer to log(32) as Answer: lg (32) = 1.50515. This is incorrect; lg() is used to denote logarithm to base 2, just like ln() is the logarithm to base e (the so-called "natural logarithm"). Google correctly reports lg(32) = 5.
  • MSN's precision is limited to about 34 decimal places; for example, try ln(0.00000000000000000000000000000000001). Google seems to be happy even with ln(0.0 <128 zeroes> 1). On the other side, MSN reports 2^40 is 1.099512x10^12, whereas 2^39 is 549,755,813,888. This shows that MSN switches to scientific notation after 12 digits; Google goes up to 2^42 before switching to scientifc notation. Update: MSN now accepts floating point numbers up to 37 decimal places, up from 34.
  • Hex and Octal arithmetic is not supported by MSN. I guess this will only matter to hardcore geeks like myself, but I just thought I'd throw it in there.
  • Google has a small bug; it reports 1/0 = 0, evidently a new kind of math invented at the Googleplex. MSN chooses to ignore your question altogether.
  • Both offer unit conversion tools, just in case you wanted to know how many furlongs in a parsec. Note the number of digits in the answer to this query, and compare with the "precision" comment above. MSN does appear to fall short in a few cases; feet^3 in m^3 returns nothing; feet in nanometers returns 0 (instead of 3.2808399 x 10-9); hectares in acre returns nothing. Update: "feet in nanometers" on MSN returns the answer 1 foot = 304,800,000.001219 nanometers. D'uh! I didn't ask for "nanometers in feet"; I asked for "feet in nanometers"! Clearly they tried fixing it, but I can't believe they'd make such a simple mistake. Additionally, the query feet in a nanometer returns Answer: 1 nanometer = 0 feet , which is also incorrect.

Other Convenience Features

Google offers other convenience features like FedEx and UPS parcel tracking; Vehicle ID Number (VIN) lookups; patent numbers, etc. Complete list is here. MSN does not have the full repertoire yet, but these features are trivial to implement, so expect MSN to have them soon (should they choose to do so).

And finally: Google returns up to 1000 matches to a search request, while MSN returns only a maximum of 250. While I pity the person who has to click through these many result pages, it may occasionally be useful.

Verdict? This round also goes to Google. But it should be easy for MSN to catch up to (and surpass) in this department.

Censorship

This is the most contentious of subjects. With the forces of free market, government regulation and potential lawsuits pulling at them from different directions, the SEs must feel like Elastigirl in The Incredibles.

One of my biggest fears is that these SEs will become the de-facto portals to the information universe, and hence become the single point of failure; if someone can force them to not show some content, then that content will effectively become blacklisted. These SEs could make censorship easier.

Google and Yahoo have been reported to be censoring content in China, as per that government's wishes. Some could say it is the price of business: follow the law of the land. Some would disagree: information yearns to be free. This issue is too complicated for me to wrap my nerdy little brain around.

What we can see is if these SEs are practicing any "self-censorship". What better way to test this than by looking at the mother of all lightning rods: Scientology. A quick peek over at MSN reveals that the results from MSN are all pro-scientology, without any mention of sites offering opposing viewpoints (and we do know there are quite a few of them). Google's results are much more balanced; the first result is for the Church of Scientology (as expected), but the second is for "operation clambake". About half of Google's results are to sites offering opposing viewpoints to Scientology.

I am sure there are others out there who can offer more such examples; I would be more than happy to list them out here if you send me some.

Verdict? MSN may find it hard to balance their other interests with the role of an unbiased search results provider. Even though Google is a public company, expect them to offer more resistance to the control freaks than MSN ("do no evil" ?).

Final Word

The verdict, for now is: Google's still much better than MSN Search. However, MSN seems to be making rapid progress, and the year 2005 is shaping out to be quite a year for search engines.

The end result, of course, is that we the consumers win. It's a great time to be alive!

Comments, criticisms and large sums of money welcome. :)


You owe:

Ajay Shekhawat (ajay AT cse.Buffalo.EDU)