pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Re: OT: freedomware vs... Was: Building Pan on Windows?


From: Alan Meyer
Subject: Re: [Pan-users] Re: OT: freedomware vs... Was: Building Pan on Windows?
Date: Thu, 11 Mar 2010 05:07:03 -0800 (PST)

Steven D'Aprano <address@hidden> wrote:
...
> You think that search engine software is hard? It's not hard. Yahoo has
> one, Microsoft has one, Alta Vista had one, Ask Jeeves had one, search
> engines where everywhere, and there still exist a couple of dozen. In
> 2008 alone, TEN new search engines were launched to the public.
> Google's competitive advantage isn't their software, but their data,
> their market share, and name recognition.

I couldn't let that pass.  Having written several search engines
myself I have to say that building search engines is VERY hard.

Anyone can write a simple search engine that can run through a
few megabytes of data on behalf of a few users.  But writing an
engine that can process terabytes or petabytes of data, can
service millions of users, can keep data in sync on thousands of
servers, can handle proximity matching, relevance ranking,
significant phrase extraction, search word stemming and
highlighting in multiple languages, result set size estimation,
and do it all at blinding speed and with practical sized and
real-time updatable stored data structures - that's a job that
only the best, world class programmers can do.

I once worked with professors of computer science who had studied
search technology for years and written their own award winning
system (the Inquery search engine) and they couldn't figure out
how the then top search engine (Infoseek) was able to do what it
did as fast as it did.

Go to any computer science library and try to find published
algorithms for search engine optimization.  Try to find out, for
example, how to build a postings list that can tell you when
three words appear in a particular order in a document, and still
be small, compressed, updatable, and yet fast to search.  Maybe
there's something now, but there wasn't when I looked.  It was a
black art based on closely guarded, proprietary secrets.

> Yes, Google probably had a competitive advantage due to their software
> algorithm ten years ago, and maybe, just maybe, they wouldn't have had
> one if they had open sourced it then. But even back on day one, Google
> didn't mind telling people what their algorithm was.

Google explained that relevance ranking was partly based on
determining how many links existed to a web page.  Revealing that
secret is like revealing that an internal combustion engine mixes
air and gasoline in a carburetor (well, they used to.)  Yes, it
gives you important information.  Yes it was a great innovation
in its time.  But it doesn't tell you anything about how to make
it work and it's only a tiny sliver of the technology that Google
had to build.

--
Alan Meyer
address@hidden



      




reply via email to

[Prev in Thread] Current Thread [Next in Thread]