MySQL Lists are EOL. Please join:

List:General Discussion« Previous MessageNext Message »
From:Steve Rapaport Date:February 10 2002 12:04pm
Subject:Re: Distributed Fulltext?
View as plain text  



On Friday 08 February 2002 06:14 pm, James Montebello wrote:
> Distribution is how Google gets its speed.  You say clustering won't
> solve the problem, but distributing the indicies across many processors
> *is* going to gain you a huge speed increase through sheer parallelism.

True, but not enough.  The BEST parallelism can do  in a compute-bound
application is divide the time by the number of processors.  That's assuming
a PERFECT routing system.  (Correct me if I'm wrong here)

So to make the routing system + parallelism add up to 
a MILLION times better performance, you would need at
least a MILLION processors.  I doubt that even Google is
doing that.  

> Google uses thousands of processors to handle its index, and any given
> search is going to be spread over 100s of processors.  

Right, so we can expect Google to do, say, 10,000 times  (10^4) better
than Mysql at a Fulltext search.  But in fact we're seeing 
a million, 10^6, being generous.  It's that extra factor
of a hundred (more likely a thousand, my estimates
were very generous) that I'm getting all fussy about.

> Asking
> a general purpose RDBMS to be really good a 10 different things is asking
> a bit much.  Fulltext searches are well down the list.

Here we get out of the realm of hard numbers and into opinions.   But here's
mine:  If mysql bothers to support fulltext searches, it's presumably because 
there's some demand for them in some circumstances.  The level of scaling and
optimization that could reasonably be expected:  What it takes to make the
feature useful (i.e. perform similarly) in similar cases to other Mysql 
features.  With a regular index, I can do a hash lookup on 23 million records
subsecond.  With a fulltext index (on a small field, only 40 characters) my
time slips to 3 to 180 seconds.  That extra little factor of 100 is my 
problem.  Distributing over 4 processors wouldn't really help much.  And 
because people don't always type a company name the exact same way,
FULLTEXT really is the best way to do this.

So Monty et al, my request is this:  please put on the wish list, enhancement 
of FULLTEXT search  to approximately match the performance of an indexed
search on 25 million records, on the same hardware and with other things
held equal.

Steve Rapaport

---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <mysql-thread99064@stripped>
To unsubscribe, e-mail <mysql-unsubscribe-cyon=bestweb.net@stripped>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Thread
Distributed Fulltext?Brian DeFeyter12 Feb
Re: Distributed Fulltext?Steve Rapaport12 Feb
Re: Distributed Fulltext?Brian DeFeyter12 Feb
Re: Distributed Fulltext?Steve Rapaport12 Feb
Re: Distributed Fulltext?Brian DeFeyter12 Feb
Re: Distributed Fulltext?Alex Aulbach12 Feb
Re: Distributed Fulltext?James Montebello12 Feb
Re: Distributed Fulltext?George M. Ellenburg12 Feb
Re: Distributed Fulltext?Steve Rapaport12 Feb
  • Re: Distributed Fulltext?Mike Wexler12 Feb
Re: Distributed Fulltext?alec.cawley12 Feb
Re: Distributed Fulltext?Steve Rapaport12 Feb
  • Re: Distributed Fulltext?Mike Wexler13 Feb
    • Re: Distributed Fulltext?hooker14 Feb
Re: Distributed Fulltext?Steve Rapaport12 Feb