MySQL Lists are EOL. Please join:

List:General Discussion« Previous MessageNext Message »
From:Mike Wexler Date:February 12 2002 9:56pm
Subject:Re: Distributed Fulltext?
View as plain text  

Steve Rapaport wrote:
> On Friday 08 February 2002 06:14 pm, James Montebello wrote:
> 
>>Distribution is how Google gets its speed.  You say clustering won't
>>solve the problem, but distributing the indicies across many processors
>>*is* going to gain you a huge speed increase through sheer parallelism.
>>
> 
> True, but not enough.  The BEST parallelism can do  in a compute-bound
> application is divide the time by the number of processors.  That's assuming
> a PERFECT routing system.  (Correct me if I'm wrong here)

There are actually some exceptions. Specifically if you have a very large 
problem set and parallelism allows you to move the problem set into a faster 
storage medium, you can sometimes see greater performance increases. Lets say 
you are doing a full text search. And you have a 20 GB full text index. It may 
not be feasible to build a machine with 20GB of RAM for storing the index in 
RAM. But it might be more feasible to store 1/20th of the index in each of 20 
1GB machines. And with RAM being >1000 times as fast as hard disk. You could get 
a huge win.


> 
> So to make the routing system + parallelism add up to 
> a MILLION times better performance, you would need at
> least a MILLION processors.  I doubt that even Google is
> doing that.  
> 
> 
>>Google uses thousands of processors to handle its index, and any given
>>search is going to be spread over 100s of processors.  
>>
> 
> Right, so we can expect Google to do, say, 10,000 times  (10^4) better
> than Mysql at a Fulltext search.  But in fact we're seeing 
> a million, 10^6, being generous.  It's that extra factor
> of a hundred (more likely a thousand, my estimates
> were very generous) that I'm getting all fussy about.
> 
> 
>>Asking
>>a general purpose RDBMS to be really good a 10 different things is asking
>>a bit much.  Fulltext searches are well down the list.
>>
> 
> Here we get out of the realm of hard numbers and into opinions.   But here's
> mine:  If mysql bothers to support fulltext searches, it's presumably because 
> there's some demand for them in some circumstances.  The level of scaling and
> optimization that could reasonably be expected:  What it takes to make the
> feature useful (i.e. perform similarly) in similar cases to other Mysql 
> features.  With a regular index, I can do a hash lookup on 23 million records
> subsecond.  With a fulltext index (on a small field, only 40 characters) my
> time slips to 3 to 180 seconds.  That extra little factor of 100 is my 
> problem.  Distributing over 4 processors wouldn't really help much.  And 
> because people don't always type a company name the exact same way,
> FULLTEXT really is the best way to do this.
> 
> So Monty et al, my request is this:  please put on the wish list, enhancement 
> of FULLTEXT search  to approximately match the performance of an indexed
> search on 25 million records, on the same hardware and with other things
> held equal.
> 
> Steve Rapaport
> 
> ---------------------------------------------------------------------
> Before posting, please check:
>    http://www.mysql.com/manual.php   (the manual)
>    http://lists.mysql.com/           (the list archive)
> 
> To request this thread, e-mail <mysql-thread99064@stripped>
> To unsubscribe, e-mail <mysql-unsubscribe-cyon=bestweb.net@stripped>
> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
> 
> 
> ---------------------------------------------------------------------
> Before posting, please check:
>    http://www.mysql.com/manual.php   (the manual)
>    http://lists.mysql.com/           (the list archive)
> 
> To request this thread, e-mail <mysql-thread99518@stripped>
> To unsubscribe, e-mail <mysql-unsubscribe-mwexler=tias.com@stripped>
> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
> 


Thread
Distributed Fulltext?Brian DeFeyter12 Feb
Re: Distributed Fulltext?Steve Rapaport12 Feb
Re: Distributed Fulltext?Brian DeFeyter12 Feb
Re: Distributed Fulltext?Steve Rapaport12 Feb
Re: Distributed Fulltext?Brian DeFeyter12 Feb
Re: Distributed Fulltext?Alex Aulbach12 Feb
Re: Distributed Fulltext?James Montebello12 Feb
Re: Distributed Fulltext?George M. Ellenburg12 Feb
Re: Distributed Fulltext?Steve Rapaport12 Feb
  • Re: Distributed Fulltext?Mike Wexler12 Feb
Re: Distributed Fulltext?alec.cawley12 Feb
Re: Distributed Fulltext?Steve Rapaport12 Feb
  • Re: Distributed Fulltext?Mike Wexler13 Feb
    • Re: Distributed Fulltext?hooker14 Feb
Re: Distributed Fulltext?Steve Rapaport12 Feb