MySQL Lists are EOL. Please join:

List:General Discussion« Previous MessageNext Message »
From:James Montebello Date:February 8 2002 6:52pm
Subject:Re: Distributed Fulltext?
View as plain text  
For the slice servers, you simply assume that if one is lost, you lose X%
of the data until it is revived, which is usually not even noticable by
the end user.  For the aggregators, we had four behind a load-balancer.
In practice, we had nearly zero downtime over a roughly 18 month period.

james montebello

On 7 Feb 2002, Amir Aliabadi wrote:

> How do you make something like this fault tolerant?
> The answer is probably what I suspect, 2 of every thing.
> How does the aggregator handle this or are these machines in a cluster?
> 
> We are thinking of how to rebuild our fulltext search.  Currently it is
> in MS SQL 7.0 -  MySQL 4.0 seems to blow the doors off the cataloging
> time as compaired to MS SQL 7.0 Or even 8.0.
> 
> 
> On Thu, 2002-02-07 at 15:19, James Montebello wrote:
> > 
> > I did this at a previous job, and we split the data up more or less
> > this way (we used a pre-existing item number for the split which was
> > essentially random in relation to the text data), with a aggregator that
> > did the query X ways, each to a separate box holding 1/X of the data.
> > The results from each unit were paged and sorted, so all the aggregator
> > did was do a simple merge sort on a "page" of the set, which was fast.
> > On a 6M record dataset, it produced millisecond-range search results.
> > Not exactly Google-class, but pretty good for 12 Linux boxes, two
> > programmers, and about six weeks of effort.
> > 
> > james montebello
> > 
> > On Thu, 7 Feb 2002, Brian Bray wrote:
> > 
> > > 
> > > It seems to me like the best solution that could be implemented as-is 
> > > would be to keep a random int column in your table (with a range of say 
> > > 1-100) and then have fulltext server 1 psudo-replicate records with a 
> > > the random number in the range of 1-10, server 2 11-20 and server 3 
> > > 21-30 and so on.
> > > 
> > > Then run your query on all 10 servers and merge the result sets and 
> > > possibly re-sort them if you use the score column.
> > > 
> > > The problem with splitting the index up by word is that is messes up all 
> > > your scoring and ranking.  For example what if you search using 5 
> > > keywords, all starting with letters from different groups?  Your going 
> > > to get pretty bad score for each match, and it could totally break 
> > > boolean searches.
> > > 
> > > --
> > > Brian Bray
> > > 
> > > 
> > > 
> > > 
> > > Brian DeFeyter wrote:
> > > > On Thu, 2002-02-07 at 15:40, Tod Harter wrote:
> > > > [snip]
> > > > 
> > > >>Wouldn't be too tough to write a little query routing system if you
> are using 
> > > >>perl. Use DBD::Proxy on the web server side, and just hack the perl
> proxy 
> > > >>server so it routes the query to several places and returns a
> single result 
> > > >>set. Ordering could be achieved as well. I'm sure there are
> commercial 
> > > >>packages out there as well. I don't see why the individual database
> servers 
> > > >>would need to do anything special.
> > > >>
> > > > [snip]
> > > > 
> > > > If I'm understanding you correctly, I think you're refering to
> routing
> > > > based on the first character of the word. That would work for cases
> > > > where the query is searching for a word that begins with a certain
> > > > character.. however fulltext searches also return results with the
> term
> > > > in the middle.
> > > > 
> > > > ie: a search for 'foo' could return:
> > > > foo.txt
> > > > foobar
> > > > 
> > > > but also could return:
> > > > thisisfoo
> > > > that_is_foolish
> > > > 
> > > > I could be wrong, but it's my understanding that MySQL stores it's
> > > > fulltext index based on all the 'unique words' found. For such a
> system
> > > > as you mentioned above, you'd probably have to create your own
> fulltext
> > > > indexing system to determine: a) where to store the data 'segments'
> and
> > > > b) how to route queries.  It seems like this could probably be done
> much
> > > > more efficiently inside of the server.
> > > > 
> > > >  - Brian
> > > > 
> > > > 
> > > > 
> > > > ---------------------------------------------------------------------
> > > > Before posting, please check:
> > > >    http://www.mysql.com/manual.php   (the manual)
> > > >    http://lists.mysql.com/           (the list archive)
> > > > 
> > > > To request this thread, e-mail
> <mysql-thread98886@stripped>
> > > > To unsubscribe, e-mail
> <mysql-unsubscribe-bbray=xmission.com@stripped>
> > > > Trouble unsubscribing? Try:
> http://lists.mysql.com/php/unsubscribe.php
> > > > 
> > > > 
> > > 
> > > 
> > > 
> > > ---------------------------------------------------------------------
> > > Before posting, please check:
> > >    http://www.mysql.com/manual.php   (the manual)
> > >    http://lists.mysql.com/           (the list archive)
> > > 
> > > To request this thread, e-mail <mysql-thread98892@stripped>
> > > To unsubscribe, e-mail
> <mysql-unsubscribe-jamesm=conru.com@stripped>
> > > Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
> > > 
> > 
> > 
> > ---------------------------------------------------------------------
> > Before posting, please check:
> >    http://www.mysql.com/manual.php   (the manual)
> >    http://lists.mysql.com/           (the list archive)
> > 
> > To request this thread, e-mail <mysql-thread98893@stripped>
> > To unsubscribe, e-mail
> <mysql-unsubscribe-root=aliabadi.com@stripped>
> > Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
> > 
> 
> 
> 
> ---------------------------------------------------------------------
> Before posting, please check:
>    http://www.mysql.com/manual.php   (the manual)
>    http://lists.mysql.com/           (the list archive)
> 
> To request this thread, e-mail <mysql-thread98918@stripped>
> To unsubscribe, e-mail <mysql-unsubscribe-jamesm=conru.com@stripped>
> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
> 


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <mysql-thread98976@stripped>
To unsubscribe, e-mail <mysql-unsubscribe-cyon=bestweb.net@stripped>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Thread
Distributed Fulltext?Brian DeFeyter12 Feb
Re: Distributed Fulltext?Steve Rapaport12 Feb
Re: Distributed Fulltext?Brian DeFeyter12 Feb
Re: Distributed Fulltext?Steve Rapaport12 Feb
Re: Distributed Fulltext?Brian DeFeyter12 Feb
Re: Distributed Fulltext?Alex Aulbach12 Feb
Re: Distributed Fulltext?James Montebello12 Feb
Re: Distributed Fulltext?George M. Ellenburg12 Feb
Re: Distributed Fulltext?Steve Rapaport12 Feb
  • Re: Distributed Fulltext?Mike Wexler12 Feb
Re: Distributed Fulltext?alec.cawley12 Feb
Re: Distributed Fulltext?Steve Rapaport12 Feb
  • Re: Distributed Fulltext?Mike Wexler13 Feb
    • Re: Distributed Fulltext?hooker14 Feb
Re: Distributed Fulltext?Steve Rapaport12 Feb