List:General Discussion« Previous MessageNext Message »
From:ryc Date:September 13 2001 4:14pm
Subject:Re: Fulltext indexing libraries (perl/C/C++)
View as plain text  
I think what you are looking for is called mifluz and is the indexing
library that htdig uses. The link is http://www.gnu.org/software/mifluz/ .

If you develop any kind of bindings to use mifluz to index a mysql database
let me know I would definitly be interested.

ryan

----- Original Message -----
From: "Christian Jaeger" <christian.jaeger@stripped>
To: <mysql@stripped>; <dbi-users@stripped>
Sent: Wednesday, September 12, 2001 9:42 PM
Subject: Fulltext indexing libraries (perl/C/C++)


> Hello
>
> [ I'm crossposting this to dbi-users because it might be of interest
> there too. Maybe better don't reply to both lists, thanks. ]
>
> While programming a journal in perl/axkit I realize that the problems
> of both creating useful indexes for searching content efficiently and
> parse user input and create the right sql queries from it are sooo
> common that there *must* be some good library already. :-) So I
> headed over to CPAN, but didn't really find what I was looking for.
>
> It should create indexes that are efficiently searchable in mysql,
> i.e. only <select ... where .. like "abcd%"> queries, not "%abc%".
> Allow to search for word parts (i.e. find "fulltext" when entering
> "text"). Allow for multiple form fields (i.e. one field for title
> words, one for author names, etc.) at once. Preferably allow for some
> sort of query rules (AND/NOT/OR or something).
> Preferably do some relevance sorting. Preferably allow to hook some
> numbers (link or access counts etc) into the relevance sorting.
>
> I think there are 3 tough parts which are needed:
> 1. creation of sophisticated index structures (inverted indexes)
> 2. somehow recognize sub-word boundaries to split words on. Maybe use
> some form of thesaurus? Or syllables? (I suspect it should be the
> same rules as for splitting words on line boundaries)
> 3. user input parser / query creator
>
> Why not:
>
> - use mysql's fulltext indexes? Because I think that currently they
> are too limited (i.e. see user comments about them
> www.mysql.com/doc/) (should be better in mysql-4, I read, but we need
> it in a few weeks already...). And they are also not supported in
> Innodb which we want to use.
>
> - use indexing robots? Because we work with XML documents, and would
> like to both keep the index up to date immediately, as well as split
> the XML contents into several parts (i.e. there's a title, byline,
> etcetc, which should be searchable or weigted differently). We want a
> *library*, not a finished product.
>
> There's Lucene (www.lucene.com) in Java that I think does exactly
> what I want. Anyone who helps me port that to perl or
> C(++)/perl-bindings (-; ? (It should be ready in a few weeks, and
> it's about 500k source code :-().
>
> (Something in C/C++ that would be loaded as UDF or so would be nice
> too, but as I understand (from recent discussion about embedded
> procedural languages) it's not possible since these UDF's would have
> to start other queries (i.e. to insert each word fragment into an
> index table).)
>
> What are my current options? What do you use?
> More info about mysql-4?
>
> Thx
> Christian.
>
> ---------------------------------------------------------------------
> Before posting, please check:
>    http://www.mysql.com/manual.php   (the manual)
>    http://lists.mysql.com/           (the list archive)
>
> To request this thread, e-mail <mysql-thread85385@stripped>
> To unsubscribe, e-mail <mysql-unsubscribe-iii=binary.net@stripped>
> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
>

Thread
Fulltext indexing libraries (perl/C/C++)Christian Jaeger13 Sep
  • Re: Fulltext indexing libraries (perl/C/C++)ryc13 Sep