List:General Discussion« Previous MessageNext Message »
From:Alec.Cawley Date:November 24 2005 6:11pm
Subject:Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode
View as plain text  
AmirBehzad Eslami <startbtn@stripped> wrote on 24/11/2005 17:48:29:

> Dear list,
> 
>   I'm considering programming a simple "Search Engine" for a website,
>   to find Arabic/Persian data within a MySQL database.
>   This database contains a huge amount of data, encoded with 
Unicode(UTF-8). 
> 
> 
>   The big deal is to ** reduce the response time ** to end-users.
> 
>   My first solution is to create an Index and use the "FULL-TEXT 
> Searching" method.
> 
>   Luckily, MySQL's provides FULL-TEXT Indexing support in MyISAM tables.
>   But unfortunately, it doesn't support multi-byte charsets (e.g. 
> Unicode). [1]
>   Technically, MySQL creates Indexes over words.
>   A "word'' is any sequence of characters consisting of letters and 
> numbers [2].
> 
>   Assuming this, I tried to save the records as Unicode Character 
> References (&#xxxx;), but the search failed again :-(
> 
>   Any suggestion?
>   I appreciate any solution to solve this problem.
> 
>   Thanks in Advance,
>   Behzad
> 
> 
>   [1] MySQL Manual -> 6.8.3 Full-text Search TODO
>   [2] MySQL Manual -> 6.8 MySQL Full-text Search
> 
> 
>   P.S.

*********************** 
>   I use MySQL 4.0
***********************

I think this is your problem: MySQL does not properly support Unicode 
until version 4.1. I am successfully using FullText with MySQL 4.1 to sort 
UTF-8 encoded Japanese text. I see no reason why it should not work for 
Arabic - if you upgrade.

        Alec


Thread
MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, UnicodeAmirBehzad Eslami24 Nov
  • Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, UnicodeAlec.Cawley24 Nov
    • Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, UnicodeAmirBehzad Eslami24 Nov
      • Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, UnicodeAlec.Cawley25 Nov
Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, UnicodeAmirBehzad Eslami27 Nov