List:General Discussion« Previous MessageNext Message »
From:Peter Gulutzan Date:October 30 2006 12:15pm
Subject:RE: Hungarian collation
View as plain text  
Hi,

On Thu, 2006-10-19 at 18:02 +0300, imre@stripped wrote:
> > From: Peter Gulutzan <peterg@stripped>
> > 
> > MySQL is looking for an authoritative, official statement which states 
> > all the current Hungarian collation rules.
> 
> According to the Reference Level Description of the hungarian language (ISBN
> 9634206441 or the hungarian version on line:
> http://bme-tk.bme.hu/other/kuszob/hangok.htm ) the rules are 
> the following:
> 

Apparently http://bme-tk.bme.hu/other/kuszob/hangok.htm is an
educational site (something to do with the council of Europe)
as opposed to an official standards site, if I'm understanding
correctly.

> - The basic order of the alphabet is a á b c cs d dz dzs e é f g gy h i
> í j
> k l ly m n ny o ó ö ő p q r s sz t ty u ú ü ű v w x y z zs
> - For the short-long vowel pairs (a á, e é, i í, o ó,
> ö ő, u ú, ü ű)  long =
> short usually, but long > short if all else 
> is equal. E.g., kád < kar < kár < kard

So far, this seems to be the opinion of a majority, although not
everyone describes the rule the same way. If MySQL adopts this rule,
SELECT * FROM t WHERE column1 = 'kár';
will not return rows where column1 = 'kar'. But perhaps
SELECT * FROM t WHERE column LIKE 'ká%'
will return rows where column1 = 'kar'

> - The long double consonants are sorting as if they would have been
> expanded.  I.e., ggy as gygy, nny as nyny

So 'ccs sorts with cscs' is true, i.e. ccs > cds

I expect that there is no rule which could apply for all LIKE searches.

> - Composit words are sorted according to word parts. I.e., meggyújt <
> meglát
> < megy < meggy
> 

I don't see a way to determine what is a composite word. So MySQL would
return meglát < megy < meggy < meggyújt

> An alternative collation sometimes used (in libraries, and some dictionaries
> and lexica) is according to the basic latin alphabet, whit the accented
> letters having the same value as the not accented.  Or anything in between.
> E.g., honoring the digraphs and the trigraph, but leaving the accents out of
> the business.
>  
> I hope this helps.
> 

Yes, and thank you. I'm grateful for the help MySQL is getting on this
question. We are still hoping for more responses.

> ImRe
> 
> 
-- 
Peter Gulutzan, Senior Software Architect
MySQL AB, www.mysql.com
Office: +1 780 472-6838
Mobile: +1 780 904-0297
VoIP:   +1 408 213-6654


Thread
Hungarian collationPeter Gulutzan17 Oct
  • Re: Hungarian collationPeter Gulutzan31 Jan
RE: Hungarian collationimre19 Oct
RE: Hungarian collationPeter Gulutzan30 Oct
  • RE: Hungarian collationimre31 Oct