List:General Discussion« Previous MessageNext Message »
From:Peter Gulutzan Date:October 30 2006 12:15pm
Subject:RE: Hungarian collation
View as plain text  

On Thu, 2006-10-19 at 18:02 +0300, imre@stripped wrote:
> > From: Peter Gulutzan <peterg@stripped>
> > 
> > MySQL is looking for an authoritative, official statement which states 
> > all the current Hungarian collation rules.
> According to the Reference Level Description of the hungarian language (ISBN
> 9634206441 or the hungarian version on line:
> ) the rules are 
> the following:

Apparently is an
educational site (something to do with the council of Europe)
as opposed to an official standards site, if I'm understanding

> - The basic order of the alphabet is a á b c cs d dz dzs e é f g gy h i
> í j
> k l ly m n ny o ó ö ő p q r s sz t ty u ú ü ű v w x y z zs
> - For the short-long vowel pairs (a á, e é, i í, o ó,
> ö ő, u ú, ü ű)  long =
> short usually, but long > short if all else 
> is equal. E.g., kád < kar < kár < kard

So far, this seems to be the opinion of a majority, although not
everyone describes the rule the same way. If MySQL adopts this rule,
SELECT * FROM t WHERE column1 = 'kár';
will not return rows where column1 = 'kar'. But perhaps
SELECT * FROM t WHERE column LIKE 'ká%'
will return rows where column1 = 'kar'

> - The long double consonants are sorting as if they would have been
> expanded.  I.e., ggy as gygy, nny as nyny

So 'ccs sorts with cscs' is true, i.e. ccs > cds

I expect that there is no rule which could apply for all LIKE searches.

> - Composit words are sorted according to word parts. I.e., meggyújt <
> meglát
> < megy < meggy

I don't see a way to determine what is a composite word. So MySQL would
return meglát < megy < meggy < meggyújt

> An alternative collation sometimes used (in libraries, and some dictionaries
> and lexica) is according to the basic latin alphabet, whit the accented
> letters having the same value as the not accented.  Or anything in between.
> E.g., honoring the digraphs and the trigraph, but leaving the accents out of
> the business.
> I hope this helps.

Yes, and thank you. I'm grateful for the help MySQL is getting on this
question. We are still hoping for more responses.

> ImRe
Peter Gulutzan, Senior Software Architect
Office: +1 780 472-6838
Mobile: +1 780 904-0297
VoIP:   +1 408 213-6654

Hungarian collationPeter Gulutzan17 Oct
  • Re: Hungarian collationPeter Gulutzan31 Jan
RE: Hungarian collationimre19 Oct
RE: Hungarian collationPeter Gulutzan30 Oct
  • RE: Hungarian collationimre31 Oct