List:Internals« Previous MessageNext Message »
From:Hagen Höpfner Date:June 10 2005 5:56pm
Subject:Re: Text Encoding
View as plain text  
Sergei Golubchik schrieb:

>Hi!
>
>On Jun 10, Hagen Höpfner wrote:
>  
>
>>-----BEGIN PGP SIGNED MESSAGE-----
>>Hash: SHA1
>>
>>Ok, the next question ;-)
>>
>>If I have a table with an utf8 coded char(1) attribute and insert the
>>value "a" this is stored as 0x61 20 20. The output of "SHOW CHARACTER
>>TYPES" mentioned a maximal length. However, it seems, that this maximal
>>length is used even if one byte (like here) would be enough? I think,
>>this is based on the idea of handling possible updates (e.g.. a->?) more
>>efficient, but why do you call the character length to be maximal if it
>>is used all the time?
>>    
>>
>
>Check the manual for the difference between CHAR and VARCHAR (in Column
>types), and static vs. dynamic row format (in MyISAM section)
>  
>
That's not the question ;-) I know about varchar / char columns. What I 
am wondering about is the lenght of an attribut value not of a row. An 
Latin1-coded "a" requires one Byte 0x61 ... if I use UTF8 instead it 
requires 3 Bytes 0x61 20 20 . I know that the original UTF-8 enconding 
allowes various byte numbers for various kinds of symbols. An "Ä", for 
example, requires 2 Bytes to be represented in UTF-8 . In the 
SHOW-CHARACTER-TYPES-Output the lenght for ONE Character in different 
codings is shown. In fact, UTF-8 requires (in theory) up to max 3 Bytes, 
thats correct. But the real stored UTF-8-"a" uses 3 Bytes. So, why do 
you call the length to be maximal 3 Bytes if it is alway 3 Bytes?

Hagen
Thread
Text EncodingHagen Höpfner9 Jun
  • Re: Text EncodingSergei Golubchik10 Jun
    • Re: Text EncodingHagen Höpfner10 Jun
      • Re: Text EncodingSergei Golubchik10 Jun
        • Re: Text EncodingHagen Höpfner10 Jun
          • Re: Text EncodingSergei Golubchik10 Jun
            • Re: Text EncodingOlaf van der Spek10 Jun
              • Re: Text EncodingSergei Golubchik11 Jun