List:Internals« Previous MessageNext Message »
From:Sergei Golubchik Date:June 10 2005 7:35pm
Subject:Re: Text Encoding
View as plain text  
Hi!

On Jun 10, Hagen Höpfner wrote:
> Sergei Golubchik schrieb:
> >>
> >>If I have a table with an utf8 coded char(1) attribute and insert
> >>the value "a" this is stored as 0x61 20 20. The output of "SHOW
> >>CHARACTER TYPES" mentioned a maximal length. However, it seems, that
> >>this maximal length is used even if one byte (like here) would be
> >>enough? I think, this is based on the idea of handling possible
> >>updates (e.g.. a->?) more efficient, but why do you call the
> >>character length to be maximal if it is used all the time?
> >
> >Check the manual for the difference between CHAR and VARCHAR (in
> >Column types), and static vs. dynamic row format (in MyISAM section)
> >
> That's not the question ;-) I know about varchar / char columns. What I 
> am wondering about is the lenght of an attribut value not of a row. An 
> Latin1-coded "a" requires one Byte 0x61 ... if I use UTF8 instead it 
> requires 3 Bytes 0x61 20 20 . I know that the original UTF-8 enconding 
> allowes various byte numbers for various kinds of symbols. An "Ä", for 
> example, requires 2 Bytes to be represented in UTF-8 . In the 
> SHOW-CHARACTER-TYPES-Output the lenght for ONE Character in different 
> codings is shown. In fact, UTF-8 requires (in theory) up to max 3 Bytes, 
> thats correct. But the real stored UTF-8-"a" uses 3 Bytes. So, why do 
> you call the length to be maximal 3 Bytes if it is alway 3 Bytes?

CHAR() columns always have the same length - in bytes.
As "a" takes only one byte it's space-padded to 3 bytes.
Try, e.g. VARCHAR(1)  (be sure to force the table into dynamic row
format), and you'll see that "a" will take only one byte (length
excluded), it won't be space-padded.

If you'd have CHAR(2), and insert "aa", you'd see "aa      ",
and not "a  a  ".
 
Regards,
Sergei

-- 
   __  ___     ___ ____  __
  /  |/  /_ __/ __/ __ \/ /   Sergei Golubchik <serg@stripped>
 / /|_/ / // /\ \/ /_/ / /__  MySQL AB, Senior Software Developer
/_/  /_/\_, /___/\___\_\___/  Osnabrueck, Germany
       <___/  www.mysql.com
Thread
Text EncodingHagen Höpfner9 Jun
  • Re: Text EncodingSergei Golubchik10 Jun
    • Re: Text EncodingHagen Höpfner10 Jun
      • Re: Text EncodingSergei Golubchik10 Jun
        • Re: Text EncodingHagen Höpfner10 Jun
          • Re: Text EncodingSergei Golubchik10 Jun
            • Re: Text EncodingOlaf van der Spek10 Jun
              • Re: Text EncodingSergei Golubchik11 Jun