Hi!
>>>>> "Jan" == Jan Dvorak <jan.dvorak@stripped> writes:
Jan> Hi,
Jan> Michael Widenius wrote:
>>
>> Hi!
>>
>> >>>>> "Jan" == Jan Dvorak <jan.dvorak@stripped> writes:
>>
Jan> Hi,
Jan> Michael Widenius wrote:
>>
>> <cut>
>>
Jan> I think it would be smaller - one would get rid
Jan> of both the keys and the row pointers!
Jan> If I have 10M rows in a dynamic format,
Jan> that's 40MB of row pointers!
>>
>> Only of there is a lot of NULL entries. If the index is a compressed
>> index, each NULL will only take about 2 bytes, in which case it's only
>> 20M of pointers. As long as you are not searching after NULL's these
>> entires are not touched and takes only place on disk; They will not
>> normally make the access to the other rows notably slower.
Jan> I suppose I should sort the index blocks often,
Jan> or the not-NULL blocks will be spread all over the index file?
I agree that B-trees gets a bit fragmented over time; OPTIMIZE TABLE
would ensure that the NULL index is in the left part off the tree
(away from the other data)
>> >> Another problem is also that the checking of index corruption and
>> >> repairing gets easily much more complicated.
>>
Jan> Yes.
Jan> After all myisamchk functionality is merged into mysqld,
Jan> will it continue to exist as a stand-alone utility?
>>
>> The plan is that it will continue to exist as a standalone utility;
>> This gives you the benefit to be able to repair most tables off-line
>> which is safer is there by some strange reason would be a bug in the
>> repair code.
Jan> This looks like a wise decision.
Jan> I know Informix a little bit.
Jan> Their server is basically in two possible modes:
Jan> * online: serving clients.
Jan> * quiescent: checking and repairing dbspaces.
Jan> When the server is brought up, it enters the quiescent mode
Jan> and "mounts" all the dbspaces it should work with.
Jan> Only when it's ready, it enters the online mode.
Jan> If you pass an option to the server,
Jan> it only does the quiescent part and then exits.
Jan> This way, they could get rid of standalone utilities for the check & repair.
Jan> I'm not saying MySQL should go this direction.
Jan> After all, MySQL's data files are very different in nature from Informix's
> dbspaces
Jan> (those can be raw devices or cooked files, but the structure is page-oriented,
Jan> just like Oracle).
With --myisam-recover you get a similar feature as the above for
MyISAM files; The main difference is that you get the 'check' on open,
not when the servers starts. Both methods has it good and bad
points...
<cut>
>> The goods news is however that the solution I proposed is portable :)
Jan> O.k., it looks more portable than what I proposed. :)
Jan> But hey, it looks like Oracle would not let you do it!
Jan> Oracle8i SQL Reference, rel. 8.1.5,
Jan> SQL Statements -> constraint_clause -> Keywords and Parameters -> UNIQUE
Jan> (page 7-221 in my copy) has the following to say:
Jan> <quotation>
Jan> A <b>composite unique key</b> is made of a combination of columns.
Jan> To define a composite unique key, you must use
> <i>table_constraint</i>
Jan> syntax rather than <i>column_constraint</i> syntax.
Jan> <emphasis made-by="Jan">
Jan> Any row that contains nulls in all key columns automatically
Jan> satisfies the constraint.
Jan> However, two rows that contain nulls for one or more key columns
Jan> and the same combination of values for the other key columns
Jan> violate the constraint.
Jan> </emphasis>
Jan> </quotation>
Jan> And hey, I just checked Informix does the same as Oracle!
Jan> I don't have the SQL standard handy.
Jan> There definitely is a difference between
Jan> how the the two big names interpret the uniqueness constraint,
Jan> and how MySQL copes with it!
From the book 'ANSI SQL 99, really':
"A UNIQUE constraint defines one or more columns of a table as unique
columns; It's satisfied if no two rows in the table have the same
non-null values in the unique columns"
In the next chapter we have a note that PRIMARY KEY is defined as a
UNIQUE where all columns has to be defined as NOT NULL.
They even very clearly say (on page 415):
'Multiple NULL values are allowed because the NULL value is never
equal to anuthing, even another NULL value'
They have even a nice example, where they insert into a UNIQUE
constraint over (a,b): (1,'hello'),(NULL,'hello'),(1,NULL),
(NULL,NULL)
but they don't insert another (NULL,'hello) :(
I have to ask a friend that knows a lot more than I about the SQL
standard about this.
Jan> Actually, it's not that dramatic with respect to my examples.
Jan> The difference is that while MySQL can have several erased rows
Jan> with the same code, Oracle and Informix will allow for just one
Jan> (that is, one erased and one active are possible under the two big names).
Jan> But the incompatibility is interesting, isn't it?
Jan> As far as I understand it,
Jan> MySQL uses the "=" operator to check the uniqueness,
Jan> while the others use the "<=>" operator (as if they had it ;-).
It really looks that way. I shall add a test to crash-me to check how
other databases interpret the standard.
>> I have put archived this mail so that we can look at doing this when
>> 4.0 is stable.
Jan> Thanks.
Jan> That's what I ment with the subject line.
Regards,
Monty