List:Internals« Previous MessageNext Message »
From:Michael Widenius Date:November 23 2000 9:06pm
Subject:Re: Index with condition? [A distant idea]
View as plain text  

>>>>> "Jan" == Jan Dvorak <jan.dvorak@stripped> writes:

Jan> Hi,
Jan> Michael Widenius wrote:
>> Hi!
>> >>>>> "Jan" == Jan Dvorak <jan.dvorak@stripped> writes:
Jan> Hi,
Jan> Michael Widenius wrote:
>> <cut>
Jan> I think it would be smaller - one would get rid
Jan> of both the keys and the row pointers!
Jan> If I have 10M rows in a dynamic format,
Jan> that's 40MB of row pointers!
>> Only of there is a lot of NULL entries.  If the index is a compressed
>> index, each NULL will only take about 2 bytes, in which case it's only
>> 20M of pointers. As long as you are not searching after NULL's these
>> entires are not touched and takes only place on disk; They will not
>> normally make the access to the other rows notably slower.

Jan> I suppose I should sort the index blocks often,
Jan> or the not-NULL blocks will be spread all over the index file?

I agree that B-trees gets a bit fragmented over time;  OPTIMIZE TABLE
would ensure that the NULL index is in the left part off the tree
(away from the other data)

>> >> Another problem is also that the checking of index corruption and
>> >> repairing gets easily much more complicated.
Jan> Yes.
Jan> After all myisamchk functionality is merged into mysqld,
Jan> will it continue to exist as a stand-alone utility?
>> The plan is that it will continue to exist as a standalone utility;
>> This gives you the benefit to be able to repair most tables off-line
>> which is safer is there by some strange reason would be a bug in the
>> repair code.

Jan> This looks like a wise decision.

Jan> I know Informix a little bit.
Jan> Their server is basically in two possible modes:
Jan>  * online: serving clients.
Jan>  * quiescent: checking and repairing dbspaces.
Jan> When the server is brought up, it enters the quiescent mode
Jan> and "mounts" all the dbspaces it should work with.
Jan> Only when it's ready, it enters the online mode.
Jan> If you pass an option to the server,
Jan> it only does the quiescent part and then exits.
Jan> This way, they could get rid of standalone utilities for the check & repair.

Jan> I'm not saying MySQL should go this direction.
Jan> After all, MySQL's data files are very different in nature from Informix's
> dbspaces
Jan> (those can be raw devices or cooked files, but the structure is page-oriented,
Jan> just like Oracle).

With --myisam-recover you get a similar feature as the above for
MyISAM files; The main difference is that you get the 'check' on open,
not when the servers starts.  Both methods has it good and bad


>> The goods news is however that the solution I proposed is portable :)

Jan> O.k., it looks more portable than what I proposed. :)

Jan> But hey, it looks like Oracle would not let you do it!
Jan> Oracle8i SQL Reference, rel. 8.1.5,
Jan> SQL Statements -> constraint_clause -> Keywords and Parameters -> UNIQUE
Jan> (page 7-221 in my copy) has the following to say:

Jan> <quotation>
Jan> A <b>composite unique key</b> is made of a combination of columns.
Jan> To define a composite unique key, you must use
> <i>table_constraint</i>
Jan> syntax rather than <i>column_constraint</i> syntax.
Jan> <emphasis made-by="Jan">
Jan> Any row that contains nulls in all key columns automatically
Jan> satisfies the constraint.
Jan> However, two rows that contain nulls for one or more key columns
Jan> and the same combination of values for the other key columns
Jan> violate the constraint.
Jan> </emphasis>
Jan> </quotation>

Jan> And hey, I just checked Informix does the same as Oracle!

Jan> I don't have the SQL standard handy.
Jan> There definitely is a difference between
Jan> how the the two big names interpret the uniqueness constraint,
Jan> and how MySQL copes with it!

From the book 'ANSI SQL 99, really':

"A UNIQUE constraint defines one or more columns of a table as unique
 columns; It's satisfied if no two rows in the table have the same
 non-null values in the unique columns"

In the next chapter we have a note that PRIMARY KEY is defined as a
UNIQUE where all columns has to be defined as NOT NULL.

They even very clearly say (on page 415):

'Multiple NULL values are allowed because the NULL value is never
equal to anuthing, even another NULL value'

They have even a nice example, where they insert into a UNIQUE
constraint over (a,b):  (1,'hello'),(NULL,'hello'),(1,NULL),

but they don't insert another (NULL,'hello) :(

I have to ask a friend that knows a lot more than I about the SQL
standard about this.

Jan> Actually, it's not that dramatic with respect to my examples.
Jan> The difference is that while MySQL can have several erased rows
Jan> with the same code, Oracle and Informix will allow for just one
Jan> (that is, one erased and one active are possible under the two big names).

Jan> But the incompatibility is interesting, isn't it?
Jan> As far as I understand it,
Jan> MySQL uses the "=" operator to check the uniqueness,
Jan> while the others use the "<=>" operator (as if they had it ;-).

It really looks that way.  I shall add a test to crash-me to check how
other databases interpret the standard.

>> I have put archived this mail so that we can look at doing this when
>> 4.0 is stable.

Jan> Thanks.
Jan> That's what I ment with the subject line.