List:Internals« Previous MessageNext Message »
From:Jan Dvorak Date:November 23 2000 5:22pm
Subject:Re: Index with condition? [A distant idea]
View as plain text  

Michael Widenius wrote:
> Hi!
> >>>>> "Jan" == Jan Dvorak <jan.dvorak@stripped> writes:
> Jan> Hi,
> Jan> Michael Widenius wrote:
> <cut>
> Jan> I think it would be smaller - one would get rid
> Jan> of both the keys and the row pointers!
> Jan> If I have 10M rows in a dynamic format,
> Jan> that's 40MB of row pointers!
> Only of there is a lot of NULL entries.  If the index is a compressed
> index, each NULL will only take about 2 bytes, in which case it's only
> 20M of pointers. As long as you are not searching after NULL's these
> entires are not touched and takes only place on disk; They will not
> normally make the access to the other rows notably slower.

I suppose I should sort the index blocks often,
or the not-NULL blocks will be spread all over the index file?

> >> Another problem is also that the checking of index corruption and
> >> repairing gets easily much more complicated.
> Jan> Yes.
> Jan> After all myisamchk functionality is merged into mysqld,
> Jan> will it continue to exist as a stand-alone utility?
> The plan is that it will continue to exist as a standalone utility;
> This gives you the benefit to be able to repair most tables off-line
> which is safer is there by some strange reason would be a bug in the
> repair code.

This looks like a wise decision.

I know Informix a little bit.
Their server is basically in two possible modes:
 * online: serving clients.
 * quiescent: checking and repairing dbspaces.
When the server is brought up, it enters the quiescent mode
and "mounts" all the dbspaces it should work with.
Only when it's ready, it enters the online mode.
If you pass an option to the server,
it only does the quiescent part and then exits.
This way, they could get rid of standalone utilities for the check & repair.

I'm not saying MySQL should go this direction.
After all, MySQL's data files are very different in nature from Informix's dbspaces
(those can be raw devices or cooked files, but the structure is page-oriented,
just like Oracle).

> Jan> 2. Sometimes I don't trust my users and don't allow them
> Jan> to do a real delete.  Instead, I put an "erased" field
> Jan> in the table and add an additional condition to my selects,
> Jan> like "... and not erased".  Now, if I have a uniqueness constraint,
> Jan> I want it to work only on the "not erased" records.
> Jan> I wish I could say
> >>
> Jan> CREATE TABLE lookup1 (
> Jan> code CHAR(5) NOT NULL,
> Jan> description VARCHAR(255),
> Jan> erased BIT NOT NULL DEFAULT 0,
> Jan> UNIQUE ( code ) WHERE NOT erased
> Jan> );
> >>
> >> Why not do it the SQL way:
> >>
> >> CREATE TABLE lookup1 (
> >> code CHAR(5),
> >> description VARCHAR(255),
> >> erased BIT NOT NULL DEFAULT 0,
> >> UNIQUE ( code )
> >> );
> >>
> >> This gives you exactly the same benefits!
> Jan> Does it?
> Jan> Assume a user sets the erased bit.
> Jan> In your approach, one should also update code with NULL,
> Jan> otherwise no new record of that code value can be inserted;
> Jan> with conditional index, the previous value can stay there,
> Jan> as the constraint is effective only on the non-erased rows.
> Jan> If an admin wants to restore the previous record
> Jan> (it turns out the user has screwed it up),
> Jan> the old code is lost under your set-up.
> I assumed you didn't mind changing code to NULL.
> If you do mind, the solution is to do the unique constrains as:
> UNIQUE ( code, erased )
> And have erased defined as:
> 'erased CHAR(0) default ""'
> To erase something you now only have to set erased to "NULL".

Yes, nice trick.

> Jan> O.k., I can see how you'd go around the issue:
> Jan> A uniqueness constraint doesn't apply to rows
> Jan> where a column of the unique index is NULL.
> Jan> Clever.
> Jan> Still, you do have the rows indexed.
> Yes, but you already had the row indexed in the first place.  They
> only thing we added was 1 byte / index entry.
> Jan> As far as I know, allowing for a conditional index
> Jan> would not violate any of the principles of relational databases.
> >> No, but it would be a bit troublesome to maintain;  Especially the
> >> UNIQUE constraint is very hard to do generally!
> Jan> I can see the following programming issues here:
> Jan>  1. Modify the virtual table interface to accept this extension.
> Jan>  2. Provide a table driver that actually can have the conditional index.
> Jan>  3. Tweak the optimizer to take the condition into account.
> Jan> To begin with, the conditions could be limited to simple
> Jan> expressions containing only the =, <=>, <>, <, >, <=,
> >=,
> Jan> IS NULL, IS NOT NULL, AND, OR, NOT operators, literals,
> Jan> and the columns of the indexed table.
> Jan> No functions (not to say UDFs), no subquery expressions.
> Jan> I think this is something other DBMS'es usually don't have,
> Jan> but what would be a very handy extension over the standards.
> >>
> Jan> The advantages are in a more efficient space usage,
> Jan> faster index maintenance, faster database operations.
> >>
> >> Only in some cases (mainly a fixed size index on one column)
> Jan> Actually, in pretty much every situation
> Jan> where you don't need to index all rows,
> Jan> just those that satisfy a highly filtering condition.
> Jan> The only disadvantage I can see is that it's not implemented yet. :-)
> >>
> >> But as you can see, its very easy to simulate...
> Jan> O.k., while it's not impossible to simulate,
> Jan> it definitely impacts the way one designs the database.
> The goods news is however that the solution I proposed is portable :)

O.k., it looks more portable than what I proposed. :)

But hey, it looks like Oracle would not let you do it!
Oracle8i SQL Reference, rel. 8.1.5,
SQL Statements -> constraint_clause -> Keywords and Parameters -> UNIQUE
(page 7-221 in my copy) has the following to say:

A <b>composite unique key</b> is made of a combination of columns.
To define a composite unique key, you must use <i>table_constraint</i>
syntax rather than <i>column_constraint</i> syntax.
<emphasis made-by="Jan">
Any row that contains nulls in all key columns automatically
satisfies the constraint.
However, two rows that contain nulls for one or more key columns
and the same combination of values for the other key columns
violate the constraint.

And hey, I just checked Informix does the same as Oracle!

I don't have the SQL standard handy.
There definitely is a difference between
how the the two big names interpret the uniqueness constraint,
and how MySQL copes with it!

Actually, it's not that dramatic with respect to my examples.
The difference is that while MySQL can have several erased rows
with the same code, Oracle and Informix will allow for just one
(that is, one erased and one active are possible under the two big names).

But the incompatibility is interesting, isn't it?
As far as I understand it,
MySQL uses the "=" operator to check the uniqueness,
while the others use the "<=>" operator (as if they had it ;-).

> I have put archived this mail so that we can look at doing this when
> 4.0 is stable.

That's what I ment with the subject line.

> Regards,
> Monty

Yours Faithfully