List:Internals« Previous MessageNext Message »
From:Jan Dvorak Date:November 21 2000 12:34pm
Subject:Index with condition? [A distant idea]
View as plain text  
Hi all,

I'm having the following idea and
I'm looking for a DBMS that would satisfy me.
I wish it was MySQL, one day. :-)

A conventional database index always spans
all rows of the table it's constructed on.
Sometimes, though, I'd prefer a finer resolution was possible:
I'd like to restrict the rows to be indexed.

Several examples:

1. A foreign key that realizes an optional relationship
   (a "0,1" on the master side) that has only a low fraction
   of non-NULLs.
   But you have all the NULLs in the index,
   although they won't ever be used:
    * If I do joins, the NULLs won't match
      (right, if I use the normal "=", not the "<=>",
       but that's another story).
    * If I select the "fk IS NULL" rows,
      any sensible optimizer will decide to scan the table
      instead of going through the index.
   If I could construct the index with the condition
   "fk IS NOT NULL":

    CREATE INDEX idx_name
      ON tab_name ( col1_name ) WHERE col1 IS NOT NULL;

   I could get rid of the dead part of the complete index.
   The index could be much smaller and the db operations faster.

2. Sometimes I don't trust my users and don't allow them
   to do a real delete.  Instead, I put an "erased" field
   in the table and add an additional condition to my selects,
   like "... and not erased".  Now, if I have a uniqueness constraint,
   I want it to work only on the "not erased" records.
   I wish I could say

    CREATE TABLE lookup1 (
      id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
      code CHAR(5) NOT NULL,
      description VARCHAR(255),
      erased BIT NOT NULL DEFAULT 0,
      UNIQUE ( code ) WHERE NOT erased
    );

3. A similar situation arises when I want to track history
   and add the "valid_from" and "valid_to" fields.
   The "valid_from" can't be null, but "valid_to" can:
   a null value there signals the up-to-date status,
   whereas a non-null means the validity of the record
   has been closed.

   Suppose I track identification documents of a person.
   The data is precious enough for me that I don't want to delete
   previous data, neither do I want to keep the non-up-to-date data
   in a separate table.
   At any given time, I'll honor just one identification of a type.
   I wish I could do

     CREATE TABLE Person_Ident_Document (
       id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
       person_id INTEGER UNSIGNED NOT NULL REFERENCES Person,
       valid_from DATETIME NOT NULL,
       valid_to DATETIME,
       ident_doc_type VARCHAR(255) NOT NULL,
       ident_doc_number VARCHAR(255) NOT NULL,
       ident_doc_issued_when DATE,
       ident_doc_issued_where VARCHAR(255),
       ident_doc_issued_by VARCHAR(255),
       UNIQUE ( person_id, ident_doc_type ) WHERE valid_to IS NULL
     );

[End of examples.]

As far as I know, allowing for a conditional index
would not violate any of the principles of relational databases.

I think this is something other DBMS'es usually don't have,
but what would be a very handy extension over the standards.

The advantages are in a more efficient space usage,
faster index maintenance, faster database operations.
The only disadvantage I can see is that it's not implemented yet. :-)


Your comments?


Jan Dvorak
Thread
Index with condition? [A distant idea]Jan Dvorak21 Nov