From: Jan Dvorak Date: November 21 2000 12:34pm Subject: Index with condition? [A distant idea] List-Archive: http://lists.mysql.com/internals/131 Message-Id: <3A1A6BE9.A0F680FE@veda.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit Hi all, I'm having the following idea and I'm looking for a DBMS that would satisfy me. I wish it was MySQL, one day. :-) A conventional database index always spans all rows of the table it's constructed on. Sometimes, though, I'd prefer a finer resolution was possible: I'd like to restrict the rows to be indexed. Several examples: 1. A foreign key that realizes an optional relationship (a "0,1" on the master side) that has only a low fraction of non-NULLs. But you have all the NULLs in the index, although they won't ever be used: * If I do joins, the NULLs won't match (right, if I use the normal "=", not the "<=>", but that's another story). * If I select the "fk IS NULL" rows, any sensible optimizer will decide to scan the table instead of going through the index. If I could construct the index with the condition "fk IS NOT NULL": CREATE INDEX idx_name ON tab_name ( col1_name ) WHERE col1 IS NOT NULL; I could get rid of the dead part of the complete index. The index could be much smaller and the db operations faster. 2. Sometimes I don't trust my users and don't allow them to do a real delete. Instead, I put an "erased" field in the table and add an additional condition to my selects, like "... and not erased". Now, if I have a uniqueness constraint, I want it to work only on the "not erased" records. I wish I could say CREATE TABLE lookup1 ( id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, code CHAR(5) NOT NULL, description VARCHAR(255), erased BIT NOT NULL DEFAULT 0, UNIQUE ( code ) WHERE NOT erased ); 3. A similar situation arises when I want to track history and add the "valid_from" and "valid_to" fields. The "valid_from" can't be null, but "valid_to" can: a null value there signals the up-to-date status, whereas a non-null means the validity of the record has been closed. Suppose I track identification documents of a person. The data is precious enough for me that I don't want to delete previous data, neither do I want to keep the non-up-to-date data in a separate table. At any given time, I'll honor just one identification of a type. I wish I could do CREATE TABLE Person_Ident_Document ( id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, person_id INTEGER UNSIGNED NOT NULL REFERENCES Person, valid_from DATETIME NOT NULL, valid_to DATETIME, ident_doc_type VARCHAR(255) NOT NULL, ident_doc_number VARCHAR(255) NOT NULL, ident_doc_issued_when DATE, ident_doc_issued_where VARCHAR(255), ident_doc_issued_by VARCHAR(255), UNIQUE ( person_id, ident_doc_type ) WHERE valid_to IS NULL ); [End of examples.] As far as I know, allowing for a conditional index would not violate any of the principles of relational databases. I think this is something other DBMS'es usually don't have, but what would be a very handy extension over the standards. The advantages are in a more efficient space usage, faster index maintenance, faster database operations. The only disadvantage I can see is that it's not implemented yet. :-) Your comments? Jan Dvorak