From: Michael Widenius Date: November 23 2000 9:06pm Subject: Re: Index with condition? [A distant idea] List-Archive: http://lists.mysql.com/internals/152 Message-Id: <14877.34506.891773.580858@narttu.mysql.fi> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi! >>>>> "Jan" == Jan Dvorak writes: Jan> Hi, Jan> Michael Widenius wrote: >> >> Hi! >> >> >>>>> "Jan" == Jan Dvorak writes: >> Jan> Hi, Jan> Michael Widenius wrote: >> >> >> Jan> I think it would be smaller - one would get rid Jan> of both the keys and the row pointers! Jan> If I have 10M rows in a dynamic format, Jan> that's 40MB of row pointers! >> >> Only of there is a lot of NULL entries. If the index is a compressed >> index, each NULL will only take about 2 bytes, in which case it's only >> 20M of pointers. As long as you are not searching after NULL's these >> entires are not touched and takes only place on disk; They will not >> normally make the access to the other rows notably slower. Jan> I suppose I should sort the index blocks often, Jan> or the not-NULL blocks will be spread all over the index file? I agree that B-trees gets a bit fragmented over time; OPTIMIZE TABLE would ensure that the NULL index is in the left part off the tree (away from the other data) >> >> Another problem is also that the checking of index corruption and >> >> repairing gets easily much more complicated. >> Jan> Yes. Jan> After all myisamchk functionality is merged into mysqld, Jan> will it continue to exist as a stand-alone utility? >> >> The plan is that it will continue to exist as a standalone utility; >> This gives you the benefit to be able to repair most tables off-line >> which is safer is there by some strange reason would be a bug in the >> repair code. Jan> This looks like a wise decision. Jan> I know Informix a little bit. Jan> Their server is basically in two possible modes: Jan> * online: serving clients. Jan> * quiescent: checking and repairing dbspaces. Jan> When the server is brought up, it enters the quiescent mode Jan> and "mounts" all the dbspaces it should work with. Jan> Only when it's ready, it enters the online mode. Jan> If you pass an option to the server, Jan> it only does the quiescent part and then exits. Jan> This way, they could get rid of standalone utilities for the check & repair. Jan> I'm not saying MySQL should go this direction. Jan> After all, MySQL's data files are very different in nature from Informix's dbspaces Jan> (those can be raw devices or cooked files, but the structure is page-oriented, Jan> just like Oracle). With --myisam-recover you get a similar feature as the above for MyISAM files; The main difference is that you get the 'check' on open, not when the servers starts. Both methods has it good and bad points... >> The goods news is however that the solution I proposed is portable :) Jan> O.k., it looks more portable than what I proposed. :) Jan> But hey, it looks like Oracle would not let you do it! Jan> Oracle8i SQL Reference, rel. 8.1.5, Jan> SQL Statements -> constraint_clause -> Keywords and Parameters -> UNIQUE Jan> (page 7-221 in my copy) has the following to say: Jan> Jan> A composite unique key is made of a combination of columns. Jan> To define a composite unique key, you must use table_constraint Jan> syntax rather than column_constraint syntax. Jan> Jan> Any row that contains nulls in all key columns automatically Jan> satisfies the constraint. Jan> However, two rows that contain nulls for one or more key columns Jan> and the same combination of values for the other key columns Jan> violate the constraint. Jan> Jan> Jan> And hey, I just checked Informix does the same as Oracle! Jan> I don't have the SQL standard handy. Jan> There definitely is a difference between Jan> how the the two big names interpret the uniqueness constraint, Jan> and how MySQL copes with it! From the book 'ANSI SQL 99, really': "A UNIQUE constraint defines one or more columns of a table as unique columns; It's satisfied if no two rows in the table have the same non-null values in the unique columns" In the next chapter we have a note that PRIMARY KEY is defined as a UNIQUE where all columns has to be defined as NOT NULL. They even very clearly say (on page 415): 'Multiple NULL values are allowed because the NULL value is never equal to anuthing, even another NULL value' They have even a nice example, where they insert into a UNIQUE constraint over (a,b): (1,'hello'),(NULL,'hello'),(1,NULL), (NULL,NULL) but they don't insert another (NULL,'hello) :( I have to ask a friend that knows a lot more than I about the SQL standard about this. Jan> Actually, it's not that dramatic with respect to my examples. Jan> The difference is that while MySQL can have several erased rows Jan> with the same code, Oracle and Informix will allow for just one Jan> (that is, one erased and one active are possible under the two big names). Jan> But the incompatibility is interesting, isn't it? Jan> As far as I understand it, Jan> MySQL uses the "=" operator to check the uniqueness, Jan> while the others use the "<=>" operator (as if they had it ;-). It really looks that way. I shall add a test to crash-me to check how other databases interpret the standard. >> I have put archived this mail so that we can look at doing this when >> 4.0 is stable. Jan> Thanks. Jan> That's what I ment with the subject line. Regards, Monty