List:General Discussion« Previous MessageNext Message »
From:Michael Widenius Date:March 12 2000 11:01am
Subject:Re: A technical question.
View as plain text  
Hi!

>>>>> "Peter" == Peter Zaitsev <pz@stripped> writes:

<cut>

>> MySQL scans the index in sorted order; For each row it goes the row
>> file and reads the data. If the row is ok, it sent to the client (or
>> but in a temporary table depending on context).  Then MySQL does a
>> 'read_next' on the index; Doing this, it will continue to scan from
>> the last used index block (which is cached by the thread).  In effect
>> MySQL gets a block of keys at a time and reads all matching rows for
>> these blocks before continuing with the next block of keys.


Peter> This will work good for small selects. if the key_block size is 1024 bytes
Peter> then it holds about 100 links to data then used key is an int. If you have a
Peter> need in selecting about 100.000 of records will if i'm not mistaken could
Peter> lead to really bad perfomance.  I found in such cases (espetially then the
Peter> key is longer) it takes a lot of more time to select 30 persent of data from
Peter> the table takes about 5 times slower then copying whole this table by cp.
Peter> This only happens on highly loaded system - then hard-drive has to fly a
Peter> lot.
Peter> I'm wondering about this design choise (additionaly to it is simpe). If
Peter> we're shure we'll need all the rows it would be better to read an index
Peter> first and build the list of records to read and then to read them in sorted
Peter> order. In my situation for example many of rows will be even placed
Peter> sequentaly.  Of couse this takes a bit more memory but I think not much. For
Peter> even 1.000.000 of rows (scaned on index) you need about 10MB of memory. I
Peter> found in many cases then mysql works with database which is much large than
Peter> the system memory the worst problem is not the CPU usage but the disk I/O.

We already do the above optimization when reading sorted rows;  It
would be quite easy to add this also when reading things through an
index.  (There is at least 100 different optimizations we know of and
plan to do; The problem is finding time to do them all...)

>> 
Peter> 3) How does mysql performs join (natural and star). Does it selects
Peter> the
Peter> table to start with with the small number of rows (or somehow else)
Peter> and then
Peter> scans it somehow (full,index) and on every row doing a search of
Peter> needed
Peter> components by index in other tables or it does it like building a
Peter> list of
Peter> records it has to get for every table and then getting it out ?
>> 
>> This depends totally on the join type. I have tried to describe this
>> in the explain section of the manual.
>> 
>> - If you don't have any keys, then MySQL will read # rows from the
>> larger file and then do a full scan of smaller table and joining
>> all rows to eachother.  Then it will read the next # rows from the
>> first file and do another full scan of the smaller table (this is
>> done recursively if you have many tables).

Peter> Well. So this is a sort of nested-loops implementation ?

Yes;  Sort of nested-loops with caching at a couple of different places.

>> 
>> - In the normal case MySQL will first find one matching row from the
>> first table and then find, based on keys, the first matching row
>> from the second table and so on.
>> 

Peter> Well. Here is the same problem as whith scaning on index (that's why mysql
Peter> lost against oracle in star-join test)

What do you mean with the star-join test ?
Can you give me an example of a query ?

Peter> I have the query which joins  7 tables which are not the really big one -
Peter> about 200MB in size. index are used everythere (even to scan root table) and
Peter> it took the server 20 mintes to produce the result set !!!  The problem is
Peter> as well - loaded I/O subsystem so disk head has to fly a lot and cache
Peter> contents is overwriten often.

Peter> If I'm right building a row-list you have to get from each file will help
Peter> against finding all data row by row, especialy with small join index
Peter> cardinarity. Well. The only downgrade of this as I understand may be the
Peter> limit as in this case everything done in parallel so you can't stop that
Peter> simple but this has a nice work around of couse.

Peter> I had implemented this algoritm ones in my application. The problem is
Peter> simmilar. I have a record and i have to insert into the table a row with all
Peter> the indexes of the record field as their content is stored in other table.
Peter> Then not-using application cache  the normalizing not one loop but  100-500
Peter> of them in parralel bring me to 3 times increased perfomance.

Could you explain this a bit further (If you do this in detail, you
can always post it directly to me :) ?

Regards,
Monty
Thread
A technical question.Peter Zaitsev11 Mar
  • A technical question.Michael Widenius12 Mar
  • Re: A technical question.Peter Zaitsev12 Mar
    • Re: A technical question.Michael Widenius12 Mar
      • Re: A technical question.Tim Bunce13 Mar