>>>>> "Peter" == Peter Zaitsev <pz@stripped> writes:
>> MySQL scans the index in sorted order; For each row it goes the row
>> file and reads the data. If the row is ok, it sent to the client (or
>> but in a temporary table depending on context). Then MySQL does a
>> 'read_next' on the index; Doing this, it will continue to scan from
>> the last used index block (which is cached by the thread). In effect
>> MySQL gets a block of keys at a time and reads all matching rows for
>> these blocks before continuing with the next block of keys.
Peter> This will work good for small selects. if the key_block size is 1024 bytes
Peter> then it holds about 100 links to data then used key is an int. If you have a
Peter> need in selecting about 100.000 of records will if i'm not mistaken could
Peter> lead to really bad perfomance. I found in such cases (espetially then the
Peter> key is longer) it takes a lot of more time to select 30 persent of data from
Peter> the table takes about 5 times slower then copying whole this table by cp.
Peter> This only happens on highly loaded system - then hard-drive has to fly a
Peter> I'm wondering about this design choise (additionaly to it is simpe). If
Peter> we're shure we'll need all the rows it would be better to read an index
Peter> first and build the list of records to read and then to read them in sorted
Peter> order. In my situation for example many of rows will be even placed
Peter> sequentaly. Of couse this takes a bit more memory but I think not much. For
Peter> even 1.000.000 of rows (scaned on index) you need about 10MB of memory. I
Peter> found in many cases then mysql works with database which is much large than
Peter> the system memory the worst problem is not the CPU usage but the disk I/O.
We already do the above optimization when reading sorted rows; It
would be quite easy to add this also when reading things through an
index. (There is at least 100 different optimizations we know of and
plan to do; The problem is finding time to do them all...)
Peter> 3) How does mysql performs join (natural and star). Does it selects
Peter> table to start with with the small number of rows (or somehow else)
Peter> and then
Peter> scans it somehow (full,index) and on every row doing a search of
Peter> components by index in other tables or it does it like building a
Peter> list of
Peter> records it has to get for every table and then getting it out ?
>> This depends totally on the join type. I have tried to describe this
>> in the explain section of the manual.
>> - If you don't have any keys, then MySQL will read # rows from the
>> larger file and then do a full scan of smaller table and joining
>> all rows to eachother. Then it will read the next # rows from the
>> first file and do another full scan of the smaller table (this is
>> done recursively if you have many tables).
Peter> Well. So this is a sort of nested-loops implementation ?
Yes; Sort of nested-loops with caching at a couple of different places.
>> - In the normal case MySQL will first find one matching row from the
>> first table and then find, based on keys, the first matching row
>> from the second table and so on.
Peter> Well. Here is the same problem as whith scaning on index (that's why mysql
Peter> lost against oracle in star-join test)
What do you mean with the star-join test ?
Can you give me an example of a query ?
Peter> I have the query which joins 7 tables which are not the really big one -
Peter> about 200MB in size. index are used everythere (even to scan root table) and
Peter> it took the server 20 mintes to produce the result set !!! The problem is
Peter> as well - loaded I/O subsystem so disk head has to fly a lot and cache
Peter> contents is overwriten often.
Peter> If I'm right building a row-list you have to get from each file will help
Peter> against finding all data row by row, especialy with small join index
Peter> cardinarity. Well. The only downgrade of this as I understand may be the
Peter> limit as in this case everything done in parallel so you can't stop that
Peter> simple but this has a nice work around of couse.
Peter> I had implemented this algoritm ones in my application. The problem is
Peter> simmilar. I have a record and i have to insert into the table a row with all
Peter> the indexes of the record field as their content is stored in other table.
Peter> Then not-using application cache the normalizing not one loop but 100-500
Peter> of them in parralel bring me to 3 times increased perfomance.
Could you explain this a bit further (If you do this in detail, you
can always post it directly to me :) ?