Thanks for the explanation, Sergei.
For joins, what does the optimizer do if these values are not present?
Just guess?
On Wed, Oct 10, 2012 at 3:07 PM, Sergei Golubchik <serg@stripped> wrote:
> Hi, Zardosht!
>
> On Oct 10, Zardosht Kasheff wrote:
>>
>> In a the structure KEY, I see a field named rec_per_key that has the
>> following comment:
>>
>> /*
>> Array of AVG(#records with the same field value) for 1st ... Nth key part.
>> 0 means 'not known'.
>> For temporary heap tables this member is NULL.
>> */
>>
>> Our storage engine does not ever fill these values. We set the values
>> to 0. We recently saw that not setting this for the primary key
>> resulted in a bad query plan. We can set it to 1 for unique keys and
>> the primary key. Are there other known bad side effects to not setting
>> this?
>
> Other than bad query plans? No.
>
> These values are used by the optimizer to prefer an index with the best
> selectivity. E.g, in SELECT ... WHERE a=5 AND b=6, if the optimizer has
> to choose between an index on 'a', with rec_per_key=2, and an index on
> 'b' with rec_per_key=20, it would prefer an index on 'a'. Because it
> would expect that a=5 will match only 2 rows, while b=6 will match 20
> rows.
>
> This was very simplified explanation. Really optimizer would probably
> use records_in_range in that case. But if you have a join,
> SELECT ... t1, t2 WHERE t1.a=t2.b AND ... then optimizer would use
> rec_per_key.
>
> These rec_per_key values don't have to be exact, MyISAM updates them
> only on ANALYZE TABLE statement. But practically they change very
> slowly, even if the table is constantly updated, so it's ok to update
> them rarely.
>
> Regards,
> Sergei
>