From: shawn l.green
Date: August 10 2017 1:39pm
Subject: =?UTF8?Q?Re:_How_was_this_records/fanout_logic_derived_for_the_?=
=?UTF8?Q?=e2=80=9cno_statistics=e2=80=9d_case_in_MySQL's_Query_Planner=3f?=
ListArchive: http://lists.mysql.com/internals/38884
MessageId: <598C621A.1000304@oracle.com>
MIMEVersion: 1.0
ContentType: text/plain; charset=windows1252; format=flowed
ContentTransferEncoding: 7bit
On 8/9/2017 7:51 AM, Sergei Golubchik wrote:
> Hi, Jacky!
>
> On Aug 07, YuFeng Shen wrote:
>> Hi Experts,
>>
>> In the MySQL Server 5.7 source code, the formula records = (x * (ba) + a*cb)/(c1) is used in the query planner to calculate the number of records when key distribution statistics are not available.
>> Where is this formula coming from, how was it derived, or why is this
>> specific formula the formula that's being used? Does it have an
>> established theoretical underpinning, and if so, what is its basis?
>> https://github.com/mysql/mysqlserver/blob/5.7/sql/sql_planner.cc#L529
>>
>> Assume that the first key part matches 1% of the file
>> and that the whole key matches 10 (duplicates) or 1
>> (unique) records.
>> Assume also that more key matches proportionally more
>> records
>> This gives the formula:
>> records = (x * (ba) + a*cb)/(c1)
>> b = records matched by whole key
>> a = records matched by first key part (1% of all records?)
>> c = number of key parts in key
>> x = used key parts (1 <= x <= c)
>
> According to git history, this was present even in the very first
> revision, when MySQL moved from CVS to BitKeeper in July 2000.
>
> There's only one person in Oracle that was in MySQL back than, and I
> don't think he's reading this mailing list.
>
> So, I'm afraid, only Monty can know that.
>
I can see why there is a formula at all in lines 551565 in that same file.
if (tab>records() == 0)
tmp_fanout= 0.0;
else if (rec_per_key / tab>records() >= 0.01)
tmp_fanout= rec_per_key;
else
{
const double a= tab>records() * 0.01;
if (keyinfo>user_defined_key_parts > 1)
tmp_fanout=
(cur_used_keyparts * (rec_per_key  a) +
a * keyinfo>user_defined_key_parts  rec_per_key) /
(keyinfo>user_defined_key_parts  1);
else
tmp_fanout= a;
set_if_bigger(tmp_fanout, 1.0);
*in pseudo logic*
If
then tmp_fanout=0
If
then tmp_fanout= # of records per key (the Cardinality)
otherwise
create a nonzero value of tmp_fanout based on a different estimate
That estimate is based on a some initial conditions
a) there are rows in the table
b) Each indexed tuple is likely to match less than 1% of the entire table
c) At least one of the leftmost columns of an index will be used for
searching the table.
I don't have time (right now) to reverse engineer the entire derivation
but try to imagine a multicolumn index as if it were a value tree. Let's
say you have a 3column index (a,b,c). For each unique value of a (a1,
a2, a3,...) , there will be a subset of the b values that are associated
with it (for a1, that might be the set {b65, b90, b205}) and that for
each {a,b} tuple there is a set of c values associated with those. At
the end of that tree is a set of row identifiers that identify which
rows had those combinations of values. Using some crude ASCII art, it
might look like this... (only diagramming the first few values)
a1+ b65 + c1 > (row104, row107, )
 + c2 > (row5, row305, row400)
 + c4 > (row799)

+ b90 + c2 > (row4,row6,row15)
 + c3 > (row19)

+ b205 + c2 > (row75)
a2 ...
Does that help you to visualize how the formula can provide a nonzero
estimate of what a "fanout factor" could be? As Sergei said, the
originator of the formula in our code (and their notes) is not available
for direct comment.
I also wasn't around when it was created so I could not have any emails
containing a discussion about its derivation.
If you think you may have a better formula for the estimation, we would
be happy to discuss it.
Yours,

Shawn Green
MySQL Senior Principal Technical Support Engineer
Oracle USA, Inc.  Integrated Cloud Applications & Platform Services
Office: Blountville, TN
Become certified in MySQL! Visit https://www.mysql.com/certification/
for details.