From: YuFeng Shen Date: August 7 2017 2:10am Subject: =?Windows-1252?Q?How_was_this_records/fanout_logic_derived_for_the_=93no_?= =?Windows-1252?Q?statistics=94_case_in_MySQL's_Query_Planner=3F?= List-Archive: http://lists.mysql.com/internals/38882 Message-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Hi Experts, =A0 In the MySQL Server 5.7 source code, the formula records =3D (x * (b-a) + a= *c-b)/(c-1) is used in the query planner to calculate the number of records= when key distribution statistics are not available. =A0Where is this formula coming from, how was it derived, or why is this sp= ecific formula the formula that's being used? Does it have an established t= heoretical underpinning, and if so, what is its basis? =A0https://github.com/mysql/mysql-server/blob/5.7/sql/sql_planner.cc#L529 =A0=A0=A0=A0=A0=A0=A0=A0=A0 Assume that the first key part matches 1% of th= e file =A0=A0=A0=A0=A0=A0=A0=A0=A0 and that the whole key matches 10 (duplicates) = or 1 =A0=A0=A0=A0=A0=A0=A0=A0=A0 (unique) records. =A0=A0=A0=A0=A0=A0=A0=A0=A0 Assume also that more key matches proportionall= y more =A0=A0=A0=A0=A0=A0=A0=A0=A0 records =A0=A0=A0=A0=A0=A0=A0=A0=A0 This gives the formula: =A0=A0=A0=A0=A0=A0=A0=A0=A0 records =3D (x * (b-a) + a*c-b)/(c-1) =A0=A0=A0=A0=A0=A0=A0=A0=A0 b =3D records matched by whole key =A0=A0=A0=A0=A0=A0=A0=A0=A0 a =3D records matched by first key part (1% of = all records?) =A0=A0=A0=A0=A0=A0=A0=A0=A0 c =3D number of key parts in key =A0=A0=A0=A0=A0=A0=A0=A0=A0 x =3D used key parts (1 <=3D x <=3D c) =A0 =A0 Jacky