List:Internals« Previous MessageNext Message »
From:Michael Widenius Date:June 17 2001 12:01am
Subject:Re: Myql internals
View as plain text  
Hi!

>>>>> "Dip" == Dip  <dds98@stripped> writes:

Dip> Michael,

Dip> I dont have individual timings for the queries. The data is generated from a PHP
> script
Dip> which i have attached. (createinserts.php)

Dip> i'm using mysql 3.23.32 that came with the nusphere distribution
> (www.nusphere.com). I
Dip> start it up from the command line by issuing this statement:
Dip> mysql.server start

Dip> I'm using postgres 7.0.3.

Dip> I'd really like to know why inserts are so fast in mysql as opposed to postgres
> and
Dip> interbase (or just why they are fast in mysql) any why the somewhat complex
> queries such as
Dip> those i sent you previously take longer.

I took a quick look at your script.

The reason MySQL is slow in reading in your case is that you haven't
specified any indexes.  MySQL has a lot of optimization on indexes that
PostgreSQL doesn't have, but apparently PostgreSQL is a little
better when you don't have any indexes.  I assume they have code that
generates a temporary indexes for cases when you don't have one.

The reason MySQL is faster on inserts is that we have more optimized
code and we can also buffer more during the actual inserts because the
structure for the MyISAM tables are designed for fast inserts from the
start.  That's the main benefit of having one file for data and
another for index.

Different algorithms, different tradeoffs.

I rewrite your benchmark in Perl, added the proper indexes and added
timing so that one can see the timing for the individual queries.

Here is the results:

First the MySQL server:
----------------
shell> perl test-dip.sh --comment="Intel Xeon, 2x550 Mhz, 512M, key_buffer=16M"

Testing server 'MySQL 3.23.39' at 2001-06-16 15:16:28

A small benchmarks based on data from Dipesh Sanghrajka
Creating tables
Inserting employees
Inserting product
Inserting customers
Inserting into supplier
Inserting into orders
Time for insert (50000): 19 wallclock secs ( 3.44 usr  1.70 sys +  0.00 cusr  0.00 csys = 
5.14 CPU)

Retrieving data from the tables
Time for select_simple (1000:9000):  1 wallclock secs ( 0.44 usr  0.09 sys +  0.00 cusr 
0.00 csys =  0.53 CPU)
Time for select_join (1000:1000):  1 wallclock secs ( 0.33 usr  0.02 sys +  0.00 cusr 
0.00 csys =  0.35 CPU)
Time for select_distinct (1000:910000): 220 wallclock secs ( 8.50 usr  3.21 sys +  0.00
cusr  0.00 csys = 11.71 CPU)
Time for select_distinct (1000:1000): 154 wallclock secs ( 0.45 usr  0.06 sys +
 0.00 cusr  0.00 csys =  0.51 CPU)
Note: Query took longer then time-limit: 600
Estimating end time based on:
4500900 queries in 900 loops of 1000 loops took 653 seconds
Estimated time for select_distinct (1000:5001000): 725 wallclock secs (46.01 usr 18.33 sys
+  0.00 cusr  0.00 csys = 64.34 CPU)
Time for select_group (1000:4999000): 308 wallclock secs (40.10 usr 17.46 sys +
 0.00 cusr  0.00 csys = 57.56 CPU)
Note: Query took longer then time-limit: 600
Estimating end time based on:
4500900 queries in 900 loops of 1000 loops took 660 seconds
Estimated time for select_group (1000:5001000): 733 wallclock secs (47.20 usr 19.34 sys + 
0.00 cusr  0.00 csys = 66.54 CPU)
Note: Query took longer then time-limit: 600
Estimating end time based on:
3999200 queries in 800 loops of 1000 loops took 672 seconds
Estimated time for outer_join (1000:4999000): 840 wallclock secs (39.57 usr 16.93 sys + 
0.00 cusr  0.00 csys = 56.50 CPU)
Time for order_join (1000:1000):  1 wallclock secs ( 0.34 usr  0.05 sys +  0.00
cusr  0.00 csys =  0.39 CPU)
Note: Query took longer then time-limit: 600
Estimating end time based on:
8000000 queries in 800 loops of 1000 loops took 672 seconds
Estimated time for order_join (1000:10000000): 840 wallclock secs (94.75 usr 38.30 sys + 
0.00 cusr  0.00 csys = 133.05 CPU)
Time for drop_table (5):  0 wallclock secs ( 0.00 usr  0.00 sys +  0.00 cusr  0.00 csys = 
0.00 CPU)
 
Estimated total time: 3842 wallclock secs (281.14 usr 115.49 sys +  0.00 cusr  0.00 csys =
396.63 CPU)

-------------------

Then PostgreSQL .7.1.2

shell> perl test-dip.sh --comment="Intel Xeon, 2x550 Mhz, 512M, key_buffer=16M"
> --server=pg --user=postgres

Testing server 'PostgreSQL version 7.1.1' at 2001-06-17  2:53:35

A benchmarks based on an email from Dipesh Sanghrajka
Creating tables
Inserting employees
Inserting product
Inserting customers
Inserting into supplier
Inserting into orders
Time for insert (50000): 408 wallclock secs (15.93 usr  2.32 sys +  0.00 cusr  0.00 csys =
18.25 CPU)

Retrieving data from the tables
Time for select_simple (1000:9000): 33 wallclock secs ( 0.42 usr  0.07 sys +  0.00 cusr 
0.00 csys =  0.49 CPU)
Time for select_join (1000:1000):  1 wallclock secs ( 0.33 usr  0.05 sys +  0.00 cusr 
0.00 csys =  0.38 CPU)
Note: Query took longer then time-limit: 600
Estimating end time based on:
455000 queries in 500 loops of 1000 loops took 664 seconds
Estimated time for select_distinct (1000:910000): 1328 wallclock secs (12.20 usr  1.72 sys
+  0.00 cusr  0.00 csys = 13.92 CPU)
Note: Query took longer then time-limit: 600
Estimating end time based on:
500 queries in 500 loops of 1000 loops took 701 seconds
Estimated time for select_distinct (1000:1000): 1402 wallclock secs ( 0.64 usr
0.04 sys +  0.00 cusr  0.00 csys =  0.68 CPU)
Note: Query took longer then time-limit: 600
Estimating end time based on:
1500300 queries in 300 loops of 1000 loops took 797 seconds
Estimated time for select_distinct (1000:5001000): 2656 wallclock secs (67.40 usr 12.60
sys +  0.00 cusr  0.00 csys = 80.00 CPU)
Note: Query took longer then time-limit: 600
Estimating end time based on:
2499500 queries in 500 loops of 1000 loops took 719 seconds
Estimated time for select_group (1000:4999000): 1438 wallclock secs (47.88 usr
4.58 sys +  0.00 cusr  0.00 csys = 52.46 CPU)
Note: Query took longer then time-limit: 600
Estimating end time based on:
1000200 queries in 200 loops of 1000 loops took 869 seconds
Estimated time for select_group (1000:5001000): 4345 wallclock secs (67.70 usr 10.30 sys +
 0.00 cusr  0.00 csys = 78.00 CPU)
Time for outer_join (1000:4999000): 379 wallclock secs (47.11 usr  4.42 sys +  0.00 cusr 
0.00 csys = 51.53 CPU)
Time for order_join (1000:1000):  2 wallclock secs ( 0.36 usr  0.03 sys +  0.00
cusr  0.00 csys =  0.39 CPU)
Note: Query took longer then time-limit: 600
Estimating end time based on:
2000000 queries in 200 loops of 1000 loops took 710 seconds
Estimated time for order_join (1000:10000000): 3550 wallclock secs (149.30 usr 27.50 sys +
 0.00 cusr  0.00 csys = 176.80 CPU)
Time for drop_table (5):  1 wallclock secs ( 0.00 usr  0.00 sys +  0.00 cusr  0.00 csys = 
0.00 CPU)
 
Estimated total time: 15543 wallclock secs (409.28 usr 63.63 sys +  0.00 cusr  0.00 csys =
472.91 CPU) 


A quick comparison of the above shows that the only case where
PostgreSQL was faster than MySQL was in the outer-join test.
(840 against 370 seconds);  I assume that MySQL somehow used the wrong
index in this test.

In allmost all other tests MySQL was significantly faster.


Here is the actual benchmark. It will work if you just drop this into
our benchmark suite (in the sql-bench directory).


#/usr/bin/perl
# Copyright (C) 2001 MySQL AB
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Library General Public
# License as published by the Free Software Foundation; either
# version 2 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Library General Public License for more details.
#
# You should have received a copy of the GNU Library General Public
# License along with this library; if not, write to the Free
# Software Foundation, Inc., 59 Temple Place - Suite 330, Boston,
# MA 02111-1307, USA
#
# Test based on data sent in Dipesh Sanghrajka

##################### Standard benchmark inits ##############################

use DBI;
use Benchmark;

$opt_loop_count=10000;		# number of rows/3
$opt_small_loop_count=$opt_loop_count/10;

chomp($pwd = `pwd`); $pwd = "." if ($pwd eq '');
require "$pwd/bench-init.pl" || die "Can't read Configuration file: $!\n";

if ($opt_small_test)
{
  $opt_loop_count/=100;
  $opt_small_loop_count=$opt_loop_count/10;
}
elsif ($opt_small_tables)
{
  $opt_loop_count/=1000;
  $opt_small_loop_count=$opt_loop_count/10;
}

print "A benchmarks based on an email from Dipesh Sanghrajka\n";

####
####  Connect and start timeing
####
@table_names= ("employees","product", "customers","supplier","orders");
$dbh = $server->connect();
$start_time=new Benchmark;

####
#### Create needed tables
####

goto select_test if ($opt_skip_create);

print "Creating tables\n";

# cleanup from old runs
for ($i = 0; $i <= $#table_names; $i++)
{
  my $table_name = $table_names[$i];
  $sth = $dbh->do("drop table $table_name" . $server->{'drop_attr'});
}

$dbh->do("drop table orders" . $server->{'drop_attr'});

do_many($dbh,$server->create("employees",
			     ["employeeno mediumint not null",
			      "lastname char(50)",
			      "firstname char(50)",
			      "salary float"],
			     ["primary key (employeeno)"]));

do_many($dbh,$server->create("product",
			     ["productno mediumint not null",
			      "productname char(50)",
			      "supplierno mediumint",
			      "price float",
			      "serialno double"],
			     ["primary key (productno)"]));

do_many($dbh,$server->create("customers",
			     ["customerno mediumint not null",
			      "customername char(50)",
			      "customeraddress char(50)",
			      "telephone int"],
			     ["primary key (customerno)"]));

do_many($dbh,$server->create("supplier",
			     ["supplierno mediumint NOT NULL",
			      "suppliername char(50)",
			      "supplieraddress char(50)",
			      "supplierserial float"],
			     ["primary key (supplierno)"]));

do_many($dbh,$server->create("orders",
			     ["orderno mediumint NOT NULL",
			      "customerno mediumint",
			      "employeeno mediumint",
			      "productno mediumint",
			      "quantity tinyint"],
			     ["primary key (orderno)",
			      "index idx_productno (productno)"]));

if ($opt_lock_tables)
{
  $sth = $dbh->do("LOCK TABLES employees,product, customers,supplier,orders WRITE") ||
die $DBI::errstr;
}

####
#### Insert data init the tables
####

$loop_time=new Benchmark;

print "Inserting employees\n";
$query="INSERT INTO employees(employeeno, lastname, firstname, salary) VALUES (";
for ($i=1 ; $i <= $opt_loop_count ; $i++)
{
  my ($f1,$f2,$f3,$f4);
  $f1 = $i;
  $f2 = 'xyz';
  $f3 = 'abc';
  $f4 = ($i * 10.25);
  $sth = $dbh->do($query . "$f1,'$f2','$f3',$f4)") or die $DBI::errstr;
}

print "Inserting product\n";
$query="INSERT INTO product(productno, productname, supplierno, price, serialno) VALUES
(";
{
  my ($f1,$f2,$f3,$f4,$f5);
  $f3 = 1;
  $f5 = 0.5;
  for ($i=1 ; $i <= $opt_loop_count ; $i++)
  {
    $f1 = $i;
    $f2 = 'producta';
    if ($i % 11 == 0)
    {
      $f3 = ($f3+1);
    }
    $f4 = ($i * 0.75);
    $f5 = ($f5 + 1);
    $sth = $dbh->do($query . "$f1,'$f2',$f3,$f4,$f5)") or die $DBI::errstr;
  }
}

print "Inserting customers\n";
$query="INSERT INTO customers(customerno, customername, customeraddress, telephone) VALUES
(";
{
  my ($f1,$f2,$f3,$f4);
  $f4 = 1000000000;
  for ($i=1 ; $i <= $opt_loop_count ; $i++)
  {
    $f1 = $i;
    $f2 = 'customera';
    $f3 = 'addressa';
    $f4 = ($f4 + 1);
    $sth = $dbh->do($query . "$f1,'$f2','$f3',$f4)") or die $DBI::errstr;
  }
}


print "Inserting into supplier\n";
$query="INSERT INTO supplier(supplierno, suppliername, supplieraddress, supplierserial)
VALUES (";
{
  my ($f1,$f2,$f3,$f4);
  $f4 = 0.52524524;
  for ($i=1 ; $i <= $opt_loop_count ; $i++)
  {
    $f1 = $i;
    $f2 = 'suppliera';
    $f3 = 'supplieraddressa';
    $f4 = ($f4 + 1);
    $sth = $dbh->do($query . "$f1,'$f2','$f3',$f4)") or die $DBI::errstr;
  }
}

print "Inserting into orders\n";
$dbh->do("INSERT INTO orders(orderno, customerno, employeeno, productno, quantity)
VALUES(10000, 10000, 10000, 10000, 1)") or die $DBI::errstr;
$query="INSERT INTO orders(orderno, customerno, employeeno, productno, quantity) VALUES
(";
{
  my ($f1,$f2,$f3,$f4,$f5);
  $f2 = 1;
  $f3 = 1;
  $f4 = 1;
  for ($i=1 ; $i < $opt_loop_count ; $i++)
  {
    $f1 = $i;
    $f2 = ($f2 + 2)  if ($i % 2 == 0);
    $f3 = ($f3 + 1) if ($i % 10 == 0);
    $f4 = ($f4 + 2) if ($i % 2 == 0);
    $f5 = ($i % 10) + 1;
    $sth = $dbh->do($query . "$f1,$f2,$f3,$f4,$f5)") or die $DBI::errstr;
  }
}

$end_time=new Benchmark;
print "Time for insert (" . ($opt_loop_count*5) . "): " .
  timestr(timediff($end_time, $loop_time),"all") . "\n\n";

####
#### Do some selects on the tables
####

select_test:

print "Retrieving data from the tables\n";
$loop_time=new Benchmark;
$error=0;

@Q=(
["SELECT * FROM employees WHERE employees.employeeno < 10",
 "select_simple"],
["SELECT customers.customername, customers.telephone FROM customers, orders
WHERE ((orders.productno=1) And (customers.customerno=orders.customerno))",
 "select_join"],
["SELECT DISTINCT supplier.supplierno, supplier.suppliername,
supplier.supplieraddress, supplier.supplierserial FROM supplier, product WHERE
(supplier.supplierno=product.supplierno)",
"select_distinct"],
["SELECT DISTINCT employees.firstname, employees.lastname FROM employees INNER
JOIN orders ON employees.employeeno = orders.employeeno",
"select_distinct"],
["SELECT DISTINCT orders.customerno, customers.customername,
customers.customeraddress, product.productname FROM (customers INNER JOIN
orders ON customers.customerno=orders.customerno) INNER JOIN product ON
orders.productno=product.productno order by customerno",
"select_distinct"],
["SELECT orders.customerno, customers.customername FROM orders INNER JOIN
customers ON orders.customerno=customers.customerno GROUP BY orders.customerno,
customers.customername HAVING Count(orders.customerno)>1",
"select_group"],
["SELECT product.productname, supplier.suppliername, product.price,
product.serialno, Sum(orders.quantity) AS SumOfquantity FROM (orders INNER JOIN
product ON orders.productno = product.productno) INNER JOIN supplier ON
product.supplierno = supplier.supplierno GROUP BY product.productname,
supplier.suppliername, product.price, product.serialno, orders.productno",
 "select_group"],
["SELECT product.productno, product.productname FROM product LEFT JOIN orders ON
product.productno = orders.productno WHERE ((orders.productno) Is Null)",
"outer_join"],
["SELECT employees.employeeno, employees.lastname, employees.salary FROM
employees INNER JOIN orders ON employees.employeeno = orders.employeeno WHERE
(orders.productno=1) ORDER BY employees.salary DESC",
"order_join"],
["SELECT orders.orderno, customers.customername, product.productname,
employees.firstname, (orders.quantity)*(product.price) AS TotaPrice FROM
orders, customers, product, employees WHERE
((orders.customerno=customers.customerno)
And(orders.productno=product.productno)
And(employees.employeeno=orders.employeeno)) order by orderno ASC",
"order_join"]);

foreach $arg (@Q)
{
  my ($total_rows,$query,$type,$estimated);
  $total_rows=$estimated=0;
  $query= $arg->[0];
  $type=  $arg->[1];

  $loop_time= new Benchmark;
  for ($i=1 ; $i <= $opt_small_loop_count ; $i++)
  {
    $total_rows+=fetch_all_rows($dbh,$query);
    $end_time=new Benchmark;
    if (($i % 100) == 0)
    {
      last if ($estimated=predict_query_time($loop_time,$end_time,
					     \$total_rows,$i,
					     $opt_small_loop_count));
    }
  }
  if ($estimated)
  { print "Estimated time"; }
  else
  { print "Time"; }
  print " for $type ($opt_small_loop_count:$total_rows): " .
    timestr(timediff($end_time, $loop_time),"all") . "\n";
}

#
# Drop tables
#
if (!$opt_skip_delete)
{
  $loop_time= new Benchmark;
  for ($i = 0; $i <= $#table_names; $i++)
  {
    my $table_name = $table_names[$i];
    $sth = $dbh->do("drop table $table_name" . $server->{'drop_attr'});
  }
  $end_time=new Benchmark;
  $count= $#table_names +1;
  print "Time for drop_table ($count): " .
    timestr(timediff($end_time, $loop_time),"all") . "\n\n";
}

if ($opt_fast && defined($server->{vacuum}))
{
  $server->vacuum(0,\$dbh);
}


####
#### End of benchmark
####

$dbh->disconnect;				# close connection

end_benchmark($start_time);


Regards,
Monty
Thread
Myql internalsDip12 Jun
  • Re: Myql internalsSasha Pachev12 Jun
  • Myql internalsMichael Widenius12 Jun
Re: Myql internalsMichael Widenius17 Jun