List:General Discussion« Previous MessageNext Message »
From:Jim Crumpler Date:March 15 1999 2:38pm
Subject:scaling over a few hundred million rows.
View as plain text  
I have some scaling questions for anyone who would like to assist.

Consider the following requirements for a data repository.

* a continuous input stream of data at a maximum rate of 500 rows per second.
(43 million rows per day).
* each row contains two timestamps, a key and a value.  An index would be
required for one timestamp and the key.
* a number of clients must read from this data in various ways.

I thought MySQL could be handy for this task, since its relatively lightweight,
its multithreaded and offers the flexibility of SQL.  I built a small scale
test server (Solaris 2.6, PII-400, 256MB RAM, a few 9GB drives striped onto two
ultra-wide controllers) and I've been working on test database for a few months
now and I'm almost ready to go to the next level and build a bigger test
machine.

If I had one BIG table that absorbed 43 million rows of data per day I'd bump
into alot of locking problems.  Given every write locks the entire table from
readers or writers, the table would spend most of its time locked causing
problems for clients wanting to read the data..  Multiple threads on other CPUs
would also get locked out while trying a parallel insert.   I thought about
spreading the data over many smaller tables, since it would effectively create
a finer grain of locking, allowing the input process to only lock one "small"
table at a time, while clients could hapily read from the other 99 or so.   It
would also make smaller tables and smaller indexes - but more of them.  I
tested with 100 tables and cranked up the table cache size to hold all the
files open (boy, the penalty of closing and opening the files is big).
Unfortunately it was about 5 times slower (the disks thrashed themselves to
death).  I'm guessing that the larger working set of blocks required to access
the indexes for all the tables didn't fit too well in memory, and the OS paged
it all in and out.   If this was the case, then I'm assuming that throwing a
few GB of silicon at the problem would solve it.  Can anyone think of other
reasons why having lots of tables would be worse than having one mega table?
Has anyone done any working page set analysis on MySQL?

The other thought was CPUs.  If I have lots of smaller tables I'm assuming that
I can write to many tables at once using a parallel set of processes to write.
The real trick of would striping the disk enough to prevent it being IO bound
and making the memory large enough to prevent page thrashing.  Can anyone think
of a reason why this wouldn't speed it up.?

I found it quite interesting to watch a truss/trace/ptrace of the mysqld server
while writing to these tables.  It reads 1k chunks from the index a few times
(traversing the B-tree I assume) and then writes a small about of data into a
selected location within the data file.   It later writes the index data back
(in 1k chunks again).

Anyway, any comments or help on optimising the management of a few hundred
million rows would be handy..

Thanks for any help

Jim Crumpler.












Thread
scaling over a few hundred million rows.Jim Crumpler15 Mar
  • scaling over a few hundred million rows.Michael Widenius15 Mar