From: Jim Crumpler Date: March 15 1999 2:38pm Subject: scaling over a few hundred million rows. List-Archive: http://lists.mysql.com/mysql/261 Message-Id: <005f01be6ef1$8c71e240$730b09d2@unicity.com.au> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit I have some scaling questions for anyone who would like to assist. Consider the following requirements for a data repository. * a continuous input stream of data at a maximum rate of 500 rows per second. (43 million rows per day). * each row contains two timestamps, a key and a value. An index would be required for one timestamp and the key. * a number of clients must read from this data in various ways. I thought MySQL could be handy for this task, since its relatively lightweight, its multithreaded and offers the flexibility of SQL. I built a small scale test server (Solaris 2.6, PII-400, 256MB RAM, a few 9GB drives striped onto two ultra-wide controllers) and I've been working on test database for a few months now and I'm almost ready to go to the next level and build a bigger test machine. If I had one BIG table that absorbed 43 million rows of data per day I'd bump into alot of locking problems. Given every write locks the entire table from readers or writers, the table would spend most of its time locked causing problems for clients wanting to read the data.. Multiple threads on other CPUs would also get locked out while trying a parallel insert. I thought about spreading the data over many smaller tables, since it would effectively create a finer grain of locking, allowing the input process to only lock one "small" table at a time, while clients could hapily read from the other 99 or so. It would also make smaller tables and smaller indexes - but more of them. I tested with 100 tables and cranked up the table cache size to hold all the files open (boy, the penalty of closing and opening the files is big). Unfortunately it was about 5 times slower (the disks thrashed themselves to death). I'm guessing that the larger working set of blocks required to access the indexes for all the tables didn't fit too well in memory, and the OS paged it all in and out. If this was the case, then I'm assuming that throwing a few GB of silicon at the problem would solve it. Can anyone think of other reasons why having lots of tables would be worse than having one mega table? Has anyone done any working page set analysis on MySQL? The other thought was CPUs. If I have lots of smaller tables I'm assuming that I can write to many tables at once using a parallel set of processes to write. The real trick of would striping the disk enough to prevent it being IO bound and making the memory large enough to prevent page thrashing. Can anyone think of a reason why this wouldn't speed it up.? I found it quite interesting to watch a truss/trace/ptrace of the mysqld server while writing to these tables. It reads 1k chunks from the index a few times (traversing the B-tree I assume) and then writes a small about of data into a selected location within the data file. It later writes the index data back (in 1k chunks again). Anyway, any comments or help on optimising the management of a few hundred million rows would be handy.. Thanks for any help Jim Crumpler.