List:Internals« Previous MessageNext Message »
From:Michael Widenius Date:January 27 2001 4:48pm
Subject:MySQL parallel server [long mail]
View as plain text  
Hi!

>>>>> "Denis" == Denis Pithon <denis.pithon@stripped> writes:

Denis> Hi all,
Denis> Yes, it's again a mail about MySQL parallel server and this one is
Denis> quite long... I work for two months to enable MySQL as a parallel
Denis> server on Linux cluster. As you can guess, I encountered a bunch of
Denis> problems !

Denis> ** Context **

Denis> At Lineo HA, we provide a Linux based software (Availix) which powers
Denis> a CompactPCI hardware. Roughly, this hardware features one disk,
Denis> shared between a couple of active nodes (up to 5).  Each node could be
Denis> seen as a diskless Linux PC, and all nodes runs the same service (one
Denis> of httpd, ftpd... or why not... mysqld). IPVS runs on a particular
Denis> node (controller) and dispatch IP query to the others. I hope that you
Denis> understand that for a such hardware solution, replication is
Denis> unfortunately not suitable.

Denis> First of all, we use GFS which is realy better than NFS (for Linux, at
Denis> least). I had to adapt sql/my_lock.c because GFS don't support fcntl()
Denis> locking (so, I use flock). I test it, and it seems to work fine (with
Denis> one server). I compiled mysql server with debug mode on, and I use the
Denis> following options for the servers:

Denis> safe_mysqld --enable-locking --one-thread --flush --safe-mode

You don't need --one-thread or --safe-mode 
and I don't think you will need --flush

Denis> Certainly very slow, but we don't want speed, we first want
Denis> availability...

Denis> The first query in my C program tests is always 'LOCK TABLES' and the
Denis> last 'UNLOCK TABLES'. Moreover, these programs run in a connect - lock
Denis> - process - unlock - deconnect loop. Each new connection is
Denis> established with another server (the goal is to simulate real running
Denis> context).

Denis> ** Tests **

Denis> I don't want to flood you with all details but updates seems to works
Denis> quite well: an update done by one server is seen by the others. I
Denis> wrote a C program wich fill/empty a stack and I launch several
Denis> occurrences of it. It runs more than one day! Great! But this test is
Denis> quite simple and don't use insert queries.

Denis> Unfortunately there is many more problems with insert queries. In fact
Denis> if you create a table with one server and add a row with another, that
Denis> row may be invisible for the other servers (for the table creator
Denis> too). I mean that you could have a result like :
Denis>                    1  'one'       with the server which insert
Denis>                    0  (NULL)      with another server

Could there be some caching in GFS that causes this?

When you do an insert, MySQL will write the data to the data and index
file and update the status information in the beginning of the key
block before the query returns.

You should be able to verify this by doing an insert and then:

myisamchk -dv table_name

at once.

Denis> Wait a couple of seconds, do a new select and you can obtain the good
Denis> result in both !  If you process a new insert with another server, it
Denis> don't crush the last inserted row and you see both rows with a select
Denis> (even if you had 0 (NULL) before)...  And if you create and insert
Denis> with the same servers the others nodes seens the new row !  Strange
Denis> isn't it ? Moreover, the table check is often annoying by these
Denis> inserts. It tells me things like this for the servers which don't do
Denis> the insert:

Denis>    Size of datafile is: 0         Should be: 25

Denis> Which is wrong, index file, data file, both or none ? 

The above just means that when mysqld did a stat of the file, it got
back 0.  The header in the .index file however tells us there should
be 25 bytes in the file.

Can you try to use 'ls' to see where the problem could be?

Denis> But if I want few seconds and re-check the table, all may be clean...
Denis> The situations isn't very clear. And repetition of the same actions
Denis> don't give the same results, that's quite annoying !

Looks like disk caching problems...

Denis> I try this test with one PC wich runs several servers. The problems
Denis> are the same. GFS seems to be ok.

Denis> ** What I want to do **

Denis> I'm actually looking for a way to force mysqld servers to flush index
Denis> file after any query and / or force mysqld server to re-read index
Denis> file before any query... I know, that's a terrific slow down for the
Denis> server, but I think I have no other choices than test it.

MySQL does already do the above by default.  The big question is
if this doesn't work with the version you are using because of some
unknown problem or if MySQL doesn't get the information that the disk
block has changed.

Denis> I have to explore the source deeper. I have seen many IO caches (bad
Denis> news for me) and even a mmap (sql_mmap.cc), ouch! To check if we could
Denis> use MySQL as a database parallel server I wan't to cancel the use of
Denis> the cache. Is it possible ? And is it really usefull ?

The memmap is only used for compressed tables (not a problem for you);
The IO caches are always flushed after every query.

Denis> OK, I hope I hurt nobody in the MySQL development team! I know that
Denis> I'm trying to slow down a formula 1 to a snail speed :-) But the
Denis> results I have with MySQL are terrifically better than these of mSQL,
Denis> PostreSQL and actual commercial products (DB2, Informix, Sybase...)
Denis> wich are designed for distributed database only.

The problem is that your current setup should work, without any
modifications to the MySQL code.

Which MySQL version are you using and what kind of tables do you have?

We did fix a bug in MySQL 3.23.28 and MyISAM tables that could explain
some of the above problems;  If you did use MyISAM and MySQL <
3.23.28, please upgrade and check if this fixes your problem!

Regards,
Monty

Thread
MySQL parallel server [long mail]Denis Pithon26 Jan
  • Re: MySQL parallel server [long mail]Sasha Pachev27 Jan
  • MySQL parallel server [long mail]Michael Widenius27 Jan
  • Re: MySQL parallel server [long mail]Paul Cadach28 Jan
Re: MySQL parallel server [long mail]Mauricio Breternitz26 Jan
  • Re: MySQL parallel server [long mail]Michael Widenius27 Jan