MySQL Lists are EOL. Please join:

List:Cluster« Previous MessageNext Message »
From:Stewart Smith Date:July 12 2005 3:45am
Subject:Re: can mysql use the shared-storage .
View as plain text  
On Tue, 2005-07-12 at 09:09 +0800, huang mingyou wrote:
> hello ,all
>         I want let two mysqld server use a shared-storage,the
> shared-storage haved been format to
> the ocfs2 filesystem. the oracle cluster fitle system.
>          Can mysql use it ?
>                                                     hmy.

OCFS was a very 'only for oracle data files' file system. OCFS2 seems to
be an attempt at a more general shared storage filesystem.

From the OCFS2 website (
> This is BETA software. It should absolutely NOT be run on production
> systems. If you are looking to run OCFS on a production system, check
> out OCFS version 1.

So, after we've been suitably warned, let's look at it.

There are a couple of neat things about it:
- using the existing JBD interface in the (linux) kernel (this is what
ext3 uses)
- each node has its' own journal.
	- this presumably will give better metadata performance. However,
you're still going to be doing network based locks for the structures,
so come to think of it - I don't know if this is much of a gain.
	- this is in contrast to a system such as SGI's CXFS which has one
journal for the filesystem and only one node updates this.
	- Since with multiple journals you're going to have to get some locks
before modifying data structures anyway, it would be interesting to see
if there is any real performance difference. It's also useful to
consider that clustered filesystems are generally never good performers
when in comes to metadata updates.
- There are a lot of improvements over OCFS. This means there is also a
lot of new code. It's unclear how well tested this all is.
- It's all free software and they're working on getting it integrated
with the main kernel tree.

And there's a bunch of things that aren't that good:
- small allocation sizes. 4k seems to be a maximum block size. However,
they are also using a concept of a cluster, which seems to get around
this by adding another layer of abstraction.
- the use of a bitmap for block allocation is worrying as you then rely
on a 'hunt and peck' method of finding blocks. In contrast, XFS uses two
B+trees - one ordered by extent location, the other by extent size. This
means it's easy to search for size or location of a free extent (as you
are nearly always allocating space that you want to have some locality
with another inode - or you're doing something of a certain size).
- it looks like they've copied the ext2/3 way of doing directories. This
sucks for any directory with any decent number of files in it (it's an
unsorted list, so searching is horrible). They seem to have htree
support, but these still don't benchmark well (I think the namesys guys
have some benchmarks of it vs reiser3 and reiser4).
- no shared writable mmap.

So what about using it with MySQL?

You can share myisam tables (by enabling external locking and using some
foo) with several mysql servers accessing the same files.

Note that if there are bugs in the locking code of your filesystem, you
will not have much fun at all.

You can use shared storage with MySQL Cluster, but each node is just
going to write it's own files - there isn't any real sharing going on
(although it makes it easier to get at all the backup files).

It all really comes down to a fundamental difference between the way
oracle does clustering and how mysql does clustering.

We take the 'shared nothing' approach and oracle have taking the 'shared
storage' approach. Shared storage costs big bucks. You're going to be
paying $1k for a HBA, about another $1k per port on a fibre switch. You
now get to buy disk.

You also now have your shared-storage as a single point of failure.
Think about what happens to your cluster if the filesystem becomes
corrupted? In a shared-nothing architecture, only the one node would be
affected, not the entire cluster. Software is buggy, so the less of it
you require to not go wrong, the better.

This isn't saying shared storage is useless - it's great if you need
high bandwidth to the same big files on multiple machines (think movie
effects studios).

The shared nothing approach means that everything is redundant and
failures such as a disk dying only affects that node. It also means
you're buying commodity hardware at commodity hardware prices. MySQL
Cluster lets you work out your disk bandwidth requirements per-node for
the checkpoints. It is extremely unlikely that you're going to need high
bandwidth for this, so you save a heap on storage.

Why have you decided that you're going to have OCFS2?
Is this part of another system?
What kind of data are you going to be storing?
How much?
What type of queries are you going to be running?
How? where from?

these questions will help us work out your situation and talk
specifically about it.

hope this helps,
and thanks for the opportunity to read up on OCFS2 :)
Stewart Smith, Software Engineer
Office: +14082136540 Ext: 6616
VoIP: 6616@stripped
Mobile: +61 4 3 8844 332

Jumpstart your cluster:

Attachment: [application/pgp-signature] This is a digitally signed message part signature.asc
Attachment: [application/pgp-signature] This is a digitally signed message part signature.asc
can mysql use the shared-storage .huang mingyou12 Jul
  • Re: can mysql use the shared-storage .Stewart Smith12 Jul
    • Re: can mysql use the shared-storage .Martino Piccinato12 Jul
      • Re: can mysql use the shared-storage .Stewart Smith13 Jul