Hello to everybody,
I hope this is the right mailing list to discuss this, I'm trying to
evaluate mysql cluster as a persistence for a software we are
developping. I'd like to have your opinions on mysql cluster solution
scalability, and have a discussion on the debated "shared nothing" vs.
"shared everything" question.
Oracle is pushing very much on its cluster architecture for Oracle 9i
(and 10g) claiming the obvious (is it?) superiority of a "shared
everything" approch against the "shared nothing" approach in clustering,
the latter is used by mysql cluster and considered by mysql AB the
better approach.
In my opinion it seems that both approaches have their pros and cons
depending on your needs. I'll try to summarize that below as I
understand it and ask about your experiences and opinions in trying to
overcome some of the limitations (or just correct my possbile
misunderstandings :-) ).
1) Cost. The "shared nothing" approach it's definitely cost effective as
it can be based on "low end" server that fully (?) replicate the
database datas in their storage (even cheap storage) and does not imply
costly shared storage solution. On the other end the cost (and
manageeability problems) might increase as the storage space needed
increase (see point 4)
2) High availability. The shared storage needed by the "shared
everything" solution constitutes a "single point of failure": whether
you buy a very (very very) expensive storage engine that implement some
high (high, high) availability solution for storage or this can become a
real problem. And by the way: isn't an "high, high, high" availability
solution for shared storage a case of "vertical scaling" (it's getting
high availability through means of a better single device...).
In my opinion a cluster made of different REALLY INDIPENDENT servers
will be always more highly available than a cluster with shared storage.
3) Performance. Oracle cluster parallelization seems to be really
superior in this as it can automatically parallelize queries among
different servers sharing resources while a query hitting a server in a
mysql cluster will be entirely processed by that same server.
I wonder if the shared storage (disks, fiber channel, concurrent access
etc.) can become a bottleneck when you add more server to the cluster,
at least could become a very expensive part of the cluster
(administratively and ecnonomically, adding clustered storage etc.).
4) Storage space scalability. Full replication on a "shared nothing"
cluster has a shortcoming in available storage space as every new node
added must have at least as much space as the other nodes. This might
not be a problem for a x*10GB database but can become very expensive for
a x*100GB (many disk for every node) and really difficult for a TB
database: Whether you add shared storage or raid disks you have to fully
duplicate the entire DB storage for every single server.
I quote here below Mysql AB and Oracle statemenst about each other
products (Oracle statement is generically about "old fashioned" shared
nothing systems). I think point (2) of mysql AB statement is false (or I
just can't understand it) while point 2 and 3 of Oracle statement are
not completely correct if applied to mysql cluster.
What are your thoughts and experience about it?
Mysql AB statement about Oracle RAC
> Oracle RAC is a complex product which requires significant investment
> in hardware and software as well as development and administration
> skills. Oracle RAC relies on a "shared storage" architecture that
> requires an additional investment in SAN (Storage Area Network)
> infrastructure. The requirement for a SAN results in:
>
> * An additional expense for customers since they have to turn to a
> 3rd party for a networked storage solution. A shared disk can
> cost $15k-20k in addition to the database license even for a
> small implementation.
> * Recovery from a failed node requires access to the shard-disk
> which increases time to failover to minutes vs. the sub-second
> failover time of MySQL Cluster.
> * A single point of failure in the cluster.
>
> MySQL Cluster provides a high-availability (99.999%) database for the
> mass market. MySQL Cluster does not require specialized hardware or
> skills. MySQL Cluster is a "shared nothing" architecture which does
> not require any additional infrastructure investments.
>
From Oracle RAC overview document:
> - First, the shared nothing approach is not optimal for use on shared
> everything SMP hardware. The requirement to physically partition data in
> order to derive the benefits of parallelism is clearly an artificial and
> outdated requirement in a shared everything SMP system, where every
> processor has direct, equal access to all the data.
> - Second, the rigid partitioning-based parallel execution strategy
> employed
> in the shared nothing approach often leads to skewed resource utilization,
> e.g. when it is not necessary to access all partitions of a table, or when
> larger non-partitioned tables, owned by a single node, are part of an
> operation. In such situations, the tight ownership model that prevents
> intra-partition parallel execution fails to utilize all available
> processing
> power, delivering sub-optimal use of available processing power.
> - Third, due to the fact of having a physical data partition to node
> relationship, shared nothing systems are not flexible at all to adapt to
> changing business requirements. When the business grows, you cannot
> easily enlarge your system incrementally to address your growing
> business needs. You can upgrade all existing nodes, keeping them
> symmetrical and
> avoiding data repartitioning. In most cases upgrading all nodes is too
> expensive; you have to add new nodes and to reorganize - to physically
> repartition - the existing database. Having no need for reorganization is
> always better than the most sophisticated reorganization facility.
> - Finally, shared nothing systems, due to their use of a rigid
> restricted access
> scheme, fail to fully exploit the potential for high fault-tolerance
> available
> in clustered systems.