List:Cluster« Previous MessageNext Message »
From:Robby Dermody Date:September 27 2008 3:30am
Subject:RE: SQL node questions
View as plain text  
Thanks Massimo, I appreciate the response. We have an amount budgeted to do
a cluster jumpstart with the appropriate time comes. I also forgot to
mention the management nodes. For those, we're planning on having two
lower-end servers per cluster for redundancy. We could run LVS/heartbeat on
them if we needed to. Is this usually a better idea than running them on the
first two SQL node servers instead?

The application we're planning to use cluster with is a large
content/communications platform. Cluster 5.0 was out of the question, and
5.1 was close but no cigar.... really it has become much more attractive
with the features in 6.2 and 6.3. We are well aware of the inherent
performance issues around incorrect cluster use (e.g. cluster is not a
cure-all, no FK support, indexes in memory, less-than-stellar JOIN
performance, must be tuned properly, etc) and our queries are being designed
accordingly: smaller, simpler, very little to any JOINS or subselects. We
have also looked at other approaches such as sharding (which as I understand
it cluster does internally in one form or another anyhow).

Beyond just running with Mysql cluster (which we hope to have as
distribution-aware with our schema as well), we are exercising functional
partitioning...splitting our schema up into 8+ separate database structures
which we can run across separate cluster hardware as necessary. At first
most of this will be on the same physical cluster setup, but as things grow
we can migrate DBs off onto their own separate clusters. We have an approach
in place where all DB access will be done through servers running mysql
proxy, which will maintain persistent backends to the multiple clusters we
have and route queries accordingly (so that the application layer does not
need to know about DB partitioning specifics, and we can more easily adjust
things on the fly), etc etc...
 
One additional thing I had questions on however is Master-master
geo-replication. As I understand it now, it's in 6.3 and pretty solid, but
only the telcos are really using it currently. We were looking into it for
two things:

1. As a form of off-site DR: The obvious one...have two clusters replicating
to each other. Writes could go to one or both. As they both have master
status, failover could be much quicker. I have some concerns about split
brain scenarios here... DR could obviously be handled with a slave cluster
(which we could manually switch to be master in case of extended site
failure).... or even as a traditional mysql setup, just for off-site
archival purposes.

2. To provide extra capacity: However, say we 1 4 node cluster at 60% load,
and we put a 2nd cluster in and do a master-master setup. Now are they both
at 60% load (since they're replicating to each-other)? Does master-master
normally buy us any extra performance here?

Also interested in the bandwidth and latency observations with
master-master.... e.g. for larger clusters are we normally looking at
20mbps, a DS3, more? If we have the cluster sites sitting across the US
(40-60ms or so), are we still fine to have reads/writes going against each
cluster? *Especially*, how good are the automatic conflict resolution
features? Unlike some of our telco friends, we're not just writing CDR
records here....

Any light you could shed on any of this would be excellent.

Robby


-----Original Message-----
From: Massimo [mailto:massimo@stripped] 
Sent: Thursday, September 25, 2008 4:27 PM
To: Robby Dermody
Cc: cluster@stripped
Subject: Re: SQL node questions

Hi Robby,

interesting setup, I'd like to understand better what kind of 
applications you are planning to run on cluster 6.3.

Regarding your questions:

1. SCI will give benefits also in the communication between sql and data 
nodes, there is a lot of data transmitted here. Usually the sql nodes 
are more cpu bound than i/o bound so your 2xquad cores are really enough.

2, Licensing: the license cost is per cpu in both sql and data nodes. 
There is no license for the management. In your set up you haven't 
mentioned the management. Where do you want to run it?

3. This ratio was before the introduction of the multiconnect between 
mysqlds and ndbds. Now you can set in the my.cnf of the sql nodes a 
variable called ndb-cluster-connection-pool that enables a 
multi-connection. Each connection is seen as 1 sql nodes. So if you have 
2 sql nodes and ndb-cluster-connection-pool is set to 3, then you need 
to have 6 sql node free in your config.ini. Regarding the data nodes, in 
the 6.3 you can lock the 2 threads (execution and maintenance thread) to 
different cores, so in your setup 3 ndbd daemons for each physical 
machine are enough (leave a couple of cores to the OS and other stuff).


If you are planning to use this set up for a critical application, I 
recommend also to use some days of our consultants to setup and tune 
everything.

Good luck
Massimo

Robby Dermody ha scritto:
> Hi guys,
>
>  
>
> Looking at provisioning a CGE 6.3 cluster in a few months here. In-line
with
> best practices, I will have data nodes and SQL nodes running on separate
> physical servers. The data node machines will be 2x Quad core Xeons (12MB
> cache per chip), 64GB RAM, 8X 147GB 15K SAS, Dolphin SCI interconnects
(fast
> things). I plan on using a 2D torus topology with the SCI.
>
>  
>
> With the SQL node machines, I have a few questions:
>
>  
>
> -Hardware requirements: I would think only CPU and network B/W is
important
> here. Could I get away with 2x Quad core Xeons, 2 SATA drives (RAID 1 of
> course), 4GB RAM? I assume I'd need SCI interconnects as well, or are
these
> only important between the data nodes?
>
>  
>
> -Licensing: For cost estimates, I'm assuming per CPU commercial licensing
at
> 5K. Do I pay this price only on the CPUs of the data node servers, or must
I
> pay it on the CPUs of the SQL node servers as well?
>
>  
>
> -Ratio: I was planning on running 1 ndbd instance per data node server
core
> (so in this case, 8 data nodes per physical data node server).
>
> If I'm not mistaken, best practices I've seen by Mikael state to use 2x
the
> number of mysqld processes per ndbd processes. Should this mean I should
> also run a mysqld on each core? Can I get a benefit from running 2 mysqlds
> on each core, or should I have twice as many physical SQL node servers per
> physical data node servers (e.g. so I'd have 2 SQL servers, 16 mysqlds
each,
> per 1 data server with 8 ndbds each).
>
>  
>
> Thanks,
>
>  
>
> Robby
>
>
>   

No virus found in this incoming message.
Checked by AVG - http://www.avg.com 
Version: 8.0.169 / Virus Database: 270.7.0/1683 - Release Date: 9/25/2008
7:05 AM

Thread
SQL node questionsRobby Dermody25 Sep
  • Re: SQL node questionsMassimo25 Sep
    • RE: SQL node questionsRobby Dermody27 Sep
      • RE: SQL node questionsRobby Dermody27 Sep