From: Robby Dermody Date: September 27 2008 3:30am Subject: RE: SQL node questions List-Archive: http://lists.mysql.com/cluster/6255 Message-Id: <000e01c92051$60a389e0$21ea9da0$@com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Thanks Massimo, I appreciate the response. We have an amount budgeted to do a cluster jumpstart with the appropriate time comes. I also forgot to mention the management nodes. For those, we're planning on having two lower-end servers per cluster for redundancy. We could run LVS/heartbeat on them if we needed to. Is this usually a better idea than running them on the first two SQL node servers instead? The application we're planning to use cluster with is a large content/communications platform. Cluster 5.0 was out of the question, and 5.1 was close but no cigar.... really it has become much more attractive with the features in 6.2 and 6.3. We are well aware of the inherent performance issues around incorrect cluster use (e.g. cluster is not a cure-all, no FK support, indexes in memory, less-than-stellar JOIN performance, must be tuned properly, etc) and our queries are being designed accordingly: smaller, simpler, very little to any JOINS or subselects. We have also looked at other approaches such as sharding (which as I understand it cluster does internally in one form or another anyhow). Beyond just running with Mysql cluster (which we hope to have as distribution-aware with our schema as well), we are exercising functional partitioning...splitting our schema up into 8+ separate database structures which we can run across separate cluster hardware as necessary. At first most of this will be on the same physical cluster setup, but as things grow we can migrate DBs off onto their own separate clusters. We have an approach in place where all DB access will be done through servers running mysql proxy, which will maintain persistent backends to the multiple clusters we have and route queries accordingly (so that the application layer does not need to know about DB partitioning specifics, and we can more easily adjust things on the fly), etc etc... One additional thing I had questions on however is Master-master geo-replication. As I understand it now, it's in 6.3 and pretty solid, but only the telcos are really using it currently. We were looking into it for two things: 1. As a form of off-site DR: The obvious one...have two clusters replicating to each other. Writes could go to one or both. As they both have master status, failover could be much quicker. I have some concerns about split brain scenarios here... DR could obviously be handled with a slave cluster (which we could manually switch to be master in case of extended site failure).... or even as a traditional mysql setup, just for off-site archival purposes. 2. To provide extra capacity: However, say we 1 4 node cluster at 60% load, and we put a 2nd cluster in and do a master-master setup. Now are they both at 60% load (since they're replicating to each-other)? Does master-master normally buy us any extra performance here? Also interested in the bandwidth and latency observations with master-master.... e.g. for larger clusters are we normally looking at 20mbps, a DS3, more? If we have the cluster sites sitting across the US (40-60ms or so), are we still fine to have reads/writes going against each cluster? *Especially*, how good are the automatic conflict resolution features? Unlike some of our telco friends, we're not just writing CDR records here.... Any light you could shed on any of this would be excellent. Robby -----Original Message----- From: Massimo [mailto:massimo@stripped] Sent: Thursday, September 25, 2008 4:27 PM To: Robby Dermody Cc: cluster@stripped Subject: Re: SQL node questions Hi Robby, interesting setup, I'd like to understand better what kind of applications you are planning to run on cluster 6.3. Regarding your questions: 1. SCI will give benefits also in the communication between sql and data nodes, there is a lot of data transmitted here. Usually the sql nodes are more cpu bound than i/o bound so your 2xquad cores are really enough. 2, Licensing: the license cost is per cpu in both sql and data nodes. There is no license for the management. In your set up you haven't mentioned the management. Where do you want to run it? 3. This ratio was before the introduction of the multiconnect between mysqlds and ndbds. Now you can set in the my.cnf of the sql nodes a variable called ndb-cluster-connection-pool that enables a multi-connection. Each connection is seen as 1 sql nodes. So if you have 2 sql nodes and ndb-cluster-connection-pool is set to 3, then you need to have 6 sql node free in your config.ini. Regarding the data nodes, in the 6.3 you can lock the 2 threads (execution and maintenance thread) to different cores, so in your setup 3 ndbd daemons for each physical machine are enough (leave a couple of cores to the OS and other stuff). If you are planning to use this set up for a critical application, I recommend also to use some days of our consultants to setup and tune everything. Good luck Massimo Robby Dermody ha scritto: > Hi guys, > > > > Looking at provisioning a CGE 6.3 cluster in a few months here. In-line with > best practices, I will have data nodes and SQL nodes running on separate > physical servers. The data node machines will be 2x Quad core Xeons (12MB > cache per chip), 64GB RAM, 8X 147GB 15K SAS, Dolphin SCI interconnects (fast > things). I plan on using a 2D torus topology with the SCI. > > > > With the SQL node machines, I have a few questions: > > > > -Hardware requirements: I would think only CPU and network B/W is important > here. Could I get away with 2x Quad core Xeons, 2 SATA drives (RAID 1 of > course), 4GB RAM? I assume I'd need SCI interconnects as well, or are these > only important between the data nodes? > > > > -Licensing: For cost estimates, I'm assuming per CPU commercial licensing at > 5K. Do I pay this price only on the CPUs of the data node servers, or must I > pay it on the CPUs of the SQL node servers as well? > > > > -Ratio: I was planning on running 1 ndbd instance per data node server core > (so in this case, 8 data nodes per physical data node server). > > If I'm not mistaken, best practices I've seen by Mikael state to use 2x the > number of mysqld processes per ndbd processes. Should this mean I should > also run a mysqld on each core? Can I get a benefit from running 2 mysqlds > on each core, or should I have twice as many physical SQL node servers per > physical data node servers (e.g. so I'd have 2 SQL servers, 16 mysqlds each, > per 1 data server with 8 ndbds each). > > > > Thanks, > > > > Robby > > > No virus found in this incoming message. Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database: 270.7.0/1683 - Release Date: 9/25/2008 7:05 AM