List:Cluster« Previous MessageNext Message »
From:Guy Davis Date:September 13 2004 9:25pm
Subject:Re: table size, and developed tools
View as plain text  
Devananda wrote:
> Instead, management is thinking of splitting the table across a number 
> of conventional mysql servers with a hashing algorithm to ensure that 
> all queries regarding any given key always get directed to the same 
> server within the group. Hopefully we would then use the cluster to 
> house that central 'hashing' server, since it will be a much smaller 
> table (smaller row size, that is) that will still receive a ton of queries.

Hi Devananda and others,

I was wondering how much headway you've made on your "segregated data" 
idea.  We just completed a Cluster Jumpstart with MySQL and came to much 
the same conclusion: that Cluster isn't for us yet.  Maybe if MySQL 5 
provides disk-based storage and removes all the very low table, column, 
etc limits, but I don't think we can wait that long.

So instead I'm pursuing a "segregated masters" approach to give us 
horizontally scalability while retaining "write access" to all servers. 
  Our SQL load is an atypically large write to read ratio.  We can 
segregate our data across logical databases as we have a very large 
number of them.

One of the two approaches ("shared-nothing") I'm considering involves 
using Linux-HA (or similar) within server pairs.  Each pair would be 
doing bi-directional replication and, via HA, cover a virtual IP.  If 
one server fails, then other takes over.  Applications would simply need 
to have reconnect logic and retry queries.  For putting new servers into 
place, I'm considering using LVM's snapshot to get a copy of the working 
server in a failed pair.  I'm worried about relying on MySQL's 
asynchronous replication however.

Another approach I found online is to use a "shared-disk" for all MySQL 
servers with something like DRBD.  This would give me "synchronous 
replication" for MySQL.  I'm unsure of whether I'd be able to replicate 
such a cluster off-site however.  Also, in order to acheive efficiency 
in memory usage, the application tier would still need to direct queries 
for certain databases to certain servers.  Otherwise, a random query 
pattern could have each server having to try caching the entire data set.

Anyway, I'm just wondering what people thought of these approaches. 
Anything else I should be considering?  Serious drawbacks to these two 
approaches?

Thanks,
Guy
Thread
table size, and developed toolsDevananda25 Aug
  • Re: table size, and developed toolsGuy Davis13 Sep