Devananda wrote:
> Instead, management is thinking of splitting the table across a number
> of conventional mysql servers with a hashing algorithm to ensure that
> all queries regarding any given key always get directed to the same
> server within the group. Hopefully we would then use the cluster to
> house that central 'hashing' server, since it will be a much smaller
> table (smaller row size, that is) that will still receive a ton of queries.
Hi Devananda and others,
I was wondering how much headway you've made on your "segregated data"
idea. We just completed a Cluster Jumpstart with MySQL and came to much
the same conclusion: that Cluster isn't for us yet. Maybe if MySQL 5
provides disk-based storage and removes all the very low table, column,
etc limits, but I don't think we can wait that long.
So instead I'm pursuing a "segregated masters" approach to give us
horizontally scalability while retaining "write access" to all servers.
Our SQL load is an atypically large write to read ratio. We can
segregate our data across logical databases as we have a very large
number of them.
One of the two approaches ("shared-nothing") I'm considering involves
using Linux-HA (or similar) within server pairs. Each pair would be
doing bi-directional replication and, via HA, cover a virtual IP. If
one server fails, then other takes over. Applications would simply need
to have reconnect logic and retry queries. For putting new servers into
place, I'm considering using LVM's snapshot to get a copy of the working
server in a failed pair. I'm worried about relying on MySQL's
asynchronous replication however.
Another approach I found online is to use a "shared-disk" for all MySQL
servers with something like DRBD. This would give me "synchronous
replication" for MySQL. I'm unsure of whether I'd be able to replicate
such a cluster off-site however. Also, in order to acheive efficiency
in memory usage, the application tier would still need to direct queries
for certain databases to certain servers. Otherwise, a random query
pattern could have each server having to try caching the entire data set.
Anyway, I'm just wondering what people thought of these approaches.
Anything else I should be considering? Serious drawbacks to these two
approaches?
Thanks,
Guy