Hi All,
After emailing back and forth with Mikael Ronström last week, we were
able to determine approx. how much total RAM will have to be available
to the cluster for it to hold our primary table. Seems like it is simply
too large for this to work for us - it would require machines that are
not going to be cost-effective, and we are beginning to look for an
alternate solution for this table, as well as alternate ways to use the
Cluster. Basically, for this table, the cluster would use 631 bytes per
record, which means we would need ~50GB of total ram simply to hold our
current table, and double that to plan for expansion. -- FYI, the main
reason that the rows are so large, is this table must contain 180
characters (varchar - 4x20 + 100) in UTF8, which takes 3x the space of
Latin1, and an index on the varchar(20) field. All in all, if we were
able to use Latin1 instead, this would reduce the row size to 303 bytes. --
That means, at a minimum (allowing for us to double our user base before
we need to expand again) we would need 8 DB nodes @ 26GB ram each or 16
@ 14GB ram each (~ 100GB ram for the cluster). I am still vying for us
to use the cluster for this table, but it is not up to me to decide.
Instead, management is thinking of splitting the table across a number
of conventional mysql servers with a hashing algorithm to ensure that
all queries regarding any given key always get directed to the same
server within the group. Hopefully we would then use the cluster to
house that central 'hashing' server, since it will be a much smaller
table (smaller row size, that is) that will still receive a ton of queries.
Since I may be tabling my research on the cluster for a little while, I
thought the least I could do to contribute to the community is to make
available some of the utilities that I have written for managing the
cluster. Essentially, it's a collection of scripts that: compile the
latest bk source and automate the distribution of the binaries to all
the servers w/in the cluster; automate the task of cycling (restarting)
all the NDB nodes (when necessary, ie after new binaries are installed)
one at a time, so that the stability of the cluster is not compromised;
automate the task of running scripts or queries on all, or a subset of,
the DB or API servers (for example, starting concurrent mysqlimports
across all API nodes by issuing one command on a central server); a
utility using syslog to watch all the log output available from the MGM
node (which as far as I know is not available through the mgm console)
while being able to send commands to both the MGM console and the system
shell, oh, and add pretty colors too ;)
If anyone would be interested in these tools, please email me.
Devananda
Neopets, Inc