List:Cluster« Previous MessageNext Message »
From:Devananda Date:August 25 2004 7:14pm
Subject:table size, and developed tools
View as plain text  
Hi All,

After emailing back and forth with Mikael Ronström last week, we were 
able to determine approx. how much total RAM will have to be available 
to the cluster for it to hold our primary table. Seems like it is simply 
too large for this to work for us - it would require machines that are 
not going to be cost-effective, and we are beginning to look for an 
alternate solution for this table, as well as alternate ways to use the 
Cluster. Basically, for this table, the cluster would use 631 bytes per 
record, which means we would need ~50GB of total ram simply to hold our 
current table, and double that to plan for expansion. -- FYI, the main 
reason that the rows are so large, is this table must contain 180 
characters (varchar - 4x20 + 100) in UTF8, which takes 3x the space of 
Latin1, and an index on the varchar(20) field. All in all, if we were 
able to use Latin1 instead, this would reduce the row size to 303 bytes. --

That means, at a minimum (allowing for us to double our user base before 
we need to expand again) we would need 8 DB nodes @ 26GB ram each or 16 
@ 14GB ram each (~ 100GB ram for the cluster). I am still vying for us 
to use the cluster for this table, but it is not up to me to decide. 
Instead, management is thinking of splitting the table across a number 
of conventional mysql servers with a hashing algorithm to ensure that 
all queries regarding any given key always get directed to the same 
server within the group. Hopefully we would then use the cluster to 
house that central 'hashing' server, since it will be a much smaller 
table (smaller row size, that is) that will still receive a ton of queries.

Since I may be tabling my research on the cluster for a little while, I 
thought the least I could do to contribute to the community is to make 
available some of the utilities that I have written for managing the 
cluster. Essentially, it's a collection of scripts that: compile the 
latest bk source and automate the distribution of the binaries to all 
the servers w/in the cluster; automate the task of cycling (restarting) 
all the NDB nodes (when necessary, ie after new binaries are installed) 
one at a time, so that the stability of the cluster is not compromised; 
automate the task of running scripts or queries on all, or a subset of, 
the DB or API servers (for example, starting concurrent mysqlimports 
across all API nodes by issuing one command on a central server); a 
utility using syslog to watch all the log output available from the MGM 
node (which as far as I know is not available through the mgm console) 
while being able to send commands to both the MGM console and the system 
shell, oh, and add pretty colors too ;)

If anyone would be interested in these tools, please email me.

Devananda
Neopets, Inc



Thread
table size, and developed toolsDevananda25 Aug
  • Re: table size, and developed toolsGuy Davis13 Sep