Thanks for your answers.
Just for curiosity, may I ask the size of your largest table in the
cluster, and how much you reserved for DataMemory and IndexMemory?
And yeah, I've tried with partitioning and works fine. But you still
keep with a single machine, that is, mem and cpu bottleneck.
But the thing is that I'm interested on an more scientific
measurement, so I'm comparing two scenarios: to store the data
centralized (with partitioning and also with partitioning) and
distributed (with 2, 3, 4, ... data nodes).
But I still cannot make the cluster to work, that is, to import the 4GB data.
And I read all the documentation, created the tablespaces, set to
store on disk, ran it many times, and stills not working :(
To be honest, I do not know what else I could do to make it work.
Thanks and regards,
On Thu, Jun 25, 2009 at 11:10 AM, David Ennis<david@stripped> wrote:
> Hi Giovane.
> What's described makes sense to me - and looks similar to simple trials I've
> done before with millions of lines inserted and various fields in disk-based
> storage - all starting from the simple tutorials.
> However, a few (non-cluster) items I thought I'd share.
> Sorry.. They don't answer your questions below, but for what you describe,
> if you aren't looking for fault tolerance, perhaps a non-NDB solution will
> Regarding cluster speed(someone please correct me here, if needed)
> - Disk based storage in cluster is not particularly fast
> - To use disk-spaced storage, those fields cannot be part of any index - so
> if those fields are an integral part of any query, no indexes would itself
> cause poor results on queries
> - NDB is not very good at all with any queries with joins
> If you are purely looking for performance increase on reads only, depending
> on how you use the data, I'd suggest using partitioning/maybe
> subpartitioning your table.
> Just use your existing table create statement and add the partition specific
> additions to it and then select into the (new)partitioned table - I believe
> this would be faster than altering the existing table and adding partions
> under it.
> - Perhaps a range partition on your 'end_time' field or something
> - Partitioning works with NDB, MyISAM and InnoDB, but is still fastest on
> Giovane wrote:
>> Hi Andrew and Augusto,
>> Thanks for your replies.
>> Firstly I've included a primary key: PRIMARY KEY (`start_time`,`flowid`).
>> Then, I've played a bit with DataMemory and IndexMemory and tried to
>> insert the 4GB data.
>> My input file has 60M lines, and what I go for each case was:
>> Case 1:
>> DataMemory: 400M
>> IndexMemory: 1200M
>> # of lines inserted before full table error: 12M
>> Case 2:
>> DataMemory: 800M
>> IndexMemory: 800M
>> # of lines inserted before full table error: 25M
>> Case 3:
>> DataMemory: 1400M
>> IndexMemory: 300M
>> # of lines inserted before full table error:46M
>> The conclusions from that is that DataMemory parameter matters more
>> than IndexMemory while inserting data.
>> I'll explain what I 'm trying to do and I would like to hear from you
>> if you think it is feasible (checking mysql doc and it says that is,
>> but so far I've been only failing).
>> I installed mysql server on a single machine, not cluster. Than I've
>> created a table using myisam engine, and finally I imported the 4GB
>> input file. No problem at at (and the machine has 3GB RAM).
>> The I thought: how would mysqlcluster improve my response time? Can I
>> store the 4GB files on 2 machines (the same model as the one used on
>> the "centralized" case) and user the cluster to parallelize a single
>> query, improving performance? Again, the documentations says yes.
>> So I tried to do all tablespace stuff to have this working (because
>> mydata > RAM), but its not working.
>> Does anyone know if there is something wrong with this approach?
>> Thanks and regards,
> MySQL Cluster Mailing List
> For list archives: http://lists.mysql.com/cluster
> To unsubscribe: http://lists.mysql.com/cluster?unsub=1