What's described makes sense to me - and looks similar to simple trials
I've done before with millions of lines inserted and various fields in
disk-based storage - all starting from the simple tutorials.
However, a few (non-cluster) items I thought I'd share.
Sorry.. They don't answer your questions below, but for what you
describe, if you aren't looking for fault tolerance, perhaps a non-NDB
solution will work..
Regarding cluster speed(someone please correct me here, if needed)
- Disk based storage in cluster is not particularly fast
- To use disk-spaced storage, those fields cannot be part of any index -
so if those fields are an integral part of any query, no indexes would
itself cause poor results on queries
- NDB is not very good at all with any queries with joins
If you are purely looking for performance increase on reads only,
depending on how you use the data, I'd suggest using partitioning/maybe
subpartitioning your table.
Just use your existing table create statement and add the partition
specific additions to it and then select into the (new)partitioned table
- I believe this would be faster than altering the existing table and
adding partions under it.
- Perhaps a range partition on your 'end_time' field or something
- Partitioning works with NDB, MyISAM and InnoDB, but is still fastest
> Hi Andrew and Augusto,
> Thanks for your replies.
> Firstly I've included a primary key: PRIMARY KEY (`start_time`,`flowid`).
> Then, I've played a bit with DataMemory and IndexMemory and tried to
> insert the 4GB data.
> My input file has 60M lines, and what I go for each case was:
> Case 1:
> DataMemory: 400M
> IndexMemory: 1200M
> # of lines inserted before full table error: 12M
> Case 2:
> DataMemory: 800M
> IndexMemory: 800M
> # of lines inserted before full table error: 25M
> Case 3:
> DataMemory: 1400M
> IndexMemory: 300M
> # of lines inserted before full table error:46M
> The conclusions from that is that DataMemory parameter matters more
> than IndexMemory while inserting data.
> I'll explain what I 'm trying to do and I would like to hear from you
> if you think it is feasible (checking mysql doc and it says that is,
> but so far I've been only failing).
> I installed mysql server on a single machine, not cluster. Than I've
> created a table using myisam engine, and finally I imported the 4GB
> input file. No problem at at (and the machine has 3GB RAM).
> The I thought: how would mysqlcluster improve my response time? Can I
> store the 4GB files on 2 machines (the same model as the one used on
> the "centralized" case) and user the cluster to parallelize a single
> query, improving performance? Again, the documentations says yes.
> So I tried to do all tablespace stuff to have this working (because
> mydata > RAM), but its not working.
> Does anyone know if there is something wrong with this approach?
> Thanks and regards,