> First of all, I heard that Mysql does not perform very well when
> tablesize goes above 1 Gb. Is this a myth?
I don't have much experience with individual tables being that large,
though I have used MySQL databases with >100G total data with no
problems at all. Based on my experience, it's not the table size which
can degrade performance, but the index size - if the index is too large
or clumsy, then MySQL can not search quickly enough. If the index is
well-structured, the data is found quickly and overall performance is
great regardless of the quantity of data.
> Image table is not a big
> deal, since I can partition/distribute it to ~ 100-200 tables, i.e. by
> table_(id % 100).
This is a good way to help keep your indexes small and thus keep
search/seek time to a minimum :)
> However, text information needs to stay in a single
> table (since I need to do queries on it for information) and there
> will be multiple indexes over this information.
A suggestion to help keep your indexes on this table small: if you have
large text fields that you need to index, you may want to use partial
indexes. (for example, if you have a column "filename varchar(200)",
create an index on filename(50), or what ever is appropriate. This will
cut the size of the index file a lot, and even though it may result in
some duplicate entries in the index, it will probably increase overall
> And, as you can imagine, I am not sure if mysql can handle something
> like this, and was wondering if you can provide some feedback.
> So my questions are:
> 1. The main question is, do you guys have any experience with this
> much binary and regular data? Do you think Mysql can handle this much
> data in a reliable manner (without corrupting data and/or
> degrading/terrible performance) ?
Based on my experience with > 100GB InnoDB databases, with the right
indexes, MySQL can easily handle 3,000 reads/sec on text/numerical data.
(This is on dual xeon 3Ghz, 4 GB RAM, SCSI Raid-5 disks.) I've never had
any problem with data corruption, but I use primarily the InnoDB engine,
which is not prone to corruption in the same way as MyISAM.
I don't have experience storing large amounts of binary data, so I can't
say anything about MySQL's performance in that area specifically.
> 2. Can I implement this using regular SCSI disks with regular mysql?
> Or do I have need advanced solutions such as clustered, replicated,
No need for clustered or distributed databases, from what you've
described. Cluster would be useful if you need 100% availability, even
in the event of hardware failures. Replication, such as single master ->
multiple slaves, is useful if you have massive reads and minimal writes,
and _may_ be something you will need.
I would recommend using the command SHOW PROCESSLIST, or a tool like
MyTop, to see what state the client connections spend the most time in.
(searching the index, or sending the data over the network? if it's the
latter, then you would benefit from distributing the read load to
multiple slave servers.)