> Dear all,
>
> I have some doubts regarding the configuration of Innodb files structure.
> As we know all myisam tables are stored in database directory and innodb
> use tablespaces ( ibdata1 ).
> In My application , databases are the mixture of mysql & innodb storage
> engines.
> In my database ,size of innodb tables may be between 100 & 200 Gb & may
> be some tables >300Gb.I have 4 Hds.
> /dev/sdb1 274G 225G 36G 87% /hdd2-1
> /dev/sdc5 274G 225G 36G 87% /hdd3-1
> /dev/sdd5 274G 219G 42G 85% /hdd4-1
> /dev/sde1 266G 184G 69G 73% /hdd5-1
> /hdd5-1/innodb_data5/ibdata100:8G;/hdd5-1/innodb_data5/ibdata101:8G:autoex
> tend
[...snip...]
> But I don't think this is the correct way to store data for better
> performance.
> Do i need to set partitioning for bigger tables or some standard ways to
> configure table spaces in mysql.
If you really want to partition it, I'd use table-level paritioning
(http://dev.mysql.com/doc/refman/5.5/en/partitioning-overview.html), with a
custom data-directory for different partitions. Keeps lookups on keys (assuming
there is 1 major key by which records are fetched, otherwise, this might even
degrade performance).
Pros:
- Per table config
- Easily portable
- Easy dropping & recreating partitions
- Can result in great performance gains due to partition pruning.
Cons:
- Could result in a degrade of performance when keys to partition by are
chosen incorrectly, or the table needs a lot of lookups on different keys
- Partitioning is done per table, so new tables need to have their config
explicitly set.
- Harder to control which percentages are on which disk (as it is data-
related)
However, the last part of that page should not be taken lightly:
> Other benefits usually associated with partitioning include those in the
> following list. These features are *not currently implemented in MySQL*
> Partitioning, but are high on our list of priorities.
> - Queries involving aggregate functions such as SUM() and COUNT() can easily
> be parallelized. A simple example of such a query might be SELECT
> salesperson_id, COUNT(orders) as order_total FROM sales GROUP BY
> salesperson_id;. By “parallelized,” we mean that the query can be run
> simultaneously on each partition, and the final result obtained merely by
> summing the results obtained for all partitions.
> - Achieving greater query throughput in virtue of spreading data seeks over
> multiple disks.
Not implemented yet....... I personally have those 2 high on my wishlist ;)
--
Rik Wasmus