Thanks for the responces, and I do concur. I was taking a stab in the
dark so to speak.
We are working with our hosting providers currently and will be
introducing a multitude of small iSCSI SANs to split the storage
structure over a multitude of disks... This is something that needs
to be addressed from a systems perspective rather than an
SSD (or Fusion and the like) are unfortunately still way to expensive
for the capacity that we require (good couple of TBs) - so mechanical
disks it would need to be. However, with the use of SANs as we hope,
we should be able to go up from 4 to over 64 spindles whilst still
being able to share the storage and have redundancy.
Many thanks for the inputs and feedbacks...
On Fri, Jul 26, 2013 at 9:23 AM, Johan De Meersman <vegivamp@stripped> wrote:
> Hey Chris,
> I'm afraid that this is not what databases are for, and the first thing you'll likely
> run into is amount of concurrent connections.
> This is typically something you should really tackle from a systems perspective. Seek
> times are dramatically improved on SSD or similar storage - think FusionIO cards, but
> there's also a couple of vendors (Violin comes to mind) who provide full-blown SSD SANs.
> If you prefer staying with spinning disks, you could still improve the seeks by
> focusing on the inner cylinders and potentially by using variable sector formatting.
> Again, there's SANs that do this for you.
> Another minor trick is to turn off access timestamp updates when you mount the
> filesystem (noatime).
> Also benchmark different filesystems, there's major differences between them. I've
> heard XFS being recommended, but I've never needed to benchmark for seek times myself.
> We're using IBM's commercial GPFS here, which is good with enormous amounts of huge files
> (media farm here), not sure how it'd fare with smaller files.
> Hope that helps,
> ----- Original Message -----
>> From: "Chris Knipe" <savage@stripped>
>> To: mysql@stripped
>> Sent: Thursday, 25 July, 2013 11:53:53 PM
>> Subject: hypothetical question about data storage
>> Hi all,
>> We run an VERY io intensive file application service. Currently, our
>> problem is that our disk spindles are being completely killed due to
>> insufficient SEEK time on the hard drives (NOT physical read/write
>> We have an directory structure where the files are stored based on
>> the MD5
>> checksum of the file name, i.e.
>> The majority of these files, are between 256K and 800K with the ODD
>> exception (say less than 15%) being more than 1M but no more than 5M
>> size. The content of the files are pure text (MIME Encoded).
>> We believe that storing these files into an InnoDB table, may
>> actually give
>> us better performance:
>> - There is one large file that is being read/written, instead of
>> BILLIONS of
>> small files
>> - We can split the structure so that each directory (4096 in total)
>> sit's on
>> their own database
>> - We can move the databases as load increases, which means that we
>> potentially run 2 physical database servers, each with 2048 databases
>> - It's easy to move / migrate the data due to mysql and replication -
>> can be said for redundancy of the data
>> We are more than likely looking at BLOB columns of course, and we
>> need to
>> read/write from the DB in excess of 100mbit/s
>> Would the experts consider something like this as being feasible? Is
>> worth it to go down this avenue, or are we just going to run into
>> problems? If we are facing different problems, what can we possibly
>> to go wrong here?
>> Many thanks, and I look forward to any input.
> Unhappiness is discouraged and will be corrected with kitten pictures.