List:General Discussion« Previous MessageNext Message »
From:Johan De Meersman Date:July 26 2013 7:23am
Subject:Re: hypothetical question about data storage
View as plain text  
Hey Chris,

I'm afraid that this is not what databases are for, and the first thing you'll likely run
into is amount of concurrent connections.

This is typically something you should really tackle from a systems perspective. Seek
times are dramatically improved on SSD or similar storage - think FusionIO cards, but
there's also a couple of vendors (Violin comes to mind) who provide full-blown SSD SANs.

If you prefer staying with spinning disks, you could still improve the seeks by focusing
on the inner cylinders and potentially by using variable sector formatting. Again,
there's SANs that do this for you.

Another minor trick is to turn off access timestamp updates when you mount the filesystem
(noatime).

Also benchmark different filesystems, there's major differences between them. I've heard
XFS being recommended, but I've never needed to benchmark for seek times myself. We're
using IBM's commercial GPFS here, which is good with enormous amounts of huge files
(media farm here), not sure how it'd fare with smaller files.

Hope that helps,
Johan

----- Original Message -----
> From: "Chris Knipe" <savage@stripped>
> To: mysql@stripped
> Sent: Thursday, 25 July, 2013 11:53:53 PM
> Subject: hypothetical question about data storage
> 
> Hi all,
> 
> We run an VERY io intensive file application service.  Currently, our
> problem is that our disk spindles are being completely killed due to
> insufficient SEEK time on the hard drives (NOT physical read/write
> speeds).
> 
> We have an directory structure where the files are stored based on
> the MD5
> checksum of the file name, i.e.
> /0/00/000/000044533779fce5cf3497f87de1d060
> The majority of these files, are between 256K and 800K with the ODD
> exception (say less than 15%) being more than 1M but no more than 5M
> in
> size.  The content of the files are pure text (MIME Encoded).
> 
> We believe that storing these files into an InnoDB table, may
> actually give
> us better performance:
> - There is one large file that is being read/written, instead of
> BILLIONS of
> small files
> - We can split the structure so that each directory (4096 in total)
> sit's on
> their own database
> - We can move the databases as load increases, which means that we
> can
> potentially run 2 physical database servers, each with 2048 databases
> each)
> - It's easy to move / migrate the data due to mysql and replication -
> same
> can be said for redundancy of the data
> 
> We are more than likely looking at BLOB columns of course, and we
> need to
> read/write from the DB in excess of 100mbit/s
> 
> Would the experts consider something like this as being feasible?  Is
> it
> worth it to go down this avenue, or are we just going to run into
> different
> problems?  If we are facing different problems, what can we possibly
> expect
> to go wrong here?
> 
> Many thanks, and I look forward to any input.
> 

-- 
Unhappiness is discouraged and will be corrected with kitten pictures.
Thread
hypothetical question about data storageChris Knipe25 Jul
  • Re: hypothetical question about data storageVahric Muhtaryan25 Jul
  • Re: hypothetical question about data storageJohan De Meersman26 Jul
    • Re: hypothetical question about data storageChris Knipe26 Jul
      • RE: hypothetical question about data storageRick James26 Jul
        • RE: hypothetical question about data storageJohan De Meersman26 Jul
          • Re: hypothetical question about data storageChris Knipe26 Jul
            • Re: hypothetical question about data storagehsv27 Jul
            • Re: hypothetical question about data storagewilliam drescher27 Jul
              • RE: hypothetical question about data storageRick James29 Jul
                • RE: hypothetical question about data storageJohan De Meersman29 Jul
                  • RE: hypothetical question about data storageRick James29 Jul
                    • Re: hypothetical question about data storageCarsten Pedersen30 Jul
                    • Re: hypothetical question about data storageManuel Arostegui30 Jul
                    • RE: hypothetical question about data storageJohan De Meersman30 Jul