List:General Discussion« Previous MessageNext Message »
From:Alex Arul Lurthu Date:June 25 2007 12:32pm
Subject:Re: Blob data
View as plain text  
Ratheesh,

If you still want to place you blob data in the database, seperate out the
blob storage to a seperate table. This will help you  alleviate few of your
performance and maintenance problems.

~Alex

On 6/22/07, Steve Edberg <sbedberg@stripped> wrote:
>
> At 12:11 PM +0530 6/22/07, Ratheesh K J wrote:
> >Hello All,
> >
> >I want a clarification. Whe run a forum wherein people send messages
> >with/without attachments. Attachments may contain images, documents
> >etc.. We are actually storing the attachment in a blob column.
> >Sometimes the attachments are big. And today the table size has
> >grown to 40 GB. This has created a headache for any maintanance
> >task, backup, restoration. etc.
> >
> >I want to know whether this is the right approach. Or should we
> >actually store the attachments in directories and just stiore the
> >attachment path in the database.
> >
> >Kindly suggest the best approach so that I can reduce the database size.
> >
> >Thanks in advance
>
>
> Yes, storing files - especially non-textual files - in the file
> system instead of the database is generally considered the best
> practice.
>
> At one point I had created a document management system that stored
> everything in the database as you are doing; my rationale was that it
> allowed me to manage permissions using the existing database
> permissions, and to back up the whole database using mysqldump, vs
> mysqldump + doing a tar of the files. However, I abandoned this
> approach for the following reasons:
>
> (1) Storing non-plaintext items (eg; pictures) in the database makes
> it bigger and slower without added value - you can't (at least not
> yet, or in the foreseeable future) do a meaningful search on a blob.
>
> (2) It becomes more difficult to split storage out onto multiple
> filesystems; eg, leaving the database files in /var/database, putting
> the documents themselves into /home/docmanager, etc.
>
> (3) It makes queries on the commandline unwieldy; if you have a blob
> field, doing a select * to check a record's contents can dump a lot
> of garbage on the screen.
>
> (4) It can make doing incremental backups more difficult; if the
> documents themselves are relatively static, but the document metadata
> stored in the database is very dynamic, it becomes simple to do a
> compact daily database dump + a weekly document directory backup (for
> example) if the files are not in the database.
>
> What I do is create a unique SHA1 hash when a file is uploaded (eg;
> sha1(rand()). The original filename and the 40-character hash are
> stored in the database, and the document is stored in the filesystem
> using the hash as the filename. I can optionally compress and encrypt
> the document as well, storing the encryption key in the database.
> This gives (for me) adequate document security. An additional
> advantage is that you can take advantage of the filesystem tree if
> you have a large number of documents. For example, if a document hash
> is 'f0118e9bd2c4fb29c64ee03abce698b8', you can store the file in the
> directory tree f0/11/8e/f0118e9bd2c4fb29c64ee03abce698b8 (extending
> to as many levels as you feel necessary). By keeping the number of
> files per directory fairly small, file retrieval becomes relatively
> fast. As the hashes approximate a random distribution, you should
> always have a close-to-balanced tree.
>
> Lastly, I store a hash of the document itself in the database as
> well. This allows me to detect if duplicate files are uploaded, and
> to determine if a previously-uploaded file has been corrupted in some
> way.
>
>         steve
>
> --
> +--------------- my people are the people of the dessert, ---------------+
> | Steve Edberg                                http://pgfsun.ucdavis.edu/ |
> | UC Davis Genome Center                            sbedberg@stripped |
> | Bioinformatics programming/database/sysadmin             (530)754-9127 |
> +---------------- said t e lawrence, picking up his fork ----------------+
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:
> http://lists.mysql.com/mysql?unsub=1
>
>

Thread
Blob dataRatheesh K J22 Jun
  • Re: Blob dataSteve Edberg22 Jun
    • Re: Blob dataPaul McCullagh26 Jun
  • Re: Blob dataWarren Young26 Jun
    • Re: Blob dataKevin Waterson26 Jun
      • Re: Blob dataWarren Young27 Jun
Re: Blob dataAlex Arul Lurthu25 Jun
Re: Blob dataPaul McCullagh27 Jun