List:General Discussion« Previous MessageNext Message »
From:Paul McCullagh Date:June 26 2007 7:17am
Subject:Re: Blob data
View as plain text  
Hi Steve,

Some very good points.

We are working on a solution for BLOBs in the a MySQL database at  
www.blobstreaming.org.

Any good solution will have to take your comments into account...

On Jun 22, 2007, at 9:24 AM, Steve Edberg wrote:

> At 12:11 PM +0530 6/22/07, Ratheesh K J wrote:
>> Hello All,
>>
>> I want a clarification. Whe run a forum wherein people send  
>> messages with/without attachments. Attachments may contain images,  
>> documents etc.. We are actually storing the attachment in a blob  
>> column. Sometimes the attachments are big. And today the table  
>> size has grown to 40 GB. This has created a headache for any  
>> maintanance task, backup, restoration. etc.
>>
>> I want to know whether this is the right approach. Or should we  
>> actually store the attachments in directories and just stiore the  
>> attachment path in the database.
>>
>> Kindly suggest the best approach so that I can reduce the database  
>> size.
>>
>> Thanks in advance
>
>
> Yes, storing files - especially non-textual files - in the file  
> system instead of the database is generally considered the best  
> practice.
>
> At one point I had created a document management system that stored  
> everything in the database as you are doing; my rationale was that  
> it allowed me to manage permissions using the existing database  
> permissions, and to back up the whole database using mysqldump, vs  
> mysqldump + doing a tar of the files. However, I abandoned this  
> approach for the following reasons:
>
> (1) Storing non-plaintext items (eg; pictures) in the database  
> makes it bigger and slower without added value - you can't (at  
> least not yet, or in the foreseeable future) do a meaningful search  
> on a blob.
>
> (2) It becomes more difficult to split storage out onto multiple  
> filesystems; eg, leaving the database files in /var/database,  
> putting the documents themselves into /home/docmanager, etc.
>
> (3) It makes queries on the commandline unwieldy; if you have a  
> blob field, doing a select * to check a record's contents can dump  
> a lot of garbage on the screen.
>
> (4) It can make doing incremental backups more difficult; if the  
> documents themselves are relatively static, but the document  
> metadata stored in the database is very dynamic, it becomes simple  
> to do a compact daily database dump + a weekly document directory  
> backup (for example) if the files are not in the database.
>
> What I do is create a unique SHA1 hash when a file is uploaded (eg;  
> sha1(rand()). The original filename and the 40-character hash are  
> stored in the database, and the document is stored in the  
> filesystem using the hash as the filename. I can optionally  
> compress and encrypt the document as well, storing the encryption  
> key in the database. This gives (for me) adequate document  
> security. An additional advantage is that you can take advantage of  
> the filesystem tree if you have a large number of documents. For  
> example, if a document hash is 'f0118e9bd2c4fb29c64ee03abce698b8',  
> you can store the file in the directory tree f0/11/8e/ 
> f0118e9bd2c4fb29c64ee03abce698b8 (extending to as many levels as  
> you feel necessary). By keeping the number of files per directory  
> fairly small, file retrieval becomes relatively fast. As the hashes  
> approximate a random distribution, you should always have a close- 
> to-balanced tree.
>
> Lastly, I store a hash of the document itself in the database as  
> well. This allows me to detect if duplicate files are uploaded, and  
> to determine if a previously-uploaded file has been corrupted in  
> some way.
>
> 	steve
>
> -- 
> +--------------- my people are the people of the dessert,  
> ---------------+
> | Steve Edberg                                http:// 
> pgfsun.ucdavis.edu/ |
> | UC Davis Genome Center                             
> sbedberg@stripped |
> | Bioinformatics programming/database/sysadmin             (530) 
> 754-9127 |
> +---------------- said t e lawrence, picking up his fork  
> ----------------+
>
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql? 
> unsub=paul.mccullagh@stripped
>

Thread
Blob dataRatheesh K J22 Jun
  • Re: Blob dataSteve Edberg22 Jun
    • Re: Blob dataPaul McCullagh26 Jun
  • Re: Blob dataWarren Young26 Jun
    • Re: Blob dataKevin Waterson26 Jun
      • Re: Blob dataWarren Young27 Jun
Re: Blob dataAlex Arul Lurthu25 Jun
Re: Blob dataPaul McCullagh27 Jun