Hi Steve,
Some very good points.
We are working on a solution for BLOBs in the a MySQL database at
www.blobstreaming.org.
Any good solution will have to take your comments into account...
On Jun 22, 2007, at 9:24 AM, Steve Edberg wrote:
> At 12:11 PM +0530 6/22/07, Ratheesh K J wrote:
>> Hello All,
>>
>> I want a clarification. Whe run a forum wherein people send
>> messages with/without attachments. Attachments may contain images,
>> documents etc.. We are actually storing the attachment in a blob
>> column. Sometimes the attachments are big. And today the table
>> size has grown to 40 GB. This has created a headache for any
>> maintanance task, backup, restoration. etc.
>>
>> I want to know whether this is the right approach. Or should we
>> actually store the attachments in directories and just stiore the
>> attachment path in the database.
>>
>> Kindly suggest the best approach so that I can reduce the database
>> size.
>>
>> Thanks in advance
>
>
> Yes, storing files - especially non-textual files - in the file
> system instead of the database is generally considered the best
> practice.
>
> At one point I had created a document management system that stored
> everything in the database as you are doing; my rationale was that
> it allowed me to manage permissions using the existing database
> permissions, and to back up the whole database using mysqldump, vs
> mysqldump + doing a tar of the files. However, I abandoned this
> approach for the following reasons:
>
> (1) Storing non-plaintext items (eg; pictures) in the database
> makes it bigger and slower without added value - you can't (at
> least not yet, or in the foreseeable future) do a meaningful search
> on a blob.
>
> (2) It becomes more difficult to split storage out onto multiple
> filesystems; eg, leaving the database files in /var/database,
> putting the documents themselves into /home/docmanager, etc.
>
> (3) It makes queries on the commandline unwieldy; if you have a
> blob field, doing a select * to check a record's contents can dump
> a lot of garbage on the screen.
>
> (4) It can make doing incremental backups more difficult; if the
> documents themselves are relatively static, but the document
> metadata stored in the database is very dynamic, it becomes simple
> to do a compact daily database dump + a weekly document directory
> backup (for example) if the files are not in the database.
>
> What I do is create a unique SHA1 hash when a file is uploaded (eg;
> sha1(rand()). The original filename and the 40-character hash are
> stored in the database, and the document is stored in the
> filesystem using the hash as the filename. I can optionally
> compress and encrypt the document as well, storing the encryption
> key in the database. This gives (for me) adequate document
> security. An additional advantage is that you can take advantage of
> the filesystem tree if you have a large number of documents. For
> example, if a document hash is 'f0118e9bd2c4fb29c64ee03abce698b8',
> you can store the file in the directory tree f0/11/8e/
> f0118e9bd2c4fb29c64ee03abce698b8 (extending to as many levels as
> you feel necessary). By keeping the number of files per directory
> fairly small, file retrieval becomes relatively fast. As the hashes
> approximate a random distribution, you should always have a close-
> to-balanced tree.
>
> Lastly, I store a hash of the document itself in the database as
> well. This allows me to detect if duplicate files are uploaded, and
> to determine if a previously-uploaded file has been corrupted in
> some way.
>
> steve
>
> --
> +--------------- my people are the people of the dessert,
> ---------------+
> | Steve Edberg http://
> pgfsun.ucdavis.edu/ |
> | UC Davis Genome Center
> sbedberg@stripped |
> | Bioinformatics programming/database/sysadmin (530)
> 754-9127 |
> +---------------- said t e lawrence, picking up his fork
> ----------------+
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?
> unsub=paul.mccullagh@stripped
>