From: Paul McCullagh Date: June 26 2007 7:17am Subject: Re: Blob data List-Archive: http://lists.mysql.com/mysql/207714 Message-Id: <72B4D685-6998-450C-9EFE-A2437834B989@primebase.com> MIME-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Content-Transfer-Encoding: 7bit Hi Steve, Some very good points. We are working on a solution for BLOBs in the a MySQL database at www.blobstreaming.org. Any good solution will have to take your comments into account... On Jun 22, 2007, at 9:24 AM, Steve Edberg wrote: > At 12:11 PM +0530 6/22/07, Ratheesh K J wrote: >> Hello All, >> >> I want a clarification. Whe run a forum wherein people send >> messages with/without attachments. Attachments may contain images, >> documents etc.. We are actually storing the attachment in a blob >> column. Sometimes the attachments are big. And today the table >> size has grown to 40 GB. This has created a headache for any >> maintanance task, backup, restoration. etc. >> >> I want to know whether this is the right approach. Or should we >> actually store the attachments in directories and just stiore the >> attachment path in the database. >> >> Kindly suggest the best approach so that I can reduce the database >> size. >> >> Thanks in advance > > > Yes, storing files - especially non-textual files - in the file > system instead of the database is generally considered the best > practice. > > At one point I had created a document management system that stored > everything in the database as you are doing; my rationale was that > it allowed me to manage permissions using the existing database > permissions, and to back up the whole database using mysqldump, vs > mysqldump + doing a tar of the files. However, I abandoned this > approach for the following reasons: > > (1) Storing non-plaintext items (eg; pictures) in the database > makes it bigger and slower without added value - you can't (at > least not yet, or in the foreseeable future) do a meaningful search > on a blob. > > (2) It becomes more difficult to split storage out onto multiple > filesystems; eg, leaving the database files in /var/database, > putting the documents themselves into /home/docmanager, etc. > > (3) It makes queries on the commandline unwieldy; if you have a > blob field, doing a select * to check a record's contents can dump > a lot of garbage on the screen. > > (4) It can make doing incremental backups more difficult; if the > documents themselves are relatively static, but the document > metadata stored in the database is very dynamic, it becomes simple > to do a compact daily database dump + a weekly document directory > backup (for example) if the files are not in the database. > > What I do is create a unique SHA1 hash when a file is uploaded (eg; > sha1(rand()). The original filename and the 40-character hash are > stored in the database, and the document is stored in the > filesystem using the hash as the filename. I can optionally > compress and encrypt the document as well, storing the encryption > key in the database. This gives (for me) adequate document > security. An additional advantage is that you can take advantage of > the filesystem tree if you have a large number of documents. For > example, if a document hash is 'f0118e9bd2c4fb29c64ee03abce698b8', > you can store the file in the directory tree f0/11/8e/ > f0118e9bd2c4fb29c64ee03abce698b8 (extending to as many levels as > you feel necessary). By keeping the number of files per directory > fairly small, file retrieval becomes relatively fast. As the hashes > approximate a random distribution, you should always have a close- > to-balanced tree. > > Lastly, I store a hash of the document itself in the database as > well. This allows me to detect if duplicate files are uploaded, and > to determine if a previously-uploaded file has been corrupted in > some way. > > steve > > -- > +--------------- my people are the people of the dessert, > ---------------+ > | Steve Edberg http:// > pgfsun.ucdavis.edu/ | > | UC Davis Genome Center > sbedberg@stripped | > | Bioinformatics programming/database/sysadmin (530) > 754-9127 | > +---------------- said t e lawrence, picking up his fork > ----------------+ > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/mysql? > unsub=paul.mccullagh@stripped >