At 12:11 PM +0530 6/22/07, Ratheesh K J wrote:
>I want a clarification. Whe run a forum wherein people send messages
>with/without attachments. Attachments may contain images, documents
>etc.. We are actually storing the attachment in a blob column.
>Sometimes the attachments are big. And today the table size has
>grown to 40 GB. This has created a headache for any maintanance
>task, backup, restoration. etc.
>I want to know whether this is the right approach. Or should we
>actually store the attachments in directories and just stiore the
>attachment path in the database.
>Kindly suggest the best approach so that I can reduce the database size.
>Thanks in advance
Yes, storing files - especially non-textual files - in the file
system instead of the database is generally considered the best
At one point I had created a document management system that stored
everything in the database as you are doing; my rationale was that it
allowed me to manage permissions using the existing database
permissions, and to back up the whole database using mysqldump, vs
mysqldump + doing a tar of the files. However, I abandoned this
approach for the following reasons:
(1) Storing non-plaintext items (eg; pictures) in the database makes
it bigger and slower without added value - you can't (at least not
yet, or in the foreseeable future) do a meaningful search on a blob.
(2) It becomes more difficult to split storage out onto multiple
filesystems; eg, leaving the database files in /var/database, putting
the documents themselves into /home/docmanager, etc.
(3) It makes queries on the commandline unwieldy; if you have a blob
field, doing a select * to check a record's contents can dump a lot
of garbage on the screen.
(4) It can make doing incremental backups more difficult; if the
documents themselves are relatively static, but the document metadata
stored in the database is very dynamic, it becomes simple to do a
compact daily database dump + a weekly document directory backup (for
example) if the files are not in the database.
What I do is create a unique SHA1 hash when a file is uploaded (eg;
sha1(rand()). The original filename and the 40-character hash are
stored in the database, and the document is stored in the filesystem
using the hash as the filename. I can optionally compress and encrypt
the document as well, storing the encryption key in the database.
This gives (for me) adequate document security. An additional
advantage is that you can take advantage of the filesystem tree if
you have a large number of documents. For example, if a document hash
is 'f0118e9bd2c4fb29c64ee03abce698b8', you can store the file in the
directory tree f0/11/8e/f0118e9bd2c4fb29c64ee03abce698b8 (extending
to as many levels as you feel necessary). By keeping the number of
files per directory fairly small, file retrieval becomes relatively
fast. As the hashes approximate a random distribution, you should
always have a close-to-balanced tree.
Lastly, I store a hash of the document itself in the database as
well. This allows me to detect if duplicate files are uploaded, and
to determine if a previously-uploaded file has been corrupted in some
+--------------- my people are the people of the dessert, ---------------+
| Steve Edberg http://pgfsun.ucdavis.edu/ |
| UC Davis Genome Center sbedberg@stripped |
| Bioinformatics programming/database/sysadmin (530)754-9127 |
+---------------- said t e lawrence, picking up his fork ----------------+