Would be great!
I am about to release the first version of the BLOB Streaming engine.
We are not that far yet (obviously), but our aim is to create an open
platform which will be extendable. So adding various image and
fulltext searching should be possible. Also the inclusion of
compression and encryption as you mentioned below.
On Jun 26, 2007, at 10:27 AM, Steve Edberg wrote:
> Hi -
>
> Interesting project; I hadn't heard of blobstreaming.org before.
> I'll have to keep an eye on it. I've been thinking it would be nice
> to have an image-optimized storage engine that could be used to do
> image arithmetic, autoscaling and format conversion as native
> operations, and maybe query by image content a la QBIC -
>
> http://wwwqbic.almaden.ibm.com/
>
> - as kind of a graphical analog of fulltext searching.
>
> steve
>
>
> At 9:17 AM +0200 6/26/07, you wrote:
>> Hi Steve,
>>
>> Some very good points.
>>
>> We are working on a solution for BLOBs in the a MySQL database at
>> www.blobstreaming.org.
>>
>> Any good solution will have to take your comments into account...
>>
>> On Jun 22, 2007, at 9:24 AM, Steve Edberg wrote:
>>
>>> At 12:11 PM +0530 6/22/07, Ratheesh K J wrote:
>>>> Hello All,
>>>>
>>>> I want a clarification. Whe run a forum wherein people send
>>>> messages with/without attachments. Attachments may contain
>>>> images, documents etc.. We are actually storing the attachment
>>>> in a blob column. Sometimes the attachments are big. And today
>>>> the table size has grown to 40 GB. This has created a headache
>>>> for any maintanance task, backup, restoration. etc.
>>>>
>>>> I want to know whether this is the right approach. Or should we
>>>> actually store the attachments in directories and just stiore
>>>> the attachment path in the database.
>>>>
>>>> Kindly suggest the best approach so that I can reduce the
>>>> database size.
>>>>
>>>> Thanks in advance
>>>
>>>
>>> Yes, storing files - especially non-textual files - in the file
>>> system instead of the database is generally considered the best
>>> practice.
>>>
>>> At one point I had created a document management system that
>>> stored everything in the database as you are doing; my rationale
>>> was that it allowed me to manage permissions using the existing
>>> database permissions, and to back up the whole database using
>>> mysqldump, vs mysqldump + doing a tar of the files. However, I
>>> abandoned this approach for the following reasons:
>>>
>>> (1) Storing non-plaintext items (eg; pictures) in the database
>>> makes it bigger and slower without added value - you can't (at
>>> least not yet, or in the foreseeable future) do a meaningful
>>> search on a blob.
>>>
>>> (2) It becomes more difficult to split storage out onto multiple
>>> filesystems; eg, leaving the database files in /var/database,
>>> putting the documents themselves into /home/docmanager, etc.
>>>
>>> (3) It makes queries on the commandline unwieldy; if you have a
>>> blob field, doing a select * to check a record's contents can
>>> dump a lot of garbage on the screen.
>>>
>>> (4) It can make doing incremental backups more difficult; if the
>>> documents themselves are relatively static, but the document
>>> metadata stored in the database is very dynamic, it becomes
>>> simple to do a compact daily database dump + a weekly document
>>> directory backup (for example) if the files are not in the database.
>>>
>>> What I do is create a unique SHA1 hash when a file is uploaded
>>> (eg; sha1(rand()). The original filename and the 40-character
>>> hash are stored in the database, and the document is stored in
>>> the filesystem using the hash as the filename. I can optionally
>>> compress and encrypt the document as well, storing the encryption
>>> key in the database. This gives (for me) adequate document
>>> security. An additional advantage is that you can take advantage
>>> of the filesystem tree if you have a large number of documents.
>>> For example, if a document hash is
>>> 'f0118e9bd2c4fb29c64ee03abce698b8', you can store the file in the
>>> directory tree f0/11/8e/f0118e9bd2c4fb29c64ee03abce698b8
>>> (extending to as many levels as you feel necessary). By keeping
>>> the number of files per directory fairly small, file retrieval
>>> becomes relatively fast. As the hashes approximate a random
>>> distribution, you should always have a close-to-balanced tree.
>>>
>>> Lastly, I store a hash of the document itself in the database as
>>> well. This allows me to detect if duplicate files are uploaded,
>>> and to determine if a previously-uploaded file has been corrupted
>>> in some way.
>>>
>>> steve
>
>
> --
> +--------------- my people are the people of the dessert,
> ---------------+
> | Steve Edberg http://
> pgfsun.ucdavis.edu/ |
> | UC Davis Genome Center
> sbedberg@stripped |
> | Bioinformatics programming/database/sysadmin (530)
> 754-9127 |
> +---------------- said t e lawrence, picking up his fork
> ----------------+