On Thu, 18 Mar 1999 09:55:44 -0600, Ed Carp wrote:
>Can someone explain to me why this will not *always* be slower than storing
>the actual message in the database? Most filesystems are not optimized for
>accessing large numbers of files in a directory, but databases are. This
>isn't noticeable until you get a few hundred files in one directory, then it
>gets painful. Why go through all sorts of pain writing code to deal with it
>when it's just as easy to stick the actual message in the database?
0. You may not need a database other than the file system. Fast,
1. The file system is optimized for accessing reasonable numbers of
files in a directory. Storing data in the database involves the
transfer and parsing + overhead from interactions between the database
and the file system (to make the db larger, etc). Not much code to
write. All you need is a hash function. I have a hard time imagining
that it would be faster to get 2 MB from a 20 GB database directly,
than to get a file name from a 20 MB database, and then the 2MB file
from the file system.
2. The original problem: the limitation on the database file size is
the same as the limit on individual files.
3. Usually, all you want the message for is as stdin for some program.
This is quite efficient for files.
4. The file system deals very efficiently with additions and especially
removal of files
5. The file system is [usually] more resistant to corruption, i.e. an
error is less likely to affect more than on file. With all respect to
MySQL: it relies on the file system, and thus can't be expected to be
6. You usually (e.g. linux) have a good system to dynamically use
currently unused system memory for file system caching. I would assume
that it works better for small files than for parts of a very large
file (e.g. db). For MySQL you dedicate memory, but this is almost
always a small subset of available memory [and better used for keys
If you have many small messages, you don't ever want to access them
outside of the db-dependent application, and crash resistance is not
one of your primary goals, then storing the data in a db is probably
If you are dealing with e.g. E-mail and you want to sync() before "250
OK", and the db is more of a way to find your way among the messages I
would store them as files. Pull the power cord after the 250 OK: your
file is on disk, you database entry in bit heaven. To serve file, all
you do is pass a file[handle/name] to a daemon or small process. With
the db, you need to maintain the db connection or buffer the file. If
you concurrently deliver a few hundred messages, the increased size of
the process serving the file becomes an important factor.
(Frederik Lindberg, Infectious Diseases, WashU, St. Louis, MO, USA)