From: Fred Lindberg Date: March 18 1999 5:06pm Subject: Re: Storing large files in the database List-Archive: http://lists.mysql.com/mysql/506 Message-Id: <19990318170907.17869.qmail@id.wustl.edu> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit [long] On Thu, 18 Mar 1999 09:55:44 -0600, Ed Carp wrote: >Can someone explain to me why this will not *always* be slower than storing >the actual message in the database? Most filesystems are not optimized for >accessing large numbers of files in a directory, but databases are. This >isn't noticeable until you get a few hundred files in one directory, then it >gets painful. Why go through all sorts of pain writing code to deal with it >when it's just as easy to stick the actual message in the database? 0. You may not need a database other than the file system. Fast, portable, resource-effcient. 1. The file system is optimized for accessing reasonable numbers of files in a directory. Storing data in the database involves the transfer and parsing + overhead from interactions between the database and the file system (to make the db larger, etc). Not much code to write. All you need is a hash function. I have a hard time imagining that it would be faster to get 2 MB from a 20 GB database directly, than to get a file name from a 20 MB database, and then the 2MB file from the file system. 2. The original problem: the limitation on the database file size is the same as the limit on individual files. 3. Usually, all you want the message for is as stdin for some program. This is quite efficient for files. 4. The file system deals very efficiently with additions and especially removal of files 5. The file system is [usually] more resistant to corruption, i.e. an error is less likely to affect more than on file. With all respect to MySQL: it relies on the file system, and thus can't be expected to be more reliable. 6. You usually (e.g. linux) have a good system to dynamically use currently unused system memory for file system caching. I would assume that it works better for small files than for parts of a very large file (e.g. db). For MySQL you dedicate memory, but this is almost always a small subset of available memory [and better used for keys anyway]. If you have many small messages, you don't ever want to access them outside of the db-dependent application, and crash resistance is not one of your primary goals, then storing the data in a db is probably better. If you are dealing with e.g. E-mail and you want to sync() before "250 OK", and the db is more of a way to find your way among the messages I would store them as files. Pull the power cord after the 250 OK: your file is on disk, you database entry in bit heaven. To serve file, all you do is pass a file[handle/name] to a daemon or small process. With the db, you need to maintain the db connection or buffer the file. If you concurrently deliver a few hundred messages, the increased size of the process serving the file becomes an important factor. -Sincerely, Fred (Frederik Lindberg, Infectious Diseases, WashU, St. Louis, MO, USA)