From: Alaric Snell-Pym Date: May 14 2009 11:11pm Subject: Re: MySQL Reengineering Project List-Archive: http://lists.mysql.com/internals/36694 Message-Id: <44AF1A0B-0206-4570-8B17-0E8BACDB250B@geniedb.com> MIME-Version: 1.0 (Apple Message framework v930.3) Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit On 14 May 2009, at 12:39 am, Michael Widenius wrote: > Alaric> That, combined with the fact that in my recent foray into > UDFs I found > Alaric> that string UDFs are passed a buffer with room for 255 > characters to > Alaric> put their result into, and then have the choice of using it > and > Alaric> returning a pointer to it or mallocing their own buffer and > returning > Alaric> a pointer to that (meaning that the caller, somewhere, must > have code > Alaric> to the effect of "if (returned_ptr != allocated_buffer) > Alaric> free(returned_ptr)" to clear up, I guess?), > > Actually the idea was that you would in your object remember the > malloc you did and then free it. > > So the code would be in the destructor: > > x_free(malloc_pointer); What object? In a UDF, I just declare a function, as per http://dev.mysql.com/doc/refman/5.1/en/udf-return-values.html - I surrender the pointer to mysql when I return, and have no chance to free it myself? > > Alaric> led me to suspect that > Alaric> there was an ingrained culture of functions plopping their > results > Alaric> into spaces left for them somewhere. > > The idea was to help the UDF to avoid doing mallocs during query > execution for most common queries. The assumption was that for most > UDF's 255 byte would be enough for their result. It's not 255 bytes, it's 255 characters "that may be multi-byte" - the manual didn't say how many bytes that might be, but I guess the charset encoding functions will cover that for me (I didn't need to research this any further, as I'm putting thirty or so hex digits and colons in!) > > No, the design was there from the start. One of the basic design of > the MySQL code base is to try to avoid calls to malloc during query > execution; We should instead try to do as much as possible when > at query exceution startup. > Avoiding malloc is a noble goal indeed! But for request-based systems, though, would it not be a good idea to have an Apache-style request memory pool system, which could allocate objects less than a certain size limit sequentially from pages of a suitably tuned size, thus amortizing mallocs behind an abstraction layer rather than having lots of little tricks here and there? Eg, if you try and malloc 50MB for a BLOB then it's worth going through proper malloc, but requests for a few hundred bytes can just be done by advancing a pointer in a single malloced page of several KB, and when a request comes along that's under the separate-block threshold but too big for the page, pop that page into a linked list of things to free when the request ends, and start a new fresh one. As well as reducing your malloc traffic, this improves SMP cache coherency by keeping more of the objects allocated by a thread together (most small objects allocated by a thread are likely to remain local, I'd expect), helps prevent memory leaks by "garbage collecting" everything at the end of a request (at the cost of needing special handling for objects that need to survive beyond the end of the request, eg that go into global data structures!), and lets you gather interesting profile statistics by counting bytes allocated per request or setting quotas, etc. The biggest downside is having to pass a memory pool argument through everywhere - although you can invoke Greenspun's Tenth Law and implement fluid-let by putting it into a pthread_specific key ;-) (As an interesting aside, a few weeks ago I dug up some old notes I'd made on this sort of thing, albeit from the perspective of garbage collected pure functional languages: http://www.snell-pym.org.uk/archives/2009/05/08/memory-management/ ) > Regards, > Monty ABS -- Alaric Snell-Pym ACGI MIAP MBCS Chief software engineer, GenieDB alaric@stripped