On 14 May 2009, at 12:39 am, Michael Widenius wrote:
> Alaric> That, combined with the fact that in my recent foray into
> UDFs I found
> Alaric> that string UDFs are passed a buffer with room for 255
> characters to
> Alaric> put their result into, and then have the choice of using it
> Alaric> returning a pointer to it or mallocing their own buffer and
> Alaric> a pointer to that (meaning that the caller, somewhere, must
> have code
> Alaric> to the effect of "if (returned_ptr != allocated_buffer)
> Alaric> free(returned_ptr)" to clear up, I guess?),
> Actually the idea was that you would in your object remember the
> malloc you did and then free it.
> So the code would be in the destructor:
What object? In a UDF, I just declare a function, as per
- I surrender the pointer to mysql when I return, and have no chance
to free it myself?
> Alaric> led me to suspect that
> Alaric> there was an ingrained culture of functions plopping their
> Alaric> into spaces left for them somewhere.
> The idea was to help the UDF to avoid doing mallocs during query
> execution for most common queries. The assumption was that for most
> UDF's 255 byte would be enough for their result.
It's not 255 bytes, it's 255 characters "that may be multi-byte" - the
manual didn't say how many bytes that might be, but I guess the
charset encoding functions will cover that for me (I didn't need to
research this any further, as I'm putting thirty or so hex digits and
> No, the design was there from the start. One of the basic design of
> the MySQL code base is to try to avoid calls to malloc during query
> execution; We should instead try to do as much as possible when
> at query exceution startup.
Avoiding malloc is a noble goal indeed!
But for request-based systems, though, would it not be a good idea to
have an Apache-style request memory pool system, which could allocate
objects less than a certain size limit sequentially from pages of a
suitably tuned size, thus amortizing mallocs behind an abstraction
layer rather than having lots of little tricks here and there?
Eg, if you try and malloc 50MB for a BLOB then it's worth going
through proper malloc, but requests for a few hundred bytes can just
be done by advancing a pointer in a single malloced page of several
KB, and when a request comes along that's under the separate-block
threshold but too big for the page, pop that page into a linked list
of things to free when the request ends, and start a new fresh one.
As well as reducing your malloc traffic, this improves SMP cache
coherency by keeping more of the objects allocated by a thread
together (most small objects allocated by a thread are likely to
remain local, I'd expect), helps prevent memory leaks by "garbage
collecting" everything at the end of a request (at the cost of needing
special handling for objects that need to survive beyond the end of
the request, eg that go into global data structures!), and lets you
gather interesting profile statistics by counting bytes allocated per
request or setting quotas, etc. The biggest downside is having to pass
a memory pool argument through everywhere - although you can invoke
Greenspun's Tenth Law and implement fluid-let by putting it into a
pthread_specific key ;-)
(As an interesting aside, a few weeks ago I dug up some old notes I'd
made on this sort of thing, albeit from the perspective of garbage
collected pure functional languages:
Alaric Snell-Pym ACGI MIAP MBCS
Chief software engineer, GenieDB