On 12/17/2012 01:05 AM, Stewart Smith wrote:
> Ruedi Steinmann <ruedi.steinmann@stripped> writes:
>> We are planning to build an asynchronous data-warehouse (we hope to
>> achieve 0.1-1s delay). We use a MySQL database for our production system
>> and would like to use the binary log of MySQL to keep the warehouse
>> Are there any substantial changes to the BinLog envisaged which could
>> render the binlog useless for our purpose?
> While Oracle is unlikely to comment on future directions, I can provide
> a pretty well educated guess.
> 5.6 introduces some decent changes, and it's likely that the same kind
> of features may be implemented slightly differently in MariaDB (maybe
> Monty or Kristian can comment more).
> The binlog itself is notoriously hard to parse, although the Oracle guys
> did get out an API (although not marked as stable, it should be rather
> usable). The binlog was never really meant to be a point for other
> software to interface with MySQL and so like any software, when you
> delve into things that are meant to be internal, you may hit problems.
> My advice would be: you're safe if you don't upgrade, and it's a much
> better plan to use binlog as a trigger to run normal SQL queries to
> populate your data warehouse rather than using the binlog itself to
> populate it.
I cannot comment on future directions, but here is a rough outline on
how changes to the binary log are usually handled. Note that this is not
a promise on that it will always be done that way neither in the short
nor the long term. It's just information about what kind of changes that
the code supports. You can find more information about it in our book
MySQL High Availability
(http://shop.oreilly.com/product/9780596807290.do). If you see anything
missing, please send me a note and I'll see if I can add it to the next
revision of the book.
Each binary log have a version number that is expected to match.
Currently the version number is 4 and it has been 4 since 5.0 even
though the binary log have been changed many times. The format of
different versions of the binary log *can* be completely different and
can potentially require completely different decoders.
In version 4, events can be added and a slave that receives an unknown
event is expected to stop. There is one exception to this, and that is
/informational events/. The informational events are intended to provide
useful information that is not critical to replication, so if the slave
receives an informational event, it should be able to discard it and
continue processing events. The informational events can contain
"comments", or information that can improve performance of slaves that
Other events, with the exception of two (Format_description_log_event
and Rotate_event), can change in two ways:
1. Fields can be added to the common header, and are then added at the
end, but the events are expected to execute the same way even if only
the original fields are read.
2. Field can be added to the event-specific header, and are then added
at the end, but the event is expected to execute the same way even if
only the original fields are read.
The format description log event is the information carrier for this.
The length of the common header as well as the length of the
event-specific headers (called the post-header) is given here.
So, when you write code to parse the binary log, you will probably have
code to read the initial N bytes of either the common header or the
post-header. If new fields are added, the initial N bytes will still
contain the same information, and it is expected that the event will be
executed the same way even if the additional bytes are ignored.
As Stewart pointed out, there is a binary log API at
https://launchpad.net/mysql-replication-listener, but it is very rough.
It does work though, even though it cannot parse all events.
I haven't looked at the library that Jeremy Cole wrote, so I cannot
comment on this.
Just my few cents,
Senior Principal Software Developer
Oracle, MySQL Department