I've been trying to build a message-routing framework on top of NDB. The
basic idea is to have queues of messages, which get picked by worker
processes, which transform and route the message, inserting them to new
My design has been to have several workers for the same queue for
reliability and HA. Each worker would do something like:
* scan queue for items (with no locks)
* attempt to lock some of the items
* for each successful lock:
* process item (includes reading and writing some other data in NDB)
* delete from queue
Now each worker can proceed in parallel, excluding others for the same
message through a lock in NDB with automatic failover within a lock
timeout. All transactions are async.
I'm only getting something like 500 messages/second throughput (similar
results on running both ndbd and worker on one box or each on its own
box over 100M ethernet). 500/sec is not cutting it for our needs.
The problem is that I'm a bit stuck: I can't find a good way to figure
out what the bottlenecks are and if I can do anything about them. I
can't find references to relative cost of different operations in NDB,
profiling tools for the data node or for NDBAPI application programs.
I've done basic tuning for batch parameters to sendPollNdb, the number
of items to handle in parallel and simplifying the processing task and
got up from an initial 100 req/sec to the 500, but I'm making very
little further process.
I'm CPU bound on a single box, split pretty evenly between my
application program and the two ndbd(s). On several boxes I'm seeing
quite low CPU usage on ndbd.
Any pointers (e.g., profiling NDB internal operations)?