Hello,
Just for the record, I found the problem and overcome the contention.
The key was: innodb_thread_concurrency
Quoting the documentation:
"innodb_thread_concurrency is the variable that limits the number of
operating system threads that can run concurrently inside the InnoDB
engine. Rest of the threads have to wait in a FIFO queue for execution.
Also, threads waiting for locks are not counted in the number of
concurrently executing threads."
When the tests started, we set it to the recommended value: 2xNumber of
Cores.
I tried also setting it up with a crazy number....
Only when I disabled it (set global innodb_thread_concurrency = 0;) the
reads went crazy and I was able to do 1.1M reads per second :-)
The CPU is now the bottleneck (which makes sense) and reached around
93-94%, at that point I am not able to go over 1.1M r/s.
The mutex_delay never appeared again after disabling the transactions limit.
Manuel.
2012/12/7 Manuel Arostegui <manuel@stripped>
> Hello all,
>
> I am testing handlersockets performance in a 5.5.28-29.1-log (Percona)
> server.
> These are the enabled options:
>
> loose_handlersocket_port = 9998
> loose_handlersocket_port_wr = 9999
> loose_handlersocket_threads = 48
> loose_handlersocket_threads_wr = 1
> innodb_spin_wait_delay=0
>
> The machine has 24 (Xeon - 2.00GHz) cores and 64GB RAM. We are using
> bonding to make sure the ethernets aren't limiting us here (we get around
> 90Mbps)
>
> We are able to handler around 500K requests per second using handler
> socket plugin. Even though it looks pretty impressive number, it's still
> not close to the 750K ones Yoshinori is able to get (
>
> http://yoshinorimatsunobu.blogspot.com.es/2010/10/using-mysql-as-nosql-story-for.html
> )
> The machine is acting a normal slave in a cluster - receiving normal
> traffic from our site (we do this on purpose to see how many requests we
> can handle in a normal workload environment)
>
> Obviously we're not expecting to get similar numbers as our tests aren't
> the same.
> However, doing a bit of profiling to try to determine what's the
> bottleneck here we've seen this:
>
> CPU: Intel Architectural Perfmon, speed 2000.26 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
> mask of 0x00 (No unit mask) count 100000
> samples % image name symbol name
> *2163868 24.2065 mysqld mutex_delay*
> 694540 7.7696 mysqld
> build_template(row_prebuilt_struct*, THD*, TABLE*, unsigned int)
> 626405 7.0074 mysqld buf_page_get_gen
> 607664 6.7978 mysqld rec_get_offsets_func
> 480937 5.3801 mysqld cmp_dtuple_rec_with_match
> 412081 4.6098 mysqld btr_cur_search_to_nth_level
> 365402 4.0876 mysqld rec_init_offsets
> 356064 3.9832 mysqld page_cur_search_with_match
> 310819 3.4770 mysqld row_search_for_mysql
> 274248 3.0679 mysqld row_sel_store_mysql_rec
> 208466 2.3320 mysqld my_pthread_fastmutex_lock
> 185853 2.0791 mysqld
> ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int,
> ha_rkey_function)
> 182939 2.0465 mysqld pfs_mutex_enter_func
> 154369 1.7269 mysqld mtr_memo_slot_release
> 138558 1.5500 mysqld page_check_dir
> 131622 1.4724 mysqld dict_index_copy_types
> 101162 1.1317 mysqld srv_conc_force_exit_innodb
> 72754 0.8139 mysqld
> ha_innobase::change_active_index(unsigned int)
> 65191 0.7293 mysqld my_long10_to_str_8bit
> 62574 0.7000 mysqld btr_pcur_store_position
> 61900 0.6925 mysqld pfs_mutex_exit_func
> 51889 0.5805 mysqld Field_long::pack_length() const
> 49073 0.5490 mysqld srv_conc_enter_innodb
> 44079 0.4931 mysqld
> ha_innobase::init_table_handle_for_HANDLER()
> 38998 0.4363 mysqld Field_tiny::pack_length() const
> 38868 0.4348 mysqld rec_copy_prefix_to_buf
> 36292 0.4060 mysqld
> ha_innobase::innobase_get_index(unsigned int)
>
> That mutex_delay is eating quite a big % of the time.
> I have not been able to find what is that related to. Does anyone has a
> clue about what's it and if there's a way to improve and overcome it?
>
> Cheers
> Manuel.
>