List:Cluster« Previous MessageNext Message »
From:Raymond Peachey Date:October 21 2013 9:54pm
Subject:Re: Data retrieval causes query times to spike
View as plain text  
I wanted to let you all know, that adding that 3rd node helped only
slightly during the job runs but it actually wasn't the underlying issue as
the MySQL has gone away errors.

The issues were:

1) A higher number of calls were hitting the Cluster than was intended (ie,
not making good use of memcache). Still, I'd like to be able to handle more
load properly. Once this was fixed, there was a notable drop in queries
hitting the cluster.
2) One of the two primary SQL nodes had a faulty ethernet cable and was
running at 100 Mb/Full instead of 1000Mb/Full. This was a primary cause for
the issue on that box (it was also the primary box running the job before
the job was offloaded). Now that this is fixed, the load on the two primary
nodes are much more even and the faulty node can now handle much more than
before. Looking at the graphs, it actually was bottlenecked every time we
had MySQL has gone away, until this fix was put into place.
3) The query that is most often called against the cluster was using a
WHERE IN (subquery). I had this adjusted to instead use a JOIN, and the
cluster was able to push these joins down to the data nodes, which made
another dramatic drop in bytes transferred, as well as on API operations
performed..

Thanks again for your assistance everyone!


On Wed, Oct 16, 2013 at 12:02 PM, Raymond Peachey <raypeachey@stripped>wrote:

> Thanks all, unfortunately, the tuning I had done did not help, but adding
> a 3rd node to offload the change tracking jobs to did. So the two primary
> MYSQLD nodes are now serving traffic, a weaker MYSQLD node is running the
> CT job.
>
> I am still open for any tuning suggestions (or things to look at) in order
> to combat higher load on a MySQL Cluster node. I know 7.3 has significant
> improvements on the API connections to the Data nodes, so our equivalent 30
> connections per primary MYSQLD node are better utilized. This single aspect
> seems to be the change with the most impact based on previous testing we
> did load testing MySQL Cluster.
>
> Anyway, thank you all for your assistance!
>
>
> On Tue, Oct 15, 2013 at 10:36 AM, Johan Andersson
> <johan@stripped>wrote:
>
>> If the tables are temporary  (if you lose the cluster the table + data
>> will be lost) you can do:
>>
>> set ndb_table_temporary=1;
>> CREATE TABLE...
>> set ndb_table_temporary=0;
>> REPLACE ...
>>
>>
>> Not sure it would give anything but the temporary tables are a bit less
>> expensive to create.
>>
>> Best regards
>> Johan
>> Severalnines AB
>>
>>
>>
>>
>> On Tue, Oct 15, 2013 at 4:25 PM, Raymond Peachey
> <raypeachey@stripped>wrote:
>>
>>> I agree, unfortunately it is not something that we can change at this
>>> point
>>> in time. One thing that's strange, is that if we take this particular
>>> node
>>> that the job is running against out of the load balancer (two sql nodes
>>> load balanced), then it runs fine against the other node, until the load
>>> becomes a bit unbearable, at which point that second node will start
>>> experiencing mysql has gone away as well.
>>>
>>> Perhaps it's got something to do with the global schema lock you mention?
>>> But if that's the case, wouldn't both nodes exhibit the same exact
>>> behavior
>>> at the same times? The problem is severely lessened when all the load is
>>> pushed on to the 2nd node.
>>>
>>> Unfortunately, increasing the MaxNoOfExecutionThreads did not help. I
>>> increased it from 8 to 30, to no avail. What I will be doing today is
>>> introducing a 3rd MySQLD node that is dedicated for this job, and leave
>>> the
>>> original two (much more powerful machines) dedicated to serving traffic.
>>>
>>>
>>> On Tue, Oct 15, 2013 at 1:06 AM, Hendrik Woltersdorf <
>>> hendrik.woltersdorf@stripped> wrote:
>>>
>>> > Creating a table in a cluster SQL node is an expensive operation, even
>>> if
>>> > the storage engine is not ndbcluster. The process creating the table(s)
>>> > needs to acquire the "global cluster schema lock". And that takes some
>>> > time. In my experience it is better to avoid creating (temporary)
>>> tables in
>>> > time critical processes.
>>> >
>>> >
>>> > regards
>>> > Hendrik Woltersdorf
>>> >
>>> > XCOM AG
>>> > Softwareentwicklung
>>> > Banksystem Front- und Middleoffice
>>> > Peter-Breuer-Str. 13
>>> > 08056 Zwickau
>>> > Germany
>>> >
>>> > Telefon +49 375 27008-580
>>> > Telefax +49 375 27008-25
>>> > Mobil   +49 152 22999 584
>>> > hendrik.woltersdorf@stripped
>>> > http://www.xcom.de
>>> >
>>> > schrieb Raymond Peachey:
>>> >
>>> >> No, there was no ALTER TABLE done recently. The job that runs,
>>> however,
>>> >>
>>> >> does create tables, and does indeed do a INSERT .. SELECT from the
>>> change
>>> >> tracking table and the master tables to the result tables. However,
>>> NDB is
>>> >> READ COMMITTED/ROW Based logging, so all reads should be consistent
>>> reads,
>>> >> non locking (correct me if I'm wrong, perhaps it's different with
>>> NDB?).
>>> >>
>>> >> We are currently running version: 5.5.29-ndb-7.2.10.
>>> >>
>>> >> There are a lot of queries that query against INFORMATION_SCHEMA to
>>> >> determine primary keys etc for the change tracking tables. The main
>>> >> culprit
>>> >> here probably is the query that joins the trigger tables to the
> master
>>> >> tables per job run:
>>> >>
>>> >> REPLACE result_table
>>> >> SELECT <column_list>
>>> >> FROM db.master_table AS P LEFT OUTER JOIN db.trigger_table AS CT ON
>>> >> p.pk_col = CT.pk_col
>>> >> WHERE timestamp_column > 'datetime';
>>> >>
>>> >> The number of records varies, but we have these jobs running every
> 5
>>> >> minutes, and so it is typically only ~500 records per table, so in
>>> this
>>> >> case, a total of about ~5000 per job run, and there is a little
>>> >> parallelism
>>> >> going on in that there's actually two jobs running simultaneously
>>> (against
>>> >> two separate databases), that make up that 5000 records processed.
>>> >>
>>> >> I could try removing the frm files and see if that helps.
>>> Additionally, I
>>> >> will look into increasing the execution threads today to see if
> that
>>> >> helps.
>>> >>
>>> >>
>>> >> On Thu, Oct 10, 2013 at 1:32 PM, Johan Andersson <
>>> johan@stripped
>>> >> >wrote:
>>> >>
>>> >>  Hi,
>>> >>>
>>> >>> Did you recently do an alter table on TABLE_XYZ and is it the
> same
>>> table
>>> >>> you run the query on?
>>> >>>
>>> >>> Not sure what that error message means "NDB online alter table
>>> commit:
>>> >>> distributing TABLE_XYZ timed out. Ignoring..."
>>> >>> But usually when i have problems like this, do stop the mysql
> server,
>>> >>> remove all the .frm files belonging to NDB tables, and start
> the
>>> server
>>> >>> again.
>>> >>> This way the server gets new fresh table meta data from the
>>> dictionary in
>>> >>> the data nodes.
>>> >>>
>>> >>> Run the query again, does it work?
>>> >>>
>>> >>> What does the query look like? Is the query trying to acquire
> locks?
>>> >>>
>>> >>> If you are not locking any records then the 233 error is super
>>> strange
>>> >>> (have you upgraded to the latest 7.2.12 ?).
>>> >>> How many records are you trying to read?
>>> >>>
>>> >>> Best regards
>>> >>> johan
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Thu, Oct 10, 2013 at 6:47 PM, Raymond Peachey <
>>> raypeachey@stripped
>>> >>> >wrote:
>>> >>>
>>> >>>  In the error log, I see this:
>>> >>>>
>>> >>>> 131009 20:02:53 [ERROR] NDB online alter table commit:
> distributing
>>> >>>> TABLE_XYZ timed out. Ignoring...
>>> >>>> 131009 20:03:52 [ERROR] Got error 233 when reading table
>>> >>>>
>>> >>>> I see the first error a lot -- the second error I see
> specifically
>>> >>>> during
>>> >>>> the trouble periods. 233 I think has to do with reaching a
> max
>>> limit of
>>> >>>> concurrent operations.
>>> >>>>
>>> >>>> We do not have TransactionDeadlockDetectionTi**meout
> defined, which
>>>
>>> >>>> means
>>> >>>> it should be at it's default of 1200 ms.
>>> >>>>
>>> >>>> SHOW GLOBAL VARIABLES LIKE '%timeout%' ;
>>> >>>> # Variable_name, Value
>>> >>>> connect_timeout, 10
>>> >>>> delayed_insert_timeout, 300
>>> >>>> innodb_lock_wait_timeout, 50
>>> >>>> innodb_rollback_on_timeout, OFF
>>> >>>> interactive_timeout, 28800
>>> >>>> lock_wait_timeout, 31536000
>>> >>>> net_read_timeout, 30
>>> >>>> net_write_timeout, 60
>>> >>>> slave_net_timeout, 3600
>>> >>>> wait_timeout, 28800
>>> >>>>
>>> >>>> Additionally, here's other settings we have configured, that
> I
>>> think may
>>> >>>> help:
>>> >>>>
>>> >>>> MaxNoOfConcurrentOperations=**1000000
>>> >>>> MaxNoOfConcurrentTransactions=**16384
>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Thu, Oct 10, 2013 at 11:38 AM, Johan Andersson <
>>> >>>> johan@stripped
>>> >>>>
>>> >>>>> wrote:
>>> >>>>>
>>> >>>> --
>>> >>>> MySQL Cluster Mailing List
>>> >>>> For list archives: http://lists.mysql.com/cluster
>>> >>>> To unsubscribe:    http://lists.mysql.com/cluster
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>>
>>> >
>>> > --
>>> > MySQL Cluster Mailing List
>>> > For list archives: http://lists.mysql.com/cluster
>>> > To unsubscribe:    http://lists.mysql.com/cluster
>>> >
>>> >
>>>
>>
>>
>

Thread
Data retrieval causes query times to spikeRaymond Peachey7 Oct
  • Re: Data retrieval causes query times to spikeMagnus BlĂ„udd9 Oct
    • Re: Data retrieval causes query times to spikeRaymond Peachey9 Oct
      • Re: Data retrieval causes query times to spikeWagner Bianchi9 Oct
        • Re: Data retrieval causes query times to spikeRaymond Peachey10 Oct
          • Re: Data retrieval causes query times to spikeHendrik Woltersdorf10 Oct
            • Re: Data retrieval causes query times to spikeRaymond Peachey10 Oct
              • Re: Data retrieval causes query times to spikeJohan Andersson10 Oct
                • Re: Data retrieval causes query times to spikeRaymond Peachey10 Oct
                  • Re: Data retrieval causes query times to spikeJohan Andersson10 Oct
                    • Re: Data retrieval causes query times to spikeRaymond Peachey14 Oct
                      • Re: Data retrieval causes query times to spikeHendrik Woltersdorf15 Oct
                        • Re: Data retrieval causes query times to spikeRaymond Peachey15 Oct
                          • Re: Data retrieval causes query times to spikeJohan Andersson15 Oct
                            • Re: Data retrieval causes query times to spikeRaymond Peachey16 Oct
                              • Re: Data retrieval causes query times to spikeRaymond Peachey21 Oct