Hi,
Not sure i got all of that last answer, but i don't believe it was aimed
at me ;)
On a side note, we have a "direct tcp connection" between the two data nodes
[TCP]
NodeId1=3
NodeId2=4
HostName1=192.168.168.1
HostName2=192.168.168.2
Would it be possible to "add" the two api nodes to this direct tcp
connection? Something like
[TCP]
NodeId1=3
NodeId2=4
NodeId3=7
NodeId4=8
HostName1=192.168.168.1
HostName2=192.168.168.2
HostName3=192.168.168.1
HostName4=192.168.168.2
Would that work? It would then cut down the latency times i guess
BR
Mathieu
Mikael Ronstrom wrote:
> Hi,
>
> Vivid Design wrote:
>> Hi guys,
>> Thanks Frazer for the clarification, the distinction between primary and
>> backup fragmentation replicas was something I was unaware of.
>>
>> Perhaps a nice solution for NDBD node clusters with co-located
>> application
>> nodes would be to have a register of primary fragments on each data
>> node and
>> then have a proxy, such as MySQL Proxy, decide which node in the
>> cluster the
>> request should be sent to in order to get the best performance - which
>> presumably would be the application node with the most relevant number
>> of co-located NDBD primary fragments for the particular request.
>>
>
> The data node all have a register of where each primary and backup
> replica resides. The problem is to avoid deadlocks we always read the
> primary replica in transactions. However most reads from SQL uses
> READ COMMITTED where no locks are involved so in this case this isn't
> really necessary, so this optimisation I think can be done from the
> API level. The idea would be to start a new transaction but using the
> same transaction identity, in this case it is possible to send the
> read to any node and since the transaction identity is the same it
> will read the latest committed value or any updated value of the
> ongoing transaction. To do this would require some minor changes to
> the NDB API and to use it from the SQL level would in addition require
> some changes in the handler to use this new interface.
>
> It is also possible to improve inside transactions but this would
> require the commit phase to coordinate between several transaction
> coordinators.
>
> The API already have the knowledge to "guestimate" where it is most
> efficient to send the query and it also does this if told to do so,
> however it only does this per transaction.
>
> Rgrds Mikael
>
>> Is this feasible and/or practical?
>>
>> Cheers,
>>
>>
>>
>> 2009/10/6 Frazer Clement <Frazer.Clement@stripped>
>>
>>> Mathieu Agopian wrote:
>>>
>>>> Actually, to be more precise (having a "mytop" on both mysqld
>>>> servers):
>>>> As soon as the mysqld2 is started, the mytop on the mysqld2 starts
>>>> showing
>>>> very slow SELECTs, but not the mysqld1!
>>>>
>>>> This is really puzzling me
>>>>
>>> Hi,
>>> Hope I can clarify :
>>> The MySQL server generally processes SQL queries which join tables by
>>> using values from one table to lookup rows in another. With Ndb
>>> cluster,
>>> this can require many separate requests to the NDBD data nodes. As the
>>> requests are dependent on each other, in some cases the total time
>>> taken is
>>> proportional to number_of_requests * latency_between_MySQLD_and_NDBD.
>>>
>>> MySQL Cluster splits tables into fragments, each fragment contains
>>> some of
>>> the rows from the table. Between them, the fragments of a table
>>> contain all
>>> of it's rows.
>>> With NoOfReplicas = 2, there will be 2 copies (or replicas) of each
>>> table
>>> fragment. Each fragment replica is stored on a different NDBD node to
>>> tolerate NDBD node failures.
>>> Not all fragment replicas are equal, NDBD considers one replica to be
>>> 'primary' and the other(s) to be 'backup'. *To ensure consistency
>>> in the
>>> presence of other transactions, often the primary fragment replica
>>> must be
>>> read rather than a backup fragment replica.*.
>>> The choice of which fragment replicas are made primary, and which
>>> backup,
>>> is *made internally by MySQL Cluster when the table is defined*.
>>> Generally,
>>> it tries to balance the system so that each NDBD node has a similar
>>> number
>>> of primary fragments. This avoids system hotspots, under the
>>> assumption
>>> that no particular NDBD node is 'closer' to the API node(s) than
>>> another.
>>> When one NDBD node in a nodegroup fails, as part of failover, the
>>> other
>>> node(s) in the nodegroup switch their some (all with NoOfReplicas=2) of
>>> their backup fragment replicas to become primary.
>>>
>>> With your setup, you have 2 hosts, each with 1 MySQLD and 1 NDBD
>>> node and
>>> NoOfReplicas=2.
>>> When both NDBD nodes are running, roughly half of your table's
>>> fragment
>>> replicas will be Primary on one NDBD and half will be Primary on the
>>> other.
>>>
>>> The data which is primary on NDBD1 is 'close' to MySQLD1 as they
>>> share the
>>> same host and can communicate with high bandwidth and low latency
>>> through
>>> memory. The data which is primary on NDBD2 is not so 'close' to
>>> MySQLD1 as
>>> the network must be used. The same argument applies to MySQLD2
>>> accessing
>>> NDBD1.
>>>
>>> If you cause NDBD2 to fail, then all fragment replicas on NDBD1 will
>>> become primary, and they will all be 'close' to MySQLD1. This may
>>> result in
>>> the speedup you observe. However, now MySQLD2 will be at a
>>> disadvantage as
>>> no data is 'close' to it. It will see slower performance than when
>>> some of
>>> the primary fragments were 'close'.
>>>
>>> Summarising :
>>> For joins, MySQL Cluster is sensitive to latency between the executing
>>> MySQLD and NDBD nodes holding the relevant data.
>>> All fragment replicas primary on NDBD1 -> MySQLD1 is local, MySQLD2 is
>>> remote
>>> All fragment replicas primary on NDBD2 -> MySQLD1 is remote,
>>> MySQLD2 is
>>> local
>>> Fragment replicas primary on both NDBDs -> MySQLD1 + MySQLD2 are mixed
>>> local/remote
>>>
>>> Perhaps we should support a special case for two NDBD node clusters
>>> with
>>> co-located application nodes like yours, where we can 'bias' the
>>> system to
>>> have primary fragments on one or other NDBD node, so that at least
>>> one API
>>> gets the best performance possible. However, this would make an
>>> asymmetric
>>> system, and your application would have to deal with choosing the
>>> 'correct'
>>> MySQLD to get best performance. Also, if the favoured MySQLD failed
>>> for
>>> some reason, you'd get worst-case performance.
>>>
>>> Does this explain what you are observing?
>>> Frazer
>>>
>>>
>>>> Mathieu
>>>>
>>>> Mathieu Agopian wrote:
>>>>
>>>>> Hi Andrew again,
>>>>>
>>>>> I understand better now, even though it feels strange to have 2
>>>>> partitions when i'm only asking for 2 replicas (i thought that
>>>>> meant "mirror
>>>>> the data on the two data nodes"). Do you mean that it actually
>>>>> splits the
>>>>> data on the two nodes?
>>>>>
>>>>> Anyway, after some further "playing" with the cluster, here's some
>>>>> new
>>>>> data:
>>>>>
>>>>> node 1 and 2 are up (ndb_mgmd)
>>>>> node 3 (ndbd1) is up
>>>>> node 7 (mysqld1) is up
>>>>>
>>>>> => everything is ok
>>>>>
>>>>> Start node 8 (mysqld2) => very very slow queries again
>>>>>
>>>>> So it seems the issue is not coming from the data nodes themselves
>>>>> (or
>>>>> their replication) as there's only one up. It's really as soon as
>>>>> i start
>>>>> the second mysqld that everything goes wrong.
>>>>>
>>>>> Any further clues?
>>>>>
>>>>> Thanks again for your time and answers
>>>>>
>>>>> BR
>>>>>
>>>>> Mathieu
>>>>>
>>>>> Andrew Morgan wrote:
>>>>>
>>>>>> Hi Mathieu,
>>>>>>
>>>>>> there are others more qualified to explain exactly why your
>>>>>> specific
>>>>>> query
>>>>>> is having performance issues and how you might be able to
>>>>>> optimise it
>>>>>> for
>>>>>> Cluster.
>>>>>>
>>>>>> I'll comment on why you only see the performance hit when both
> data
>>>>>> nodes
>>>>>> are up and running.
>>>>>>
>>>>>> When you only have a single data node, both of the data
>>>>>> partitions will
>>>>>> be
>>>>>> active on that node - the query can therefore run within that
>>>>>> data node
>>>>>> and
>>>>>> it can be quick as all of the data is local (especially if
> you've
>>>>>> configured
>>>>>> it to have all data in memory). When the second data node is up
> and
>>>>>> running,
>>>>>> each data node in the node group is active for one of the 2
>>>>>> partitions
>>>>>> which
>>>>>> means that the data needed to satisfy your query is split across
>
>>>>>> both
>>>>>> data
>>>>>> nodes and so more network hops are needed to satisfy it and so it
>
>>>>>> takes
>>>>>> longer.
>>>>>>
>>>>>>
>>>>>> Regards, Andrew.
>>>>>>
>>>>>>
>>>>>> P.S. yes, reply-all is the best way to follow up; it’s good
> that the
>>>>>> whole
>>>>>> thread is visible to everyone on the receiver list.
>>>>>>
>>>>>> On 10/06/2009 11:12 AM, Mathieu Agopian wrote:
>>>>>> Hi, and thanks for the very fast replies!
>>>>>> We do acknowledge that the database isn't build for mysql
>>>>>> cluster. And
>>>>>> in
>>>>>> fact, we did get used to rather slow response times (sometimes up
>
>>>>>> to 5
>>>>>> seconds on some of those very slow queries).
>>>>>> However, in this particular case, it's working "as expected"
> when
>>>>>> there's
>>>>>> only one data node up, but not when there's the two nodes up. Is
>
>>>>>> that
>>>>>> "normal"?
>>>>>> We've been using a "one leg" ndb cluster for over a year now
> (other
>>>>>> issues
>>>>>> prevented the second node from coming up... partially because we
>
>>>>>> were
>>>>>> using
>>>>>> a beta version at that time). Now that we have the cluster
> properly
>>>>>> setup on
>>>>>> a 7.0.7 GA version, we'd rather have it working fully with ndb
> if
>>>>>> possible. So if it's not "normal" to have those _very_ slow
>>>>>> response times
>>>>>> only when
>>>>>> the two data nodes are up, is there anything i could check?
>>>>>> configuration
>>>>>> parameters, specific log messages...
>>>>>> I'll also be checking the links Andrew gave me, thanks a lot ;)
>>>>>> Mathieu
>>>>>> P.S: how does this list work, should i hit the "reply to all"
>>>>>> button on
>>>>>> my
>>>>>> mailer each time?
>>>>>> Andrew Morgan wrote:
>>>>>>
>>>>>>
>>>>>> Hi Mathieu,
>>>>>> A couple of other 'compromise' approaches:
>>>>>> - You can mix and match MySQL Cluster with other storage
>>>>>> engines; for
>>>>>> some applications it may make sense to use Cluster for some
>>>>>> tables and
>>>>>> MyISAM or InnoDB for others. - If it does make sense for most
>>>>>> of your
>>>>>> application to use Cluster for these tables but there are
>>>>>> particular use
>>>>>> cases where you need to perform
>>>>>>
>>>>>> the type of query where the distributed nature of Cluster makes
> it
>>>>>> less than ideal then you can choose to replicate the Cluster
>>>>>> data to
>>>>>> another MySQL Server where these tables are held using a
>>>>>> different
>>>>>> storage engine - a couple of links on this topic:
>>>>>>
> http://www.clusterdb.com/mysql-cluster/mysql-cluster-flexibility-of-replicat
>>>>>>
>>>>>>
>>>>>> ion/
>>>>>>
> http://johanandersson.blogspot.com/2009/05/ha-mysql-write-scaling-using-clus
>>>>>>
>>>>>>
>>>>>> ter-to.html Regards, Andrew.
>>>>>> -----Original Message----- From: Geert.Vanderkelen@stripped
> [mailto:
>>>>>> Geert.Vanderkelen@stripped] Sent: 06 October 2009 10:44 To:
> Mathieu
>>>>>> Agopian Cc: cluster@stripped Subject: Re: very slow
> select
>>>>>> queries only when havint both data nodes up
>>>>>> Mathieu,
>>>>>> On Oct 6, 2009, at 11:37 , Mathieu Agopian wrote: .. mysql>
>
>>>>>> SELECT
>>>>>> Hotspots.id, Hotspots.`Name`, Hotspots.`SSID`,
> Hotspots.`Address`,
>>>>>> Hotspots.`City`, Country.`Name` AS `Country`,
> Hotspots.`PostalCode`,
>>>>>> LocationTypes.`MarketingName` AS `LocationType`,
> Hotspots.`GeoLat`,
>>>>>> Hotspots.`GeoLng`, Hotspots.`Description`,
>>>>>> InboundPartner.`PricePerMinuteRetail` AS `PricePerMinute` FROM
>>>>>> Hotspots,
>>>>>> InboundPartner, LocationTypes, Country WHERE
>>>>>> (Hotspots.InboundPartnerID =
>>>>>> InboundPartner.id) AND Hotspots.`LocationType` =
>>>>>> `LocationTypes`.`id` AND
>>>>>> Hotspots.`Country` = Country.`id` AND (Hotspots.`Country` =
>>>>>> '276') AND
>>>>>> (Hotspots.`City` = 'Gersthofen') AND (Hotspots.`Show` = 'yes')
> AND
>>>>>> (Hotspots.Deleted = 'no') ORDER BY Hotspots.Country,
> Hotspots.City,
>>>>>> Hotspots.`PostalCode`; // One pointer we have at the
>>>>>> moment: the
>>>>>> "Description" field is a "text" field with no length (is that a
>>>>>> blob? a
>>>>>> memo?), and if we take it out of the query, it seems to be
>>>>>> executing much
>>>>>> faster. Yes, that's a 'blob', and blob is best not used in
>>>>>> Cluster,
>>>>>> especially with joins (which is sometimes the case for other
>>>>>> storage engines
>>>>>> too). Maybe the best thing to start is to get the join reduced to
>
>>>>>> minimum
>>>>>> data needed.
>>>>>> That application wasn't build for MySQL Cluster, right? Just
>>>>>> ALTER ..
>>>>>> ENGINE=NDB will in most cases not work out well.
>>>>>> Cheers,
>>>>>> Geert
>>>>>>
>>>>>>
>>>> --
>>>> MySQL Cluster Mailing List
>>>> For list archives: http://lists.mysql.com/cluster
>>>> To unsubscribe:
>>>> http://lists.mysql.com/cluster?unsub=1
>>>>
>>>>
>>> --
>>> Frazer Clement, Software Engineer, MySQL Cluster Sun Microsystems -
>>> www.mysql.com
>>> Office: Edinburgh, UK
>>>
>>> Are you MySQL certified? www.mysql.com/certification
>>>
>>>
>>>
>>> --
>>> MySQL Cluster Mailing List
>>> For list archives: http://lists.mysql.com/cluster
>>> To unsubscribe:
>>> http://lists.mysql.com/cluster?unsub=1
>>>