From: Craig L Russell Date: November 27 2012 6:48pm Subject: Re: Problem with multithreading and ClusterJ List-Archive: http://lists.mysql.com/cluster/8442 Message-Id: <40C86A8C-6654-4A8D-86FE-A84EF068F0F4@oracle.com> MIME-Version: 1.0 (Apple Message framework v936) Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Hi Graeme, I'm curious to know whether you are guaranteed to create unique =20 primary keys with your test. Are you sure that the lock wait timeout =20 is actually an error? I don't have much to go on here, but you might be thinking that in =20 case of a duplicate, there will be no conflict because the second =20 "write" to arrive will just overwrite the first. But in fact, the =20 first session to write the key will hold a lock on the row (whether it =20= exists or not) until the transaction commits. The second session to =20 write the key will wait until the first transaction commits. If both transactions start, and the first transaction writes key 1, =20 and the second transaction writes key 2, and then the first =20 transaction tries to write key 2, and the second transaction tries to =20= write key 1, you have a real deadlock. If the internal queue from which threads pull requests is generating =20 unique keys, then we have some other problem that isn't obvious. Another thing that occurs to me is that for better performance, you =20 might want to try to restrict a session to insert only rows that =20 belong to a specific partition. That is, calculate the partition key =20 for the row and then delegate the write operation to the thread =20 (session) that is already operating in that partition. Craig On Nov 27, 2012, at 6:37 AM, Graeme Wallace wrote: > Magnus, > > Here's the schema - > > CREATE TABLE `linkedavailableroute2` ( > `routeKey1` bigint(20) NOT NULL, > `routeKey2` bigint(20) NOT NULL, > `routeKey3` bigint(20) NOT NULL, > `pointOfSaleCountryCode` char(3) DEFAULT NULL, > `possibleDupeJourney` char(1) DEFAULT NULL, > `pointOfCommencementCityCode` char(3) DEFAULT NULL, > `destinationCityCode` char(3) DEFAULT NULL, > `airportCodes` varchar(24) DEFAULT NULL, > `lastModifiedDate` int(11) DEFAULT '0', > `lastModifiedTime` int(11) DEFAULT '0', > `availability` char(26) NOT NULL, > `travelDateOffsets` smallint(6) DEFAULT '0', > PRIMARY KEY (`routeKey1`,`routeKey2`,`routeKey3`), > KEY `lastModifiedDateIndex` (`lastModifiedDate`) > ) ENGINE=3Dndbcluster DEFAULT CHARSET=3Dlatin1 > > and the Java Interface > > @PersistenceCapable(table=3D"linkedavailableroute2") > public interface Availability { > > @Column(name=3D"routeKey1") > long getRouteKey1(); > void setRouteKey1(long routeKey1); > > @Column(name=3D"routeKey2") > long getRouteKey2(); > void setRouteKey2(long routeKey2); > > @Column(name=3D"routeKey3") > long getRouteKey3(); > void setRouteKey3(long routeKey3); > > @Column(name=3D"pointOfSaleCountryCode") > String getPointOfSaleCountryCode(); > void setPointOfSaleCountryCode(String pointOfSaleCountryCode); > > @Column(name=3D"possibleDupeJourney") > String getPossibleDupeJourney(); > void setPossibleDupeJourney(String possibleDupeJourney); > > @Column(name=3D"pointOfCommencementCityCode") > String getPointOfCommencementCityCode(); > void setPointOfCommencementCityCode(String =20 > pointOfCommencementCityCode); > > @Column(name=3D"destinationCityCode") > String getDestinationCityCode(); > void setDestinationCityCode(String destinationCityCode); > > @Column(name=3D"airportCodes") > String getAirportCodes(); > void setAirportCodes(String airportCodes); > > @Column(name=3D"lastModifiedDate") > int getLastModifiedDate(); > void setLastModifiedDate(int lastModifiedDate); > > @Column(name=3D"lastModifiedTime") > int getLastModifiedTime(); > void setLastModifiedTime(int lastModifiedTime); > > @Column(name=3D"availability") > String getAvailability(); > void setAvailability(String availability); > > @Column(name=3D"travelDateOffsets") > short getTravelDateOffsets(); > void setTravelDateOffsets(short travelDateOffsets); > > } > > And the run() method from my Runnable implementation - (i have a > ExecutorService with a configurable number of threads that pulls =20 > these off > its internal queue) > > public void run() { > > Session session =3D > ThreadContext.getInstance().getSession().get(); > Availability availability =3D > ThreadContext.getInstance().getAvailability().get(); > Integer persistedCount =3D > ThreadContext.getInstance().getPersistedCount().get(); > > if (session =3D=3D null) { > System.out.println("Thread " + > Thread.currentThread().getName() + " creating session"); > session =3D > ThreadContext.getInstance().getSessionFactory().getSession(); > session.currentTransaction().begin(); > > ThreadContext.getInstance().getSession().set(session); > > availability =3D = session.newInstance(Availability.class); > > > ThreadContext.getInstance().getAvailability().set(availability); > > persistedCount =3D 0; > } > > > availability=20 > .setRouteKey1(AvailabilityHelper.generateRouteKey(_originCityCode, > _destinationCityCode, _carrierCode, _travelDate)); > availability.setRouteKey2(_flightNumber); > availability.setRouteKey3(0); > > availability.setAirportCodes(_airportCodes); > availability.setAvailability(_availability); > availability.setDestinationCityCode(_destinationCityCode); > availability.setLastModifiedDate(_lastModifiedDate); > availability.setLastModifiedTime(_lastModifiedTime); > =20 > availability.setPointOfSaleCountryCode(_pointOfSaleCountryCode); > availability.setPossibleDupeJourney(_possibleDupeJourney); > > availability=20 > .setPointOfCommencementCityCode(_pointOfCommencementCityCode); > availability.setTravelDateOffsets(_travelDateOffsets); > > session.savePersistent(availability); > > > ThreadContext.getInstance().getPersistedCount().set(++persistedCount); > > if (persistedCount % TRANSACTION_LIMIT =3D=3D 0) { > System.out.println("Thread " + > Thread.currentThread().getName() + " committing " + persistedCount); > session.currentTransaction().commit(); > session.currentTransaction().begin(); > } > > } > > The ThreadContext is just a singleton wrapper around some =20 > ThreadLocal's to > store Session and Availability objects. SessionFactory is an instance > variable. > > > > On Tue, Nov 27, 2012 at 6:12 AM, Magnus Bl=E5udd = >wrote: > >> Hi Graeme, >> >> Normally MySQL Cluster is able to cope with very high read and =20 >> write load, >> but unfortunately something is not right. Do you think you could =20 >> show us >> schema and the code you use? >> >> Quickly looked in SaveTest.java which shows how to use =20 >> savePersistent() >> and savePersistentAll(), I assume you use the latter? Could serve =20 >> as a >> starting point for a reproducable public test case. >> >> >> / Magnus >> >> >> On 11/26/2012 06:49 PM, Graeme Wallace wrote: >> >>> Ok tried reducing transaction size down to 4K rows. Works for 2 =20 >>> threads, >>> but its slower than doing one thread with larger transactions. >>> >>> If i up the threads to 4, i get the same error as before >>> >>> Nov 26, 2012 12:44:24 PM com.mysql.clusterj.tie.Utility throwError >>> SEVERE: Error in NdbJTie: returnCode -1, code 266, mysqlCode 146, =20= >>> status >>> 1, classification 10, message Time-out in NDB, probably caused by >>> deadlock . >>> >>> >>> Am I just going about this wrong ? I assumed that as i have a =20 >>> cluster of >>> 10 datanodes i would be able to have lots of threads and therefore =20= >>> lots >>> of writes going on at the same time. (Caveat - i read the marketing >>> material for 1 billion reads and writes a second :) ) >>> >>> regards, >>> >>> >>> >>> Graeme >>> >>> >>> >>> On Mon, Nov 26, 2012 at 8:44 AM, Magnus Bl=E5udd = >> >> =20 >>> wrote: >>> >>> On Mon 26 Nov 2012 04:33:57 PM CET, Graeme Wallace wrote: >>> >>> I've tried playing around with transaction size, but the =20 >>> error >>> still >>> remains even if I make the transaction small ie 32K rows =20 >>> AND i >>> make all the >>> primary keys unique - so that each transaction should have a >>> unique set of >>> keys. >>> >>> G. >>> >>> >>> On Mon, Nov 26, 2012 at 8:30 AM, Magnus Bl=E5udd >>> >> >>>>> __wrote: >>> >>> On Mon 26 Nov 2012 04:09:42 PM CET, Graeme Wallace wrote: >>> >>> I'm trying to make my app multi-threaded and =20 >>> running into >>> problems. >>> >>> I have a Session local to each thread that is =20 >>> attempting >>> to write to the >>> db. I'm batching up many savePersistent() calls in a >>> transaction - but >>> when >>> the transaction commits I invariably end up with >>> >>> Nov 21, 2012 7:22:29 PM =20 >>> com.mysql.clusterj.tie.Utility >>> throwError >>> SEVERE: Error in NdbJTie: returnCode -1, code 266, >>> mysqlCode 146, status >>> 1, >>> classification 10, message Time-out in NDB, probably >>> caused by deadlock . >>> >>> For reading the docs, it looks like there shouldn't =20= >>> be >>> table locking going >>> on - so i dont understand what resource is being =20 >>> held by >>> one thread that >>> stops the others from being able to write at the same >>> time. >>> >>> Any clues would be most helpful, >>> >>> regards, >>> >>> >>> Graeme >>> >>> >>> Check out Matthew's reply in in this forum thread >>> = http://forums.mysql.com/read.***__*php?25,505712,505757>> > >>> = >> > >>>> = >>> > >>> = >> > >>>>> >>> >>> "Due to the distributed nature of NDBCLUSTER there is no >>> automatic >>> deadlock detection mechanism within the NDBCLUSTER =20 >>> engine. >>> The only >>> indication that the cluster gives that there is a =20 >>> *possible* >>> deadlock is >>> the "Lock wait timeout exceeded". This error occurs =20 >>> when a >>> given lock >>> operation takes longer than >>> TransactionDeadlockDetectionTi**__**meout >>> (default 1200 ms.). If you have large transactions that =20= >>> lock >>> many rows at >>> once or if you have long running transactions these can >>> prevent this or >>> transactions from completing quickly. Try to avoid =20 >>> large or >>> long running >>> transactions by breaking them into smaller chunks. The >>> TransactionDeadlockDetectionTi**__**meout can also be =20 >>> reached >>> when a node is >>> overloaded but has not yet been ejected from the =20 >>> cluster for >>> missing >>> heartbeats longer than 4xHeartbeatIntervalDbDb (default =20= >>> 1500 >>> ms. each). " >>> >>> / Magnus >>> >>> >>> >>> >>> >>> Give it a try with even smaller transactions if possible. =20 >>> Normally >>> better to use more threads with smaller transactions. >>> >>> / Magnus >>> >>> >>> >>> >>> -- >>> Graeme Wallace >>> CTO >>> FareCompare.com >>> O: 972 588 1414 >>> M: 214 681 9018 >>> >>> >>> >> > > > --=20 > Graeme Wallace > CTO > FareCompare.com > O: 972 588 1414 > M: 214 681 9018 Craig L Russell Architect, Oracle http://db.apache.org/jdo 408 276-5638 mailto:Craig.Russell@stripped P.S. A good JDO? O, Gasp!