List:General Discussion« Previous MessageNext Message »
From:Baron Schwartz Date:June 28 2010 8:20pm
Subject:Re: selecting data for computation
View as plain text  
Tim,

What you have described is really a queue, although you are not seeing
it from that angle.  This is a very common pattern in this kind of
usage.  It's also tricky to make it correct (as you have discovered),
and both performance and scalability are problems in most
implementations.

A general-case good design for this is to have status fields, as
suggested, and who-owns-this-row fields.  Set the rows initially to
un-owned (NULL), and status 'new' or similar.  Then use a query such
as

update <table> set status = 'claimed', owner=connection_id() where
status = 'new' and owner is null limit 1;

Do that with autocommit enabled, so it does not hold locks longer than
needed.  (You will certainly need to use InnoDB for this to work.)  If
the statement affected any rows, then you just claimed a row, and you
can go query for the row and do the work, then mark it done.  As
previously suggested, add a timestamp column and periodically look for
rows that got claimed but not processed within some amount of time,
due to crashes or bugs or what have you.  Set those back to
new/unclaimed state.

After completing the jobs, move them to another table or just delete
them.  Do not let this table grow full of historic data.  It will
become a big performance problem if you do.

- Baron
Thread
selecting data for computationTim Gustafson24 Jun
  • Re: selecting data for computationDan Nelson24 Jun
    • Re: selecting data for computationTim Gustafson25 Jun
      • Re: selecting data for computationTim Gustafson25 Jun
        • Re: selecting data for computationDan Nelson25 Jun
          • Re: selecting data for computationTim Gustafson26 Jun
            • Re: selecting data for computationBaron Schwartz28 Jun
              • Re: selecting data for computationTim Gustafson1 Jul
      • Re: selecting data for computationJoerg Bruehe25 Jun