List:General Discussion« Previous MessageNext Message »
From:Rick James Date:May 24 2013 4:17pm
Subject:RE: Bug in BETWEEN same DATETIME
View as plain text  
For years (even decades), I have stayed out of trouble by assuming a 'date' represents the
instant in time corresponding to midnight at the start of that day.  In MySQL (until 5.6),
that is equivalent to a 1-second DATETIME.  I also assume midnight belongs to the day that
it is the _start_ of.

> There is also a strong desire to make a database server try to do exactly what the
> user tells it to do.
That is difficult, because of definitions and representation.
A common problem is comparing a FLOAT value to a 'decimal' value like 1.23.  MySQL does a
good job of covering some cases, but there are still cases between DECIMAL, FLOAT,
DOUBLE, and literals that will register as inequality, to the surprise of the user.
I see the DATE problem as another thing where the user needs to understand the computer's
algorithm, which, as Shawn points out is:

"We do one thing (make dates represent midnight on that date when they need to be compared
to datetime values) and allow the users to decide how to handle the rest of the comparison
according to their specific needs."

>>    WHERE datetime_col >= '2013-01-01'
>>      AND datetime_col <  '2013-01-01' + INTERVAL 1 DAY
>but for predictability and reliability, this is one rewrite that may not always be
> true.
So, to be safe, one should perhaps say:
      WHERE datetime_col >= '2013-01-01 00:00:00'
        AND datetime_col <  '2013-01-01 00:00:00' + INTERVAL 1 DAY

IN_DATE (or maybe "ON_DAY") is an interesting idea.  I assume it would be transliterated
by the parser into something like the expression above, then optimized based on which
part(s) are columns and which are literals.

>         '2013-05-14 17:00:00' >= '2013-01-01'
>     AND '2013-05-14 17:00:00' <= '2013-05-14' + INTERVAL 12 HOUR
There's an extra second in that!  (I call it the "midnight bug".)

I perceive (rightly or wrongly) that comparing a TIMESTAMP to something first converts the
TIMESTAMP value to a string ('2013-...').  Shawn, perhaps this statement belongs as part
of the 'algorithm' explanation?

Yes, you might get in trouble if the same SELECT were run in two different timezones at
the same time.  Or, TIMESTAMP might help you get the "right" answer.

There are something like 5 different "datetime" concepts.  MySQL covers 2 of them.
    DATETIME is a picture of _your_ clock.
    TIMESTAMP is an instant in the _universe_.
For these, and others, think of a recurring event on a calendar, a sporting event, an
appointment (potentially in a diff timezone), train schedule, etc.

> -----Original Message-----
> From: shawn green [mailto:shawn.l.green@stripped]
> Sent: Friday, May 24, 2013 6:50 AM
> To: mysql@stripped
> Subject: Re: Bug in BETWEEN same DATETIME
> 
> Hello Rick,
> 
> On 5/23/2013 7:08 PM, Rick James wrote:
> > Watch out for CAST(), DATE(), and any other function.  In a WHERE
> clause, if you hide an indexed column inside a function, the index
> cannot be used for optimization.
> >
> >    INDEX(datetime_col)
> >    ...
> >    WHERE DATE(datetime_col) = '2013-01-01'
> > will not use the index!
> >
> > The workaround is messy, but worth it (for performance):
> >    WHERE datetime_col >= '2013-01-01'
> >      AND datetime_col <  '2013-01-01' + INTERVAL 1 DAY (or any of a
> > zillion variants)
> >
> > (Yeah, it seems like the optimizer could do the obvious
> transformation
> > for you.  Hint, hint, Shawn.)
> >
> 
> Or we could coerce datetime values back to their date values when both
> are being used. The trick now becomes choosing between rounding the
> datetime value (times past noon round to the next date) or do we use
> the
> floor() function all the time.
> 
> This has been discussed and the consensus was that the most predictable
> and performant behavior was to extend a date value to become a datetime
> value by associating it with midnight (0000).
> 
> Let's look at some examples:
> a)   '2013-05-14 07:00:00' = '2013-05-14'
> This is true as the datetime value is 7 hours after midnight.
> 
> b)    '2013-05-14 07:00:00' >= '2013-05-14'
>    AND '2013-05-14 07:00:00' < '2013-05-15'
> This is true as the time value is somewhen between both midnights.
> 
> c)    '2013-05-14 07:00:00' > '2013-05-14' + INTERVAL 8 HOURS
> This is false. The offset applied to the date term means the time
> portion of the resulting datetime value is 0800, not 0000. (0700 >
> 0800) is false.
> 
> d) And what if instead of comparing against the FLOOR() of each date we
> rounded datetime values up or down to their nearest dates?
>         '2013-05-14 17:00:00' = '2013-05-14'
> This would be false because the datetime value would have rounded up to
> '2013-05-15'.
> 
> 
> There is also a strong desire to make a database server try to do
> exactly what the user tells it to do. If the user wants to compare a
> value to another value with an equality check, we should do that.  It
> would be very odd behavior if an equality check suddenly turns into a
> ranged check.  I realize how much time it would save people to not need
> to include both ends of the range:
>  >    WHERE datetime_col >= '2013-01-01'
>  >      AND datetime_col <  '2013-01-01' + INTERVAL 1 DAY
> but for predictability and reliability, this is one rewrite that may
> not always be true.
> 
> Perhaps a new operator like IN_DATE is what you are looking for?
> (example)
>         '2013-05-14 17:00:00' IN_DATE '2013-05-14'
> This would be true.
> 
> But what if the date field were a due date and all submissions needed
> to be in by noon on those dates?  This IN_DATE operator would return
> true for submissions beyond the deadline, too (a wrong result).  But a
> direct comparison would be true:
>         '2013-05-14 17:00:00' >= '2013-01-01'
>     AND '2013-05-14 17:00:00' <= '2013-05-14' + INTERVAL 12 HOUR
> 
> Of course you could always write that as a stored function, too.
> (pseudo declaration)
> CREATE FUNCTION IN_DATE(dateval, datetimeval) (...)
> 
> (example usage)
>      IN_DATE('2013-05-14','2013-05-14 17:00:00') This could be true,
> too.  But a stored function would interfere with the use of indexes to
> resolve the query.
> 
> Then we get into the problems of how would this operator or function
> would handle comparisons to TIMESTAMP columns and many other variations
> like how to see if a datetime is within a 2 day span instead of one.
> It's a major ball of wax to rewrite these queries in the optimizer just
> to avoid one extra line of SQL code per comparison. To us "Keep It
> Simple, Stupid!" (KISS) seemed like the better approach to the problem.
> We do one thing (make dates represent midnight on that date when they
> need to be compared to datetime values) and allow the users to decide
> how to handle the rest of the comparison according to their specific
> needs.
> 
> Respectfully,
> --
> Shawn Green
> MySQL Principal Technical Support Engineer Oracle USA, Inc. - Hardware
> and Software, Engineered to Work Together.
> Office: Blountville, TN
> 
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql

Thread
Bug in BETWEEN same DATETIMEDaevid Vincent23 May
  • RE: Bug in BETWEEN same DATETIMETimothy R Peterson23 May
    • Re: Bug in BETWEEN same DATETIMEMichael Dykman23 May
      • RE: Bug in BETWEEN same DATETIMERick James23 May
  • Re: Bug in BETWEEN same DATETIMEAndrew Moore23 May
    • Re: Bug in BETWEEN same DATETIMEAndrew Moore23 May
  • Re: Bug in BETWEEN same DATETIMEshawn green23 May
    • RE: Bug in BETWEEN same DATETIMERick James23 May
      • Re: Bug in BETWEEN same DATETIMEshawn green24 May
        • RE: Bug in BETWEEN same DATETIMERick James24 May
          • Re: Bug in BETWEEN same DATETIMEshawn green24 May
        • Re: Bug in BETWEEN same DATETIMEhsv24 May
          • RE: Bug in BETWEEN same DATETIMERick James29 May