Date ranges: inclusive vs strict boundaries - sql

When allowing a user to select a date range, let's say:
Show me entries from [August 1] to [September 1]
As a user, I would generally expect this to include the results for September 1. Especially when you consider that when I select the same date for both ends, I obviously mean "from start of day to end of day":
Show me entries from [September 1] to [September 1]
As a programmer, I think of date boundaries as "zero-hour", i.e. "start-of-day"; logically, the entries for September 1 are actually after "2010-09-01 00:00:00" (hence outside of the range).
For example, in SQL the following condition would exclude everything:
SELECT * FROM entries
WHERE created_at >= DATE('2010-09-01') AND created_at <= DATE('2010-09-01')
Obviously, an adjustment needs to be made from the user input to the SQL to advance the end date by 24 hours.
However, that only applies to timestamp or datetime columns. When the column is a date then a direct comparison works and this adjustment should not be added.
In an MVC framework such as Rails, where do you handle the logic for this input mismatch before sending a query? If it's in the controller, it seems that relies too much on knowing the internal fields of the model (date vs. datetime), and if it's in the model, would a "find_in_date_range" method be understood as inclusive, or does that just invite off-by-day errors?
Finally, is my assumption correct for having the user interface represent inclusive ranges? Is this always the case or are there situations where a strict (exclusive) date boundary is more appropriate? For example, in my rake scripts I use the parameter END_DATE=2010-09-01 to capture up to this date, which is inconsistent with the UI, but makes sense to me: where do you draw this line?

My Personal Bias:
Program it however you think the users will like it, but ALWAYS make the inclusive vs exclusive choice explicit in the GUI with labeling. Rather than saying
Between _____________ and ______________
I always label as
On or after _______________ but before _______________
or
On or after ______________ through ______________
(If you are writing software that a few users use every day and you can carefully train them, then they aren't going to be reading the GUI labels anyway, so don't bother.)

Related

SQL equals does not work for timestamps?

My table has a category 'timestamp' where the timestamps are formatted 2015-06-22 18:59:59
However, using DBVisualizer Free 9.2.8 and Vertica, when I try to pull up rows by timestamp with a
SELECT * FROM table WHERE timestamp = '2015-06-22 18:59:59';
(directly copy-pasting the stamp), nothing comes up. Why is this happening and is there a way around it?
FYI, saying "the timestamps are formatted 2015-06-22 18:59:59" is incorrect if you are indeed using a TIMESTAMP type. Such types have their own internal representation of a date-time value, almost always a count since epoch. In your case with Vertica, 8 bytes are used for such storage. The formatting of the date-time value happens when a string representation is generated. Never confuse the string representation with the date-time value. Conflating the two may well be related to your problem/confusion.
A few different thoughts about possible problems…
String Literals
Are you sure Vertica takes strings as timestamp literals? That format you used is common SQL format. But given that Vertica seems to be a specialized database, I would double-check that.
If strings are not allowed, you may need to call some kind of function to transform the string into a date-time values.
Fractional Second
As the comment by Martin Smith points out, the doc for Timestamp-related data types in Vertica 7.1 says those types can have a fractional second to resolution of microseconds. That means up to 6 decimal places of a fraction.
So if you are searching for "2015-06-22 18:59:59" but the stored value is "2015-06-22 18:59:59.012345", no match on the query.
Half-Open
The fractional seconds issue described above is often the cause of problems people have when handling a span of time. If you naïvely try to pinpoint the ending time, you are likely to have problems. Seeing the "59:59" in your example string makes me think this applies to you.
The better approach to spans of time is "Half-Open" (or Half-Closed, whatever) where the beginning is inclusive while the ending is exclusive. Common notation for this is [). In comparison logic this means: value >= start AND value < stop. Notice the lack of EQUALS SIGN in the stop comparison. In English we would say "look for an hour's worth of invoices starting at 2:00 PM and going up to, but not including, 3:00 PM".
Half-Open for a week means Monday-Monday, for a month the first of one month to the first of the next month, and for a year the January 1 of one year to January 1 of the following year.
Half-Open means not using BETWEEN in SQL. SQL's BETWEEN has often be criticized. Instead do something like the following to look for an hour's worth of invoices. Notice the Z on the end of string literal which means "UTC time zone" ("Z" for "Zulu"). (But verify, as my SQL syntax may need fixing.)
SELECT *
FROM some_table_
WHERE invoice_received_ >= '2015-06-22 18:00:00Z'
AND invoice_received_ < '2015-06-22 19:00:00Z'
;
This query will catch any values such as '2015-06-22 18:59:59.654321" which seems to be eluding you.
Reserved Word
I hope you have not really named your table 'table' and your column 'timestamp'. Such use of keywords and reserved words can cause explicit errors or more subtle weird problems.
Tip: The easy way to avoid any of the over a thousand reserved words in various databases is to append a trailing underscore. The SQL standard explicitly promises to never using a trailing underscore in its reserved words. So use "timestamp_" rather than "timestamp". Another example: "invoice_" table and "received_" column. I recommend doing that as a habit on everything your name in SQL: columns, tables, constraints, indexes, and so on.
Time Zone
You are using the TIMESTAMP which is short for TIMESTAMP WITHOUT TIME ZONE. Or so I presume; the Vertica doc is vague but that is the common usage as seen in the Postgres doc, and may even be standard SQL.
Anyways, TIMESTAMP WITHOUT TIME ZONE is usually the wrong type for most business purposes. The WITH time zone is misnamed and often misunderstood as a consequence: It means "with respect for time zone" where data inputs that include an offset or other time zone information from UTC are adjusted to UTC during the INSERT/UPDATE operations. The WITHOUT type simply ignores any such offset or time zone information.
The WITHOUT type should only be used for the concept of a date-time generally without being tied to any one locality. For example, saying "Christmas this year starts at beginning of December 25, 2015". That means in any time zone rather than a specific time zone. Obviously Christmas starts earlier in Paris, for example, than in Montréal.
If you are timestamping legal documents such as invoices, or booking appointments with people across time zones, or scheduling shipments in various localities, you should be using WITH time zone type.
So back to your possible problem: Test how Vertica or your client app or your database driver is handling your input string. It may be adjusting time zones as part of the parsing of the string using your client machine’s current default time zone. When sent to the database, that value will not match the stored value if during storage no adjustment to UTC was made.
Tip: Generally best practice is to do all your storage and business logic in UTC, adjusting to local time zones only where expected by user.

Can this sql-statement be shortened?

I have this SQL query to a PostgreSQL database. Can it be shortened? I am thinking about the where part.
SELECT *
FROM reservations
WHERE (starts_at BETWEEN ? AND ?) OR (ends_at BETWEEN ? AND ?)
The values for the question-marks is:
The beginning of the current date in datetime format
The end of the the current date in datetime format
Same as point one
Same as point two
The code is meant to return all the reservations that begins or ends on a certain date. And works as it is supposed to. But I have to supply the same information multiple times into the query.
I so not actually use this exactly SQL, so there might be an obvious error somewhere, but please focus on the where part
I'm not a huge fan of BETWEEN in this context, because timestampor datetime can be fractional. In particular, specifying the last possible value on a given date is much more complicated than specifying the first possible value (midnight) because you have to specify the time as 23:59:59.999... out to whatever precision your RDBMS uses. PostgreSQL's timestamp is supposed to be accurate to the microsecond (1e-6 seconds), for example, so it's easy to specify a range that either includes times you don't want, or misses times that you do.
On the other hand, if you use BETWEEN with midnight of the following day so you don't have to know the precision of the time, you're including a time that doesn't exist in the date you're interested in. If your application is only precise to the second, or the minute, or to 5 minutes, then you may mis-categorize data or, worse, count it twice since it suddenly counts as being in two dates.
I would prefer:
WHERE (starts_at >= ? AND starts_at < ?)
OR (ends_at >= ? AND ends_at < ?)
Where the ? map to:
Midnight of the target date.
Midnight of the date after the target date.
Midnight of the target date.
Midnight of the date after the target date.
It's not as short, but it's decidedly safer unless you really want to specify your intervals that precisely.
However, you should not do the following, even though it's shorter:
WHERE DATE(starts_at) = ?
OR DATE(ends_at) = ?
You don't want to do that because it's not SARGEable.
This is also an example of why shortness or brevity is a poor measure of code quality. Generally, I'd order my preference like so:
Accuracy.
Performance.
Readability/maintainability.
Brevity.
No, you can't improve upon this.
WHERE (starts_at BETWEEN ? AND ?) OR (ends_at BETWEEN ? AND ?)

What data type to use for opening hours in a database

I want to keep track of opening hours of various shops, but I can't figure out what is the best way to store that.
An intuitive solution would be to have starting and ending time for each day. But that means two attributes per day, which doesn't look nice.
Another approach would be to have a starting time and a day to second interval for each day. But still, that means two attributes.
What is the most commonly and easiest way to represent this? I'm working with oracle.
The lowest granularity you need is probably minutes (actually, probably 15 minute intervals, but call it minutes).
Possibly you also want to consider day of the week.
If you use a table such as:
create table day_of_week_opening_hours(
id integer primary key,
day_of_week integer not null,
store_id integer not null,
opening_minutes_past_midnight integer default 0 not null,
closing_minutes_past_midnight integer default (24*60) not null)
Pop a unique constraint on store_id and day_of_week, and for a given store and day of the week you can find the opening time with:
the_date + (opening_minutes_past_midnight * interval '1 minute')
or ...
the_date + (opening_minutes_past_midnight / (24*60))
Shops that open 24 hours a day could be represented with a special code instead of opening times, in a separate table instead of with special opening and closing times, or maybe you could just leave the opening/closing times null.
create table day_of_week_24_hour_opening(
id integer primary key,
day_of_week integer not null,
store_id integer not null)
Think about shops that do not open at all on a given day as well, and how to represent that.
Probably you could do with a date-based override also, to indicate different (or no) opening hours on certain dates (Xmas, etc).
Quite an interesting problem, this.
I done something similar once and I used NUMBERs:
START_DATE END_DATE
0700 1800
0915 1745
0600 2300
1115 2215
It was easy to use with the plain SQL i.e. with BETWEEN clause.
What if a store has a break?
I think it makes total sense to have two Column One for Open DateTime and One for Close Datetime. Since Once a shop is open it will have to be closed someday/sometime.
My Suggestion
I would Create a separate table for shop Opening/Closing Times. Since everytime A shop is opened it will have a close time value as well so you wont have any unwanted nulls in you second column. to me it makes total sense to have a separate table altogether for shop opening closing times.
Since oracle does not have a date only or time only datatype, you are going to need to use the date datatype which is both date and time. The intuitive solution you mentioned looks pretty good to me.
Your requirements have two data points: an opening time and a closing time. There are different ways of representing the information, but you will need two attributes.
I understand why you think having two attributes will lead to "a longer condition to check" whether "a given time is in the interval". But Oracle provides many ways to manipulate its
DATE datatype. Find out more.
One useful mask is 'SSSSS', which is the number of seconds past midnight. So, if you hold opening and closing times as seconds you can check the interval like this:
select * from shop
where sysdate between trunc(sysdate) + (opening_time_sssss/86400)
and trunc(sysdate) + (closing_time_sssss/86400)
I'm not saying it's elegant but it is effective.
Actually, it would be possible to store the opening hours in a single attribute: by representing them as a bit mask in a RAW column. That would be harder to understand and even less elegant to check.

When to use separate date and time instead of a single datetime

If I want to store date and time, is it better to store them in a separate date and time or use a single datetime?
When should we use date and time instead of a single datetime?
I want to filter my queries either using date or time.
When you are talking about a moment in time, whether a universal moment, or a specific date and time on someone's local calendar, you use a datetime. If you want to be sure that you are talking about an exact moment in time, regardless of the observer, then you use a datetimeoffset.
If you are storing just a date then you mean a date without a time component, meaning "any time on this date".
If you are storing just a time then you mean a time without a date component, meaning "this time on any date", or "this time on a date determined by some other means".
There is no practical purpouse to having both a date and a time that are about the same thing, sitting on the same row. Just use a datetime for that.
In SQL Server 2008 you have date and time data types so this becomes a non issue.
If it is good choice it really depends by your business and how you will query you data.
If for example you want to know all the orders places between 1 and 2 PM for any day using a separated Date and Time column will make it quicker
If you intentionally do not care about the time, it's more efficient to store this data as a date datatype. Think a customer birthday column, there's not too many cases I can think of that would use this time. If there happens to be a time attached to it (often a bug), this needs to be removed via a convert statement in order to do a compare. It also consumes additional space if you don't need these values (3 bytes compared to 8).
I think it's similar to having a status code table with the id as a bigint instead of a tinyint or the like (depending on how many status codes you would plan to have).
It's just a matter of what you're using the data for, if you think there's a good chance you'll ever need the that data, then use datetime, otherwise use date.
Nothing brilliant about separating date and time,
Better you save date and time in Same column,
Here they have discussed the same issue check it : are-there-any-good-reasons-for-keeping-date-and-time-in-separate-columns
you can also get date and time separately by query
SELECT
CONVERT(VARCHAR(10),GETDATE(),111) as DatePart,
convert(varchar(15), getdate(), 108) TimePart

Converting string with US date and time format to UK format

I have an application that stores date and time in a string field in an SQL Server 2008 table.
The application stores the date and time according to the regional settings of the PC that is running and we can’t change this behavior.
The problem is that some PCs have to be in UK date format with 12h time (eg. 22/10/2011 1:22:35 pm) some with UK date format with 24h time (eg. 22/10/2011 13:22:25) and some have to be US date format (eg. 10/22/2011 1:22:35 pm) and (eg. 10/22/2011 13:22:25).
Is there any automatic way to change the string every time it changing/added to the table to UK 24h format so it will be always the same format in the database?
Can it be done using some trigger on update or insert? Is there any built-in function that already does that?
Even a script to run it from time to time may be do the job...
I’m thinking to break apart the string to day, month , year, hour, minute, second , AM/PM and then put the day and month part in dd/mm order and somehow change the hour part to 24h if PM, get rid of the “am” and “pm” and then put the modified date/time back to the table.
For example the table has
id datestring value Location
1 15/10/2011 11:55:01 pm BLAHBLAH UK
2 15/10/2011 13:12:20 BLAKBLAK GR
3 10/15/2011 6:00:01 pm SOMESTUFF US
4 10/15/2011 20:16:43 SOMEOTHERSTUFF US
and we want it to be
id datestring value Location
1 15/10/2011 23:55:01 BLAHBLAH UK
2 15/10/2011 13:12:20 BLAKBLAK GR
3 15/10/2011 18:00:01 SOMESTUFF US
4 15/10/2011 20:16:43 SOMEOTHERSTUFF US
We can display the date parts (day,month,year) correctly using the datepart function but with the time part we have problems because it changes too many ways.
Edited to explain some more
mr. p.campbell thanks for the edit .. i didn't know how to beautify it :)
and mr. Matthew, thank you for your quick reply..
We can tell if it is UK date or US date because we have another field i didn't mention with the text "US", "UK", "GR", "IT" according to where the PLC machine is located.
I'm sorry i didn't explain it to well. My english are not so good.
There are two different and independent applications. And they don't have direct relation with the sql server.
The application that only writes data to the database ..lets call it "the writer" for short.. and a different application that reads the data .. lets call it "the reader".
"The writer" is an internal application of a PLC machine that stores values every 1 min to the database that's why we can't change its behavior. It uses the string data type to store the date and the time at the same field according to the regional settings of the pc that a daemon application runs and does the communication between the pc and the PLC machine.
Now "the reader" expects the date and time to be in the format "dd/mm/yyyy 23:23:01" or "yyyy/mm/dd 23:23:01" and the only thing it does for now is doing some calculations with the data in the value field between given dates. eg. from 10/09/2011 10:00:00 to 15/09/2011 14:00:00.
we just need to do something like this ...
select * from table1 where datestring between "10/09/2011 10:00:00" and "15/09/2011 14:00:00"
I could post some of the code but it will be very long post.
At first, I agreed with Matthew, but then I realized that, given the information presented, this actually was possible (well, sorta).
However, some caveats;
You are doing nobody any favors by storing and maintaining the database this way. Your best bet is to change the application to have it give an actual Datetime value, not this mangled string.
This data CANNOT be meaningfully sorted by date or time (not without performing expensive string manipulation).
You appear to be storing all times as local times, but do not appear to be storing a TimeZone or related information. Without this information, you will NOT be able to (completely) correctly translate times 'globally'. For instance, which is later - 4PM in London, or 11AM in New York (for, say, an international conference call)? The answer is that you don't know: it depends on the time of year.
You are storing local times, period. This only works so long as local time is correct. What happens when somebody sets their clock to 1900? You should be storing time based off of the SERVER'S clock.
Your stored timestamp is based on a formatted string. If the user changes how their time is displayed, your data correctness (potentially) goes out the window. For instance, what if somebody removes the am/pm symbols, thinking "I'll look out the window - if the sun is out, it's 'am'"?
Please keep all of that in mind.
As to how to do this....
I'm not going to actually write out the SQL statement for this. Mostly because storing the information this way is pretty terrible. But also because it's going to take a lot of work I'd rather not do. I really recommend stressing to whomever has the keys at your place to get that application changed.
So instead, I'm going to give you a really big clue - and this will only work for so long as your timestamp format remains the same; You should be able to tell what format the date and time are in based on the presence and absence of 'am' and 'pm' in the string (if you don't have both, you're flat-out toast). As Matthew has pointed out, the formatting is also likely different for the date, as well as the time - you will need to translate both. However, this will immediately give you problems due to comparative timestamps (please see point 4, above); any attempt to run scheduling or auditing with this data id pretty much doomed to failure ("When did that happen?" "Well, it's in the UK date format, so..." "But that makes it 1AM here, and he was dead then!").
Most beneficial answer: Change how the information is stored in the database
EDIT:
And then it hits me (especially in light of the new edits) - there are potentially other possibilities that could actually make this work....
First, change your database to actually store some sort of 'globalized' timestamp, based off of the server's clock.
This will of course break your existing application code - it would get a data-type mismatch error. To fix that, rename the table, then create a view, named the same as the original table, that will return the string formatted as indicated in the 'source' column. You'll need to create instead-of triggers for the view, to translate the formatted string to an actual datetime value. The best part is, the application code should never notice the difference. You seem to have indicated that you have sufficient control over the database to allow this to happen; this should allow you to 'fix' the data transparently.
This of course works best if the incoming datetime values are absolute (not local). Hopefully, the values are actually supposed to be 'insert time' - these could likely be safely ignored, in favor of using a special register (like NOW or CURRENT DATE or whatever).
Can't believe this didn't hit me earlier...
You stated that you cannot change the application behavior, thus this is not possible.
Your problem is that your database doesn't know the culture / timezone settings of the client and your client doesn't report it.
You will need to report this data or think of clever ways to infer this information before you can act on it.
EDIT: For example, without knowledge of the client's details how could you tell the difference between the strings:
10/1/2011 12:00:00 (October First, noon, US)
10/1/2011 12:00:00 (January Tenth, noon, UK)
?