Distinguishing between columns with UTC and local datetime - sql

I work on an application that stores datetimes in a SQL Server database. Some of these are a point in time stored in UTC (such as log item datetimes), while others are a literal date/time (such as "take medication X at 4pm on 20 July, irrespective of your timezone).
Problem is that these both have a date and time component, so using a datetime2 column type makes sense for both. We're now in a situation where it is often unclear in our app whether a date/time column is a UTC point in time or a literal date/time.
What is the most common practice to distinguish between these 2 cases? I can think of these options:
1) End all UTC columns in ...Utc, while literal date/time columns have no special ending.
2) End all literal columns in ...Literal, while UTC date/time columns have no special ending.
3) Give UTC columns the data type datetime2 and literal date/time columns datetimeoffset.

Always try to use the appropriate type first and then good naming. If datetime2(0) is a good fit, use it.
In my system I add a suffix to the column name, for example: PlaybackStartedLocal datetime2(0), PlaybackStartedUTC datetime2(0). In my case I have to store both local and UTC values for the same event, because some reports need local value, some UTC and it is very difficult to convert between them later.
In general it is a good practice to include units of measurement into the column/variable name.
What do you prefer to see:
PlaybackDurationMSec or PlaybackDuration
LengthMeters / LengthMiles or Length
A well-known example when two teams of programmers didn't notice that they were interpreting metric values as imperial and visa versa: A disaster investigation board reports that NASA’s Mars Climate Orbiter burned up in the Martian atmosphere because engineers failed to convert units from English to metric.
The software calculated the force the thrusters needed to exert in
pounds of force. A separate piece of software took in the data
assuming it was in the metric unit: newtons.

Related

Storing DATETIME with and without TIMEZONE

Databases often store a datetime without timezone as a separate type as a datetime with a timezone. As an example, I'll use BigQuery (though most databases store this the same):
DATETIME is a date/time that does not store timezone.
TIMESTAMP is a date/time that does store timezone.
I understand abstractly that "Dec 2 at 2:45 PM" is a different time in Japan as it is in New York, but I'm wondering why that even matters if all the dates are stored in UTC by the application. For example, if the value to be inserted is:
2021-12-02 14:45:00
Wouldn't that value be inserted as 2021-12-02 14:45:00 UTC in both data types? Or, would the 2:45PM be stored as "2:45 PM UTC" in the DATETIME type but would be stored as (if using EST) 2:45 PM EST --> 6:45 PM UTC in the TIMESTAMP type?
And if the value was:
2021-12-02 14:45:00 EST
Wouldn't that value also be inserted as 2021-12-02 18:45:00 UTC in both data types, and stored the same way? It seems to only 'timezone' is on the query-side, and can't that act either as a cursor-variable or some sort of metadata on the field (similar to a NULL check)? I guess I'm not following why the timezone-aware and no-timezone need to be stored as two different types if all date/times get stored as UTC anyways.
The SQL standard defines two types for a date with time-of-day:
TIMESTAMP (Also known more clearly in some databases such as Postgres as TIMESTAMP WITHOUT TIME ZONE)
TIMESTAMP WITH TIME ZONE
The first type is meant to purposely lack any context of time zone or offset-from-UTC. So noon on the 23rd of January next year, 2022-01-23 12:00, means noon anywhere and everywhere. It means noon on Tokyo Japan as well as noon on Toulouse France, and also noon in Toledo Ohio US. These are all obviously different moments, several hours apart. Therefore, this type cannot represent a moment, is not a specific point on the time line.
The second type does represent a moment, is a specific point on ne timeline. When you want to track actual moments, such as when a row was written to the database, or when a shipment arrived in a warehouse, use this type.
Unfortunately, the SQL spec says little about the various date-time types and behaviors. So the various database products vary widely in their support for these types and their interpretations of behavior.
In some databases such as Postgres, a value submitted to a column of the fist type (TIMESTAMP WITHOUT TIME ZONE) containing an indicator of zone or offset will have the date and time recorded as submitted. No adjustment is made. Any zone or offset input is ignored and discarded.
In some databases such as Postgres, a value submitted to a column of the second type (TIMESTAMP WITH TIME ZONE) containing an indicator of zone or offset will have its date and time adjusted to UTC before being written to the database. In such databases, this type is always in UTC, that is, represents a moment as seen with an offset of zero.
What is an offset? Merely a number of hours-minutes-seconds ahead of UTC (+) or behind UTC (-). A time zone, in contrast, is much more. A time zone has a name in Continent/Region format, and contains the history past, present, and future changes to the offset used by the people of a particular region as decided by their politicians.
So the TIMESTAMP WITH TIME ZONE type in databases such as Postgres is a misnomer. No time zone information is stored in the database. Any time zone or offset info submitted with the date and time is used to adjust to UTC. The zone/offset info is then discarded. So if remembering the zone originally submitted is important to you, you will need to store that yourself in a second column. Regarding the misnomer, you can think of the type as being TIMESTAMP WITH REGARD FOR SUBMITTED OFFSET OR TIME ZONE. But be clear, in databases such as Postgres your moment is stored in UTC, always UTC, and is retrieved as UTC, always UTC.
Unfortunately, there is a wrinkle here. All too commonly, tooling and middleware will inject a default time zone, adjusting the retrieved UTC moment into some time zone. While well-intentioned, this anti-feature creates the illusion that the value was stored with that time zone. But the values actually stored in UTC, at least with databases such as Postgres.
You asked:
2021-12-02 14:45:00 Wouldn't that value be inserted as 2021-12-02 14:45:00 UTC in both data types?
No.
In a column of a data type akin to the TIMESTAMP WITHOUT TIME ZONE, that date and time will be stored as submitted, a quarter to 3 PM on the second of December this year.
In a column of a data type akin to TIMESTAMP WITH TIME ZONE, the value stored may depend on the behavior of your particular database and your particular middleware, tooling, and driver. The behavior might simply assume you meant 2021-12-02 14:45:00 as seen in UTC, and store that. Or the behavior may assume you meant 2021-12-02 14:45:00 as seen in a particular time zone. And in databases such as Postgres, an adjustment to UTC would be applied before finally storing. You must study the documentation of your particular database, middleware, tooling, and driver to discover which behavior will be seen in your software. Be sure to conduct experiments to verify your understanding.
You asked:
2021-12-02 14:45:00 … Or, would the 2:45PM be stored as "2:45 PM UTC" in the DATETIME type but would be stored as (if using EST) 2:45 PM EST --> 6:45 PM UTC in the TIMESTAMP type?
”Likely yes”, for the first clause. But no EST involved at all. The date is stored as-is, 2021-12-02, along with the time-of-day as-is, 14:45:00. The EST part is ignored and discarded. (But experiment to verify this behavior in your particular tooling.)
And “maybe” for the second clause. As discussed in last bullet above, the behavior for TIMESTAMP WITH TIME ZONE may vary. Read docs, and conduct experiments.
You said:
though most databases store this the same
No, incorrect. That would be a very big “No”.
Databases vary widely in their support for date-time features, their kinds of date-time types, the names of their types, the technical details of their types, and the behaviors of the database server, middleware, drivers, and tooling.
Some older database systems have legacy data types supplanted by newer types, but all still supported, further complicating the picture.
You said:
I guess I'm not following why the timezone-aware and no-timezone need to be stored as two different types if all date/times get stored as UTC anyways.
You incorrectly assume that the “no-timezone” type stores in UTC. It does not.
That is what the “no-timezone” means: no regard for an offset or zone, no consideration for any offset or zone, no adjustment for any offset or zone, and no concept of offset or zone. The TIMESTAMP WITHOUT TIME ZONE type means simply, literally, a date, a time-of-day, and nothing more. Anything more than that is either (a) a figment of your imagination, or (b) interference by your middleware/tooling/drivers.

Does the SQL type TIME WITH TIMEZONE make sense?

While doing the mapping of some database columns into Java classes I stumbled onto this obscure SQL-92 Standard type (implemented by PostgreSQL, H2, and HyperSQL afaik). I haven't ever used it, but I wanted to understand how clearly map it to a Java type if I ever find it.
Here are the variants I can see:
Case A: The TIME type, such as 15:20:01. It's a "local time". The time zone is evident to the application so the database doesn't record it.
Case B: The TIME with offset, as in 15:20:01+04:00. It represents a "world time". This time can be converted trivially to UTC, or to any other world clock.
Case C: A TIME with a time zone, such as 15:20:01 EDT. Since the rules to interpret a time strongly depend on the specific date I can't really make any sense of it without the date; but then, if I add the date, it becomes a TIMESTAMP, and that's something totally different.
So, did the SQL Standard get it wrong? Or maybe "TIME with time zone" should be always interpreted as "time with offset" (case B)?
For lots of reasons, that you described well, interpreting a point in time with time of day and variable time zone but without a date is effectively undefined. There are use cases though, where you're establishing a policy within an international context this would be a helpful data type. Everyday at 15:20:01+04:00 the cats need to take a nap. Now the intention isn't to evaluate value in iosolation but within the context of adding it to a baseline date. Standards are all about supporting theoretical possibilities eaven if they're not super common.
Case C, a TIME with a time zone, such as 15:20:01 EDT, can be meaningful for things like store opening hours. Imagine you have a nationwide chain of stores. You want to store each store's standard opening hours in the database. The opening and closing time is a local time with an associated time zone. It isn't a time with a UTC offset (your case B), since it is defined in each store's local time zone, and hence daylight savings–or more rarely a change in the time zone definition–will change the UTC offset without actually changing the value of the opening time column. This store opens at 9am year round, but because its time zone has daylight savings, that is a different UTC offset at different times of year. But we aren't storing a date, because the standard opening/closing times are date-independent. (Maybe we'd have effective-from/effective-to dates, or similar, to track changes to standard opening hours over time.)
It isn't exactly case A, because imagine you have a table of stores, with opening_time and closing_time columns – if they are in different timezones, then case A would make those columns be a mix of data from different time zones, without being explicit about that. Now, given the poor support for case C in most databases, that's probably what happens – you'll probably store the time zone as an additional column. But Case C isn't useless in principle, unlike what many people think.

what is realdate in SQL?

I have some SQLite database in which one of the columns has data type as realdate and the column has value as 2453137.5
can anyone please comment on this?
any help is appreciated :)
From SQLlite Docs
SQLite does not have a storage class set aside for storing dates and/or times. Instead, the built-in Date And Time Functions of SQLite are capable of storing dates and times as TEXT, REAL, or INTEGER values:
TEXT as ISO8601 strings ("YYYY-MM-DD HH:MM:SS.SSS").
REAL as Julian day numbers, the number of days since noon in Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian calendar.
INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC.
Applications can chose to store dates and times in any of these formats and freely convert between formats using the built-in date and time functions.
In your example you are using REAL datatype to store Dates. It will give the output which is not human readable.
For eg., If i'm storing current date and time
CREATE TABLE
IF NOT EXISTS DATEREAL (d1 real);
INSERT INTO DATEREAL (d1)
VALUES(julianday('now'));
SELECT * from DATEREAL;
Output : 2458792.7882345
You can read this using built-in date() and time() as shown below
SELECT
date(d1),
time(d1)
FROM
datereal;
Output :
date(d1) time(d1)
2019-11-05 06:55:03
Check demo here
One of the powerful features of SQLite is allowing you to choose the storage type.
Real number has 2 advantages:
High precision regarding fraction seconds
Longest time range
I got this answer from a user named Zso.
Here's the link to the original post How do DATETIME values work in SQLite?.
Hope this might help you to understand better.

SQL equals does not work for timestamps?

My table has a category 'timestamp' where the timestamps are formatted 2015-06-22 18:59:59
However, using DBVisualizer Free 9.2.8 and Vertica, when I try to pull up rows by timestamp with a
SELECT * FROM table WHERE timestamp = '2015-06-22 18:59:59';
(directly copy-pasting the stamp), nothing comes up. Why is this happening and is there a way around it?
FYI, saying "the timestamps are formatted 2015-06-22 18:59:59" is incorrect if you are indeed using a TIMESTAMP type. Such types have their own internal representation of a date-time value, almost always a count since epoch. In your case with Vertica, 8 bytes are used for such storage. The formatting of the date-time value happens when a string representation is generated. Never confuse the string representation with the date-time value. Conflating the two may well be related to your problem/confusion.
A few different thoughts about possible problems…
String Literals
Are you sure Vertica takes strings as timestamp literals? That format you used is common SQL format. But given that Vertica seems to be a specialized database, I would double-check that.
If strings are not allowed, you may need to call some kind of function to transform the string into a date-time values.
Fractional Second
As the comment by Martin Smith points out, the doc for Timestamp-related data types in Vertica 7.1 says those types can have a fractional second to resolution of microseconds. That means up to 6 decimal places of a fraction.
So if you are searching for "2015-06-22 18:59:59" but the stored value is "2015-06-22 18:59:59.012345", no match on the query.
Half-Open
The fractional seconds issue described above is often the cause of problems people have when handling a span of time. If you naïvely try to pinpoint the ending time, you are likely to have problems. Seeing the "59:59" in your example string makes me think this applies to you.
The better approach to spans of time is "Half-Open" (or Half-Closed, whatever) where the beginning is inclusive while the ending is exclusive. Common notation for this is [). In comparison logic this means: value >= start AND value < stop. Notice the lack of EQUALS SIGN in the stop comparison. In English we would say "look for an hour's worth of invoices starting at 2:00 PM and going up to, but not including, 3:00 PM".
Half-Open for a week means Monday-Monday, for a month the first of one month to the first of the next month, and for a year the January 1 of one year to January 1 of the following year.
Half-Open means not using BETWEEN in SQL. SQL's BETWEEN has often be criticized. Instead do something like the following to look for an hour's worth of invoices. Notice the Z on the end of string literal which means "UTC time zone" ("Z" for "Zulu"). (But verify, as my SQL syntax may need fixing.)
SELECT *
FROM some_table_
WHERE invoice_received_ >= '2015-06-22 18:00:00Z'
AND invoice_received_ < '2015-06-22 19:00:00Z'
;
This query will catch any values such as '2015-06-22 18:59:59.654321" which seems to be eluding you.
Reserved Word
I hope you have not really named your table 'table' and your column 'timestamp'. Such use of keywords and reserved words can cause explicit errors or more subtle weird problems.
Tip: The easy way to avoid any of the over a thousand reserved words in various databases is to append a trailing underscore. The SQL standard explicitly promises to never using a trailing underscore in its reserved words. So use "timestamp_" rather than "timestamp". Another example: "invoice_" table and "received_" column. I recommend doing that as a habit on everything your name in SQL: columns, tables, constraints, indexes, and so on.
Time Zone
You are using the TIMESTAMP which is short for TIMESTAMP WITHOUT TIME ZONE. Or so I presume; the Vertica doc is vague but that is the common usage as seen in the Postgres doc, and may even be standard SQL.
Anyways, TIMESTAMP WITHOUT TIME ZONE is usually the wrong type for most business purposes. The WITH time zone is misnamed and often misunderstood as a consequence: It means "with respect for time zone" where data inputs that include an offset or other time zone information from UTC are adjusted to UTC during the INSERT/UPDATE operations. The WITHOUT type simply ignores any such offset or time zone information.
The WITHOUT type should only be used for the concept of a date-time generally without being tied to any one locality. For example, saying "Christmas this year starts at beginning of December 25, 2015". That means in any time zone rather than a specific time zone. Obviously Christmas starts earlier in Paris, for example, than in Montréal.
If you are timestamping legal documents such as invoices, or booking appointments with people across time zones, or scheduling shipments in various localities, you should be using WITH time zone type.
So back to your possible problem: Test how Vertica or your client app or your database driver is handling your input string. It may be adjusting time zones as part of the parsing of the string using your client machine’s current default time zone. When sent to the database, that value will not match the stored value if during storage no adjustment to UTC was made.
Tip: Generally best practice is to do all your storage and business logic in UTC, adjusting to local time zones only where expected by user.

When to use VARCHAR and DATE/DATETIME

We had this programming discussion on Freenode and this question came up when I was trying to use a VARCHAR(255) to store a Date Variable in this format: D/MM/YYYY. So the question is why is it so bad to use a VARCHAR to store a date. Here are the advantages:
Its faster to code. Previously I used DATE, but date formatting was a real pain.
Its more power hungry to use string than Date? Who cares, we live in the Ghz era.
Its not ethically correct (lolwut?) This is what the other user told me...
So what would you prefer to use to store a date? SQL VARCHAR or SQL DATE?
Why not put screws in with a hammer?
Because it isn't the right tool for the job.
Some of the disadvantages of the VARCHAR version:
You can't easily add / subtract days to the VARCHAR version.
It is harder to extract just month / year.
There is nothing stopping you putting non-date data in the VARCHAR column in the database.
The VARCHAR version is culture specific.
You can't easily sort the dates.
It is difficult to change the format if you want to later.
It is unconventional, which will make it harder for other developers to understand.
In many environments, using VARCHAR will use more storage space. This may not matter for small amounts of data, but in commercial environments with millions of rows of data this might well make a big difference.
Of course, in your hobby projects you can do what you want. In a professional environment I'd insist on using the right tool for the job.
When you'll have database with more than 2-3 million rows you'll know why it's better to use DATETIME than VARCHAR :)
Simple answer is that with databases - processing power isn't a problem anymore. Just the database size is because of HDD's seek time.
Basically with modern harddisks you can read about 100 records / second if they're read in random order (usually the case) so you must do everything you can to minimize DB size, because:
The HDD's heads won't have to "travel" this much
You'll fit more data in RAM
In the end it's always HDD's seek times that will kill you. Eg. some simple GROUP BY query with many rows could take a couple of hours when done on disk compared to couple of seconds when done in RAM => because of seek times.
For VARCHAR's you can't do any searches. If you hate the way how SQL deals with dates so much, just use unix timestamp in 32 bit integer field. You'll have (basically) all advantages of using SQL DATE field, you'll just have to manipulate and format dates using your choosen programming language, not SQL functions.
Two reasons:
Sorting results by the dates
Not sensitive to date formatting changes
So let's take for instance a set of records that looks like this:
5/12/1999 | Frank N Stein
1/22/2005 | Drake U. La
10/4/1962 | Goul Friend
If we were to store the data your way, but sorted on the dates in assending order SQL will respond with the resultset that looks like this:
1/22/2005 | Drake U. La
10/4/1962 | Goul Friend
5/12/1999 | Frank N. Stein
Where if we stored the dates as a DATETIME, SQL will respond correctly ordering them like this:
10/4/1962 | Goul Friend
5/12/1999 | Frank N. Stein
1/22/2005 | Drake U. La
Additionally, if somewhere down the road you needed to display dates in a different format, for example like YYYY-MM-DD, then you would need to transform all your data or deal with mixed content. When it's stored as a SQL DATE, you are forced to make the transform in code, and very likely have one spot to change the format to display all dates--for free.
Between DATE/DATETIME and VARCHAR for dates I would go with DATE/DATETIME everytime. But there is a overlooked third option. Storing it as a INTEGER unsigned!
I decided to go with INTEGER unsigned in my last project, and I am really satisfied with making that choice instead of storing it as a DATE/DATETIME. Because I was passing along dates between client and server it made the ideal type for me to use. Instead of having to store it as DATE and having to convert back every time I select, I just select it and use it however I want it. If you want to select the date as a "human-readable" date you can use the FROM_UNIXTIME() function.
Also a integer takes up 4 bytes while DATETIME takes up 8 bytes. Saving 50% storage.
The sorting problem that Berin proposes is also solved using integer as storage for dates.
I'd vote for using the date/datetime types, just for the sake of simplicity/consistency.
If you do store it as a character string, store it in ISO 8601 format:
http://www.iso.org/iso/date_and_time_format
http://xml.coverpages.org/ISO-FDIS-8601.pdf
http://www.cl.cam.ac.uk/~mgk25/iso-time.html
Among other things, ISO 8601 date/time string (A) collate properly, (B) are human readable, (C) are locale-indepedent, and (D) are readily convertable to other formats. To crib from the ISO blurb, ISO 8601 strings offer
representations for the following:
Date
Time of the day
Coordinated universal time (UTC)
Local time with offset to UTC
Date and time
Time intervals
Recurring time intervals
Representations can be in one of two formats: a basic format
that has a minimal number of characters and an extended format
that adds characters to enhance human readability. For example,
the third of January 2003 can be represented as either 20030103
or 2003-01-03.
[and]
offer the following advantages over many of the locally used
representations:
Easily readable and writeable by systems
Easily comparable and sortable
Language independent
Larger units are written in front of smaller units
For most representations the notation is short and of constant length
One last thing: If all you need to do is store a date, then storing it in the ISO 8601 short form YYYYMMDD in a char(8) column takes no more storage than a datetime value (and you don't need to worry about the 3 millisecond gap between the last tick of the one day and the first tick of the next. But that's a matter for another discussion. If you break it up into 3 columns — YYYY char(4), MM char(2), DD char(2) you'll use up the same amount of storage, and get more options for indexing. Even better, store the fields as a short for yyyy (4 bytes), and a tinyint for each of MM and DD — now you're down to 6 bytes for the date. The drawback, of course, to decomposing the date components into their constituent parts is that conversion to proper date/time data types is complicated.