Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 months ago.
This post was edited and submitted for review 3 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I've seen a lot of projects where epoch format was used to represent dates. But as far as I know most of developers use ISO-8601 format for timestamp. I wonder what benefits ISO-8601 has over epoch?
I found this question on SO, but it's more about proper date formatting in general. I know epoch timestamps often used to represent last update or creation time - but I think it's rather about databases. For me epoch timestamp is a little confusing, at least because different languages utilize different time units (i.g. it's seconds in PHP, but milliseconds in Java, Javascript, etc.) while ISO-8601 format is standard.
tl;dr
Which value you would rather see when debugging a critical problem at 3 AM?
1668329462529
2022-11-13T08:51:02.529Z
Use ISO 8601 text for clarity
Regarding the tracking of time using an elapsed count from some epoch reference, two issues to consider:
Various granularities are used by various protocols such as whole seconds, milliseconds, microseconds, hundreds of microseconds, nanoseconds, etc.
Various epoch references are used by various systems/protocols.
So encountering a mere number that is supposed to represent a date-time can be ambiguous. Such numbers are often error-prone because of incorrect assumptions or miscommunications.
A mere number cannot be parsed as a moment by a human mind. A timestamp such as 15687402956 or 1668329058 is unreadable to mere mortals. So debugging is made difficult. And invalid values may go unnoticed.
Generally better to store and exchange date-time values as text in standard ISO 8601 format. This standard defines formats that are thoughtfully designed to be relatively clear and unambiguous across human cultures and languages.
Offset of zero
And it is generally best to report moments (specific points on the timeline) with an offset-from-UTC of zero hours-minutes-seconds unless you have a specific reason to use offsets.
For example, in Java:
Instant instant = Instant.now() ;
String output = instant.toString() ; // Generates text in ISO 8601 format.
See this code run at Ideone.com.
2022-11-13T08:17:31.634762Z
The Z at the end of an ISO 8601 string means an offset of zero, and is pronounced “Zulu”.
Going the other direction:
Instant instant = Instant.parse( "2022-11-13T08:17:31.634762Z" ) ;
ZonedDateTimes
Note that one shortcoming of ISO 8601 is the omission of time zones. The standard covers only offsets.
The java.time.ZonedDateTime class in Java wisely extends the ISO 8601 standard to append the time zone name in brackets when generating text. One of the biggest advantages of ZonedDateTime over an OffsetDateTime is that it adjusts the time zone offset automatically (e.g. in the case of DST or when a country changes its time zone offset).
ZoneId z = ZoneId.of( "Asia/Tokyo" ) ;
ZonedDateTime now = ZonedDateTime.now( z ) ;
String output = now.toString() ; // ISO 8601 format extended to append the name of the zone in brackets.
When run in Ideone.com. Note the formal time zone name (Continent/Region) at the end.
2022-11-13T17:26:08.717876+09:00[Asia/Tokyo]
Hopefully, this representation of time zones will catch on as a de facto standard, and someday be added to the official standard.
Definitions
A quick review of offset & time zone:
An offset is merely a number of hours-minutes-seconds ahead or behind the temporal prime meridian of UTC.
A time zone is a named history of the past, present, and future changes to the offset used by the people of a particular region, as decided by their politicians.
When to use count
The only place I would use a count as a way of tracking time would be where resources are extremely constrained. A count of milliseconds takes only 32-bits.
In commonplace business app programming, such a need has largely evaporated. Memory is cheap, storage is cheap.
In contrast, debugging is expensive, and business miscommunication is expensive. So use ISO 8601 text wherever practicable.
Related
We are working on an API that has an authentication header in which a timestamp has to be processed. Since we might need more use of timestamps in the future and want to be consistent in what timezone we use, I'd like to know if there is a conventional timezone for APIs. UTC+0 for example.
Thanks in advance
In some cases, you will find that HTTP headers (such as the standard Date header) is in RFC5322 format (aka RFC2822, RFC822). For example: Tue, 31 Jan 2017 17:45:00 GMT
In the Date header, the time zone abbreviation is always"GMT" (which is equivalent to UTC for this purpose), even though the RFC5322 format allows for other time zones.
The above format is not really preferred, it's just something we're stuck with. You can certainly use any format you wish for your own HTTP headers though.
A much better format is the RFC3339 format. This is similar to the ISO8601 extended format. An example is: 2017-01-31T17:45:00Z. The Z at the end indicates "Zulu" time, which is the same as GMT or UTC. However, you could also specify a time zone offset from UTC, such as the equivalent local time in the US Pacific time zone: 2017-01-31T09:45:00-08:00
If you have Unix time numbers like 1485884700, the time zone is always UTC. However, this is not a good interchange format because it is not human readable, and offers no context for what epoch or precision is being used. One would have to know those things externally. It is not a good fit for HTTP headers, nor XML or JSON.
If you are talking about the body of your XML/JSON requests, then you should only be using ISO8601. There are a few other formats that people use, but they are not recommended.
With regard to what time zone you should use - that's entirely dependent on context. If you are timestamping - where you take the current time and record it, then you can indeed just work with UTC. I would think for authorization purposes, this would be reasonable. You could also work in a local time, as long as you provided the offset from UTC along with it so there is no ambiguity.
However, you said (in comments) that your data contains information about job vacancies. That sounds to me like you are talking about time in the future - and that is an exception to the "always UTC" rule. Any time you are talking about time in the future, you need to express that in terms of local time, in the most direct form you can, and you also need to provide the identifier (not the offset) of the time zone that applies.
For example, if I am talking about a job vacancy, I might say in my data:
{
"job": "dishwasher",
"available: "2017-02-13",
"start": "08:00",
"end": "16:00",
"tz": "America/New_York"
}
These values would follow the ISO8601 format for date-only and time-only (or if I wanted to combine them 2017-02-13T08:00) - and would not have a time zone specified. Instead, the IANA time zone identifier America/New_York (for US Eastern Time) is provided in a separate field.
This matters because perhaps the job continues into future months. On March 12th, the US Eastern time zone will change from UTC-5 to UTC-4. Therefore, one cannot specify a single offset, nor can you use UTC.
Also, even for a single occurrence, keep in mind that some countries continue to change their minds about what their time zone and DST rules are, which is why there are dozens of updates to the tz database every year. If you were to calculate an offset for a date and time in the future, by the time it rolls around you might find the offset had been changed by the government.
Looking at the functions here. The formatting shows fractional seconds available to the millisecond (0.001) but is it accurate to the millisecond? I have not been able to find the resolution of these calls in any of the documentation.
https://www.sqlite.org/datatype3.html#section_2_2
SQLite does not have a storage class set aside for storing dates
and/or times. Instead, the built-in Date And Time Functions of SQLite
are capable of storing dates and times as TEXT, REAL, or INTEGER
values:
TEXT as ISO8601 strings ("YYYY-MM-DD HH:MM:SS.SSS").
REAL as Julian day numbers, the number of days since noon in Greenwich on November
24, 4714 B.C. according to the proleptic Gregorian calendar.
INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC.
Applications can chose to store dates and times in any of these
formats and freely convert between formats using the built-in date and
time functions.
It appears that you can reach the highest resolution with ISO8601 strings. There should be no problem with accuracy with these strings, as long as you're not mixing storage representations.
This depends on the date format.
INTEGER numbers are accurate to the second.
TEXT values are accurate to the millisecond. You could specify more digits in the fractional seconds fields, but the built-in function will ignore all after the first three.
The resolution of Julian day numbers is better than a millisecond, but when formatting them, the built-in functions will not output more than three fractional digits.
If you want to store time values with higher resolution than is permitted by the date and time functions, you can opt for INTEGER storage, but you're on your own for conversion functions and the like.
Storing unix epoch timestamps with up to nanosecond precision will cover dates from 1970 until minimally the year 2262 (if nanoseconds). You're on your own for conversion to/from ISO date strings and time unit-relative arithmetic, but if you're content with that, then you'll have a fast, high precision and compact storage format.
I have a system that receives sensor values from devices at a rate timed in nanoseconds, so storing and indexing on those timestamps has been very helpful. I do have to provide a query interface that converts from ISO, Python datetime and other formats into ns and back, but that's the deal.
Working with 8-byte timestamps helped keep the record length short, with quick inserts and queries.
I have a string in the following format:
14:41:21 Dec 15, 2015 PST
I want to convert that to my server's local time, but I think I'm creating an extra step that can be avoided:
Dim testdate As Date
DateTime.TryParseExact(dateinput, "HH:mm:ss MMM dd, yyyy PST", CultureInfo.InvariantCulture, DateTimeStyles.None, testdate)
testdate = TimeZoneInfo.ConvertTimeToUtc(testdate, TimeZoneInfo.FindSystemTimeZoneById("Pacific Standard Time"))
testdate = testdate.ToLocalTime()
I've played around with this but always off by a couple hours either way, and the above is what I've found to work but just wanted to know if there was a better way. Also note it could be deployed on multiple servers, so I don't want to specify the timezone to convert it to explicitly, reason for localtime.
A few things:
If you're going to include fixed text in a format string, put it in single-tick quotes so it can't get misinterpreted as a formatting token. ('PST')
In the general case, time zone abbreviations should only be used for display purposes. They should not be parsed as input, as they could be ambiguous. For example, there are 5 different interpretations of CST. It might be US Central Standard Time, but it could also be China Standard Time, or one of the others. See the list on Wikipedia.
If you have a limited number of time zone abbreviations you want to support, then you could extract it from the string and use a dictionary, select/case statements, or conditional logic to map them. Just be certain you know the entire set of abbreviations you want to support and exactly which time zones you want them to map to. Also be sure to account for daylight time abbreviations, such as PDT.
Note that some older standards, such as RFC 2822 §4.3 indeed hardcode a few abbreviations, so you may choose to support those if you are parsing that particular format. (Yours is similar, but not quite a match.)
Your code is mostly ok, but you should probably check the result of TryParseExact. Otherwise you might as well use ParseExact which will throw an exception on failure instead of just returning false.
You could use ConvertTime with TimeZoneInfo.Local as the destination zone if you wanted to do the conversion in a single step. The code would be slightly smaller, though would have no technical differences.
Are you sure you really want to do this? Relying on the system's local time zone should usually should not be done in server-based applications. That's something more appropriate for desktop and mobile. In general, server-side code should not rely on the system time zone to be anything in particular. Avoid "local time" APIs, including DateTime.Now, TimeZoneInfo.Local, ToLocalTime, and ToUniversalTime (when it assumes the input is local time). It is better to supply the applicable time zone in your business logic or application configuration.
My table has a category 'timestamp' where the timestamps are formatted 2015-06-22 18:59:59
However, using DBVisualizer Free 9.2.8 and Vertica, when I try to pull up rows by timestamp with a
SELECT * FROM table WHERE timestamp = '2015-06-22 18:59:59';
(directly copy-pasting the stamp), nothing comes up. Why is this happening and is there a way around it?
FYI, saying "the timestamps are formatted 2015-06-22 18:59:59" is incorrect if you are indeed using a TIMESTAMP type. Such types have their own internal representation of a date-time value, almost always a count since epoch. In your case with Vertica, 8 bytes are used for such storage. The formatting of the date-time value happens when a string representation is generated. Never confuse the string representation with the date-time value. Conflating the two may well be related to your problem/confusion.
A few different thoughts about possible problems…
String Literals
Are you sure Vertica takes strings as timestamp literals? That format you used is common SQL format. But given that Vertica seems to be a specialized database, I would double-check that.
If strings are not allowed, you may need to call some kind of function to transform the string into a date-time values.
Fractional Second
As the comment by Martin Smith points out, the doc for Timestamp-related data types in Vertica 7.1 says those types can have a fractional second to resolution of microseconds. That means up to 6 decimal places of a fraction.
So if you are searching for "2015-06-22 18:59:59" but the stored value is "2015-06-22 18:59:59.012345", no match on the query.
Half-Open
The fractional seconds issue described above is often the cause of problems people have when handling a span of time. If you naïvely try to pinpoint the ending time, you are likely to have problems. Seeing the "59:59" in your example string makes me think this applies to you.
The better approach to spans of time is "Half-Open" (or Half-Closed, whatever) where the beginning is inclusive while the ending is exclusive. Common notation for this is [). In comparison logic this means: value >= start AND value < stop. Notice the lack of EQUALS SIGN in the stop comparison. In English we would say "look for an hour's worth of invoices starting at 2:00 PM and going up to, but not including, 3:00 PM".
Half-Open for a week means Monday-Monday, for a month the first of one month to the first of the next month, and for a year the January 1 of one year to January 1 of the following year.
Half-Open means not using BETWEEN in SQL. SQL's BETWEEN has often be criticized. Instead do something like the following to look for an hour's worth of invoices. Notice the Z on the end of string literal which means "UTC time zone" ("Z" for "Zulu"). (But verify, as my SQL syntax may need fixing.)
SELECT *
FROM some_table_
WHERE invoice_received_ >= '2015-06-22 18:00:00Z'
AND invoice_received_ < '2015-06-22 19:00:00Z'
;
This query will catch any values such as '2015-06-22 18:59:59.654321" which seems to be eluding you.
Reserved Word
I hope you have not really named your table 'table' and your column 'timestamp'. Such use of keywords and reserved words can cause explicit errors or more subtle weird problems.
Tip: The easy way to avoid any of the over a thousand reserved words in various databases is to append a trailing underscore. The SQL standard explicitly promises to never using a trailing underscore in its reserved words. So use "timestamp_" rather than "timestamp". Another example: "invoice_" table and "received_" column. I recommend doing that as a habit on everything your name in SQL: columns, tables, constraints, indexes, and so on.
Time Zone
You are using the TIMESTAMP which is short for TIMESTAMP WITHOUT TIME ZONE. Or so I presume; the Vertica doc is vague but that is the common usage as seen in the Postgres doc, and may even be standard SQL.
Anyways, TIMESTAMP WITHOUT TIME ZONE is usually the wrong type for most business purposes. The WITH time zone is misnamed and often misunderstood as a consequence: It means "with respect for time zone" where data inputs that include an offset or other time zone information from UTC are adjusted to UTC during the INSERT/UPDATE operations. The WITHOUT type simply ignores any such offset or time zone information.
The WITHOUT type should only be used for the concept of a date-time generally without being tied to any one locality. For example, saying "Christmas this year starts at beginning of December 25, 2015". That means in any time zone rather than a specific time zone. Obviously Christmas starts earlier in Paris, for example, than in Montréal.
If you are timestamping legal documents such as invoices, or booking appointments with people across time zones, or scheduling shipments in various localities, you should be using WITH time zone type.
So back to your possible problem: Test how Vertica or your client app or your database driver is handling your input string. It may be adjusting time zones as part of the parsing of the string using your client machine’s current default time zone. When sent to the database, that value will not match the stored value if during storage no adjustment to UTC was made.
Tip: Generally best practice is to do all your storage and business logic in UTC, adjusting to local time zones only where expected by user.
We had this programming discussion on Freenode and this question came up when I was trying to use a VARCHAR(255) to store a Date Variable in this format: D/MM/YYYY. So the question is why is it so bad to use a VARCHAR to store a date. Here are the advantages:
Its faster to code. Previously I used DATE, but date formatting was a real pain.
Its more power hungry to use string than Date? Who cares, we live in the Ghz era.
Its not ethically correct (lolwut?) This is what the other user told me...
So what would you prefer to use to store a date? SQL VARCHAR or SQL DATE?
Why not put screws in with a hammer?
Because it isn't the right tool for the job.
Some of the disadvantages of the VARCHAR version:
You can't easily add / subtract days to the VARCHAR version.
It is harder to extract just month / year.
There is nothing stopping you putting non-date data in the VARCHAR column in the database.
The VARCHAR version is culture specific.
You can't easily sort the dates.
It is difficult to change the format if you want to later.
It is unconventional, which will make it harder for other developers to understand.
In many environments, using VARCHAR will use more storage space. This may not matter for small amounts of data, but in commercial environments with millions of rows of data this might well make a big difference.
Of course, in your hobby projects you can do what you want. In a professional environment I'd insist on using the right tool for the job.
When you'll have database with more than 2-3 million rows you'll know why it's better to use DATETIME than VARCHAR :)
Simple answer is that with databases - processing power isn't a problem anymore. Just the database size is because of HDD's seek time.
Basically with modern harddisks you can read about 100 records / second if they're read in random order (usually the case) so you must do everything you can to minimize DB size, because:
The HDD's heads won't have to "travel" this much
You'll fit more data in RAM
In the end it's always HDD's seek times that will kill you. Eg. some simple GROUP BY query with many rows could take a couple of hours when done on disk compared to couple of seconds when done in RAM => because of seek times.
For VARCHAR's you can't do any searches. If you hate the way how SQL deals with dates so much, just use unix timestamp in 32 bit integer field. You'll have (basically) all advantages of using SQL DATE field, you'll just have to manipulate and format dates using your choosen programming language, not SQL functions.
Two reasons:
Sorting results by the dates
Not sensitive to date formatting changes
So let's take for instance a set of records that looks like this:
5/12/1999 | Frank N Stein
1/22/2005 | Drake U. La
10/4/1962 | Goul Friend
If we were to store the data your way, but sorted on the dates in assending order SQL will respond with the resultset that looks like this:
1/22/2005 | Drake U. La
10/4/1962 | Goul Friend
5/12/1999 | Frank N. Stein
Where if we stored the dates as a DATETIME, SQL will respond correctly ordering them like this:
10/4/1962 | Goul Friend
5/12/1999 | Frank N. Stein
1/22/2005 | Drake U. La
Additionally, if somewhere down the road you needed to display dates in a different format, for example like YYYY-MM-DD, then you would need to transform all your data or deal with mixed content. When it's stored as a SQL DATE, you are forced to make the transform in code, and very likely have one spot to change the format to display all dates--for free.
Between DATE/DATETIME and VARCHAR for dates I would go with DATE/DATETIME everytime. But there is a overlooked third option. Storing it as a INTEGER unsigned!
I decided to go with INTEGER unsigned in my last project, and I am really satisfied with making that choice instead of storing it as a DATE/DATETIME. Because I was passing along dates between client and server it made the ideal type for me to use. Instead of having to store it as DATE and having to convert back every time I select, I just select it and use it however I want it. If you want to select the date as a "human-readable" date you can use the FROM_UNIXTIME() function.
Also a integer takes up 4 bytes while DATETIME takes up 8 bytes. Saving 50% storage.
The sorting problem that Berin proposes is also solved using integer as storage for dates.
I'd vote for using the date/datetime types, just for the sake of simplicity/consistency.
If you do store it as a character string, store it in ISO 8601 format:
http://www.iso.org/iso/date_and_time_format
http://xml.coverpages.org/ISO-FDIS-8601.pdf
http://www.cl.cam.ac.uk/~mgk25/iso-time.html
Among other things, ISO 8601 date/time string (A) collate properly, (B) are human readable, (C) are locale-indepedent, and (D) are readily convertable to other formats. To crib from the ISO blurb, ISO 8601 strings offer
representations for the following:
Date
Time of the day
Coordinated universal time (UTC)
Local time with offset to UTC
Date and time
Time intervals
Recurring time intervals
Representations can be in one of two formats: a basic format
that has a minimal number of characters and an extended format
that adds characters to enhance human readability. For example,
the third of January 2003 can be represented as either 20030103
or 2003-01-03.
[and]
offer the following advantages over many of the locally used
representations:
Easily readable and writeable by systems
Easily comparable and sortable
Language independent
Larger units are written in front of smaller units
For most representations the notation is short and of constant length
One last thing: If all you need to do is store a date, then storing it in the ISO 8601 short form YYYYMMDD in a char(8) column takes no more storage than a datetime value (and you don't need to worry about the 3 millisecond gap between the last tick of the one day and the first tick of the next. But that's a matter for another discussion. If you break it up into 3 columns — YYYY char(4), MM char(2), DD char(2) you'll use up the same amount of storage, and get more options for indexing. Even better, store the fields as a short for yyyy (4 bytes), and a tinyint for each of MM and DD — now you're down to 6 bytes for the date. The drawback, of course, to decomposing the date components into their constituent parts is that conversion to proper date/time data types is complicated.