Does BigQuery support nanoseconds in any of its date time data type? - google-bigquery

I have done some research on DATETIME and TIMESTAMP data type and I understand that they support date time to be represented in milliseconds and microseconds
like the one below,
YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DDDDDD]]
But, is it possible to load/represent values that has nanoseconds precision?
like,
YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DDDDDDDDD]]

Actually, BigQuery supports up to microsecond precision, and not only millisecond.
No, I don't believe it supports nanosecond precision (maybe a Googler will correct me there), and I certainly can't see anything in the docs. However, this is stated:
An error is produced if the string_expression is invalid, has more
than six subsecond digits (i.e. precision greater than microseconds),
or represents a time outside of the supported timestamp range.
Thus, this will work:
SELECT CAST('2017-01-01 00:00:00.000000' AS TIMESTAMP)
But this will not ("Could not cast literal "2017-01-01 00:00:00.000000000" to type TIMESTAMP"):
SELECT CAST('2017-01-01 00:00:00.000000000' AS TIMESTAMP)

For more context on timestamp precision, consider the supported range of BigQuery timestamps, which is 0001-01-01 00:00:00.000000 to 9999-12-31 23:59:59.999999. With microsecond precision, if you anchor timestamps to the Unix epoch, this means that you can represent the start of this range with the integer value -62135596800000000 and the end with 253402300799999999 (these are the values that you get if you apply the UNIX_MICROS function to the timestamps above).
Now suppose that we wanted nanosecond precision, but we still wanted to be able to express the timestamp as an integer relative to the Unix epoch. The minimum and maximum timestamps would be represented as -62135596800000000000 and 253402300799999999. Looking at the range of int64, though, we would need a wider integer type, since the min and max of int64 are -9223372036854775808 and 9223372036854775807. Alternatively, we would need to restrict the range of timestamps to approximately 1677-09-21 00:12:43 to 2262-04-11 23:47:16, assuming I did the math correctly. Given that nanosecond precision generally isn't that useful, having the wider timestamp range while still being able to use a 64-bit representation is the best compromise.

Related

Converting timestamp on whole table in bigquery

I have this table which stores millions of rows of data. This data has a date that indicates when was the data entered. I store the data in NUMERIC schemas with EPOCH UNIX as the format. However, I wanted to convert them to human date (yyyy-mm-dd hh:mm:ss) and later sort them by date not queried date.
However, it took me so long to find a suitable way. Here's my attempt.
I used SELECT CAST(DATE(timestamp) AS DATE) AS CURR_DT FROM dataset.table but it gave me this error:
No matching signature for function DATE for argument types: NUMERIC. Supported signatures: DATE(TIMESTAMP, [STRING]); DATE(DATETIME); DATE(INT64, INT64, INT64) at [1:13]
I used this method BigQuery: convert epoch to TIMESTAMP but still didn't fully understand
I'm a novice in coding so I hope you guys understand the situation. Thanks!
If I am understanding your question correctly you would like to take a numeric EPOCH time that is stored as an integer and convert it to a timestamp?
If so you can use the following in BigQuery Standard SQL:
select TIMESTAMP_SECONDS(1606048220)
It gives the output of:
2020-11-22 12:30:20 UTC
Documentation
If you only want the date component, then you would convert to a date after converting to a timestamp. Presumably you have seconds, so you would use TIMESTAMP_SECONDS() -- but there are similar functions for milliseconds and microseconds.
For just the date:
select date(timestamp_seconds(col))
Note that this removes the time component.

Convert DOUBLE column to TIMESTAMP in Firebird database

I have a Firebird database that saves the datetime field as a DOUBLE. I have created a ColdFusion datasource connection, so I can query the data remotely. While the rest of the data is being returned correctly, the datetime field is unreadable. I have tried using CAST and CONVERT to no avail. How can I convert this to a timestamp?
An example of the data stored is: 43016.988360
You can't just convert a DOUBLE PRECISION to a TIMESTAMP, not without explicitly defining how you want it mapped and writing that conversion yourself (or hoping there is an existing third-party UDF that does this for you).
A TIMESTAMP in Firebird is a date + time represented as an 8 byte value, where the date range is from January 1, 1 a.d. to December 31, 9999 a.d. and the time range is 00:00 to 23:59.9999 (so, 100 microsecond precision).
A DOUBLE PRECISION is - usually - the wrong type for storing date and time information, and as you haven't provided how that double value should be interpreted, we can't help you other than saying: there is no default method in Firebird to do this.
Based on the comments below, it looks like the value is a ColdFusion date value stored as double precision with the number of days since December 30th 1899, see also why is ColdFusion's Epoch Time Dec 30, 1899?. If this is really the case, then you can use the following for conversion to a TIMESTAMP:
select timestamp'1899-12-30 00:00' + 43016.988360 from rdb$database
Which will yield the value 2017-10-08 23:43:14.304. Using the value 43182.4931754 from the comments will yield 2018-03-23 11:50:10.354. That is a millisecond off from your expectation, but that might be a rounding/presentation issue, eg I get the exact expected date if I use 43182.49317539 instead.
I would strongly suggest you carefully test this with known values.

SAS timestamp from scientific notation to yyyy/mm/dd hh:mm:ss

Problem:
My timestamp is being displayed in scientific notation. I would like to display the column without scientific notation, and create a second column formatted as a long date, yyyy/mm/dd hh:mm:ss.
Steps taken:
I've already converted the column from a UNIX Epoch (1960) timestamp to SAS time (1970) timestamp. But scientific notation persists. I tried date20. doesn't do the trick, either.
Timestamp in Scientific Notation
My current insufficient code fails to format the timestamp column as a date.
proc print data=heart._23a;
format timestamp date9.;
run;
Results:
It results in no errors, but it redimensions my matrix to a 1x3. I need to obtain a matrix of the same dimension, just with a reformatted timestamp. I appreciate any help, but please keep it simple, I am in unknown territory!
datetime17. is the standard timestamp format in SAS, though you have many other choices as well. ymddttm. is the closest to what you're looking for, I believe.
One important distinction here: SAS has two concepts, date and datetime. date is number of days since 1/1/1960 and has no time part, while datetime is number of seconds since 1/1/1960 00:00:00 and has both time and date. You can use datepart to convert datetime -> date, or dhms to convert date -> datetime.
Your question also seems to get the two epochs backwards. UNIX epoch is 1970. SAS epoch is 1960.
Finally, if you want to display the raw number of seconds, use w.d format instead of bestw.d format - format timestampvar 14. for example, where w is number of characters (digits) wide total including decimal.

Earliest Timestamp supported in PostgreSQL

I work with different databases in a number of different time zones (and periods of time) and one thing that normally originates problems, is the date/time definition.
For this reason, and since a date is a reference to a starting value, to keep track of how it was calculated, I try to store the base date; i.e.: the minimum date supported in that particular computer/database;
If I am seeing it well, this depends on the RDBMS and on the particular storage of the type.
In SQL Server, I found a couple of ways of calculating this "base date";
SELECT CONVERT(DATETIME, 0)
or
SELECT DATEADD(MONTH, 0, 0 )
or even a cast like this:
DECLARE #300 BINARY(8)
SET #300 = 0x00000000 + CAST(300 AS BINARY(4))
set #dt=(SELECT CAST(#300 AS DATETIME) AS BASEDATE)
print CAST(#dt AS NVARCHAR(100))
(where #dt is a datetime variable)
My question is, is there a similar way of calculating the base date in PostgreSQL, i.e.: the value that is the minimum date supported and is on the base of all calculations?
From the description of the date type, I can see that the minimum date supported is 4713 BC, but is there a way of getting this value programmatically (for instance as a formatted date string), as I do in SQL Server?
The manual states the values as:
Low value: 4713 BC
High value: 294276 AD
with the caveat, as Chris noted, that -infinity is also supported.
See the note later in the same page in the manual; the above is only true if you are using integer timestamps, which are the default in all vaguely recent versions of PostgreSQL. If in doubt:
SHOW integer_datetimes;
will tell you. If you're using floating point datetimes instead, you get greater range and less (non-linear) precision. Any attempt to work out the minimum programatically must cope with that restriction.
PostgreSQL does not just let you cast zero to a timestamp to get the minimum possible timestamp, nor would this make much sense if you were using floating point datetimes. You can use the julian date conversion function, but this gives you the epoch not the minimum time:
postgres=> select to_timestamp(0);
to_timestamp
------------------------
1970-01-01 08:00:00+08
(1 row)
because it accepts negative values. You'd think that giving it negative maxint would work, but the results are surprising to the point where I wonder if we've got a wrap-around bug lurking here:
postgres=> select to_timestamp(-922337203685477);
to_timestamp
---------------------------------
294247-01-10 12:00:54.775808+08
(1 row)
postgres=> select to_timestamp(-92233720368547);
to_timestamp
---------------------------------
294247-01-10 12:00:54.775808+08
(1 row)
postgres=> select to_timestamp(-9223372036854);
to_timestamp
------------------------------
294247-01-10 12:00:55.552+08
(1 row)
postgres=> select to_timestamp(-922337203685);
ERROR: timestamp out of range
postgres=> select to_timestamp(-92233720368);
to_timestamp
---------------------------------
0954-03-26 09:50:36+07:43:24 BC
(1 row)
postgres=> select to_timestamp(-9223372036);
to_timestamp
------------------------------
1677-09-21 07:56:08+07:43:24
(1 row)
(Perhaps related to the fact that to_timestamp takes a double, even though timestamps are stored as integers these days?).
I think it's possibly wisest to just let the timestamp range be any timestamp you don't get an error on. After all, the range of valid timestamps is not continuous:
postgres=> SELECT TIMESTAMP '2000-02-29';
timestamp
---------------------
2000-02-29 00:00:00
(1 row)
postgres=> SELECT TIMESTAMP '2001-02-29';
ERROR: date/time field value out of range: "2001-02-29"
LINE 1: SELECT TIMESTAMP '2001-02-29';
so you can't assume that just because a value is between two valid timestamps, it is its self valid.
The earliest timestamp is '-infinity'. This is a special value. The other side is 'infinity' which is later than any specific timestamp.
I don't know of a way of getting this programaticly. I would just use the value hard-coded the way you might use NULL. That means you have to handle infinities on the client side though.

DateTime2 vs DateTime in SQL Server

Which one:
datetime
datetime2
is the recommended way to store date and time in SQL Server 2008+?
I'm aware of differences in precision (and storage space probably), but ignoring those for now, is there a best practice document on when to use what, or maybe we should just use datetime2 only?
The MSDN documentation for datetime recommends using datetime2. Here is their recommendation:
Use the time, date, datetime2 and
datetimeoffset data types for new
work. These types align with the SQL
Standard. They are more portable.
time, datetime2 and datetimeoffset
provide more seconds precision.
datetimeoffset provides time zone
support for globally deployed
applications.
datetime2 has larger date range, a larger default fractional precision, and optional user-specified precision. Also depending on the user-specified precision it may use less storage.
DATETIME2 has a date range of "0001 / 01 / 01" through "9999 / 12 / 31" while the DATETIME type only supports year 1753-9999.
Also, if you need to, DATETIME2 can be more precise in terms of time; DATETIME is limited to 3 1/3 milliseconds, while DATETIME2 can be accurate down to 100ns.
Both types map to System.DateTime in .NET - no difference there.
If you have the choice, I would recommend using DATETIME2 whenever possible. I don't see any benefits using DATETIME (except for backward compatibility) - you'll have less trouble (with dates being out of range and hassle like that).
Plus: if you only need the date (without time part), use DATE - it's just as good as DATETIME2 and saves you space, too! :-) Same goes for time only - use TIME. That's what these types are there for!
datetime2 wins in most aspects except (old apps Compatibility)
larger range of values
better Accuracy
smaller storage space (if optional user-specified precision is specified)
please note the following points
Syntax
datetime2[(fractional seconds precision=> Look Below Storage Size)]
Precision, scale
0 to 7 digits, with an accuracy of 100ns.
The default precision is 7 digits.
Storage Size
6 bytes for precision less than 3;
7 bytes for precision 3 and 4.
All other precision require 8 bytes.
DateTime2(3) have the same number of digits as DateTime but uses 7 bytes of storage instead of 8 byte (SQLHINTS- DateTime Vs DateTime2)
Find more on datetime2(Transact-SQL MSDN article)
image source :
MCTS Self-Paced Training Kit (Exam 70-432): Microsoft® SQL Server® 2008 - Implementation and Maintenance
Chapter 3:Tables -> Lesson 1: Creating Tables -> page 66
I concurr with #marc_s and #Adam_Poward -- DateTime2 is the preferred method moving forward. It has a wider range of dates, higher precision, and uses equal or less storage (depending on precision).
One thing the discussion missed, however...
#Marc_s states: Both types map to System.DateTime in .NET - no difference there. This is correct, however, the inverse is not true...and it matters when doing date range searches (e.g. "find me all records modified on 5/5/2010").
.NET's version of Datetime has similar range and precision to DateTime2. When mapping a .net Datetime down to the old SQL DateTime an implicit rounding occurs. The old SQL DateTime is accurate to 3 milliseconds. This means that 11:59:59.997 is as close as you can get to the end of the day. Anything higher is rounded up to the following day.
Try this :
declare #d1 datetime = '5/5/2010 23:59:59.999'
declare #d2 datetime2 = '5/5/2010 23:59:59.999'
declare #d3 datetime = '5/5/2010 23:59:59.997'
select #d1 as 'IAmMay6BecauseOfRounding', #d2 'May5', #d3 'StillMay5Because2msEarlier'
Avoiding this implicit rounding is a significant reason to move to DateTime2. Implicit rounding of dates clearly causes confusion:
Strange datetime behavior in SQL Server
http://bytes.com/topic/sql-server/answers/578416-weird-millisecond-part-datetime-data-sql-server-2000-a
SQL Server 2008 and milliseconds
http://improve.dk/archive/2011/06/16/getting-bit-by-datetime-rounding-or-why-235959-999-ltgt.aspx
http://milesquaretech.com/Blog/post/2011/09/12/DateTime-vs-DateTime2-SQL-is-Rounding-My-999-Milliseconds!.aspx
Almost all the Answers and Comments have been heavy on the Pros and light on the Cons. Here's a recap of all Pros and Cons so far plus some crucial Cons (in #2 below) I've only seen mentioned once or not at all.
PROS:
1.1. More ISO compliant (ISO 8601) (although I don’t know how this comes into play in practice).
1.2. More range (1/1/0001 to 12/31/9999 vs. 1/1/1753-12/31/9999) (although the extra range, all prior to year 1753, will likely not be used except for ex., in historical, astronomical, geologic, etc. apps).
1.3. Exactly matches the range of .NET’s DateTime Type’s range (although both convert back and forth with no special coding if values are within the target type’s range and precision except for Con # 2.1 below else error / rounding will occur).
1.4. More precision (100 nanosecond aka 0.000,000,1 sec. vs. 3.33 millisecond aka 0.003,33 sec.) (although the extra precision will likely not be used except for ex., in engineering / scientific apps).
1.5. When configured for similar (as in 1 millisec not "same" (as in 3.33 millisec) as Iman Abidi has claimed) precision as DateTime, uses less space (7 vs. 8 bytes), but then of course, you’d be losing the precision benefit which is likely one of the two (the other being range) most touted albeit likely unneeded benefits).
CONS:
2.1. When passing a Parameter to a .NET SqlCommand, you must specify System.Data.SqlDbType.DateTime2 if you may be passing a value outside the SQL Server DateTime’s range and/or precision, because it defaults to System.Data.SqlDbType.DateTime.
2.2. Cannot be implicitly / easily converted to a floating-point numeric (# of days since min date-time) value to do the following to / with it in SQL Server expressions using numeric values and operators:
2.2.1. add or subtract # of days or partial days. Note: Using DateAdd Function as a workaround is not trivial when you're needing to consider multiple if not all parts of the date-time.
2.2.2. take the difference between two date-times for purposes of “age” calculation. Note: You cannot simply use SQL Server’s DateDiff Function instead, because it does not compute age as most people would expect in that if the two date-times happens to cross a calendar / clock date-time boundary of the units specified if even for a tiny fraction of that unit, it’ll return the difference as 1 of that unit vs. 0. For example, the DateDiff in Day’s of two date-times only 1 millisecond apart will return 1 vs. 0 (days) if those date-times are on different calendar days (i.e. “1999-12-31 23:59:59.9999999” and “2000-01-01 00:00:00.0000000”). The same 1 millisecond difference date-times if moved so that they don’t cross a calendar day, will return a “DateDiff” in Day’s of 0 (days).
2.2.3. take the Avg of date-times (in an Aggregate Query) by simply converting to “Float” first and then back again to DateTime.
NOTE: To convert DateTime2 to a numeric, you have to do something like the following formula which still assumes your values are not less than the year 1970 (which means you’re losing all of the extra range plus another 217 years. Note: You may not be able to simply adjust the formula to allow for extra range because you may run into numeric overflow issues.
25567 + (DATEDIFF(SECOND, {d '1970-01-01'}, #Time) + DATEPART(nanosecond, #Time) / 1.0E + 9) / 86400.0 – Source: “ https://siderite.dev/blog/how-to-translate-t-sql-datetime2-to.html “
Of course, you could also Cast to DateTime first (and if necessary back again to DateTime2), but you'd lose the precision and range (all prior to year 1753) benefits of DateTime2 vs. DateTime which are prolly the 2 biggest and also at the same time prolly the 2 least likely needed which begs the question why use it when you lose the implicit / easy conversions to floating-point numeric (# of days) for addition / subtraction / "age" (vs. DateDiff) / Avg calcs benefit which is a big one in my experience.
Btw, the Avg of date-times is (or at least should be) an important use case. a) Besides use in getting average duration when date-times (since a common base date-time) are used to represent duration (a common practice), b) it’s also useful to get a dashboard-type statistic on what the average date-time is in the date-time column of a range / group of Rows. c) A standard (or at least should be standard) ad-hoc Query to monitor / troubleshoot values in a Column that may not be valid ever / any longer and / or may need to be deprecated is to list for each value the occurrence count and (if available) the Min, Avg and Max date-time stamps associated with that value.
Here is an example that will show you the differences in storage size (bytes) and precision between smalldatetime, datetime, datetime2(0), and datetime2(7):
DECLARE #temp TABLE (
sdt smalldatetime,
dt datetime,
dt20 datetime2(0),
dt27 datetime2(7)
)
INSERT #temp
SELECT getdate(),getdate(),getdate(),getdate()
SELECT sdt,DATALENGTH(sdt) as sdt_bytes,
dt,DATALENGTH(dt) as dt_bytes,
dt20,DATALENGTH(dt20) as dt20_bytes,
dt27, DATALENGTH(dt27) as dt27_bytes FROM #temp
which returns
sdt sdt_bytes dt dt_bytes dt20 dt20_bytes dt27 dt27_bytes
------------------- --------- ----------------------- -------- ------------------- ---------- --------------------------- ----------
2015-09-11 11:26:00 4 2015-09-11 11:25:42.417 8 2015-09-11 11:25:42 6 2015-09-11 11:25:42.4170000 8
So if I want to store information down to the second - but not to the millisecond - I can save 2 bytes each if I use datetime2(0) instead of datetime or datetime2(7).
DateTime2 wreaks havoc if you are an Access developer trying to write Now() to the field in question. Just did an Access -> SQL 2008 R2 migration and it put all the datetime fields in as DateTime2. Appending a record with Now() as the value bombed out. It was okay on 1/1/2012 2:53:04 PM, but not on 1/10/2012 2:53:04 PM.
Once character made the difference. Hope it helps somebody.
Interpretation of date strings into datetime and datetime2 can be different too, when using non-US DATEFORMAT settings. E.g.
set dateformat dmy
declare #d datetime, #d2 datetime2
select #d = '2013-06-05', #d2 = '2013-06-05'
select #d, #d2
This returns 2013-05-06 (i.e. May 6) for datetime, and 2013-06-05 (i.e. June 5) for datetime2. However, with dateformat set to mdy, both #d and #d2 return 2013-06-05.
The datetime behavior seems at odds with the MSDN documentation of SET DATEFORMAT which states: Some character strings formats, for example ISO 8601, are interpreted independently of the DATEFORMAT setting. Obviously not true!
Until I was bitten by this, I'd always thought that yyyy-mm-dd dates would just be handled right, regardless of the language / locale settings.
Old Question... But I want to add something not already stated by anyone here... (Note: This is my own observation, so don't ask for any reference)
Datetime2 is faster when used in filter criteria.
TLDR:
In SQL 2016 I had a table with hundred thousand rows and a datetime column ENTRY_TIME because it was required to store the exact time up to seconds. While executing a complex query with many joins and a sub query, when I used where clause as:
WHERE ENTRY_TIME >= '2017-01-01 00:00:00' AND ENTRY_TIME < '2018-01-01 00:00:00'
The query was fine initially when there were hundreds of rows, but when number of rows increased, the query started to give this error:
Execution Timeout Expired. The timeout period elapsed prior
to completion of the operation or the server is not responding.
I removed the where clause, and unexpectedly, the query was run in 1 sec, although now ALL rows for all dates were fetched. I run the inner query with where clause, and it took 85 seconds, and without where clause it took 0.01 secs.
I came across many threads here for this issue as datetime filtering performance
I optimized query a bit. But the real speed I got was by changing the datetime column to datetime2.
Now the same query that timed out previously takes less than a second.
cheers
while there is increased precision with datetime2, some clients doesn't support date, time, or datetime2 and force you to convert to a string literal. Specifically Microsoft mentions "down level" ODBC, OLE DB, JDBC, and SqlClient issues with these data types and has a chart showing how each can map the type.
If value compatability over precision, use datetime
According to this article, if you would like to have the same precision of DateTime using DateTime2 you simply have to use DateTime2(3). This should give you the same precision, take up one fewer bytes, and provide an expanded range.
I just stumbled across one more advantage for DATETIME2: it avoids a bug in the Python adodbapi module, which blows up if a standard library datetime value is passed which has non-zero microseconds for a DATETIME column but works fine if the column is defined as DATETIME2.
As the other answers show datetime2 is recommended due to smaller size and more precision, but here are some thoughts on why NOT to use datetime2 from Nikola Ilic:
lack of (simple) possibility to do basic math operations with dates, like GETDATE()+1
every time you are doing comparisons with DATEADD or DATEDIFF, you will finish with implicit data conversion to datetime
SQL Server can’t use statistics properly for Datetime2 columns, due to a way data is stored that leads to non-optimal query plans, which decrease the performance
I think DATETIME2 is the better way to store the date, because it has more efficiency than
the DATETIME. In SQL Server 2008 you can use DATETIME2, it stores a date and time, takes 6-8 bytes to store and has a precision of 100 nanoseconds. So anyone who needs greater time precision will want DATETIME2.
Accepted answer is great, just know that if you are sending a DateTime2 to the frontend - it gets rounded to the normal DateTime equivalent.
This caused a problem for me because in a solution of mine I had to compare what was sent with what was on the database when resubmitted, and my simple comparison '==' didn't allow for rounding. So it had to be added.
datetime2 is better
datetime range : 1753-01-01 through 9999-12-31 , datetime2 range : 0001-01-01 through 9999-12-31
datetime Accuracy : 0.00333 second , datetime2 Accuracy : 100 nanoseconds
datetime get 8 bytes , datetime2 get 6 to 8 bytes depends on precisions
(6 bytes for precision less than 3 , 7 bytes for precision 3 or 4 , All other precision require 8 bytes, Click and Look at the below picture)
Select ValidUntil + 1
from Documents
The above SQL won't work with a DateTime2 field.
It returns and error "Operand type clash: datetime2 is incompatible with int"
Adding 1 to get the next day is something developers have been doing with dates for years. Now Microsoft have a super new datetime2 field that cannot handle this simple functionality.
"Let's use this new type that is worse than the old one", I don't think so!