Hive: Reduce millisecond precision in timestamp - hive

In Hive, is there anyway to reduce millisecond precision (not rounding)?
For example I have the following timestamp with millisecond in 3 decimal places
2019-10-08 21:21:39.163
I want to get a timestamp exactly in 1 decimal place (remove two last milliseconds 63):
2019-10-08 21:21:39.1
I only get so far as to turning the timestamp into a decimal with one value precision:
cast(floor(cast(2019-10-08 21:21:39.163 AS double)/0.100)*0.100 AS decimal(16,1)) AS updatetime
This gives:
1570537299.1
The problem: I do not know how to turn the above value back to a timestamp in millisecond. Even better, if there is a better way to reduce timestamp precision from 3 to 1 decimal place, I will appreciate it.
The reason I have to cast the above code into decimal is because if I only do:
floor(cast(2019-10-08 21:21:39.163 AS double)/0.100)*0.100 AS exec_time
This gives something like:
1570537299.100000001
This is not good, since I need to join this table X with another table Y.
Table X has timestamp like 2019-10-08 21:21:39.163.
But table Y stores data in each 100ms interval, whose timestamp is exactly: 2019-10-08 21:21:39.1
The trailing 00000001 would prevent the timestamp from Table X to map exactly with Table Y

If you need to remove last two milliseconds, use substr() and cast to timestamp again if necessary. For example:
with your_data as
(
select timestamp('2019-10-08 21:21:39.163') as original_timestamp --your example
)
select original_timestamp,
substr(original_timestamp,1,21) truncated_string,
timestamp(substr(original_timestamp,1,21)) truncated_timestamp --this may be not necessary, timestamp is compatible w string
from your_data
Returns:
original_timestamp truncated_string truncated_timestamp
2019-10-08 21:21:39.163 2019-10-08 21:21:39.1 2019-10-08 21:21:39.1

Related

How do I convert a text column to a timestamp column in PostgreSQL?

I have a text column "A" in table X and I want to convert it to a timestamp column without a timezone in table Y.
How can I perform this conversion in PostgreSQL?
Input: Table X
A (text)
2006-08-30 21:30:00
21:30:00
Desired Output: Table Y
Output (timestamp)
2006-08-30 21:30:00
21:30:00
The TIME data type by itself is pretty much useless, which occurred first 23:00:00 or 01:00:00, without the date part you cannot tell. However you can get it what you want by applying cast twice: first string to timestamp then timestamp to time. Using Postgres cast operator (::) that becomes: (see demo)
select ts_as_string::timestamp::time from table_x;
NOTES: First, heed the comment by #FrankHeikens and use the proper data type for your timestamp and not text. You can virtually guarantee that at some point someone will insert an invalid value in that column (like 'N/A' perhaps) and Postgres will not stop it after all you declared it valid. Second, Postgres 9.5 reached end-of-life in Feb 2021. You should update your version.

Is there a native technique in PostgreSQL to force "timestamp without time zone" to not include milliseconds?

I am using PostgreSQL 9.6.17. (Migrating from MySQL)
A java program writes dates inside a table. The date formats is the following:
2019-01-01 09:00:00
But it can also be 2019-01-01 09:00:00.00 or 2019-01-01 09:00:00.000 when inserted in the database, which messes up my date management in my program when retrieved.
On insertion, I would like all the date to have the very same format: 2019-01-01 09:00:00. The datatype used by the column is timestamp without a time zone.
How can I tell postgresql to not input milliseconds in timestamp without timezone via configuration or SQL query ?
This data types doc does not provide any information about that.
Quote from the manual
time, timestamp, and interval accept an optional precision value p which specifies the number of fractional digits retained in the seconds field. By default, there is no explicit bound on precision. The allowed range of p is from 0 to 6
So just define your column as timestamp(0), e.g.:
create table foo
(
some_timestamp timestamp(0)
);
If you have an existing table with data, you can simply ALTER the column:
alter table some_table
alter column some_timestamp type timestamp(0);
If you now insert a timestamp with milliseconds, the value will be rounded to remove the milliseconds.
Note that technically you still have milliseconds in the stored value, but they are always set to 0
You can cast:
mytimestamptz::timestamp(0)
This will round the result to the nearest second. If you want to truncate instead:
date_trunc('second', mytimestamp)
Retrieve as a timestamp and in the application querying the database, manage the precision however you want. eg. via JDBC you'll get a Java LocalDateTime object, in Python you'll get a datetime object.
If you want to retrieve timestamps as strings, there are lots of formatting options available
SELECT to_char(when, 'YYYY-MM-DD HH24:MI:SS') FROM mytable
Drop any milliseconds on input by specifying the precision option to your timestamp type:
CREATE TABLE mytable (..., when TIMESTAMP(0));

date or string type to bigint

How can I convert date like '2018-03-31' into bigint in Hive?
What Gordon said.
If you have Javascript timestamps, keep in mind that they are simply the number of milliseconds since 1970-01-01T00:00:00.000Z in 64-bit floating point. They can be converted to BIGINT easily. If you're storing those timestamps in DATETIME(3) or TIMESTAMP(3) data types, use UNIX_TIMESTAMP(date)*1000 to get a useful BIGINT millisecond value.
If you only care about dates (not times) you can use TO_DAYS() to get an integer number of days since 0000-01-01 (in the Gregorian calendar; if you're an historian of antiquity and care about the Julian calendar, this approach has problems. If you don't know what I'm talking about, you don't need to worry.) But INT will suffice for these day numbers; BIGINT is overkill.
You could do:
select year(date) * 10000 + month(date) * 100 + day(date)
This produces an integer that represents the date.
If you want a Unix timestamp (seconds since 1970-01-01), then:
select unix_timestamp(date)
You can use unix_timestamp function which converts date or timestamp to a Unix timestamp and returns as a bigint.
Example query:
select unix_timestamp('2018-03-31', 'yyyy-MM-dd');
Output:
+--------------------------------------+
|unix_timestamp(2018-03-31, yyyy-MM-dd)|
+--------------------------------------+
| 1522434600|
+--------------------------------------+
Note: Tested this code in Hive 1.2.0

Does BigQuery support nanoseconds in any of its date time data type?

I have done some research on DATETIME and TIMESTAMP data type and I understand that they support date time to be represented in milliseconds and microseconds
like the one below,
YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DDDDDD]]
But, is it possible to load/represent values that has nanoseconds precision?
like,
YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DDDDDDDDD]]
Actually, BigQuery supports up to microsecond precision, and not only millisecond.
No, I don't believe it supports nanosecond precision (maybe a Googler will correct me there), and I certainly can't see anything in the docs. However, this is stated:
An error is produced if the string_expression is invalid, has more
than six subsecond digits (i.e. precision greater than microseconds),
or represents a time outside of the supported timestamp range.
Thus, this will work:
SELECT CAST('2017-01-01 00:00:00.000000' AS TIMESTAMP)
But this will not ("Could not cast literal "2017-01-01 00:00:00.000000000" to type TIMESTAMP"):
SELECT CAST('2017-01-01 00:00:00.000000000' AS TIMESTAMP)
For more context on timestamp precision, consider the supported range of BigQuery timestamps, which is 0001-01-01 00:00:00.000000 to 9999-12-31 23:59:59.999999. With microsecond precision, if you anchor timestamps to the Unix epoch, this means that you can represent the start of this range with the integer value -62135596800000000 and the end with 253402300799999999 (these are the values that you get if you apply the UNIX_MICROS function to the timestamps above).
Now suppose that we wanted nanosecond precision, but we still wanted to be able to express the timestamp as an integer relative to the Unix epoch. The minimum and maximum timestamps would be represented as -62135596800000000000 and 253402300799999999. Looking at the range of int64, though, we would need a wider integer type, since the min and max of int64 are -9223372036854775808 and 9223372036854775807. Alternatively, we would need to restrict the range of timestamps to approximately 1677-09-21 00:12:43 to 2262-04-11 23:47:16, assuming I did the math correctly. Given that nanosecond precision generally isn't that useful, having the wider timestamp range while still being able to use a 64-bit representation is the best compromise.

Earliest Timestamp supported in PostgreSQL

I work with different databases in a number of different time zones (and periods of time) and one thing that normally originates problems, is the date/time definition.
For this reason, and since a date is a reference to a starting value, to keep track of how it was calculated, I try to store the base date; i.e.: the minimum date supported in that particular computer/database;
If I am seeing it well, this depends on the RDBMS and on the particular storage of the type.
In SQL Server, I found a couple of ways of calculating this "base date";
SELECT CONVERT(DATETIME, 0)
or
SELECT DATEADD(MONTH, 0, 0 )
or even a cast like this:
DECLARE #300 BINARY(8)
SET #300 = 0x00000000 + CAST(300 AS BINARY(4))
set #dt=(SELECT CAST(#300 AS DATETIME) AS BASEDATE)
print CAST(#dt AS NVARCHAR(100))
(where #dt is a datetime variable)
My question is, is there a similar way of calculating the base date in PostgreSQL, i.e.: the value that is the minimum date supported and is on the base of all calculations?
From the description of the date type, I can see that the minimum date supported is 4713 BC, but is there a way of getting this value programmatically (for instance as a formatted date string), as I do in SQL Server?
The manual states the values as:
Low value: 4713 BC
High value: 294276 AD
with the caveat, as Chris noted, that -infinity is also supported.
See the note later in the same page in the manual; the above is only true if you are using integer timestamps, which are the default in all vaguely recent versions of PostgreSQL. If in doubt:
SHOW integer_datetimes;
will tell you. If you're using floating point datetimes instead, you get greater range and less (non-linear) precision. Any attempt to work out the minimum programatically must cope with that restriction.
PostgreSQL does not just let you cast zero to a timestamp to get the minimum possible timestamp, nor would this make much sense if you were using floating point datetimes. You can use the julian date conversion function, but this gives you the epoch not the minimum time:
postgres=> select to_timestamp(0);
to_timestamp
------------------------
1970-01-01 08:00:00+08
(1 row)
because it accepts negative values. You'd think that giving it negative maxint would work, but the results are surprising to the point where I wonder if we've got a wrap-around bug lurking here:
postgres=> select to_timestamp(-922337203685477);
to_timestamp
---------------------------------
294247-01-10 12:00:54.775808+08
(1 row)
postgres=> select to_timestamp(-92233720368547);
to_timestamp
---------------------------------
294247-01-10 12:00:54.775808+08
(1 row)
postgres=> select to_timestamp(-9223372036854);
to_timestamp
------------------------------
294247-01-10 12:00:55.552+08
(1 row)
postgres=> select to_timestamp(-922337203685);
ERROR: timestamp out of range
postgres=> select to_timestamp(-92233720368);
to_timestamp
---------------------------------
0954-03-26 09:50:36+07:43:24 BC
(1 row)
postgres=> select to_timestamp(-9223372036);
to_timestamp
------------------------------
1677-09-21 07:56:08+07:43:24
(1 row)
(Perhaps related to the fact that to_timestamp takes a double, even though timestamps are stored as integers these days?).
I think it's possibly wisest to just let the timestamp range be any timestamp you don't get an error on. After all, the range of valid timestamps is not continuous:
postgres=> SELECT TIMESTAMP '2000-02-29';
timestamp
---------------------
2000-02-29 00:00:00
(1 row)
postgres=> SELECT TIMESTAMP '2001-02-29';
ERROR: date/time field value out of range: "2001-02-29"
LINE 1: SELECT TIMESTAMP '2001-02-29';
so you can't assume that just because a value is between two valid timestamps, it is its self valid.
The earliest timestamp is '-infinity'. This is a special value. The other side is 'infinity' which is later than any specific timestamp.
I don't know of a way of getting this programaticly. I would just use the value hard-coded the way you might use NULL. That means you have to handle infinities on the client side though.