Date comparison in Hive - hive

I'm working with Hive and I have a table structured as follows:
CREATE TABLE t1 (
id INT,
created TIMESTAMP,
some_value BIGINT
);
I need to find every row in t1 that is less than 180 days old. The following query yields no rows even though there is data present in the table that matches the search predicate.
select *
from t1
where created > date_sub(from_unixtime(unix_timestamp()), 180);
What is the appropriate way to perform a date comparison in Hive?

How about:
where unix_timestamp() - created < 180 * 24 * 60 * 60
Date math is usually simplest if you can just do it with the actual timestamp values.
Or do you want it to only cut off on whole days? Then I think the problem is with how you are converting back and forth between ints and strings. Try:
where created > unix_timestamp(date_sub(from_unixtime(unix_timestamp(),'yyyy-MM-dd'),180),'yyyy-MM-dd')
Walking through each UDF:
unix_timestamp() returns an int: current time in seconds since epoch
from_unixtime(,'yyyy-MM-dd') converts to a string of the given format, e.g. '2012-12-28'
date_sub(,180) subtracts 180 days from that string, and returns a new string in the same format.
unix_timestamp(,'yyyy-MM-dd') converts that string back to an int
If that's all getting too hairy, you can always write a UDF to do it yourself.

Alternatively you may also use datediff. Then the where clause would be
in case of String timestamp (jdbc format) :
datediff(from_unixtime(unix_timestamp()), created) < 180;
in case of Unix epoch time:
datediff(from_unixtime(unix_timestamp()), from_unixtime(created)) < 180;

I think maybe it's a Hive bug dealing with the timestamp type. I've been trying to use it recently and getting incorrect results.
If I change your schema to use a string instead of timestamp, and supply values in the
yyyy-MM-dd HH:mm:ss
format, then the select query worked for me.
According to the documentation, Hive should be able to convert a BIGINT representing epoch seconds to a timestamp, and that all existing datetime UDFs work with the timestamp data type.
with this simple query:
select from_unixtime(unix_timestamp()), cast(unix_timestamp() as
timestamp) from test_tt limit 1;
I would expect both fields to be the same, but I get:
2012-12-29 00:47:43 1970-01-16 16:52:22.063
I'm seeing other weirdness as well.

TIMESTAMP is milliseconds
unix_timestamp is in seconds
You need to multiply the RHS by 1000.
where created > 1000 * date_sub(from_unixtime(unix_timestamp()), 180);

After reviewing this and referring to Date Difference less than 15 minutes in Hive I came up with a solution. While I'm not sure why Hive doesn't perform the comparison effectively on dates as strings (they should sort and compare lexicographically), the following solution works:
FROM (
SELECT id, value,
unix_timestamp(created) c_ts,
unix_timestamp(date_sub(from_unixtime(unix_timestamp()), 180), 'yyyy-MM-dd') c180_ts
FROM t1
) x
JOIN t1 t ON x.id = t.id
SELECT to_date(t.Created),
x.id, AVG(COALESCE(x.HighestPrice, 0)), AVG(COALESCE(x.LowestPrice, 0))
WHERE unix_timestamp(t.Created) > x.c180_ts
GROUP BY to_date(t.Created), x.id ;

Related

Convert ISO DateTime to UnixTimestamp in sql where clause - DBeaver client, postgresSQL

I have a column named 'CreatedAt' in postgres (DBeaver client) that is an int8 datatype and holds a unix timestamp value. Example: 1659347651689
I am writing a query that I'd like to input an ISO datet ime in the where clause and have it automatically convert to find the applicable records.
For example:
Normally, I'd write:
select * from table1 where CreatedAt = '2022-08-01 09:54:11.000'
I can't do that because the CreatedAt column value is 1659347651689. Is there a way to have it automatically convert and locate the record with that datetime?
I tried this:
`select * from table1 where CreatedAt = date("CreatedAt",strtotime('2022-08-01 09:53:27.000'))`
but strtotime doesn't exist (guessing because it's a Python command. I tried date, dateadd, but no luck)
Your data appears to be in milliseconds, so:
select to_timestamp(1659347651689/1000);
Yes thank you Jeroen Mostert and a_horse_with_no_name (great userid). After reading the links here, I got it.
If anyone else is looking, the answer is:
select * from table1 pfs
where timestamp 'epoch' + pfs."CreatedAt" /1000 * interval '1 second' = '2022-08-01 09:53:13.000'

Why do I get an incompatible value type for my column?

I am trying to calculate the difference between two dates in an oracle database using a JDBC connection. I followed the advice from this question using a query like this:
SELECT CREATE_DATE - CLOSED
FROM TRANSACTIONS;
and I get the following error:
Incompatible value type specified for
column:CREATE_DATE-CLOSED. Column Type = 11 and Value Type =
8.[10176] Error Code: 10176
What should I change so I can successfully calculate the difference between the dates?
note: CREATE_DATE and CLOSED both have TIMESTAMP type
The answer you found is related to date datatypes, but you are dealing with timestamps. While substracting two Oracle dates returns a number, substracting timestamps produces an interval datatype. This is probably not what you want, and, apparently, your driver does not properly handle this datatype.
For this use case one solution is to cast the timestamps to dates before substracting them:
select cast(create_date as date) - cast(closed as date) from transactions;
As it was mentioned, it seems that JDBC cannot work with the INTERVAL datatype. What about casting it with the EXTRACT function to the expected output as number? If you want number of seconds between those two timestamps, it would be:
SELECT EXTRACT(SECOND FROM (CREATE_DATE - CLOSED)) FROM TRANSACTIONS;
Here are list of options which might be used instead of SECOND:
https://docs.oracle.com/database/121/SQLRF/functions067.htm#SQLRF00639
When we subtract one date from another Oracle gives us the difference as a number: it's straightforward arithmetic. But when we subtract one timestamp from another - which is what you're doing - the result is an INTERVAL. Older versions of JDBC don't like the INTERVAL datatype (docs) .
Here are a couple of workarounds, depending on what you want to do with the result. The first is to calculate the number of seconds from the interval result. extract second from ... only gives us the numbers of seconds in the interval. This will be fine providing none of your intervals are more than fifty-nine seconds long. Longer intervals require us to extract minute, hour and even days. So that solution would be:
select t.*
, extract (day from (t.closed - t.create_date)) * 84600
+ extract (hour from (t.closed - t.create_date)) * 3600
+ extract (minute from (t.closed - t.create_date)) * 60
+ extract (second from (t.closed - t.create_date)) as no_of_secs
from transactions t
A second solution is to follow the advice in the JDBC mapping guide and turn the interval into a string:
select t.*
, cast ((t.closed - t.create_date) as varchar2(128 char)) as intrvl_str
from transactions t
The format of a string interval is verbose:INTERVAL'+000000001 04:40:59.710000'DAY(9)TO SECOND. This may not be useful in the Java side of the application. But with regex we can turn it into a string which can be converted into a Java 8 Duration object (docs) : PnDTnHnMn.nS.
select t.id
, regexp_replace(cast ((t.closed - t.create_date) as varchar2(128 char))
, 'INTERVAL''\+([0-9]+) ([0-9]{2}):([0-9]{2}):([0-9]{2})\.([0-9]+)''DAY\(9\)TO SECOND'
, 'P\1DT\2H\3M\4.\5S')
as duration
from transactions t
There is a demo on db<>fiddle

SELECT matching dates from a timestamp with time zone column

I have a timestamp with time zone column within which I'd like to run a query returning all matching dates. eg. I want all rows which have a timestamp with date 2019-09-30. I'm trying something like this but haven't been able to figure it out:
SELECT * FROM table WHERE
x='1277' AND
date='2019-09-30 21:40:01.316240 +00:00'::DATE;
There are two options:
range search:
WHERE timestampcol >= TIMESTAMPTZ '2019-09-30'
AND timestampcol < (TIMESTAMPTZ '2019-09-30' + INTERVAL '1 day')
The proper index to make this fast is
CREATE INDEX ON atable (timestampcol);
conversion to date:
WHERE CAST(timestampcol AS date) = '2019-09-30'
The proper index to make this fast is
CREATE INDEX ON atable ((CAST(timestampcol AS date)));
Both methods work equally well. The second method has a shorter WHERE clause, but a specialized index that maybe no other query can benefit from.
You can use such a collation among your date converted column value and fixed date value :
with tab( x, date ) as
(
select 1277, '2019-09-30 21:40:01.316240 +00:00'::timestamp with time zone union all
select 1278, '2019-09-29 21:40:01.316240 +00:00'::timestamp with time zone
)
select *
from tab
where date::date = date'2019-09-30';
Demo
I am not familiar with postgresql, but the following page https://www.postgresql.org/docs/9.1/datatype-datetime.html indicates that there is an optional precision value p that is used to specify the number of fractional digits retained in the seconds field. How was the data type configured for your date field? Is the timestamp column being stored as an eight byte integer or a floating-point number? If the later, the effective limit of precision might be less than 6. All of this info is on the above linked page.
You have several options, as they have been mentioned, and may vary depending on the data type of your field named "date".
For example,
Turn your field to date, returning format 'yyyy-mm-dd':
SELECT * FROM table WHERE x='1277' AND date::DATE='2019-09-30';
"date" just after field name.
Convert it to char and retrieve 10 characters:
SELECT * FROM table WHERE x='1277' AND LEFT(date::varchar,10)='2019-09-30';
Like the previous:
SELECT * FROM table WHERE x='1277' AND to_char(date,'yyyymmdd')='20190930';
And there are many others. For more specific info, you have to check PostgreSQL documentation to check which one is best for you or post more information in order we can guess more about your problem.
https://www.postgresql.org/docs/9.4/functions-datetime.html

Format int as date in presto SQL

I have an integer date column "date_created" storing values like...
20180527, 20191205, 20200208
And am wondering what the best way to parse as a date is so I could do something like this in a query...
select * from table where formatted(date_created) > formatted(date_created) - 90
(to return everything within the last 90 days)
I've found some similar examples that convert from date ints representing seconds or milliseconds, but none where the columns are essentially date strings stored as integers.
Appreciate any thoughts on the best way to achieve this
And am wondering what the best way to parse as a date is so I could do something like this in a query...
You can convert "date as a number" (eg. 20180527 for May 27, 2018) using the following:
cast to varchar
parse_datetime with appropriate format
cast to date (since parse_datetime returns a timestamp)
Example:
presto> SELECT CAST(parse_datetime(CAST(20180527 AS varchar), 'yyyyMMdd') AS date);
_col0
------------
2018-05-27
However, this is not necessarily the best way to query your data. By adapting your search conditions to the format of your data (and not vice versa), you can potentially benefit from predicate push down and partition pruning. See #GordonLinoff answer for information how to do this.
You can do the comparison in the world of integers or of dates. You might as well convert the current date minus 90 days to a number:
select t.*
from t
where date_created >= cast(date_format(current_date - interval '90 day',
'%Y%m%d'
) as int
);
the below query is index friendly for any database since it does not use function on indexed column
select * from table where date_created > timestamp (formatted(date) - 90)
In addition, suppose we have date in format 20211011_1234 and we want one month older date and want back the original format, we can use the following formatting to convert date to int and vice versa.
select cast(date_format(
CAST(parse_datetime(cast(
split_part('20211011_1234', '_', 1) as varchar), 'yyyyMMdd')
AS date) - interval '30' day ,'%Y%m%d') as int) as column_name

How to find select conversion failed value

I have a query from core of data which is nvarchar and all values are '00:00:00' format. I want to convert it into long. When I try to convert top 1000 it working fine but problem with all values. Query show in below
SELECT DATEDIFF(second, '00:00', CAST(TimeSpent AS time(7)))* cast(1000 as bigint) + RIGHT(CAST(TimeSpent AS time(7)),7) FROM [mtr].[MatterDocument]
The error statement is
Conversion failed when converting date and/or time from character string
How can I find which value failed to convert?
I suggest that there is some bad data in your MatterDocument table. SQL Server does not support regex searches, but fortunately its LIKE operator does support some primitive regex which we can use:
SELECT *
FROM [mtr].[MatterDocument]
WHERE TimeSpent NOT LIKE '[01][0-9]:[0-5][0-9]:[0-5][0-9]' AND
TimeSpent NOT LIKE '2[0-3]:[0-5][0-9]:[0-5][0-9]';
Demo
You may verify in the demo that bad, non acceptable, time strings are being flushed out. The above query should also work to flush out strings which maybe aren't even time values at all, and somehow made it into your table.
The best long term fix would be to correct your data at its source, and then bring the data into SQL Server as a bona fide date/time type.
Edit: TRY_CAST, as described by #Denis in his answer, might be another approach. But this would require SQL Server 2012 or later. The above query should still work in earlier versions.
Try to use TRY_CAST function to find the wrong rows (it returns NULL if it cannot convert the value)
SELECT c.TimeSpent, /*Any columns to identify rows */
FROM (
SELECT TimeSpent, /*Any columns to identify rows */
DATEDIFF(second, '00:00', TRY_CAST(TimeSpent AS time(7)))* cast(1000 as bigint)
+ RIGHT(TRY_CAST(TimeSpent AS time(7)),7) AS Converted
FROM [mtr].[MatterDocument]
) c
WHERE Converted IS NULL
You should find the bad values:
select timespent
from t
where try_cast(TimeSpent AS time(7)) is null;
This will enable you to find the bad values. They are probably times that exceed 23.
I would suggest doing the conversion more simply:
select (left(TimeSpent, 2) * 60 * 60 +
substring(TimeStpent, 4, 2) * 60 +
right(TimeSpent, 2)
) as seconds
This will do the conversion without the limitations of the SQL Server time data type.