Hive equivalent of Teradata statement - sql

I am trying to convert a Teradata query to Hive
WHERE visit_date BETWEEN (CURRENT_DATE-194) AND (CURRENT_DATE)
where visit_date is a string of format yyyy-mm-dd.
CURRENT_DATE is valid in Hive but CURRENT_DATE-194 is giving error.
How can I do it in Hive?

Got the solution by using
visit_date BETWEEN date_sub(CURRENT_DATE,194) AND CURRENT_DATE

To get data of past 194 days in Hive;
Try below query:
select * from table_1 where visit_date > date_sub(from_unixtime(unix_timestamp()), 194);
Note: TIMESTAMP is milliseconds
unix_timestamp is in seconds

Related

timestamp string to timestamp in sql

Date data saved from stripe start_date as string timestamp like "1652789095".
Now I want to filter with this timestamp string form last 12 months.
what should I do ?
how can I filter with this timestamp string?
These are some examples - I'm sure there are plenty of options that would work.
convert to date
select *
from Table
where
to_timestamp(cast(start_date as int)::date > date_add(now(), interval -1 year);
work with unix timestamps
-- approx 1 year ago, by way of example
select *
from Table
where
start_date > '1621253095';
-- exactly one year ago, calculated dynamically
select *
from Table
where
start_date >
cast(unix_timestamp(date_add(now(), interval -1 year)) as varchar);
I'm not a MySQL guy really so forgive any syntax errors and fix up the sql as needed to work in MySQL.
Resources:
PostgreSQL: how to convert from Unix epoch to date?
https://www.postgresonline.com/article_pfriendly/3.html

Extract date, hour from the utc time

I would like to extract the date & hour from UTC time from the below table in bigquery. I have used timestamp for getting the date or time using the below code. I would like to apply the code for the entire column. How to apply timestamp for the entire column? Can you please assist with it?
SELECT EXTRACT(HOUR FROM TIMESTAMP "2020-05-03 16:49:47.583494")
My data is like this
I want result like this:
You can do it this way:
SELECT my_column AS original_value,
DATE_FORMAT(STR_TO_DATE(my_column, "%Y-%m-%d %H:%i:%s.%f UTC"), "%e/%m/%Y") AS date,
DATE_FORMAT(STR_TO_DATE(my_column, "%Y-%m-%d %H:%i:%s.%f UTC"), "%l%p") AS hour
FROM my_table;
I am assuming that the column is VARCHAR, that's why I am converting it to DATE.
Output:
Demo:
You can check the demo here.
Edit:
My initial thought was that OP wanted the query for MySQL (probably BigQuery is based on that). But it turns out that BigQuery is not based on MySQL. So you can use FORMAT_TIMESTAMP in BigQuery, this is how the query would look:
SELECT Occurrence AS original_value,
FORMAT_TIMESTAMP("%e/%m/%Y", Occurrence) AS date,
FORMAT_TIMESTAMP("%l%p", Occurrence) AS hour
FROM mytable

Hive SELECT records from 1 hour ago

I have a hive table that contains a column called timestamp. The timestamp is a bigint field generated from java System.currenttimemillis(). I suppose it should be in UTC. Right now I am trying to select records from 1 hour ago. I know in MySQL you can do something like:
SELECT * FROM table WHERE datetimefield >= DATE_SUB(NOW(), INTERVAL 1 HOUR)
In hive, it seems like NOW() is missing. I did some searching and find unix_timestamp(). I should be able to get the current UTC time in milliseconds by doing a unix_timestamp()*1000.
So if i want to get records from 1 hour ago I am thinking about doing something like:
SELECT * FROM hivetable WHERE datetimefield >= (unix_timestamp()*1000-3600000);
Can someone suggest if it's the right way to approach this problem? Also what if I want to select like 1 day ago? Seems inconvenient to convert that to milliseconds. Any help or suggested readings will be highly appreciated. Thanks in advance for your help.
Yes unix_timestamp() gets you the seconds elapsed since Unix epoch. You can subtract 60*60*1000 milliseconds and compare your field to get the desired records.
For Hive 1.2.0 and higher you can use current_timestamp
select *
from hivetable
where
datetimefield >= ((unix_timestamp()*1000) - 3600000);
For 1 day,convert the milliseconds to date format and use date_sub
select *
from hivetable
where
from_unixtime(unix_timestamp(datetimefield,'MM-dd-yyyy HH:mm:ss')) >=
date_sub(from_unixtime(unix_timestamp()),1);

HIVE - date_format( your_date_column, '%Y-%m-%d %H' )

I'm trying to achieve the MySQL equivalent of date_format( your_date_column, '%Y-%m-%d %H' ) as my_date in Hive. I've tried a few options from Hive date formatting but can't get the format right. I haven't found anything that has helped me yet.
Could I please request someone who may have already bumped into this situation or knows how to do it please?
Recent version of Hive have a date_format() function, it just uses java formatting codes instead of C. Try date_format(your_date_column, 'yyyy-MM-dd HH')
To convert date to given string format you have to use from_unixtime() function of hive
from_unixtime(bigint unixtime[, string format]) converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone.
The sample input and output can be seen from below image:
The final query is
select from_unixtime(unix_timestamp(),'yyyy-MM-dd HH') as my_date from table1;
where table1 is the table name present in my hive database.
I hope this help you to achieve date_format( your_date_column, '%Y-%m-%d %H' ) !!!
Let's say you have a column 'birth_day_H' in your table which is in string format,
you should use the following query to filter using birth_day_H
date_Format(birth_day_H, 'yyyy-MM-dd HH')
You can use it in a query in the following way
select * from yourtable
where
date_Format(birth_day_H, 'yyyy-MM-dd HH') = '2019-04-16 10';
I worked around this by using concat(substr(your_date_column,1,13), ':00')
In case the date column has a reserved keyword such as timestamp as in my case, this works - concat(substr(`timestamp`,1,13), ':00')

Convert YYYYMMDD String to Date in Impala

I'm using SQL in Impala to write this query. I'm trying to convert a date string, stored in YYYYMMDD format, into a date format for the purposes of running a query like this:
SELECT datadate,
session_info
FROM database
WHERE datadate >= NOW() - INTERVAL 5 DAY
ORDER BY datadate DESC;
Since the >= NOW() - INTERVAL 5 DAY code won't work with the YYYYMMDD string, I'd like to find a way to convert that into a date format that will work with this type of query. My thought is that it should look something like this (based on similar questions about other SQL query editors), but it's not working in Impala:
SELECT datadate,
session_info,
convert(datetime, '20141008', 102) AS session_date
FROM database
WHERE session_date >= NOW() - INTERVAL 5 DAY
ORDER BY session_date DESC;
Anyone know how to do this in Impala?
EDIT:
I finally found a working solution to the problem. None of the attempts using configurations of CAST or CONVERT would work in Impala, but the below query solves the problem and is fully operational, allowing date math to be performed on a column containing string values:
SELECT datadate,
session_info
FROM database
WHERE datadate >= from_unixtime(unix_timestamp(now() - interval 5 days), 'yyyyMMdd')
GROUP BY datadate
ORDER BY datadate DESC;
Native way:
to_timestamp(cast(date_number AS STRING), 'yyyyMMdd')
See Timestamp Literals on [Link Updated 2020-08-24]:
https://docs.cloudera.com/cdp-private-cloud-base/7.1.3/impala-sql-reference/topics/impala-literals.html
You need to add the dashes to your string so Impala will be able to convert it into a date/timestamp. You can do that with something like:
concat_ws('-', substr(datadate,1,4), substr(datadate,5,2), substr(datadate,7) )
which you can use instead of datadate in your expression.
To ignore hour/minute/second... , use from_timestamp, result 2020-01-01.
select from_timestamp(cast('2020-01-01 01:01:01.000000' as TIMESTAMP),'yyyy-MM-dd');