Hive select query with current date in where clause

Hive select query with current date in where clause - hive

I am new to hive. I have a table like following:
EntriesRunDate (
id string,
run_date string
);
In the above table a entry will be processed when its associated run_date is today's date (where run_date is in YYYYMMDD format).
To select such rows, I have written following hive query:
select * from EntriesRunDate where run_date = (select from_unixtime(unix_timestamp(),'YYYYMMDD'));
But while running above query, I am getting following error:
FAILED: SemanticException Line 0:-1 Unsupported SubQuery Expression ''YYYYMMDD'': Only SubQuery expressions that are top level conjuncts are allowed
Though I think there is a way to do this by setting a variable in command line and re-using it within hive, but I want to do everything in hive. I am not sure also if it's doable.

You don't need a sub-query and can compare the value directly as
where run_date = from_unixtime(unix_timestamp(current_date),'yyyyMMdd')
or using date_format
where run_date = date_format(current_date,'yyyyMMdd')

Related

Convert hive function(space, split) to Redshift syntax to generate a date series

I have a segment of code written in Hiveql that generates 3 columns, a index, a date, and a range of dates:
drop table if exists date_list;
create temporary table date_list as
with seq as(
select date_add('2020-02-27',s.i) as dt
from(
select posexplode(split(space(datediff('2020-12-01','2020-02-27')),' ')) as (i,x)
)s
)
select *,
row_number() over() index_n,
int(REGEXP_REPLACE(substring(dt,6,10),'\\-','')) as date_index
from seq;
Output:
.
.
.
Now, I need to convert to Redshift syntax due to a database migration, functions such as space, split do not exist in Redshift. How do I convert the code to be able to run in Redshift?
Running the same code generates such error:
[Code: 500310, SQL State: 42601] [Amazon](500310) Invalid operation: syntax error at or near ")"
Position: 169;
Thanks!

This is done with a recursive cte. I answered a similar question a bit ago.
trying to create a date table in Redshift
Generating the other columns from the list of dates is straight forward.

Why does Hive throw me an error while using Order by date?

I am trying to write a query In hive and I am seeing the following error. "Error while compiling statement:
FAILED: SemanticException Failed to breakup Windowing invocations into
Groups. At least 1 group must only depend on input columns. Also check
for circular dependencies. Underlying error: Primitve type DATE not
supported in Value Boundary expression.
I used the same query in Oracle sql and it works fine. How can I write a valid order by in Hive?
Select
Email,
FIRST_VALUE(C.abc_cust_id) Over (Partition By Lower(email) Order By C.regt_date
Desc)As CUSTOMER_ID
from table X

Because some primitive types support (it was no DATE type before) was added after windowing and windowing was not fixed. See HIVE-13973
As a workaround, try to cast DATE as STRING:
Over (Partition By Lower(email) Order By cast(C.regt_date as string) Desc)

Using an UDF to query a table in Hive

I have the following UDF available on Hive to convert a time bigint to date,
to_date(from_utc_timestamp(from_unixtime(cast(listed_time/1000 AS bigint)),'PST'))
I want to use this UDF to query a table on a specific date. Something like,
SELECT * FROM <table_name>
WHERE date = '2020-03-01'
ORDER BY <something>
LIMIT 10

I would suggest to change the logic: avoid applying the function to the column being filtered, because it is an inefficient approach. The function needs to be invoked for every row, which prevents the query from benefiting an index.
On the other hand, you can simply convert the input date to a unix timestamp (possibly with an UDF). This should look like;
SELECT * FROM <table_name>
WHERE date = to_utc_timestamp('2020-03-01', 'PST') * 1000
ORDER BY <something>
LIMIT 10

Column ambigously defined when using Pivot with subquery

I am trying to run this pivot query to show the dates as columns which are in this format: "MM/DD/YYYY" and the occurrences of some kind of ID's in each date:
The column which contains the dates is "DATE_POSTED" -- DATA TYPE date
The column which contains the ID's is "ID_INST" -- DATA TYPE varchar2
Query:
SELECT *
FROM (SELECT ID_INST, DATE_POSTED
FROM total.table1) PIVOT XML (COUNT (DATE_POSTED)
FOR (DATE_POSTED)
IN (SELECT distinct DATE_POSTED
FROM total.table1));
The error which i'm receiving is ORA-00918: column ambiguously defined, I did some searches but I keep getting this error. Not sure if my approach is totally correct. P.S im using XML keyword due to the fact that it prompted: missing keyword
Current table:
Expected result:

Try the following:
SELECT *
FROM (SELECT ID_INST, TO_CHAR(DATE_POSTED, 'DD-Mon') DATE_POSTED
FROM TOTAL.TABLE1)
PIVOT XML (COUNT(DATE_POSTED)
FOR DATE_POSTED IN (ANY))
The poblem might be caused by the fact, that date also stores time information additional to the date.
So you get different values for DATE_POSTED, but the conversion to char leads to the same column name as the date format mask cuts the time information.

Hive doesn't pick up partition with the calculated partition key

My external table auto1_tracking_events_ext is partitioned on a column dt.
First i execute:
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;
When I run this query:
select count(*)
from auto1_tracking_events_ext
where dt = '2016-12-05';
It picks up the partition, creates maybe like 3 mappers and finishes in a couple of seconds.
However if i run this:
select count(*)
from auto1_tracking_events_ext
where dt = from_unixtime(unix_timestamp()-1*60*60*24, 'yyyy-MM-dd');
It does not pick up the partition and starts 413 mappers and takes quite some time to calculate.
For the time of posting this question:
hive> select from_unixtime(unix_timestamp()-1*60*60*24, 'yyyy-MM-dd');
OK
2016-12-05
Why does Hive not pick up the partition?
UPDATE:
Passing date string as hiveconf parameter (as shown below) does not help either.
hive -hiveconf date_yesterday=$(date --date yesterday "+%Y-%m-%d")
hive> select count(*) from auto1_tracking_events_ext where dt = ${hiveconf:date_yesterday};

Your last query with passing hiveconf variable should work as well if first query works, because variables are being substituted first and only after that query is being executed. It is one possible bug, you did not quote the variable. Try this:
hive -hiveconf date_yesterday=$(date --date yesterday "+%Y-%m-%d")
hive> select count(*) from auto1_tracking_events_ext where dt = '${hiveconf:date_yesterday}'; --single quotes here
Without quotes it is resolved like this where dt=2020-12-12 - this is wrong, it should be single quotes.
As for using unix_timestamp() - the function is not deterministic and prevents proper query optimization.
Use current_date or current_timestamp instead:
select count(*)
from auto1_tracking_events_ext
where dt = date_sub(current_date,1);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive select query with current date in where clause - hive

You don't need a sub-query and can compare the value directly as where run_date = from_unixtime(unix_timestamp(current_date),'yyyyMMdd') or using date_format where run_date = date_format(current_date,'yyyyMMdd')

Related

Convert hive function(space, split) to Redshift syntax to generate a date series

Why does Hive throw me an error while using Order by date?

Using an UDF to query a table in Hive

Column ambigously defined when using Pivot with subquery

Hive doesn't pick up partition with the calculated partition key

Categories

Resources