Hive - Partition Column Equal to Current Date - hive

I am trying to insert into a Hive table from another table that does not have a column for todays date. The partition I am trying to create is at the date level. What I am trying to do is something like this:
INSERT OVERWRITE TABLE table_2_partition
PARTITION (p_date = from_unixtime(unix_timestamp() - (86400*2) , 'yyyy-MM-dd'))
SELECT * FROM table_1;
But when I run this I get the following error:
"cannot recognize input near 'from_unixtime' '(' 'unix_timestamp' in constant"
If I query a table and make one of the columns that it work just fine. Any idea how to set the partition date to current system date in HiveQL?
Thanks in advance,
Craig

What you want here is Hive dynamic partitioning. This allows the decision for which partition each record is inserted into be determined dynamically as the record is selected. In your case, that decision is based on the date when you run the query.
To use dynamic partitioning your partition clause has the partition field(s) but not the value. The value(s) that maps to the partition field(s) is the value(s) at the end of the SELECT, and in the same order.
When you use dynamic partitions for all partition fields you need to ensure that you are using nonstrict for your dynamic partition mode (hive.exec.dynamic.partition.mode).
In your case, your query would look something like:
SET hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE table_2_partition
PARTITION (p_date)
SELECT
*
, from_unixtime(unix_timestamp() - (86400*2) , 'yyyy-MM-dd')
FROM table_1;

Instead of using unix_timestamp() and from_unixtime() functions, current_date() can used to get current date in 'yyyy-MM-dd' format.
current_date() is added in hive 1.2.0. official documentation
revised query will be :
SET hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE table_2_partition
PARTITION (p_date)
SELECT
*
, current_date()
FROM table_1;

I hope you are running a shell script and then you can store the current date in a variable. Then you create an empty table in Hive using beeline with just partition column. Once done then while inserting the data into partition table you can add that variable as your partition column and data will be inserted.

Related

How to modify CTAS query to append query results to table based on if new partition doesn't exist? - Athena

I have a query that I want to execute daily that's to be partitioned by the date it's executed. The results of this query should be appended to a the same table.
My idea was ideally having something similar to the CREATE TABLE IF NOT EXISTS command for adding data by a new partition every day to the existing table if the partition doesn't already exist, but I can't figure out how I'd be able to integrate this in my query.
My query:
CREATE TABLE IF NOT EXISTS db_name.table_name
WITH (
external_location = 's3://my-query-results-location/',
format = 'PARQUET',
parquet_compression = 'SNAPPY',
partitioned_by = ARRAY['date_executed'])
AS
SELECT
{columns_that_I_am_selecting_here_including_'date_executed'}
What this does is create a new table for the first day it's executed but nothing happens for subsequent days, I'm assuming because of the CREATE TABLE IF NOT EXISTS validating that the table already exists and not proceeding with the logic.
Is there a way to modify my query to create a table for the first day executed and append the results by a new partition for each subsequent day?
I'm quite sure ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION would not apply to my use case here as I'm running a CTAS query.
You can simply use INSERT INTO existing_table SELECT....
Presumably your table is already partitioned, so include that partition column in the SELECT and Amazon Athena will automatically put the data in the correct directory.
For example, you might include hte column like this: SELECT ... CURRENT_DATE as date_executed
See: INSERT INTO - Amazon Athena

Error inserting data into Hive partitioned table

I'm trying to insert data into a Hive table with partition, the partition condition is yesterday's date in yyyyMMdd format, and I want to do that dynamically so I'm generating it using a query. The date query works fine in my other select statement, however when inserting it's throwing an error like this:
Error picture
Could you guys help me? Thank you and have a nice day.
You can create a view to load data or tweak your sql to do it. Make sure you have this date column as last column and partitioned by this column in table.
INSERT OVERWRITE TABLE dwh_vts.staging_f_vts_sale_revenue PARTITION(`date`)
SELECT 'N350','10','4500000.000000',DATE_FORMAT(date_sub(CURRENT_DATE,1),'yyyyMMdd')
union
SELECT 'T280','21','3760000.000000',,DATE_FORMAT(date_sub(CURRENT_DATE,1),'yyyyMMdd')
Or you can put above SQL into a view and then insert overwrite from the view.

copy data from one table to another partitioning table

%hive
INSERT INTO NEWPARTITIONING partition(year(L_SHIPDATE)) select * from LINEITEM;
I want to copy the data from line item to the partitioning table NEWPARTITIONING but I got the following error:
line 1:54 cannot recognize input near ')' 'select' '*' in statement.
Don't understand why this error occurs. Can anyone give me some ideas
Hive supports DYNAMIC or STATIC partition loading.
Partition specification allows only column name or column list (for dynamic partition load), if you need function, then calculate it in the select, see example below:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table NEWPARTITIONING partition (partition_column)
select i.col1,
...
i.colN,
year(L_SHIPDATE) as partition_column --Partition should be the last in column list
from LINEITEM i
Or you can specify static partition in the form partition(partition_column='value'), in this case you do not need to select partition expression:
insert into table NEWPARTITIONING partition (partition_column='2020-01-01')
select i.col1,
...
i.colN
from LINEITEM i
where year(L_SHIPDATE) = '2020-01-01'
In both cases - STATIC and DYNAMIC, Hive does not support functions in partition specification. Functions can be calculated in the query (dynamic load) or calculated in a wrapper shell and passed as a parameter to the script (for static partition).

hive partition by time

I want to implement
alter table dos_sourcedata add partition (data = to_date (current_timestamp ()));
in hive
Run this statement at a specific time every day.
but this is always wrong.
If you want to create empty partition using alter table, use value, not expression, like this:
alter table mytable add partition (partition_date='2020-04-01');
You can create partitions dynamically when loading data using insert overwrite table partition, in this case you can use expression in the query:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table mytable partition (partition_date)
select
col_1,
...
col_n,
current_date --partition column is the last one
from ...
Use current_date instead of to_date (current_timestamp ()).

Insert into hive table with dynamic partition only writing first partition to disk and not all

I am trying to write data into a hive table and failing. I get a error at the end of Cycle_dt =null and only one partition being writing. It is the first day's.
set hive.auto.convert.join=true;
set hive.optimize.mapjoin.mapreduce=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set mapred.map.tasks = 100;
Insert into table dynamic.dynamic_test_avro_v1 partition(cycle_dt)
Select date_time as CYCLE_TS, case when evar1 is not null or length(trim(evar1)) > 0 then cast(unbase64(substring(evar1,6,12)) as string) end NRNM ,
prop14 as state, evar8 as FLOW_TYPE, prop25 as KEY, pagename PAGE_NM,
partition_dt as cycle_dt from source.std_avro_v1 WHERE
(partition_dt = '2016-10-02' AND partition_dt < '2016-10-07')
AND (
evar8='google');
I am unsure what is going on here. I have a date range setup to only get those date's as partitions.
From hive documentation:
In the dynamic partition inserts, users can give partial partition specifications, which means just specifying the list of partition column names in the PARTITION clause. The column values are optional. If a partition column value is given, we call this a static partition, otherwise it is a dynamic partition. Each dynamic partition column has a corresponding input column from the select statement. This means that the dynamic partition creation is determined by the value of the input column. The dynamic partition columns must be specified last among the columns in the SELECT statement and in the same order in which they appear in the PARTITION() clause.
So, in your query, partition_dt is the value of the dynamic partition. However, you impose the following constraints: (partition_dt = '2016-10-02' AND partition_dt < '2016-10-07') which translate to partition_dt = '2016-10-02' and eventually it creates a single partition.
You probably wanted a range of dates: (partition_dt >= '2016-10-02' AND partition_dt < '2016-10-07')