Error inserting data into Hive partitioned table - hive

I'm trying to insert data into a Hive table with partition, the partition condition is yesterday's date in yyyyMMdd format, and I want to do that dynamically so I'm generating it using a query. The date query works fine in my other select statement, however when inserting it's throwing an error like this:
Error picture
Could you guys help me? Thank you and have a nice day.

You can create a view to load data or tweak your sql to do it. Make sure you have this date column as last column and partitioned by this column in table.
INSERT OVERWRITE TABLE dwh_vts.staging_f_vts_sale_revenue PARTITION(`date`)
SELECT 'N350','10','4500000.000000',DATE_FORMAT(date_sub(CURRENT_DATE,1),'yyyyMMdd')
union
SELECT 'T280','21','3760000.000000',,DATE_FORMAT(date_sub(CURRENT_DATE,1),'yyyyMMdd')
Or you can put above SQL into a view and then insert overwrite from the view.

Related

How to modify CTAS query to append query results to table based on if new partition doesn't exist? - Athena

I have a query that I want to execute daily that's to be partitioned by the date it's executed. The results of this query should be appended to a the same table.
My idea was ideally having something similar to the CREATE TABLE IF NOT EXISTS command for adding data by a new partition every day to the existing table if the partition doesn't already exist, but I can't figure out how I'd be able to integrate this in my query.
My query:
CREATE TABLE IF NOT EXISTS db_name.table_name
WITH (
external_location = 's3://my-query-results-location/',
format = 'PARQUET',
parquet_compression = 'SNAPPY',
partitioned_by = ARRAY['date_executed'])
AS
SELECT
{columns_that_I_am_selecting_here_including_'date_executed'}
What this does is create a new table for the first day it's executed but nothing happens for subsequent days, I'm assuming because of the CREATE TABLE IF NOT EXISTS validating that the table already exists and not proceeding with the logic.
Is there a way to modify my query to create a table for the first day executed and append the results by a new partition for each subsequent day?
I'm quite sure ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION would not apply to my use case here as I'm running a CTAS query.
You can simply use INSERT INTO existing_table SELECT....
Presumably your table is already partitioned, so include that partition column in the SELECT and Amazon Athena will automatically put the data in the correct directory.
For example, you might include hte column like this: SELECT ... CURRENT_DATE as date_executed
See: INSERT INTO - Amazon Athena

Hive - Create Table statement with 'select query' and 'partition by' commands

I want to create a Partitioned Table in Hive. I know to create a table structure first with the help of "Create table ... Partitioned by" command and then insert the data into the table using "Insert Into Table" command
But what I am trying to do is to combine these two commands into a single query like below but it is throwing errors.
CREATE TABLE test_extract AS
SELECT
*
FROM master_extract
PARTITION BY (year string
,month string)
;
Both Year and Month are two separate columns in the master_extract table.
Is there any way to achieve something like this ?
No, this is not possible, because Create Table As Select (CTAS) has restrictions:
The target table cannot be a partitioned table.
The target table cannot be an external table.
The target table cannot be a list bucketing table.
You can create table separately and then insert overwrite it.
There has been some development since this question was originally asked and answered. As per hive documentation: Starting with Hive 3.2.0, CTAS statements can define a partitioning specification for the target table (HIVE-20241).
You can also see the related ticket here. It has been resolved back in July 2018.
Therefore if your hive is of 3.2.0 or higher, then you can simply do
CREATE TABLE test_extract PARTITIONED BY (year string, month string) AS
SELECT
col1,
col2,
year,
month
FROM master_extract

Hive - Partition Column Equal to Current Date

I am trying to insert into a Hive table from another table that does not have a column for todays date. The partition I am trying to create is at the date level. What I am trying to do is something like this:
INSERT OVERWRITE TABLE table_2_partition
PARTITION (p_date = from_unixtime(unix_timestamp() - (86400*2) , 'yyyy-MM-dd'))
SELECT * FROM table_1;
But when I run this I get the following error:
"cannot recognize input near 'from_unixtime' '(' 'unix_timestamp' in constant"
If I query a table and make one of the columns that it work just fine. Any idea how to set the partition date to current system date in HiveQL?
Thanks in advance,
Craig
What you want here is Hive dynamic partitioning. This allows the decision for which partition each record is inserted into be determined dynamically as the record is selected. In your case, that decision is based on the date when you run the query.
To use dynamic partitioning your partition clause has the partition field(s) but not the value. The value(s) that maps to the partition field(s) is the value(s) at the end of the SELECT, and in the same order.
When you use dynamic partitions for all partition fields you need to ensure that you are using nonstrict for your dynamic partition mode (hive.exec.dynamic.partition.mode).
In your case, your query would look something like:
SET hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE table_2_partition
PARTITION (p_date)
SELECT
*
, from_unixtime(unix_timestamp() - (86400*2) , 'yyyy-MM-dd')
FROM table_1;
Instead of using unix_timestamp() and from_unixtime() functions, current_date() can used to get current date in 'yyyy-MM-dd' format.
current_date() is added in hive 1.2.0. official documentation
revised query will be :
SET hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE table_2_partition
PARTITION (p_date)
SELECT
*
, current_date()
FROM table_1;
I hope you are running a shell script and then you can store the current date in a variable. Then you create an empty table in Hive using beeline with just partition column. Once done then while inserting the data into partition table you can add that variable as your partition column and data will be inserted.

Copy Contents of One Column Of a Table To another of a different Table SQL

I want to copy the content of one column in table A and replace the contents (not insert into it - the number of rows will be the same) of another column in another table.
I can't a where condition, the table has only just been created at this point with one empty timestamp column. it will be populated via pyodbc class after the timestamps have been added - this query will fill the timestamps for me
What is the SQL command for this?
Thanks!
After discussion, this is the query needed : INSERT INTO OCAT_test_table (DateTimeStamp) SELECT DateTimeStamp FROM DunbarGen

merging data from old table into new for a monthly archive

I have a sql statement to insert data into a table for archiving, but I need a merge statement to run on a monthyl basis to update the new table(2) with any data that changed in the old table(1) that should now be moved into archive.
Part of the issue is to remove the moved data from the old table. My insert is not doing that, but I need to have it to where the saved data is purged from the original table.
Is there a single sql statement that will move data out of one table into another in this way? Or does it need to be a two step operation?
the initial statement moved data depending on age and a few other relative factors.
insert is:
INSERT /*+ append */
INTO tab1
SELECT *
FROM tab2
WHERE (Postingdate < TO_DATE ('2001/07/01', 'yyyy/mm/dd')
OR jobname IS NULL)
AND STATUS <> '45';
All help appreciated...
The merge statement will let you do this in one statement by adding a delete statement in the update clause. See Oracle Documentation on Merge.
I think you should try this with a partition table. My idea is to create table which have range partition on date:
create table(id number primary key,name varchar,J_date date )
partition by range(J_date)(PARTITION one_mnth VALUES LESS THAN(sysdate-30)),
partition by range(J_date)(PARTITION one_mnth VALUES LESS THAN(maxvalue)));
then move that partition in to another table and and truncate that partition