hive partition by time - hive

I want to implement
alter table dos_sourcedata add partition (data = to_date (current_timestamp ()));
in hive
Run this statement at a specific time every day.
but this is always wrong.

If you want to create empty partition using alter table, use value, not expression, like this:
alter table mytable add partition (partition_date='2020-04-01');
You can create partitions dynamically when loading data using insert overwrite table partition, in this case you can use expression in the query:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table mytable partition (partition_date)
select
col_1,
...
col_n,
current_date --partition column is the last one
from ...
Use current_date instead of to_date (current_timestamp ()).

Related

How to modify CTAS query to append query results to table based on if new partition doesn't exist? - Athena

I have a query that I want to execute daily that's to be partitioned by the date it's executed. The results of this query should be appended to a the same table.
My idea was ideally having something similar to the CREATE TABLE IF NOT EXISTS command for adding data by a new partition every day to the existing table if the partition doesn't already exist, but I can't figure out how I'd be able to integrate this in my query.
My query:
CREATE TABLE IF NOT EXISTS db_name.table_name
WITH (
external_location = 's3://my-query-results-location/',
format = 'PARQUET',
parquet_compression = 'SNAPPY',
partitioned_by = ARRAY['date_executed'])
AS
SELECT
{columns_that_I_am_selecting_here_including_'date_executed'}
What this does is create a new table for the first day it's executed but nothing happens for subsequent days, I'm assuming because of the CREATE TABLE IF NOT EXISTS validating that the table already exists and not proceeding with the logic.
Is there a way to modify my query to create a table for the first day executed and append the results by a new partition for each subsequent day?
I'm quite sure ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION would not apply to my use case here as I'm running a CTAS query.
You can simply use INSERT INTO existing_table SELECT....
Presumably your table is already partitioned, so include that partition column in the SELECT and Amazon Athena will automatically put the data in the correct directory.
For example, you might include hte column like this: SELECT ... CURRENT_DATE as date_executed
See: INSERT INTO - Amazon Athena

How to create partitions (year,month,day) in hive from date column which have MM/dd/yyyy format

Data loaded on a daily basis.
Need to create a partition with the date column.
Date
3/15/2021 8:02:32 AM
12/21/2020 12:20:41 PM
You need to convert the table into a partition to the table. Then change the loading sql so that it inserts into the table properly.
Create a new table identical to original table and make sure the exclude partition column from list of columns and add it in partitioned by like below.
create table new_tab() partitioned by ( partition_dt string );
Load data into new_tab from original table. Make sure last column in your select clause is the partitioned col.
set hive.exec.dynamic.partition.mode=nonstrict;
insert into new_table partition(partition_dt )
select src.*, from_unixtime(unix_timestamp(dttm_column),'MM/dd/yyyy') as partition_dt from original_table src;
Drop original table and rename new_table as original table.
drop table original_table ;
alter table new_table rename to original_table ;

How to add comparing to ON CONFLICT () DO UPDATE

I need to check, if in table there are any operations with current user for today.
Usually I compare time in this way: timestamp > CURRENT_TIMESTAMP::date
Could you please help, how to do it in INSERT in ON CONFLICT () DO UDPATE?
INSERT INTO table (login, smth, timestamp)
VALUES ('username', 'smth', CURRENT_TIMESTAMP)
ON CONFLICT (login, timestamp) DO UPDATE
SET smth = 'smth'
timestamp = CURRENT_TIMESTAMP
Here will be exactly comparing of timestamp, but I need to check, if it's for today, like above: timestamp > CURRENT_TIMESTAMP::date
Thanks!
If you want to store the timestamp but have a unique constraint on the date, then you can do that easily in the most recent versions of Postgres using a computed column. This requires adding a new column which is the date into the table:
create table t (
login text,
smth text,
ts timestamp,
ts_date date generated always as (ts::date) stored
);
And then creating a unique constraint:
create unique index unq_t_login_timestamp on t(login, ts_date);
Now you can use on conflict:
INSERT INTO t (login, smth, ts)
VALUES ('username', 'smth', CURRENT_TIMESTAMP)
ON CONFLICT (login, ts_date) DO UPDATE
SET smth = 'smth',
ts = CURRENT_TIMESTAMP;
Here is the code in a db<>fiddle.
EDIT:
It is better to eschew the computed column and just use:
create unique index unq_t_login_timestamp on t(login, (timestamp::date));
If you can use CTE, see here.
In case of your question, the query is like below:
(However, I'm not clear what "timestamp > CURRENT_TIMESTAMP::date" means.)
with
"data"("w_login","w_smth","w_timestamp") as (
select 'username2'::text, 'smth'::text, CURRENT_TIMESTAMP
),
"update" as (
update "table" set ("smth","timestamp")=("w_smth","w_timestamp") from "data"
where "login"="w_login" and "w_timestamp">CURRENT_TIMESTAMP::date
returning *
)
insert into "table"
select * from "data"
where not exists (select * from "update");
DB Fiddle

Oracle incremental query in a table with no ID or timestamp

I need to regularly extract data from an Oracle 11 table using sqlplus. For example, I need every day to extract the new rows inserted into that table.
On a table with a primary key such as RECORD_ID (assuming it is inserted incrementally), that query would be:
SELECT * from TABLE WHERE RECORD_ID > &LAST_RECORD_ID_FROM_PREVIOUS_QUERY
On a table with a RECORD_DATE timestamp, this could similarly done like:
SELECT * from TABLE WHERE RECORD_DATE > &LAST_RECORD_DATE_FROM_PREVIOUS_QUERY
My question is: how do you do this when you have no timestamps and no incremental column you could use? Can this be achieved with ROWID?
One way would be to enable flashback and then you could do:
SELECT * FROM table_name
MINUS
SELECT * FROM table_name AS OF TIMESTAMP SYSTIMESTAMP - INTERVAL '1' DAY;
As I suspected there isn't any easy solution. It has to be one of:
Adding an identity or timestamp column
Do a diff using flashback
Add a trigger on insert on the table
Unfortunately none of which is practical in my environment. Case closed!

Hive - Partition Column Equal to Current Date

I am trying to insert into a Hive table from another table that does not have a column for todays date. The partition I am trying to create is at the date level. What I am trying to do is something like this:
INSERT OVERWRITE TABLE table_2_partition
PARTITION (p_date = from_unixtime(unix_timestamp() - (86400*2) , 'yyyy-MM-dd'))
SELECT * FROM table_1;
But when I run this I get the following error:
"cannot recognize input near 'from_unixtime' '(' 'unix_timestamp' in constant"
If I query a table and make one of the columns that it work just fine. Any idea how to set the partition date to current system date in HiveQL?
Thanks in advance,
Craig
What you want here is Hive dynamic partitioning. This allows the decision for which partition each record is inserted into be determined dynamically as the record is selected. In your case, that decision is based on the date when you run the query.
To use dynamic partitioning your partition clause has the partition field(s) but not the value. The value(s) that maps to the partition field(s) is the value(s) at the end of the SELECT, and in the same order.
When you use dynamic partitions for all partition fields you need to ensure that you are using nonstrict for your dynamic partition mode (hive.exec.dynamic.partition.mode).
In your case, your query would look something like:
SET hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE table_2_partition
PARTITION (p_date)
SELECT
*
, from_unixtime(unix_timestamp() - (86400*2) , 'yyyy-MM-dd')
FROM table_1;
Instead of using unix_timestamp() and from_unixtime() functions, current_date() can used to get current date in 'yyyy-MM-dd' format.
current_date() is added in hive 1.2.0. official documentation
revised query will be :
SET hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE table_2_partition
PARTITION (p_date)
SELECT
*
, current_date()
FROM table_1;
I hope you are running a shell script and then you can store the current date in a variable. Then you create an empty table in Hive using beeline with just partition column. Once done then while inserting the data into partition table you can add that variable as your partition column and data will be inserted.