Insert overwrite in Hive - hive

I am trying to use Insert overwrite in Hive. Basically I would like to insert overwrite not the complete partition but only a few records in the partition. I am not finding any solution to do it (Insert overwrite in destination table based on a filter on non partition column also).
Is there any way I can achieve it?

Hive is not as Regular RDBMS, If you want to update the record simple do INSERT OVERWRITE TABLE Table_Name...simple change your data in one temporary table or by using WITH clause simply insert overwrite..by using table partioning..it is safe.
QUERY[HIVE]:
WITH TEMP_TABLE AS (SELECT * FROM SOURCE_TABLE_NAME) INSERT OVERWRITE TABLE TARGET_TABLE_NAME SELECT * FROM TEMP_TABLE

Hive is not an RDBMS. What you are trying to achieve with Hive is not recommended. Hive is better suited for batch processing over very large sets of immutable data.
However, from what I could deduce, you are trying to update an existing record in your table. To do so, enable ACID support on the table that needs to be updated and your update queries will start working.
UPDATE <TABLE>
SET <COL1>='Value1',
SET <COL2>='Value2'
WHERE <Some Condition That Only Evaluates To The Rows You Need Updated>

Related

Removing specific rows from table without using DELETE

I am working in a company and I need to find a way to delete specific rows from table without using DELETE function.
So I was thinking to use partition and then remove it using drop partition if exists:
select *, count(validity_date) over(partition by another_column) as indicator from schema.table
Which worked, but when I try dropping the partition using
ALTER TABLE schema.table DROP IF EXISTS PARTITION(year(validity_date) = '2022');
I get an error saying
mismatched input '(' expecting set null in drop partition statement
So my question is there any other way to remove specific rows from a table without using the delete function?
Thank you !
There is a typo in your query - missing a closing parenthesis in the end.
ALTER TABLE schema.table DROP IF EXISTS PARTITION(year(validity_date) = '2022'));
These are the only two options to delete the data : Either via drop partition or delete query
P.S. Hive supports ACID transactions like delete and update records/rows on Table only Hive 0.14 version onwards.

How to modify CTAS query to append query results to table based on if new partition doesn't exist? - Athena

I have a query that I want to execute daily that's to be partitioned by the date it's executed. The results of this query should be appended to a the same table.
My idea was ideally having something similar to the CREATE TABLE IF NOT EXISTS command for adding data by a new partition every day to the existing table if the partition doesn't already exist, but I can't figure out how I'd be able to integrate this in my query.
My query:
CREATE TABLE IF NOT EXISTS db_name.table_name
WITH (
external_location = 's3://my-query-results-location/',
format = 'PARQUET',
parquet_compression = 'SNAPPY',
partitioned_by = ARRAY['date_executed'])
AS
SELECT
{columns_that_I_am_selecting_here_including_'date_executed'}
What this does is create a new table for the first day it's executed but nothing happens for subsequent days, I'm assuming because of the CREATE TABLE IF NOT EXISTS validating that the table already exists and not proceeding with the logic.
Is there a way to modify my query to create a table for the first day executed and append the results by a new partition for each subsequent day?
I'm quite sure ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION would not apply to my use case here as I'm running a CTAS query.
You can simply use INSERT INTO existing_table SELECT....
Presumably your table is already partitioned, so include that partition column in the SELECT and Amazon Athena will automatically put the data in the correct directory.
For example, you might include hte column like this: SELECT ... CURRENT_DATE as date_executed
See: INSERT INTO - Amazon Athena

Insert into select, target data unaffected

We have a simple query
INSERT INTO table2
SELECT *
FROM table1
WHERE condition;
I can read somewhere that to use INSERT INTO SELECT statement, the following condition must be fulfilled:
The existing records in the target table are unaffected
What does it mean?
INSERT is a SQL operations that add some new rows into your table, with not affect on the others. This is happening instead of UPDATE operations, that cand affect multiple rows from your table if you use a wrong WHERE Clause.

sql update table from another table snappydata

Hi I am using SnappyData's sql utility to update table my table from another table, say update Table_A with rows from Table_B.
Table_A(col_key, col_value) -- partitioned table with large number of rows
Table_B(col_key, col_value) -- small batch update in this table
Ideally a MERGE would be ideal (update if there is a match, or insert if the row with the key does not exists in Table_A)
But MERGE is not supported in SnappyData (or Gemfire), thus I am planning to insert first with an outer join to handle new col_key rows, then an update to update values in Table_A
where the same col_key also appears in Table_B.
However it seems that the "update ... set ... from ... " is also not supported in Gemfire
So is there a way to implement the "update .. set .. from .." in SnappyData sql statements? Thanks in advance :)
Yes, you can use PUT INTO when using SQL or you can do the same using the Snappy Spark extension APIs too.
I just found that GemFire actually use a "PUT INTO" statement to kind of support the "INSERT or UPDATE" (MERGE) function by other DBMSes.
Basically first retrieve the 'old' values from my TABLE_A where the col_key exists in , add them to TABLE_B, and use "PUT INTO" to put those rows in Table_B to Table_A and it's done!

merging data from old table into new for a monthly archive

I have a sql statement to insert data into a table for archiving, but I need a merge statement to run on a monthyl basis to update the new table(2) with any data that changed in the old table(1) that should now be moved into archive.
Part of the issue is to remove the moved data from the old table. My insert is not doing that, but I need to have it to where the saved data is purged from the original table.
Is there a single sql statement that will move data out of one table into another in this way? Or does it need to be a two step operation?
the initial statement moved data depending on age and a few other relative factors.
insert is:
INSERT /*+ append */
INTO tab1
SELECT *
FROM tab2
WHERE (Postingdate < TO_DATE ('2001/07/01', 'yyyy/mm/dd')
OR jobname IS NULL)
AND STATUS <> '45';
All help appreciated...
The merge statement will let you do this in one statement by adding a delete statement in the update clause. See Oracle Documentation on Merge.
I think you should try this with a partition table. My idea is to create table which have range partition on date:
create table(id number primary key,name varchar,J_date date )
partition by range(J_date)(PARTITION one_mnth VALUES LESS THAN(sysdate-30)),
partition by range(J_date)(PARTITION one_mnth VALUES LESS THAN(maxvalue)));
then move that partition in to another table and and truncate that partition