Hive INSERT OVERWRITE doesnt work with ACID property enabled

Hive INSERT OVERWRITE doesnt work with ACID property enabled - hive

My basic requirement is to update a table using JOIN with another table. As that is not supported, Im trying INSERT OVERWRITE statement. But below is the error it throws
FAILED: SemanticException [Error 10295]: INSERT OVERWRITE not allowed on table with OutputFormat that implements AcidOutputFormat while transaction manager that supports ACID is in use
I have enabled ACID properties. My query is similar to below
INSERT OVERWRITE TABLE tbl1 SELECT
col1,col2,col3 ,
case when B.COL4 is not null then B.COL4 else A.COL4 end as COL4...
FROM
tbl1 A
LEFT JOIN (
SELECT col1,col2,col3,col4..coln
FROM tbl1 rcs LEFT JOIN TBL2... many other conditions and joins,
) B
ON JOIN CONDITIONS;
tbl1 schema
CREATE TABLE tbl1(
col1 varchar(10),col2 varchar(10),col3 varchar(10))
CLUSTERED BY (col1) INTO 2 BUCKETS
STORED AS ORC TBLPROPERTIES('transactional'='true');

Related

Hive - cannot recognize input 'insert' in select clause

Say I've already created table3, and try to insert data into it using the following code
WITH table1
AS
(SELECT 1 AS key, 'One' AS value),
table2
AS
(SELECT 1 AS key, 'I' AS value)
INSERT TABLE table3
SELECT t1.key, t1.value, t2.value
FROM table1 t1
JOIN table2 t2
ON (t1.key = t2.key)
However, I got an error as cannot recognize input 'insert' in select clause. If I simply delete the insert sentence, then the query runs just fine.
Is this a syntax problem? Or I cannot use with clause to insert?

Use INTO or OVERWRITE depending on what you need:
INSERT INTO TABLE table3 --this will append data, keeping the existing data intact
or
INSERT OVERWRITE TABLE table3 --will overwrite any existing data
Read manual: Inserting data into Hive Tables from queries

Inserting new rows in table if already not exisitng

I have a table than over time can get bigger and I want to insert some of its rows in another table but I also want to make sure I am not duplicating the rows that I had inserted before.
So here is the type of condition for my insert:
INSERT INTO SecondTable(Col1,Col2)
SELECT Col5,Col6
FROM
FirstTable ft
WHERE ft.RecType = 'ABC'
So if I keep running this it will keep inserting the same rows again and again. How can I tell it only insert if it is not already there?

You can use not exists:
INSERT INTO SecondTable(Col1,Col2)
SELECT Col5,Col6
FROM FirstTable ft
WHERE ft.RecType = 'ABC' AND
NOT EXISTS (SELECT 1 FROM SecondTable t2 WHERE t2.col1 = ft.col5 AND t2.col2 = ft.colt6);

Generate unique constraint on table with proper columns which identifies unicity. This will also help you to preserve integrity of your table. when you try to insert records into the RDBMS will give you an error.
ALTER TABLE SecondTable
ADD UNIQUE (col1, col2, col3);

INSERT INTO SecondTable(Col1,Col2)
SELECT Col5,Col6
FROM FirstTable ft
LEFT JOIN SecondTable st ON st.Col1 = ft.Col1
WHERE st.Col1 IS NULL AND ft.RecType = 'ABC'

How to add only new row from one table to another table in sql

I am using Linked Server in SQL Server which is connected to hana database. I want to import one table from hana database and make copy into sql. I have done the importing part. Now issue is that i want to make a query that select the only row or insert only row in sql which is not existing or new. But my query is giving an error of Primary key violation which means its inserting all data again.
Here is my Query:
insert into OACT ( AcctCode,AcctName,CurrTotal,FatherNum,SysTotal,CreateDate,UpdateDate,ActId,FormatCode)
select tab2.AcctCode,tab2.AcctName,tab2.CurrTotal, tab2.FatherNum,tab2.SysTotal,tab2.CreateDate,tab2.UpdateDate,tab2.ActId,tab2.FormatCode
from HanaSql8.."TRAININGDB"."OACT" tab2
Where NOT EXISTS (
Select tab1.AcctCode,tab1.AcctName,tab1.CurrTotal, tab1.FatherNum,tab1.SysTotal,tab1.CreateDate,tab1.UpdateDate,tab1.ActId,tab1.FormatCode
from OACT tab1
where tab1.AcctCode=tab2.AcctCode
);

Your code should work without any problem if the primary key on target table is defined on AcctCode column
In EXISTS() or NOT EXISTS() you don't have to select with column names, it is only a logical check so you can modify as follows
insert into OACT (
AcctCode,AcctName,CurrTotal,
FatherNum,SysTotal,CreateDate
,UpdateDate,ActId,FormatCode
)
select
tab2.AcctCode,tab2.AcctName,tab2.CurrTotal,
tab2.FatherNum,tab2.SysTotal,tab2.CreateDate,
tab2.UpdateDate,tab2.ActId,tab2.FormatCode
from HanaSql8.."TRAININGDB"."OACT" tab2
Where NOT EXISTS (
Select *
from OACT tab1
where tab1.AcctCode=tab2.AcctCode
);
We can test LEFT JOIN clause as an alternative to NOT EXISTS()
Here is the query
insert into OACT (
AcctCode,AcctName
)
select
tab2.AcctCode,tab2.AcctName
from HanaSql8.."TRAININGDB"."OACT" tab2
left join OACT tab1
on tab1.AcctCode=tab2.AcctCode
where tab1.AcctCode is null

Selecting rowset when value exists in one of 5 tables with different amounts of columns

Using SQL Server, I Need to return the entire row from whatever table contains 'value' in the Filename column (A column each of the tables contain), but the tables do not have the same number of columns, and each table has unique columns with their own specific data types (The only column Name/Type they have in common is the Filename column that I need to check for 'value').
Ideally, I would be able to do something along the lines of:
SELECT * FROM Table1, Table2, Table3, Table4, Table5
WHERE Filename = 'someValue'
Since all tables share the same column name for the Filename.
I have tried using Union but have issues since the number of columns and datatypes of the tables do not align.
I have also tried every combination of JOIN I could find.
I'm sure this could be accomplished with IF EXISTS, but that would be many, many lines of what seems like unnecessary code. Hoping there is a more elegant solution.
Thanks in advance!

You can try to join the tables together. First create temporary table where you store the input. And then join the tables with this temporary to get all records you want. When there is no record for that filename in the table, then you will get NULL values.
create table Table1 (id int,value int);
insert into Table1 values (1,10)
create table Table2 (id int,value int);
insert into Table2 values (1,20)
create table Table3 (id int,value int);
insert into Table3 values (2,30)
Here is the query itself
create table #tmp (id int)
insert into #tmp
values (1)
select t.id, t1.value, t2.value, t3.value from #tmp as t
left join Table1 as t1
on t.id = t1.id
left join Table2 as t2
on t.id = t2.id
left join Table3 as t3
on t.id = t3.id
And this is what you get
id value value value
1 10 20 NULL

this should work too:
EXEC sp_MSforeachtable
#command1='SELECT * FROM ? where filename = ''someValue''',
#whereand='AND o.id in (select object_id from sys.tables where name in (''Table1'',''Table2'',''Table3''))'

mapping of data between same server but diffrent databases with same table names

Hi I am stuck here need your advice. I have a server with multiple DB's.
Now I want to map if the data in one table of 1 db is equal to the data in another db with same table name
can anyone suggest how to do that?? thanks in advancve

select * from db1.table1 t1
full outer join db2.table2 t2 on t1.id = t2.id
where t1.id <> t2.id
or t1.col2 <> t2.col2 or ...

depends on what you need to map.
If you just want to know the differences by primary key, I would try a full join on the PK, so it will tell you records that exist on A but not on B and records that exists on B but not on A. Like this:
create table DB_A(id int)
create table DB_B(id int)
insert into DB_A values (1)
insert into DB_A values (2)
insert into DB_B values (2)
insert into DB_B values (3)
select DB_A.ID as 'Exists on A but not on B', DB_B.id as 'Exists on B but not on A'
from DB_A full join DB_B on DB_A.id=DB_B.id
where DB_A.id is null or DB_B.id is null
if you need more than that like compare the values of all columns, I suggest you to use a data compare tool. It would not be so strait forward to do it using just SQL

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive INSERT OVERWRITE doesnt work with ACID property enabled - hive

Related

Hive - cannot recognize input 'insert' in select clause

Inserting new rows in table if already not exisitng

How to add only new row from one table to another table in sql

Selecting rowset when value exists in one of 5 tables with different amounts of columns

mapping of data between same server but diffrent databases with same table names

Categories

Resources