Hive - cannot recognize input 'insert' in select clause - hive

Say I've already created table3, and try to insert data into it using the following code
WITH table1
AS
(SELECT 1 AS key, 'One' AS value),
table2
AS
(SELECT 1 AS key, 'I' AS value)
INSERT TABLE table3
SELECT t1.key, t1.value, t2.value
FROM table1 t1
JOIN table2 t2
ON (t1.key = t2.key)
However, I got an error as cannot recognize input 'insert' in select clause. If I simply delete the insert sentence, then the query runs just fine.
Is this a syntax problem? Or I cannot use with clause to insert?

Use INTO or OVERWRITE depending on what you need:
INSERT INTO TABLE table3 --this will append data, keeping the existing data intact
or
INSERT OVERWRITE TABLE table3 --will overwrite any existing data
Read manual: Inserting data into Hive Tables from queries

Related

Update table if exist or insert if not

I have 2 SQL tables with same column label, Table1 and Table2.
I want to update Table1 with value of Table2 if there is a same value for an attribute.
The tables has CODE, POSITION, DESCRIPTION as columns.
I do this:
UPDATE Table1
SET CODE = Table2.CODE,
POSITION= Table2.POSITION
FROM Table2
WHERE DESCRIPTION = Table2.DESCRIPTION
It works, but if in the DESCRIPTION Value of Table2 is not present into Table1, I want to insert the entire row, in other words I need to do something like:
UPDATE IF DESCRIPTION EXISTS
ELSE INSERT
How can I do this?
You can do this from first-principals in 2 steps
Update the existing values like you have done:
UPDATE Table1
SET
CODE= t2.CODE,
POSITION= t2.POSITION
FROM Table1 t1
INNER JOIN Table2 t2 ON t1.DESCRITPION = t2.DESCRITPION
Then you can insert the missing records
INSERT INTO Table1 (CODE, POSITION, DESCRITPION)
SELECT CODE, POSITION, DESCRITPION
FROM Table2 t2
WHERE NOT EXISTS (SELECT DESCRITPION
FROM Table1 t1
WHERE t1.DESCRITPION = t2.DESCRITPION)
Most RDBMS have a MERGE statement that allows you to combine these two queries into a single atomic operation, you'll need to check the documentation for your vendor, but an MS SQL Server MERGE solution looks like this:
MERGE INTO Table1 AS tgt
USING Table2 AS src
ON tgt.DESCRITPION = src.DESCRITPION
WHEN MATCHED
THEN UPDATE SET CODE= src.CODE,
POSITION= src.POSITION
WHEN NOT MATCHED THEN
INSERT (CODE, POSITION, DESCRITPION)
VALUES (src.CODE, src.POSITION, src.DESCRITPION)
An additional benefit to the MERGE clause is that it simplifies capturing the changed state into the OUTPUT clause which allows for some complex logic all while keeping the query as a single operation without having to manually create and track transaction scopes.

How to copy an sql column based on foreign key?

How do I run the following query
INSERT INTO table1
SELECT sourceId
FROM table2
WHERE table2.id = table1.productId
given that table2 has an id column and a sourceId column (and few others) and table1 already contains productId and I want to copy the sourceId column into the table as well?
The error message I'm getting with this query is simply "Unknown column 'table1.productId' in where clause", but if I include table1.productId in the SELECT and table1 on the FROM row, I get the "Column count doesn't match value count at row 1" error.
Any idea how to fix up the query?
when you are inserting into a table, you have to specify what to insert in all columns of that table. Should look like this:
INSERT INTO table1 (column1, column2, column3, ...)
SELECT value1, value2, value3, ...
FROM table2
WHERE ...;
But in your case since you said table1 already contains productId, you probably should do an UPDATE and not an INSERT
UPDATE table1 t1
SET t1.sourceId =
(SELECT sourceId
FROM table2
WHERE table2.id = t1.productId);
If this works for you, you can improve on it with JOINs
The select part is supposed to work stand-alone, so you must select from table1 in that part, if you want to refer to its rows.
INSERT INTO table1
SELECT sourceId
FROM table2
WHERE table2.id IN (SELECT productId FROM table1);

How can I effectively check uniqueness of an item in database?

I have a spreadsheet that needs to be uploaded. But each row needs to be check if they are unique in the database. One option that I can think of is to check each row exists in the database or not, that means doing N sql queries for N rows. Is there any alternatives on effectively checking for unique data instead of checking row by row?
we can do that by using NOT EXISTS and NOT IN methods. To do that, first we need to insert that spreadsheet data's into one temp table.
lets take your main table as TABLE_1 and your spreadsheet temp table as TABLE_2.
Using NOT IN -
INSERT INTO TABLE_1 (id, name)
SELECT t2.id, t2.name FROM TABLE_2 t2
WHERE t2.id NOT IN (SELECT id FROM TABLE_1)
Using NOT EXISTS -
INSERT INTO TABLE_1 (id, name)
SELECT t2.id, t2.name FROM TABLE_2 t2
WHERE NOT EXISTS(SELECT id FROM TABLE_1 t1 WHERE t1.id = t2.id)

Copy data from table to table doing INSERT or UPDATE

I need to copy a lot of data from one table to another. If the data already exists, I need to update it, otherwise I need to insert it. The data to be copied is selecting using a WHERE condition. The data has a primary key (a string of up to 12 characters).
If I was just inserting the data, I would do
INSERT INTO T2 SELECT COL1, COL2 FROM T1 WHERE T1.ID ='I'
but I cannot figure out how to do the INSERT / UPDATE. I keep seeing references to upserts and MERGE, but MERGE appears to have issues,and I cannot figure ut how to do the upsert for multiple records.
What is the best solution for this?
If you want to avoid merges (though you should not be afraid of it) you can do something like
update t2
set col1 = t1.col1
,col2 = t1.col2
from t2
join t1
on t2.[joinkey] = t1.[joinkey]
where [where clause]
And after for the ones that you do not have
insert into t2(col1,col2)
select col1,col2 from t1
where not exists (select * from t2 where t1.[joinkey] = t2.[joinkey])
in such way you first update the ones that match and then insert the ones that do not. Also if you want it in one go you can wrap it in a transaction.
It is commonly known as UPSERT operation. Yes you are correct in saying merge has some issues with it so stay away from it.
A simple approach assuming there is a Primary Key column in Both tables called PK_Col would be something like this...
BEGIN TRANSACTION;
-- Update already existing records
UPDATE T2
SET T2.Col1 = T1.Col1
,T2.Col2 = T1.Col2
FROM T2 INNER JOIN T1 ON T2.PK_COl = T1.PK_Col
-- Insert missing records
INSERT INTO T2 (COL1, COL2 )
SELECT COL1, COL2
FROM T1
WHERE T1.ID ='I'
AND NOT EXISTS (SELECT 1
FROM T2
WHERE T2.PK_COl = T1.PK_Col )
COMMIT TRANSACTION;
Wrap the whole UPSERT operation in one transaction.
You can use IF EXISTS something like:
if exists (select * from table with (updlock,serializable) where key = #key)
begin
update table set ...
where key = #key
end
else
begin
insert table (key, ...)
values (#key, ...)
end
Another solution is to check ##ROWCOUNT
UPDATE MyTable SET FieldA=#FieldA WHERE Key=#Key
IF ##ROWCOUNT = 0
INSERT INTO MyTable (FieldA) VALUES (#FieldA)

In SQL Server, how can I insert values into table EXCEPT repeat rows?

I wrote a python script which would look at a text file and create SQL code which would insert data into a table.
It looked like this:
insert into table1 (date, locid, personid, itemid, amounts)
values (val11,val12,val13,val14,val15)
,(val21,val22,val23,val24,val25)
The data is structured so that for a particular set of values the first four columns (date, locid, personid, itemid), there will be at most one row.
At the moment, I have to check by hand if an entry already exists in the table, then remove it from the insert statement.
How can I enter this data into the DB without manually checking for repeats?
Create a second table, table2, with the DDL for just the columns involved in your insert.
Do your inserts into table2.
Then run:
insert into table1 (date, locid, personid, itemid, amounts)
select t2.* from table2 t2
join(
select locid, personid, itemid, amounts from table2
except
select locid, personid, itemid, amounts from table1) x
on t2.locid = x.locid
and t2.personid = x.personid
and t2.itemid = x.itemid
and t2.amounts = x.amounts
Then you can delete table2.
And table1 will be populated with only those INSERTS where values in all 4 columns are not a match with all 4 columns on any existing row of table1.
This assumes you do not want the INSERT to go through only if there exists a match in all 4 columns. In other words the above will do the INSERT if there is a row where 3 of the 4 columns match. It only stops the INSERT when there already exists a row that is a perfect match on all 4 columns.
If you have duplicate INSERT statements being generated as well, simply add the DISTINCT operator to the query, "select DISTINCT t2.* from table2 t2"
As ludwigmace pointed out in the comments maybe try the following as well and compare the performance difference, it should be functionally equivalent (if the inserts don't contain duplicates, you can get rid of the group by) ---
insert into table1 (date, locid, personid, itemid, amounts)
SELECT t2.date, t2.locid, t2.personid, t2.itemid, t2.amounts
FROM table2 t2
LEFT JOIN t1
ON t2.date = t1.date
AND t2.locid = t1.locid
AND t2.personid = t1.personid
AND t2.itemid = t1.itemid
WHERE t1.date is null
GROUP BY t2.date, t2.locid, t2.personid, t2.itemid, t2.amounts
This should work:
INSERT INTO [table1] ([date], [locid], [personid], [itemid], [amounts])
SELECT val1, val2, val3, val4, val5
WHERE NOT EXISTS
(
SELECT * FROM [table1] WHERE [date]=val1 AND [locid]=val2 AND [personid]=val3 AND [itemid]=val4
)
Instead of inserting values directly, you insert from the result of a select statement.
The select statement is crafted to only return the values that you specify if they don't already exist
You can control the scope of uniqueness (the combination of which columns should be unique) by modifying the Where-clause in the second select (i.e. add or remove comparisons as needed)