how to avoid duplicate entries in hive?

how to avoid duplicate entries in hive? - hive

Create a table with primary key in Hive.
Insert the identical data record several times.
How can you avoid that the data record (primary key) is not inserted more than once without using a second temporary table?
drop table t1;
CREATE TABLE IF NOT EXISTS `t1` (
`ID` BIGINT DEFAULT SURROGATE_KEY(),
`Name` STRING NOT NULL DISABLE NOVALIDATE,
CONSTRAINT `PK_t1` PRIMARY KEY (`ID`) DISABLE NOVALIDATE);
select * from t1;
+--------+----------+
| t1.id | t1.name |
+--------+----------+
+--------+----------+
insert into t1 values (1, "Hi");
insert into t1 values (1, "Hi");
insert into t1 values (1, "Hi");
select * from t1;
+--------+----------+
| t1.id | t1.name |
+--------+----------+
| 1 | Hi |
| 1 | Hi |
| 1 | Hi |
+--------+----------+
I tried unsuccessfully with a merge:
MERGE INTO t1
USING (select * from t1) sub
ON sub.id != t1.id
WHEN not matched then insert values (2, "World");

Related

Insert records from 1st table to 2nd table only when the record is not present in the 2nd table

I have 1 table with the same table structure as the second table, I just have to insert records from table1 to table2 with
insert into table2(select * from table1);
The table 2 has a primary key in one of the fields say(id), and some one inserted data corresponding to that primary key
table1 table2
id | name id | name
1 | new1 1 | old1
2 | new2 4 | new4
3 | new3 3 | old3
5 | new5 6 | old6
I have to insert only those records into table 2 for which the primary key is not populated.
After insertion table 2 should look like this
table 2
id | name
1 | old1
2 | new2
3 | old3
4 | new4
5 | new5
6 | old6
What is the easiest way to do this?

Use a NOT EXISTS condition to get only those rows from table1 that don't exists in table2:
insert into table2 (id, name)
select t1.id, t1.name
from table1 t1
where not exists (select *
from table2 t2
where t2.id = t1.id);

You could use MERGE statement with only INSERT WHEN NOT MATCHED:
MERGE INTO table2 t2
USING table1 t1
ON t1.id = t2.id
WHEN NOT MATCHED
THEN
INSERT INTO table2
(id, name)
VALUES
(t1.id, t1.name)

Create an SQL query from two tables in postgresql

I have two tables as shown in the image. I want to create a SQL query in postgresql to get the pkey and minimum count for each unique 'pkey' in table 1 where 'name1' is not present in the array of column 'name' in table 2.
'name' is a array

You can use ANY to check if one element exists in your name's array.
create table t1 (pkey int, cnt int);
create table t2 (pkey int, name text[]);
insert into t1 values (1, 11),(1, 9),(2, 14),(2, 15),(3, 21),(3,16);
insert into t2 values
(1, array['name1','name2']),
(1, array['name3','name2']),
(2, array['name4','name1']),
(2, array['name5','name2']),
(3, array['name2','name3']),
(3, array['name4','name5']);
select pkey
from t2
where 'name1' = any(name);
| pkey |
| ---: |
| 1 |
| 2 |
select t1.pkey, min(cnt) count
from t1
where not exists (select 1
from t2
where t2.pkey = t1.pkey
and 'name1' = any(name))
group by t1.pkey;
pkey | count
---: | ----:
3 | 16
dbfiddle here

Insert data into two columns only if not duplicate

I have a table user_interests with id(AUTO_INC), user_id, user_interest columns.
I want a easy way to insert data into user_id and user_interest without duplicate entries.
E.g. if I have a table like this before.
+------------------------------+
| ID | user_id | user_interest |
+------------------------------+
| 1 | 2 | Music |
| 2 | 2 | Swimming |
+------------------------------+
If I now insert into table (user_id, user_interest) values ((2, Dance),(2, Swimming), I only need (2,dance) entry to be inserted - not (2, swimming) since (2, swimming) already exists in the table.
I have seen upsert commands, and have also tried creating a command like below but it doesn't work.
INSERT INTO `user_interests`( `user_id`,`interest` )
VALUES ("2","Music")
WHERE (SELECT COUNT(`interest`) FROM `user_interests`
WHERE `interest` = "Music" AND `user_id` = "2"
Having COUNT(`interest`) <=0 )

Use NOT EXISTS method :
INSERT INTO your_table (user_id ,user_interest )
SELECT #userId , #UserIntreset
WHERE NOT EXISTS(SELECT 1 FROM your_table user_id = #userid AND user_interest
= #userinterest )
Or Create unique constraint in your table,
ALTER TABLE your_table
ADD CONSTRAINT Constraint_Name UNIQUE (Column_Name1,Column_Name2)

Can the same column have primary key & foreign key constraint to another column

Can the same column have primary key & foreign key constraint to another column?
Table1: ID - Primary column, foreign key constraint for Table2 ID
Table2: ID - Primary column, Name
Will this be an issue if i try to delete table1 data?
Delete from table1 where ID=1000;
Thanks.

Assigning Primary Key And Foreign key to the same column in a Table:
create table a1 (
id1 int not null primary key
);
insert into a1 values(1),(2),(3),(4);
create table a2 (
id1 int not null primary key foreign key references a1(id1)
);
insert into a2 values(1),(2),(3);

There should be no problem with that. Consider the following example:
CREATE TABLE table2 (
id int PRIMARY KEY,
name varchar(20)
) ENGINE=INNODB;
CREATE TABLE table1 (
id int PRIMARY KEY,
t2_id int,
FOREIGN KEY (t2_id) REFERENCES table2 (id)
) ENGINE=INNODB;
INSERT INTO table2 VALUES (1, 'First Row');
INSERT INTO table2 VALUES (2, 'Second Row');
INSERT INTO table1 VALUES (1, 1);
INSERT INTO table1 VALUES (2, 1);
INSERT INTO table1 VALUES (3, 1);
INSERT INTO table1 VALUES (4, 2);
The tables now contain:
SELECT * FROM table1;
+----+-------+
| id | t2_id |
+----+-------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
+----+-------+
4 rows in set (0.00 sec)
SELECT * FROM table2;
+----+------------+
| id | name |
+----+------------+
| 1 | First Row |
| 2 | Second Row |
+----+------------+
2 rows in set (0.00 sec)
Now we can successfully delete rows like this:
DELETE FROM table1 WHERE id = 1;
Query OK, 1 row affected (0.00 sec)
DELETE FROM table1 WHERE t2_id = 2;
Query OK, 1 row affected (0.00 sec)
However we won't be able to delete the following:
DELETE FROM table2 WHERE id = 1;
ERROR 1451 (23000): A foreign key constraint fails
If we had defined the foreign key on table1 with the CASCADE option, we would have been able to delete the parent, and all the children would get deleted automatically:
CREATE TABLE table2 (
id int PRIMARY KEY,
name varchar(20)
) ENGINE=INNODB;
CREATE TABLE table1 (
id int PRIMARY KEY,
t2_id int,
FOREIGN KEY (t2_id) REFERENCES table2 (id) ON DELETE CASCADE
) ENGINE=INNODB;
INSERT INTO table2 VALUES (1, 'First Row');
INSERT INTO table2 VALUES (2, 'Second Row');
INSERT INTO table1 VALUES (1, 1);
INSERT INTO table1 VALUES (2, 1);
INSERT INTO table1 VALUES (3, 1);
INSERT INTO table1 VALUES (4, 2);
If we were to repeat the previously failed DELETE, the children rows in table1 will be deleted as well as the parent row in table2:
DELETE FROM table2 WHERE id = 1;
Query OK, 1 row affected (0.00 sec)
SELECT * FROM table1;
+----+-------+
| id | t2_id |
+----+-------+
| 4 | 2 |
+----+-------+
1 row in set (0.00 sec)
SELECT * FROM table2;
+----+------------+
| id | name |
+----+------------+
| 2 | Second Row |
+----+------------+
1 row in set (0.00 sec)

The answer provided by Jason may have worked some time in the past but when I tried to use this answer in 2021 against a MySQL 5.7 server it complains. The syntax I used to get this working was;
CREATE TABLE a1 (
id1 INT NOT NULL PRIMARY KEY
);
INSERT INTO a1 VALUES (1),(2),(3),(4);
CREATE TABLE a2 (
id1 INT NOT NULL,
PRIMARY KEY (id1),
CONSTRAINT `fk_id1` FOREIGN KEY (id1) REFERENCES a1(id1)
);
INSERT INTO a2 VALUES (1),(2),(3);
For one-to-one relationships of this type, I would also strongly recommend you create the foreign keys as;
CONSTRAINT `fk_id1` FOREIGN KEY (id1) REFERENCES a1(id1) ON DELETE CASCADE

Yes, it can.
No, it won't.
P.S. But you'll not be able to delete table2 data without deleting corresponding table1 rows obviously.
P.P.S. I've implemented such structure in Postgres, but it must be similar for MySQL.

Merge two rows in SQL

Assuming I have a table containing the following information:
FK | Field1 | Field2
=====================
3 | ABC | *NULL*
3 | *NULL* | DEF
is there a way I can perform a select on the table to get the following
FK | Field1 | Field2
=====================
3 | ABC | DEF
Thanks
Edit: Fix field2 name for clarity

Aggregate functions may help you out here. Aggregate functions ignore NULLs (at least that's true on SQL Server, Oracle, and Jet/Access), so you could use a query like this (tested on SQL Server Express 2008 R2):
SELECT
FK,
MAX(Field1) AS Field1,
MAX(Field2) AS Field2
FROM
table1
GROUP BY
FK;
I used MAX, but any aggregate which picks one value from among the GROUP BY rows should work.
Test data:
CREATE TABLE table1 (FK int, Field1 varchar(10), Field2 varchar(10));
INSERT INTO table1 VALUES (3, 'ABC', NULL);
INSERT INTO table1 VALUES (3, NULL, 'DEF');
INSERT INTO table1 VALUES (4, 'GHI', NULL);
INSERT INTO table1 VALUES (4, 'JKL', 'MNO');
INSERT INTO table1 VALUES (4, NULL, 'PQR');
Results:
FK Field1 Field2
-- ------ ------
3 ABC DEF
4 JKL PQR

There are a few ways depending on some data rules that you have not included, but here is one way using what you gave.
SELECT
t1.Field1,
t2.Field2
FROM Table1 t1
LEFT JOIN Table1 t2 ON t1.FK = t2.FK AND t2.Field1 IS NULL
Another way:
SELECT
t1.Field1,
(SELECT Field2 FROM Table2 t2 WHERE t2.FK = t1.FK AND Field1 IS NULL) AS Field2
FROM Table1 t1

There might be neater methods, but the following could be one approach:
SELECT t.fk,
(
SELECT t1.Field1
FROM `table` t1
WHERE t1.fk = t.fk AND t1.Field1 IS NOT NULL
LIMIT 1
) Field1,
(
SELECT t2.Field2
FROM `table` t2
WHERE t2.fk = t.fk AND t2.Field2 IS NOT NULL
LIMIT 1
) Field2
FROM `table` t
WHERE t.fk = 3
GROUP BY t.fk;
Test Case:
CREATE TABLE `table` (fk int, Field1 varchar(10), Field2 varchar(10));
INSERT INTO `table` VALUES (3, 'ABC', NULL);
INSERT INTO `table` VALUES (3, NULL, 'DEF');
INSERT INTO `table` VALUES (4, 'GHI', NULL);
INSERT INTO `table` VALUES (4, NULL, 'JKL');
INSERT INTO `table` VALUES (5, NULL, 'MNO');
Result:
+------+--------+--------+
| fk | Field1 | Field2 |
+------+--------+--------+
| 3 | ABC | DEF |
+------+--------+--------+
1 row in set (0.01 sec)
Running the same query without the WHERE t.fk = 3 clause, it would return the following result-set:
+------+--------+--------+
| fk | Field1 | Field2 |
+------+--------+--------+
| 3 | ABC | DEF |
| 4 | GHI | JKL |
| 5 | NULL | MNO |
+------+--------+--------+
3 rows in set (0.01 sec)

I had a similar problem. The difference was that I needed far more control over what I was returning so I ended up with an simple clear but rather long query. Here is a simplified version of it based on your example.
select main.id, Field1_Q.Field1, Field2_Q.Field2
from
(
select distinct id
from Table1
)as main
left outer join (
select id, max(Field1)
from Table1
where Field1 is not null
group by id
) as Field1_Q on main.id = Field1_Q.id
left outer join (
select id, max(Field2)
from Table1
where Field2 is not null
group by id
) as Field2_Q on main.id = Field2_Q.id
;
The trick here is that the first select 'main' selects the rows to display. Then you have one select per field. What is being joined on should be all of the same values returned by the 'main' query.
Be warned, those other queries need to return only one row per id or you will be ignoring data

if one row has value in field1 column and other rows have null value then this Query might work.
SELECT
FK,
MAX(Field1) as Field1,
MAX(Field2) as Field2
FROM
(
select FK,ISNULL(Field1,'') as Field1,ISNULL(Field2,'') as Field2 from table1
)
tbl
GROUP BY FK

My case is I have a table like this
---------------------------------------------
|company_name|company_ID|CA | WA |
---------------------------------------------
|Costco | 1 |NULL | 2 |
---------------------------------------------
|Costco | 1 |3 |Null |
---------------------------------------------
And I want it to be like below:
---------------------------------------------
|company_name|company_ID|CA | WA |
---------------------------------------------
|Costco | 1 |3 | 2 |
---------------------------------------------
Most code is almost the same:
SELECT
FK,
MAX(CA) AS CA,
MAX(WA) AS WA
FROM
table1
GROUP BY company_name,company_ID
The only difference is the group by, if you put two column names into it, you can group them in pairs.

SELECT Q.FK
,ISNULL(T1.Field1, T2.Field2) AS Field
FROM (SELECT FK FROM Table1
UNION
SELECT FK FROM Table2) AS Q
LEFT JOIN Table1 AS T1 ON T1.FK = Q.FK
LEFT JOIN Table2 AS T2 ON T2.FK = Q.FK
If there is one table, write Table1 instead of Table2

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to avoid duplicate entries in hive? - hive

Related

Insert records from 1st table to 2nd table only when the record is not present in the 2nd table

Create an SQL query from two tables in postgresql

Insert data into two columns only if not duplicate

Can the same column have primary key & foreign key constraint to another column

Merge two rows in SQL

Categories

Resources