How to delete rows where more than 1 column matches another table? - sql

I have two tables. One (let's call it table1) looks a bit like this:
account_number | offer_code
---------------|-----------
1 | 123
1 | 456
2 | 123
The other table (let's call it table2) looks a bit like this:
account_number | offer_code
---------------|-----------
1 | 123
I want to delete all rows from table1 where the account_number AND the offer_code match a row in table2. So afterwards table1 would look like this:
account_number | offer_code
---------------|-----------
1 | 456
2 | 123
I've tried the following, but it doesn't run:
DELETE
FROM TABLE1 A
INNER JOIN
TABLE2 B
ON A.ACCOUNT_NUMBER = B.ACCOUNT_NUMBER
AND A.OFFER_CODE = B.OFFER_CODE
;
I've also tried the following. It seems to run, but the sheer volume of data in both tables (65.5m rows in table1 and 9m in table2) mean it takes an impractically long time to do so (I was forced to kill the query after 3 hours).
DELETE
FROM TABLE1
WHERE CONCAT(ACCOUNT_NUMBER, OFFER_CODE) IN
(
SELECT CONCAT(ACCOUNT_NUMBER, OFFER_CODE)
FROM TABLE2
)
;
Does anyone know if there is a way to accomplish this efficiently please?

Databases do not like update and delete processes. They are exhausting. Depending on your application(carefully check this out!!!) you can try this:
create table table1_tmp
select * from table1
minus
select * from table2;
alter table table1 rename to table1_tmp2;
alter table table1_tmp rename to table1;

Related

Remove duplicate rows based on specific columns

I have a table that contains these columns:
ID (varchar)
SETUP_ID (varchar)
MENU (varchar)
LABEL (varchar)
The thing I want to achieve is to remove all duplicates from the table based on two columns (SETUP_ID, MENU).
Table I have:
id | setup_id | menu | label |
-------------------------------------
1 | 10 | main | txt |
2 | 10 | main | txt |
3 | 11 | second | txt |
4 | 11 | second | txt |
5 | 12 | third | txt |
Table I want:
id | setup_id | menu | label |
-------------------------------------
1 | 10 | main | txt |
3 | 11 | second | txt |
5 | 12 | third | txt |
You can achieve this with a common table expression (cte)
with cte as (
select id, setup_id, menu,
row_number () over (partition by setup_id, menu, label) rownum
from atable )
delete from atable a
where id in (select id from cte where rownum >= 2)
This will give you your desired output.
Common Table Expression docs
Assuming a table named tbl where both setup_id and menu are defined NOT NULL and id is the PRIMARY KEY.
EXISTS will do nicely:
DELETE FROM tbl t0
WHERE EXISTS (
SELECT FROM tbl t1
WHERE t1.setup_id = t0.setup_id
AND t1.menu = t0.menu
AND t1.id < t0.id
);
This deletes every row where a dupe with lower id is found, effectively only keeping the row with the smallest id from each set of dupes. An index on (setup_id, menu) or even (setup_id, menu, id) will help performance with big tables a lot.
If there is no PK and no reliable UNIQUE (combination of) column(s), you can fall back to using the ctid. If NULL values can be involved, you need to specify how to deal with those.
Consider:
Delete duplicate rows from small table
How to delete duplicate rows without unique identifier
How do I (or can I) SELECT DISTINCT on multiple columns?
After cleaning up duplicates, add a UNIQUE constraint to prevent new dupes:
ALTER TABLE tbl ADD CONSTRAINT tbl_setup_id_menu_uni UNIQUE (setup_id, menu);
If you had an index on (setup_id, menu), drop that now. It's superseded by the UNIQUE constraint.
I have found a solution that fits me the best.
Here it is if anyone needs it:
DELETE FROM table_name
WHERE id IN
(SELECT id
FROM
(SELECT id,
ROW_NUMBER() OVER( PARTITION BY setup_id,
menu
ORDER BY id ) AS row_num
FROM table_name ) t
WHERE t.row_num > 1 );
link: https://www.postgresql.org/docs/current/queries-union.html
https://www.postgresql.org/docs/current/sql-select.html#SQL-DISTINCT
let's sat table name is a
select distinct on (setup_id,menu ) a.* from a;
Key point: The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.
Which means you can only order by setup_id,menu in this distinct on query scope.
Want the opposite:
EXCEPT returns all rows that are in the result of query1 but not in the result of query2. (This is sometimes called the difference between two queries.) Again, duplicates are eliminated unless EXCEPT ALL is used.
SELECT * FROM a
EXCEPT
select distinct on (setup_id,menu ) a.* from a;
You can try something along these lines to delete all but the first row in case of duplicates (please note that this is not tested in any way!):
DELETE FROM your_table WHERE id IN (
SELECT unnest(duplicate_ids[2:]) FROM (
SELECT array_agg(id) AS duplicate_ids FROM your_table
GROUP BY SETUP_ID, MENU
HAVING COUNT(*) > 1
)
)
)
The above collects the ids of the duplicate rows (COUNT(*) > 1) in an array (array_agg), then takes all but the first element in that array ([2:]) and "explodes" the id values into rows (unnest).
The outer query just deletes every id that ends up in that result.
For mysql the similar question is already answered here Find and remove duplicate rows by two columns
Try if any of the approach helps in this matter.
I like the below one for MySql:
ALTER IGNORE TABLE your_table ADD UNIQUE (SETUP_ID, MENU);
DELETE t1
FROM table_name t1
join table_name t2 on
(t2.setup_id = t1.setup_id or t2.menu = t1.menu) and t2.id < t1.id
There are many ways to find and delete all duplicate row(s) based on conditions. But I like inner join method, which works very fast even in a large amount of Data. Please check follows :
DELETE T1 FROM <TableName> T1
INNER JOIN <TableName> T2
WHERE
T1.id > T2.id AND
T1.<ColumnName1> = T2.<ColumnName1> AND T1.<ColumnName2> = T2.<ColumnName2>;
In your case you can write as follows :
DELETE T1 FROM <TableName> T1
INNER JOIN <TableName> T2
WHERE
T1.id > T2.id AND
T1.setup_id = T2. setup_id;
Let me know if you face any issue or need more help.

Comparing two tables and get the values that dont match

I have two tables with articles.
table 1 article and table 2 articlefm
both tables have one field with artnr.
'table 1' has 2192 artnr and 'table 2' has 2195 artnr.
I want in my query to find out whats the artnr of the 3 articles that is not matched.
If 'table 2' has more articles then 'table 1' then I need a list with those artnr.
How can I make this?
You can do this using a FULL JOIN:
SELECT COALESCE(t1.Artnr, t2.Artnr) AS Artnr,
CASE WHEN t1.Artnr IS NULL THEN 'Table1' ELSE 'Table2' END AS MissingFrom
FROM Table1 AS t1
FULL JOIN Table2 AS t2
ON t1.Artnr = t2.Artnr
WHERE t1.Artnr IS NULL
OR t2.Artnr IS NULL;
Note, that just because there is a difference in the count of 3, it does not necessarily mean that there are only 3 records in one table missing from the other. Imagine the following:
Table1 Table2
------ -------
1 2
2 4
3 6
4
The difference in count is 1, but there are actually 2 records present in table1 that aren't in table2, and 1 in table2 that isn't in table1. Using the above full join method you would get a result like:
Artnr | MissingFrom
------+-------------
1 | Table1
3 | Table1
6 | Table2
In most databases you can use except (SQL standard) or minus (Oracle specific):
select artnr
from articlefm -- table 2
except
select artnr
from article -- table 1
Else you could try a not in:
select atrnr
from articlefm -- table 2
where atrnr not in
( select artnr
from article -- table 1
)
This will give you the article numbers that exist in 2, but not in 1.

SQL Server : Nested Select Query

I have a SQL query returning results based on a where clause.
I would like to include some more results, from the same table, dependent on what is found in the first select.
My select returns rows with ID's that meet the where criteria. It does happen that the table has more rows with this ID, but that does not meet the initial where criteria. Rather than re querying the DB with a separate call, I would like to use one select statement to also get these extra rows with the same ID. ID is not the index/ID. Its a naming convention I am using here.
Pseudo: (two steps)
1: select * from table where condition=xxx
2: for each row returned, (select * from table where id=row.id)
I want to do:
select
id as thisID, field1, field2,
(select id, field1, field2 from table where id = thisID)
from
table
where
condition=xxx
I have multiple joins in my real query, and just cant get the above to work. I unfortunately can not supply the real query, but I get an error of:
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS. Invalid column name 'thisID'
My query works fine with the multiple joins, without the above. I am trying to retrieve these extra records as part of the current working query.
Example:
TABLE
select * from table where col3 = 'green'
id, col1, col2, col3
123 | blue | red | green
-------------------------
567 | blue | red | green
-------------------------
123 | blue | red | blue
-------------------------
890 | blue | red | green
-------------------------
I want to return all 4 rows, because although row 3 fails the where condition, it has the same col1 value as row 1 (123), and I need to include it, as it is part of a "set" that I need to locate / import, called / referenced by id=123.
What I am doing manually now, is getting row one, and then running another query based on row 1's ID, to get row 3 as well.
You can use Where IN
select id as thisID, field1, field2 from table
where id in
(select id from table where condition=xxx)
Try this
Let say you table is below and called #Temp
Id Col1 Col2 Col3
123 blue red green
567 blue red green
123 blue red blue
890 blue red green
Will get the id to a temp table
Create Table #T1(Id int)
Insert Into #T1
Select Id
From #Temp
Where Col3='green'
Then
Select distinct *
From #Temp
Where Id in (select Id from #T1) Or Col3='Green'
Which result all the rows from main table
Update
If you want to use the way you currently using, try something like below
select
id as thisID, field1, field2,
(select top 1 id from table where id = t.id) as Id,
(select top 1 field1 from table where id = t.id) as field1,
(select top 1 field2 from table where id = t.id) as field2,
from
table t
where
condition=xxx

Display duplicate value of two columns in different rows

I have a table in which there can be two newspaper publishing dates for a particular value which is inserted in a single column only under NewsPaperDate. All the remaining values get duplicated. Now I have to write a query in which the two NewsPaperDate values should be shown in a single row under two columns, NewsPaperDate1 and NewsPaperDate2, with the remaining values. Can anyone help in this? The DataBase is Sql Server
The Table structure is
You need to join the table to itself. There are different ways of doing this but based on your screenshot you could do:
select
a.yonja_no,
a.newspaper_date as newspaperdate1,
b.newspaper_date as newspaperdate2
from newspapertable a, newspapertable b
where a.yonja_no = b.yonja_no
and a.newspapere_s > b.newspapere_s
;
(Edited: see comments)
check fiddle link for query execution with sample data
create table tab1(newspaperDate number,b number,c number);
INSERT INTO tab1 VALUES(1,2,3);
INSERT INTO tab1 VALUES(2,2,3);
INSERT INTO tab1 VALUES(3,3,4);
SELECT t1.newspaperDate AS date1,t2.newspaperDate AS date2 , t1.b AS b1,t1.c AS c1 FROM tab1 t1 , tab1 t2
WHERE t1.newspaperDate < t2.newspaperDate AND t1.b=t2.b ;
OUTPUT
| DATE1 | DATE2 | B1 | C1 |
---------------------------
| 1 | 2 | 2 | 3 |
Joining a table to itself is the best approach for your query. Read
this
http://www.thunderstone.com/site/texisman/joining_a_table_to_itself.html

Update single data row from table to another

How can I update a complete data row by using a data row from another table.
Example:
Table A
ID | NAME | ... |
----------------------------
1 | Test | ... |
2 | Test2 | ... |
Table B
ID | NAME | ... |
----------------------------
1 | Test97 | ... |
So I want to copy the content of a single row of Table B to Table A and override existing values. I do not want to name all columns. The contents of table A and B are redundant.
Summarize:
I want an equivalent to the following INSERT Statement as an UPDATE Statement:
INSERT INTO destTable
VALUES (SELECT * FROM TABLE2)
FROM srcTable
Any hint, even telling me that its not possible, is very appricated.
you can update a set of columns (you still have to list the columns once):
SQL> UPDATE table_a
2 SET (ID, NAME, etc)
3 = (SELECT * FROM table_b WHERE table_b.id = table_a.id)
4 WHERE table_a.id IN (SELECT ID FROM table_b);
1 row updated
Like so:
UPDATE suppliers
SET supplier_name = ( SELECT customers.name
FROM customers
WHERE customers.customer_id = suppliers.supplier_id)
WHERE EXISTS
( SELECT customers.name
FROM customers
WHERE customers.customer_id = suppliers.supplier_id);
You want to use an Oracle MERGE statement. With this statement, it inserts if a match doesn't exist and updates if it already exists.
Here is a site with an example.
MERGE INTO bonuses b
USING (
SELECT employee_id, salary, dept_no
FROM employee
WHERE dept_no =20) e
ON (b.employee_id = e.employee_id)
WHEN MATCHED THEN
UPDATE SET b.bonus = e.salary * 0.1
DELETE WHERE (e.salary < 40000)
WHEN NOT MATCHED THEN
INSERT (b.employee_id, b.bonus)
VALUES (e.employee_id, e.salary * 0.05)
WHERE (e.salary > 40000);