SQL Server - Exclude Records from other tables - sql

I used the search function which brought me to the following solution.
Starting Point is the following: I have one table A which stores all data.
From that table I select a certain amount of records and store it in table B.
In a new statement I want to select new records from table A that do not appear in table B and store them in table c. I tried to solve this with a AND ... NOT IN statement.
But I still receive records in table C that are in table B.
Important: I can only work with select statements, each statement needs to start with select as well.
Does anybody have an idea where the problem in the following statement could be:
Select *
From
(Select TOP 10000 *
FROM [table_A]
WHERE Email like '%#domain_A%'
AND Id NOT IN (SELECT Id
FROM [table_B]))
Union
(Select TOP 7500 *
FROM table_A]
WHERE Email like '%#domain_B%'
AND Id NOT IN (SELECT Id
FROM [table_B]))
Union
(SELECT TOP 5000 *
FROM [table_A]
WHERE Email like '%#domain_C%'
AND Id NOT IN (SELECT Id
FROM [table_B]))

Try NOT EXISTS instead of NOT IN
SELECT
*
FROM TableA A
WHERE NOT EXISTS
(
SELECT 1 FROM TableB WHERE Id = A.Id
)

So Basically the idea here is to select everything from table A that doesnt exists in table B and Insert all that into Table C?
INSERT INTO Table_C
SELECT a.colum1, a.column2,......
FROM [table_A]
LEFT JOIN [table_B] ON a.id = b.ID
WHERE a.Email like '%#domain_A%' AND b.id IS NULL

Thank you guys all for your feedback, from which I learned a lot.
I was able to fix the statement with your help. Above is the statement which is working now with the desired results:
Select Id
From
(Select TOP 10000 * FROM Table_A
WHERE Email like '%#domain_a%'
AND Id NOT IN (SELECT Id
FROM Table_B)
order by No desc) t1
Union
Select Id
From
(Select TOP 7500 * FROM Table_A
WHERE Email like '%#domain_b%'
AND Id NOT IN (SELECT Id
FROM Table_B)
order by No desc) t2
Union
Select Id
From
(SELECT TOP 5000 * FROM Table_A
WHERE Email like '%#domain_c%'
AND Id NOT IN (SELECT Id
FROM Table_B)
order by No desc) t3

Related

Returning the first result from another table if statement is true

So I'm struggling to find the logic for the next problem:
I got 2 tables
TABLE A has the following column
Postalcode
1111
2222
3333
4444
TABLE B has the following column
Postalcode
1111AA
1111BB
1111CA
2222AA etc
What I would like to have is that if the Postalcodes first 4 numbers are found from Table A in table B, then I would like to have the first result of that postalcode from Table B (4digits+2letters).
e.g. if the postalcode in A is 1111 and substring(postalcode, 1, 4) of Table B is also 1111, then return the first result of that postalcode from Table B --> 1111AA
I can't seem to find the answer for this and I'm struggling for a while now.
Hope you guys have the solution for me.
If for each record in table A you want to match at most one record from Table B, an OUTER APPLY (SELECT TOP 1 ...) should do the trick.
Try:
select a.PostalCode, b1.Postalcode
from table_a a
outer apply (
select top 1 *
from table_b b
where b.Postalcode LIKE a.Postalcode + '%'
order by b.id
) b1
order by a.PostalCode;
If you wish to omit results that have no matching table_b record, change the OUTER APPLY to a CROSS APPLY. A OUTER APPLY is like a LEFT JOIN while a CROSS APPLY is like an INNER JOIN.
See this db<>fiddle fr a demo.
(Credit Bernd Buffen for the data setup. Note that I changed PostalCode from INT to VARCHAR to simplify the match criteria.)
i have change the sample from #Ergest Basha with a virtual column and index
CREATE TABLE table_a (
Postalcode INT ,
KEY idx_sPortalcode (Postalcode)
);
INSERT INTO table_a VALUES
(1111),
(2222),
(3333),
(4444);
CREATE TABLE table_b (
id INT,
Postalcode VARCHAR(25),
sPostalcode INT AS ( 0 + Postalcode) STORED,
KEY idx_sPortalcode (sPostalcode)
);
INSERT INTO table_b (id,Postalcode) VALUES
(1,'1111AA'),
(2,'1111BB'),
(3,'1111CA'),
(4,'2222AA');
SELECT * FROM table_b;
-- EXPLAIN
SELECT b.Postalcode
FROM table_a a
INNER JOIN table_b b ON b.sPostalcode=a.Postalcode
WHERE a.Postalcode=1111
ORDER BY b.id ASC LIMIT 1;
Something like this: MySQL
select b.Postalcode
from table_a a
inner join table_b b on LEFT(b.Postalcode,4)=a.Postalcode
where a.Postalcode=1111
order by b.id asc limit 1;
Check the demo
SQL Server
select top(1) b.Postalcode
from table_a a
inner join table_b b on LEFT(b.Postalcode,4)=a.Postalcode
where a.Postalcode=1111
order by b.id ;
Demo
Edit based on comments*
I think you need something like below, but check #Bernd Buffen suggestion for performance:
WITH cte AS (
SELECT Postalcode, ROW_NUMBER() OVER ( PARTITION BY LEFT(Postalcode,4) ORDER BY id asc ) row_num
FROM table_b
)
SELECT cte.Postalcode
FROM table_a a
INNER JOIN cte on LEFT(cte.Postalcode,4)=a.Postalcode
WHERE row_num = 1 ;
Demo

Value present in more than one table

I have 3 tables. All of them have a column - id. I want to find if there is any value that is common across the tables. Assuming that the tables are named a.b and c, if id value 3 is present is a and b, there is a problem. The query can/should exit at the first such occurrence. There is no need to probe further. What I have now is something like
( select id from a intersect select id from b )
union
( select id from b intersect select id from c )
union
( select id from a intersect select id from c )
Obviously, this is not very efficient. Database is PostgreSQL, version 9.0
id is not unique in the individual tables. It is OK to have duplicates in the same table. But if a value is present in just 2 of the 3 tables, that also needs to be flagged and there is no need to check for existence in he third table, or check if there are more such values. One value, present in more than one table, and I can stop.
Although id is not unique within any given table, it should be unique across the tables; a union of distinct id should be unique, so:
select id from (
select distinct id from a
union all
select distinct id from b
union all
select distinct id from c) x
group by id
having count(*) > 1
Note the use of union all, which preserves duplicates (plain union removes duplicates).
I would suggest a simple join:
select a.id
from a join
b
on a.id = b.id join
c
on a.id = c.id
limit 1;
If you have a query that uses union or group by (or order by, but that is not relevant here), then you need to process all the data before returning a single row. A join can start returning rows as soon as the first values are found.
An alternative, but similar method is:
select a.id
from a
where exists (select 1 from b where a.id = b.id) and
exists (select 1 from c where a.id = c.id);
If a is the smallest table and id is indexes in b and c, then this could be quite fast.
Try this
select id from
(
select distinct id, 1 as t from a
union all
select distinct id, 2 as t from b
union all
select distinct id, 3 as t from c
) as t
group by id having count(t)=3
It is OK to have duplicates in the same table.
The query can/should exit at the first such occurrence.
SELECT 'OMG!' AS danger_bill_robinson
WHERE EXISTS (SELECT 1
FROM a,b,c -- maybe there is a place for old-style joins ...
WHERE a.id = b.id
OR a.id = c.id
OR c.id = b.id
);
Update: it appears the optimiser does not like carthesian joins with 3 OR conditions. The below query is a bit faster:
SELECT 'WTF!' AS danger_bill_robinson
WHERE exists (select 1 from a JOIN b USING (id))
OR exists (select 1 from a JOIN c USING (id))
OR exists (select 1 from c JOIN b USING (id))
;

Find Modified/New/Deleted Records Between Two Tables

I want to find new, modified and deleted records in one table (tableA) by comparing it to another table (tableB). Both tables are of the same schema and has a unique ID field.
In my situation, tableA is originally the same as tableB but it has been edited by some external organisation and once they have done their edits, they send the table back via ZIP file, and we re-populate (truncate and insert) that data to tableA. So I want to find out what records have changed in tableA. I am using SQL Server 2012.
I can get new and modified records with the "except" keyword:
select * from tableA
except
select * form tableB
(Let's call the above results ResultsA)
I can also get deleted and modified records:
select * from tableB
except
select * form tableA
(Let's call the above results ResultsB)
The problem is, both ResultsA and ResultsB have the same records that have been modified/edited. So the modified/edited records are doubled up. I can use inner join or intersect on ResultsA and ResultsB to get just the modified records (call this results ResultsC). But then I will need to use join/except again between ResultsA and ResultsC to get just the new records, and join/except again between ResultsB and ResultsC to get just the deleted records... I tried this and this but they are not working for me.
Obviously this is not good. Are there any elegant and simpler ways to find out the records that have been deleted, modified or added in tableA compared to tableB?
How about:
-- DELETED
SELECT B.*, 'DELETED' AS 'CHANGE_TYPE'
FROM TableB B
LEFT JOIN TableA A ON B.PK_ID = A.PK_ID
WHERE A.PK_ID IS NULL
UNION
-- NEW
SELECT A.*, 'NEW' AS 'CHANGE_TYPE'
FROM TableA A
LEFT JOIN TableB B ON B.PK_ID = A.PK_ID
WHERE B.PK_ID IS NULL
UNION
-- MODIFIED
SELECT B.*, 'MODIFIED' AS 'CHANGE_TYPE'
FROM (
SELECT * FROM TableA
EXCEPT
SELECT * FROM TableB
) S1
INNER JOIN TableB B ON S1.PK_ID = B.PK_ID;
Not exactly elegant, but it works.
Based on what i understood i came up with the following solution.
DECLARE #tableA TABLE (ID INT, Number INT)
DECLARE #tableB TABLE (ID INT, Number INT)
INSERT INTO #tableA VALUES
(1,10),
(2,20),
(3,30),
(4,40)
INSERT INTO #tableB VALUES
(1,11),
(2,20),
(4,40),
(5,50)
SELECT *,'Modified or deleted' as 'Status' FROM
(
select * from #tableA
except
select * from #tableB
)a WHERE ID NOT IN
(
select ID from #tableB
except
select ID from #tableA
)
UNION
SELECT *,'New' as 'Status' FROM
(
select * from #tableB
except
select * from #tableA
)b WHERE ID NOT IN
(
SELECT ID FROM
(
select * from #tableA
except
select * from #tableB
)a WHERE ID NOT IN
(
select ID from #tableB
except
select ID from #tableA
)
)
You can use the OUTPUT clause:
Returns information from, or expressions based on, each row affected by an INSERT, UPDATE, or DELETE statement. These results can be returned to the processing application for use in such things as confirmation messages, archiving, and other such application requirements. Alternatively, results can be inserted into a table or table variable.
See the the following, sorry I don't have a practical code for you. But note the SQL output clause can be used to return any value from ‘inserted’ and ‘deleted’ (New value and Old value) tables when doing an insert or update. follow this for more info
declare #DBOrderItem table
(
OrderItemGuid UniqueIdentifier default newid(),
Name VarChar(100)
);
declare #PayloadOrderItem table
(
OrderItemGuid UniqueIdentifier default newid(),
Name VarChar(100)
);
insert into #DBOrderItem (Name) values ('Phone');
insert into #DBOrderItem (Name) values ('Laptop');
insert into #PayloadOrderItem
select top 1 * from #DBOrderItem;
insert into #PayloadOrderItem (Name) values ('Tablet');
select doi.OrderItemGuid,
doi.Name,
case when poi.OrderItemGuid is null then 'Delete' else 'Update' end ActionType
from #DBOrderItem doi
left join #PayloadOrderItem poi on doi.OrderItemGuid = poi.OrderItemGuid
union
select poi.OrderItemGuid,
poi.Name,
'Add' ActionType
from #PayloadOrderItem poi
left join #DBOrderItem doi on doi.OrderItemGuid = poi.OrderItemGuid
where doi.OrderItemGuid is null;
Another solution that works quite efficiently is to use a where not exists an intersect between the two tables. Its very compact.
SELECT
IsNull(tableB.ID,tableA.ID) as 'ID',
IsNull(tableB.Number,tableA.Number) as 'Number',
'Action' = CASE
WHEN tableB.ID IS NULL THEN 'Deleted'
WHEN tableA.ID IS NULL THEN 'Created'
ELSE 'Updated'
END
FROM tableA
FULL OUTER JOIN tableB
ON tableB.ID = tableA.ID
WHERE
NOT EXISTS (SELECT tableB.* INTERSECT SELECT tableA.*)
This keeps the table scans down to a minimum, and provides detection of new, deleted and changed records.
I put all three from here into fiddle, and its surprising how differently they all compile.
http://sqlfiddle.com/#!6/b1a5a/5
This one works without primary key also a bit more elegant .(in my opinion!)
WITh A AS (SELECT 1,2,3 FROM DUAL
UNION ALL
SELECT 1,3,2 FROM DUAL
UNION ALL
SELECT 1,3,1 FROM DUAL),
B AS (SELECT 1,3,2 FROM DUAL
UNION ALL
SELECT 1,2,3 FROM DUAL
UNION ALL
SELECT 1,3,5 FROM DUAL
)
,
C AS
(SELECT * FROM A
MINUS
SELECT * FROM B
),
D AS( SELECT * FROM b
MINUS
SELECT * FROM A)
SELECT C.* ,'Deleted' FROM c
UNION ALL
SELECT D.* ,'Added' FROM D

SQL: how to find unused primary key

I've got a table with > 1'000'000 entries; this table is referenced from about 130 other tables. My problem is that a lot of those 1-mio-entries is old and unused.
What's the fastet way to find the entries not referenced by any of the other tables? I don't like to do a
select * from (
select * from table-a TA
minus
select * from table-a TA where TA.id in (
select "ID" from (
(select distinct FK-ID "ID" from table-b)
union all
(select distinct FK-ID "ID" from table-c)
...
Is there an easier, more general way?
Thank you all!
You could do this:
select * from table_a a
where not exists (select * from table_b where fk_id = a.id)
and not exists (select * from table_c where fk_id = a.id)
and not exists (select * from table_d where fk_id = a.id)
...
try :
select a.*
from table_a a
left join table_b b on a.id=b.fk_id
left join table_c c on a.id=c.fk_id
left join table_d d on a.id=d.fk_id
left join table_e e on a.id=e.fk_id
......
where b.fk_id is null
and c.fk_id is null
and d.fk_id is null
and e.fk_id is null
.....
you might also try:
select a.*
from table_a a
left join
(select b.fk_id from table_b b union
select c.fk_id from table_c c union
...) table_union on a.id=table_union.fk_id
where table_union.fk_id is null
This is more SQL oriented and it will not take forever like the above solution.
Not sure about efficiency but:
select * from table_a
where id not in (
select id from table_b
union
select id from table_c )
If your concern is allowing the database to continue normal operations while you do the house keeping you could split it into multiple stages:
insert into tblIds
select id from table_a
union
select id from table_b
as may times as you need and then:
delete * from table_a where id not in ( select id from tableIds )
Of course sometimes doing a lot of processing takes a lot of time.
I like #Patrick's answer above, but I would like to add to that.
Rather than building the 130-step query by hand, you could build these INSERT statements by scanning sysObjects, finding key relations and generating your INSERT statements.
That would not only save you time, but should also help you to know for sure whether you've covered all the tables - maybe there are 131, or only 129.
I'm inclined to Marcelo Cantos' answer (and have upvoted it), but here is an alternative in an attempt to circumvent the problem of not having indexes on the foreign keys...
WITH
ids_a AS
(
SELECT id FROM myTable
)
,
ids_b AS
(
SELECT id FROM ids_a WHERE NOT EXISTS (SELECT * FROM table_a WHERE fk_id = ids_a.id)
)
,
ids_c AS
(
SELECT id FROM ids_b WHERE NOT EXISTS (SELECT * FROM table_b WHERE fk_id = ids_b.id)
)
,
...
,
ids_z AS
(
SELECT id FROM ids_y WHERE NOT EXISTS (SELECT * FROM table_y WHERE fk_id = ids_y.id)
)
SELECT * FROM ids_z
All I'm trying to do is to suggest an order to Oracle to minimise its efforts. Unfortunately Oracle will compile this to comething very similar to Marcelo Cantos' answer and it may not performa any differently.

select a value where it doesn't exist in another table

I have two tables
Table A:
ID
1
2
3
4
Table B:
ID
1
2
3
I have two requests:
I want to select all rows in table A that table B doesn't have, which in this case is row 4.
I want to delete all rows that table B doesn't have.
I am using SQL Server 2000.
You could use NOT IN:
SELECT A.* FROM A WHERE ID NOT IN(SELECT ID FROM B)
However, meanwhile i prefer NOT EXISTS:
SELECT A.* FROM A WHERE NOT EXISTS(SELECT 1 FROM B WHERE B.ID=A.ID)
There are other options as well, this article explains all advantages and disadvantages very well:
Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?
For your first question there are at least three common methods to choose from:
NOT EXISTS
NOT IN
LEFT JOIN
The SQL looks like this:
SELECT * FROM TableA WHERE NOT EXISTS (
SELECT NULL
FROM TableB
WHERE TableB.ID = TableA.ID
)
SELECT * FROM TableA WHERE ID NOT IN (
SELECT ID FROM TableB
)
SELECT TableA.* FROM TableA
LEFT JOIN TableB
ON TableA.ID = TableB.ID
WHERE TableB.ID IS NULL
Depending on which database you are using, the performance of each can vary. For SQL Server (not nullable columns):
NOT EXISTS and NOT IN predicates are the best way to search for missing values, as long as both columns in question are NOT NULL.
select ID from A where ID not in (select ID from B);
or
select ID from A except select ID from B;
Your second question:
delete from A where ID not in (select ID from B);
SELECT ID
FROM A
WHERE NOT EXISTS( SELECT 1
FROM B
WHERE B.ID = A.ID
)
This would select 4 in your case
SELECT ID FROM TableA WHERE ID NOT IN (SELECT ID FROM TableB)
This would delete them
DELETE FROM TableA WHERE ID NOT IN (SELECT ID FROM TableB)
SELECT ID
FROM A
WHERE ID NOT IN (
SELECT ID
FROM B);
SELECT ID
FROM A a
WHERE NOT EXISTS (
SELECT 1
FROM B b
WHERE b.ID = a.ID)
SELECT a.ID
FROM A a
LEFT OUTER JOIN B b
ON a.ID = b.ID
WHERE b.ID IS NULL
DELETE
FROM A
WHERE ID NOT IN (
SELECT ID
FROM B)