Tricky SQL query to get all unmatched record - sql

I need a tricky SQL query that should lists to me all unmatched destinations. But there is parent-child relation in destination table. So if a matched destination has an unmatched child destination both parent and child should be listed. I tried to create a table to visualize it. I hope I would help to understand better way.
I tried to write below query bot it did not work
SELECT
*
FROM
Table_A AS ParentTable
WHERE
ID NOT IN (SELECT TableA_ID FROM Table_B WHERE TableA_ID IS NOT NULL)
-- this is to find UNmatched records
AND NOT EXISTS (
SELECT * FROM Table_A AS ChildTable
WHERE ChildTable.Parent = ParentTable.Code)
-- this is the part that I was not sure (:
Here is main destination table
TABLE A
ID Destination ParentID
1 France 0
2 Île-De-France 1
3 Ablis 2
4 Provence-Alpes-Cote D'azur 1
5 Aix-En-Provence 4
Here is the second table
TABLE B
ID Destination TableA_ID
100 France 1
101 Île-De-France 2
102 Ablis NULL
103 Provence-Alpes-Cote D'azur 4
104 Aix-En-Provence 5
In this situation I need to retrieve below table since "Ablis" not matched.
RESULTING TABLE
ID Destination ParentID
1 France 0
2 Île-De-France 1
3 Ablis 2

You can try a recursive cte like below.
See live demo
; with rcte as
(
SELECT
A.id, A.destination, A.parentid
FROM
TABLE_B b join TABLE_A a on a.destination=b.destination WHERE TableA_ID IS NULL
union all
select
T.id, T.destination, T.parentid
from
table_A T join rcte r
on r.parentid=t.id
)
select * from rcte order by parentid asc

I would use not exist with recursive CTE
with t as (
select Id, Destination, ParentID
from table_a a
where not exists (select 1 from table_b where a.id = TableA_ID)
union all
select t.Id, t.Destination, t.ParentID
from t c join table_a a1
on t.id = c.ParentID
)
select * from t
order by 3;

Related

How do I update a table that references duplicate records?

I have two SQL tables. One gets a reference value from another table which stores a list of Modules and their ID. But these descriptions are not unique. I am trying to remove the duplicates of Table A but I'm not sure how to update Table B to only reference the single values.
Example:
Table A: Table B:
-------------------------------- ------------------------------------
ID Description RefID ID Name
-------------------------------- ------------------------------------
1 Test 1 2 1 QuickReports
-------------------------------- ------------------------------------
2 Test 2 1 2 QuickReports
-------------------------------- ------------------------------------
I want the results to be the following:
Table A: Table B:
-------------------------------- ------------------------------------
ID Description RefID ID Name
-------------------------------- ------------------------------------
1 Test 1 1 1 QuickReports
-------------------------------- ------------------------------------
2 Test 2 1
--------------------------------
I managed to delete duplicates from table B using the below code but I haven't been able to update the records in Table A. Each table have over 500 records each.
WITH cte AS(
SELECT
Name,
ROW_NUMBER() OVER (
PARTITION BY
Name
ORDER BY
Name
)row_num
FROM ReportmodulesTest
)
DELETE FROM cte
WHERE row_num > 1;
You would need to update table A first, before deleting from table B.
You tagged your question MySQL but that database would not support the delete statement that you are showing. I suspect that you are running SQL Server, so here is how to do it in that database:
update a
set refid = b.minid
from tablea
inner join (select name, id, min(id) over(partition by name) minid from tableb) b
on b.id = a.id and b.minid <> a.id
In MySQL, you would phrase the same query as:
update tablea a
from tablea
inner join (select name, id, min(id) over(partition by name) minid from tableb) b on b.id = a.id
set a.refid = b.minid
where b.minid <> a.id
You can update the first table using :
update a join
(select b.*,
min(id) over (partition by name) as min_id
from b
) b
on a.refid = b.id
set a.refid = b.min_id
where a.refid <> b.min_id;
Then, you can delete rows in the second table with a similar logic :
delete b
from b join
(select b.*,
min(id) over (partition by name) as min_id
from b
) bb
on bb.id = b.id
where b.id <> bb.min_id;
I found a solution that has made this process easier. I first use Row_Number to find duplicates in Table A and SELECT INTO a temporary table.
SELECT
a.Id
, a.Name
, ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Id DESC) RN
INTO
#TestTable
FROM
TableA a WITH(NOLOCK)
I then JOIN Table A and Table B to see where the ID's match and identify which ID I need to keep and which ID's I need to delete:
SELECT
b.Id
, b.Name
, b.RefId
, ToKeep.Id KeepId
, ToDelete.Id DeleteId
FROM
#TestTable ToDelete
JOIN TableB b WITH(NOLOCK)
ON b.RefId = ToDelete.Id
JOIN #TestTable ToKeep
ON ToDelete.Name = ToKeep.Name
AND ToKeep.RN = 1
WHERE ToDelete.RN > 1
Then using a similar statement, I just update the records:
UPDATE b
SET
b.RefId = ToKeep.Id,
FROM #TestTable ToDelete
JOIN TableB b WITH(NOLOCK)
ON b.RefId = ToDelete.Id
JOIN #TestTable ToKeep
ON ToDelete.Name = ToKeep.Name
AND ToKeep.RN = 1
WHERE
ToDelete.RN > 1
Lastly, I can now delete the duplicate records:
DELETE a
FROM #TestTable b
INNER JOIN TableA a
ON b.Id = a.Id
WHERE
b.RN > 1
After this, you can use the same first SELECT statement to ensure that all duplicates are deleted. Just remove the SELECT INTO statement.
Thanks to an anonymous colleague of mine for this solution and hope this helps someone out there.

SQL - Get one of each in a join

I've got 2 tables. Table A and B.
Table A has an id and some data which isn't important for the question.
Table B has an id and an A_id. The last one is used to combine the 2 of them. There can be either multiple rows with the same A_id, only 1 or none at all.
I need a query which will do the following:
Get only 1 of each row from table A
Join table B into it
No duplicates from table A
I know it might sound complicated, so here is an example
Table A
id other info
1 ...
2 ...
3 ...
4 ...
Table B
id A_id
1 2
2 3
3 3
4 3
Output
A.id other info B.id A_id
1 ... NULL NULL
2 ... 1 2
3 ... 2 3
4 ... NULL NULL
So, even though there are multiple rows in table B of which A_id is 3, I only need the one of them. And even though there is no row in table B of which the A_id is 1 or 4, I still need both of them to show up.
This is as clear as I can possibly describe my question, please give feedback on how I can improve this question.
I think the simplest way is to use a correlated subquery:
select a.*,
(select max(b.id) from b where b.a_id = a.id)
from a;
I can't test it right now but it seems that you want something like this
SELECT * FROM A
LEFT JOIN B ON A.ID = B.A_ID
UPDATE:
WITH tmp AS (
SELECT MIN(ID) ID FROM B GROUP BY A_id
)
SELECT A.*, B.* FROM B
INNER JOIN tmp ON B.Id = tmp.ID
RIGHT JOIN A ON A.Id = B.A_Id
Assuming your database supports ANSI SQL and when there are multiple rows in B you want the last one based on ID:
with lastB (B_Id) as (
select max(id) from tableB group by A_id
),
BRows as (
select * from tableB
where Id in (select B_Id from lastB)
)
select a.field1, a.field2, a.fieldN,
b.field1, b.field2, b.fieldN
from tableA a
left join BRows b on a.Id = b.A_Id
EDIT: Oops. You edited your question and wnat the first one. Then simply make max(), min().
with lastB (B_Id) as (
select min(id) from tableB group by A_id
),
BRows as (
select * from tableB
where Id in (select B_Id from lastB)
)
select a.field1, a.field2, a.fieldN,
b.field1, b.field2, b.fieldN
from tableA a
left join BRows b on a.Id = b.A_Id
EDIT: Here is the MS SQL sample I promised for:
DECLARE #tableA TABLE ( id INT, other VARCHAR(10) );
DECLARE #tableB TABLE
(
id INT ,
A_Id INT ,
other VARCHAR(10)
);
INSERT #tableA
( id, other )
VALUES ( 1, 'v1' ),
( 2, 'v2' ),
( 3, 'v3' ),
( 4, 'v4' );
INSERT #tableB
( id, A_Id, other )
VALUES ( 1, 2, 'v21' ),
( 2, 3, 'v31' ),
( 3, 3, 'v32' ),
( 4, 3, 'v33' );
WITH fromB ( B_Id )
AS ( SELECT MIN(id)
FROM #tableB
GROUP BY A_Id
),
BRows
AS ( SELECT *
FROM #tableB
WHERE id IN ( SELECT B_Id
FROM fromB )
)
SELECT a.id AS A_Id ,
a.other AS A_Other ,
b.id AS B_Id ,
b.other AS B_Other
FROM #tableA a
LEFT JOIN BRows b ON a.id = b.A_Id;
Result:
A_Id A_Other B_Id B_Other
1 v1 NULL NULL
2 v2 1 v21
3 v3 2 v31
4 v4 NULL NULL
Big thank you for Gordon Linoff for pushing me in the right direction. The initial answer was:
select A.*,
(select count(B.A_id) from B where B.A_id = A.id)
from A
I'm sorry for not telling this in my question, but all I actually needed was to get every row from A and at the same time check if there was any row in table B which had the same value in A_id as the A.id had.
This query counts all rows which have the same value of A_id as A.id.
To be clear, the output will give:
A.id other info count
1 ... 0
2 ... 1
3 ... 3
4 ... 0

select sql query to merge results

I have a table old_data and a table new_data. I want to write a select statement that gives me
Rows in old_data stay there
New rows in new_data get added to old_data
unique key is id so rows with id in new_data should update existing ones in old_data
I need to write a select statement that would give me old_data updated with new data and new data added to it.
Example:
Table a:
id count
1 2
2 19
3 4
Table b:
id count
2 22
5 7
I need a SELECT statement that gives me
id count
1 2
2 22
3 4
5 7
Based on your desired results:
SELECT
*
FROM
[TableB] AS B
UNION ALL
SELECT
*
FROM
[TableA] AS A
WHERE
A.id NOT IN (SELECT id FROM [TableB])
I think this would work pretty neatly with COALESCE:
SELECT a.id, COALESCE(b.count, a.count)
FROM a
FULL OUTER JOIN b
ON a.id = b.id
Note - if your RDBMS does not contain COALESCE, you can write out the function using CASE as follows:
SELECT a.id,
CASE WHEN b.count IS NULL THEN a.count
ELSE b.count END AS count
FROM ...
You can write a FULL OUTER JOIN as follows:
SELECT *
FROM a
LEFT JOIN b
ON a.id = b.id
UNION ALL
SELECT *
FROM b
LEFT a
ON b.id = a.id
You have to use UPSERT to update old data and add new data in Old_data table and select all rows from Old_data. Check following and let me know what you think about this query
UPDATE [old_data]
SET [count] = B.[count]
FROM [old_data] AS A
INNER JOIN [new_Data] AS B
ON A.[id] = B.[id]
INSERT INTO [old_data]
([id]
,[count])
SELECT A.[id]
,A.[count]
FROM [new_Data] AS A
LEFT JOIN [old_data] AS B
ON A.[id] = B.[id]
WHERE B.[id] IS NULL
SELECT *
FROM [old_data]

Value present in more than one table

I have 3 tables. All of them have a column - id. I want to find if there is any value that is common across the tables. Assuming that the tables are named a.b and c, if id value 3 is present is a and b, there is a problem. The query can/should exit at the first such occurrence. There is no need to probe further. What I have now is something like
( select id from a intersect select id from b )
union
( select id from b intersect select id from c )
union
( select id from a intersect select id from c )
Obviously, this is not very efficient. Database is PostgreSQL, version 9.0
id is not unique in the individual tables. It is OK to have duplicates in the same table. But if a value is present in just 2 of the 3 tables, that also needs to be flagged and there is no need to check for existence in he third table, or check if there are more such values. One value, present in more than one table, and I can stop.
Although id is not unique within any given table, it should be unique across the tables; a union of distinct id should be unique, so:
select id from (
select distinct id from a
union all
select distinct id from b
union all
select distinct id from c) x
group by id
having count(*) > 1
Note the use of union all, which preserves duplicates (plain union removes duplicates).
I would suggest a simple join:
select a.id
from a join
b
on a.id = b.id join
c
on a.id = c.id
limit 1;
If you have a query that uses union or group by (or order by, but that is not relevant here), then you need to process all the data before returning a single row. A join can start returning rows as soon as the first values are found.
An alternative, but similar method is:
select a.id
from a
where exists (select 1 from b where a.id = b.id) and
exists (select 1 from c where a.id = c.id);
If a is the smallest table and id is indexes in b and c, then this could be quite fast.
Try this
select id from
(
select distinct id, 1 as t from a
union all
select distinct id, 2 as t from b
union all
select distinct id, 3 as t from c
) as t
group by id having count(t)=3
It is OK to have duplicates in the same table.
The query can/should exit at the first such occurrence.
SELECT 'OMG!' AS danger_bill_robinson
WHERE EXISTS (SELECT 1
FROM a,b,c -- maybe there is a place for old-style joins ...
WHERE a.id = b.id
OR a.id = c.id
OR c.id = b.id
);
Update: it appears the optimiser does not like carthesian joins with 3 OR conditions. The below query is a bit faster:
SELECT 'WTF!' AS danger_bill_robinson
WHERE exists (select 1 from a JOIN b USING (id))
OR exists (select 1 from a JOIN c USING (id))
OR exists (select 1 from c JOIN b USING (id))
;

Problem combining result of two different queries into one

I have two tables (TableA and TableB).
create table TableA
(A int null)
create table TableB
(B int null)
insert into TableA
(A) values (1)
insert into TableB
(B) values (2)
I cant join them together but still I would like to show the result from them as one row.
Now I can make select like this:
select
(select A from tableA) as A
, B from TableB
Result:
A B
1 2
But if I now delete from tableB:
delete tableB
Now when I run the same query as before:
select
(select A from tableA) as A
, B from TableB
I see this:
A B
But I was expecting seeing value from tableA
like this:
Expected Result:
A B
1
Why is this happening and how can I still see the value from TableA although selectB is returning 0 rows?
I am using MS SQL Server 2005.
Use a LEFT JOIN (although it's more of a cross join in your case).
If your db supports it:
SELECT a.a, b.b
FROM a
CROSS JOIN b
If not, do something like:
SELECT a.a, b.b
FROM a
LEFT JOIN b ON ( 1=1 )
However, once you have more rows in a or b, this will return the cartesian product:
1 1
1 2
2 1
2 2
This will actually give you what you're looking for, but if you only have one row per table:
select
(select A from tableA) as A
, (select B from TableB) as B
give this a try:
DECLARE #TableA table (A int null)
DECLARE #TableB table (B int null)
insert into #TableA (A) values (1)
insert into #TableB (B) values (2)
--this assumes that you don't have a Numbers table, and generates one on the fly with up to 500 rows, you can increase or decrease as necessary, or just join in your Numbers table instead
;WITH Digits AS
(
SELECT 0 AS nbr
UNION SELECT 1 UNION SELECT 2 UNION SELECT 3
UNION SELECT 4 UNION SELECT 5 UNION SELECT 6
UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
)
, AllNumbers AS
(
SELECT u3.nbr * 100 + u2.nbr * 10 + u1.nbr + 1 AS Number
FROM Digits u1, Digits u2, Digits u3
WHERE u3.nbr * 100 + u2.nbr * 10 + u1.nbr + 1 <= 500
)
, AllRowsA AS
(
SELECT
A, ROW_NUMBER() OVER (ORDER BY A) AS RowNumber
FROM #TableA
)
, AllRowsB AS
(
SELECT
B, ROW_NUMBER() OVER (ORDER BY B) AS RowNumber
FROM #TableB
)
SELECT
a.A,b.B
FROM AllNumbers n
LEFT OUTER JOIN AllRowsA a on n.Number=a.RowNumber
LEFT OUTER JOIN AllRowsB b on n.Number=b.RowNumber
WHERE a.A IS NOT NULL OR b.B IS NOT NULL
OUTPUT:
A B
----------- -----------
1 2
(1 row(s) affected)
if you DELETE #TableB, the output is:
A B
----------- -----------
1 NULL
(1 row(s) affected)
try this:
select a, (select b from b) from a
union
select b, (select a from a) from b
should retrieve you all the existing data.
you can filter it more by surrounding it with another select