Find Groups that don't contain all records - sql

I feel like I should be able to get this and I'm just having a brain fart. I've simplified the problem to the following example:
DECLARE #A TABLE (ID int);
DECLARE #B TABLE (GroupID char(1), ID int);
INSERT #A VALUES (1);
INSERT #A VALUES (2);
INSERT #A VALUES (3);
INSERT #B VALUES ('X', 1);
INSERT #B VALUES ('X', 2);
INSERT #B VALUES ('X', 3);
INSERT #B VALUES ('Y', 1);
INSERT #B VALUES ('Y', 2);
INSERT #B VALUES ('Z', 1);
INSERT #B VALUES ('Z', 2);
INSERT #B VALUES ('Z', 3);
INSERT #B VALUES ('Z', 4);
So table A contains a set of some records. Table B contains multiple copies of the set contained in A with Group IDs. But some of those groups may be missing one or more records of the set. I want to find the groups that are missing records. So in the above example, my results should be:
GroupID
-------
Y
But for some reason I just can't wrap my head around this, today. Any help would be appreciated.

Awesome use-case for relational division! (Here's a must-read blog post about it)
SELECT DISTINCT b1.GroupID
FROM #B b1
WHERE EXISTS (
SELECT 1
FROM #A a
WHERE NOT EXISTS (
SELECT 1
FROM #B b2
WHERE b1.GroupID = b2.GroupID
AND b2.ID = a.ID
)
);
How to read this?
I want all distinct GroupIDs in #B for which there is a record in #A for which there isn't a record in #B with the same #A.ID
In fact, this is the "remainder" of the relational division.

try this
SELECT GroupID ,COUNT(GroupID )
FROM #a INNER JOIN #b
ON #a.id=#b.id
GROUP BY GroupID
HAVING COUNT(GroupID )<(SELECT count(*) FROM #a)

This will give you all the combinations that are missing.
select FullList.*
from (select distinct a.ID,
b.GroupId
from #A a
cross join #B b) FullList
left join #B b
on FullList.ID = b.ID
and FullList.GroupID = b.GroupID
where b.ID is null
The answer to your question would just be the same but with the first line:
select distinct FullList.GroupID

This will give you all the combinations that are missing.
select FullList.*
from (select distinct a.ID,
b.GroupId
from #A a
cross join (select distinct db.GroupId from #B db) b
) as FullList
left join #B b
on FullList.ID = b.ID
and FullList.GroupID = b.GroupID
where b.ID is null
The answer to your question would just be the same but with the first line:
select distinct FullList.GroupID

Related

set value using a conditional of a subquery

Sorry if I am not explaining my issue the best, but basically I have two tables.
Table A has a reference column to table B. On table B there is column X where for each referenced row, there is an unreferenced row with that same value of column X (table B has double the rows of table A). I want to update the reference on table A to be the row of table B that is not currently referenced of the two rows that have the same value on column X.
In pseudo code...
update tableA
set refCol = (select tableB.refCol
from tableB
where colX = (select colX
from tableB
where tableB.refCol = tableA.refCol)
and tableB.refCol != tableA.refCol)
The innermost query returns two rows, the outer query returns one row
sample tables:
Table A
refCol
1
3
Table B
refCol
colX
1
hello
2
hello
3
hi
4
hi
expected output:
Table A
refCol
2
4
Any help would be much appreciated.
Refer it below working example
create table #tableA(
id int)
create table #tableB(
id int,
name varchar(10)
)
insert into #tableA values(1)
insert into #tableA values(3)
insert into #tableA values(5)
insert into #tableA values(6)
insert into #tableA values(7)
insert into #tableA values(8)
insert into #tableB values (1,'A')
insert into #tableB values (2,'A')
insert into #tableB values (3,'C')
insert into #tableB values (4,'C')
select * from #tableA
select * from #tableB
update aa set aa.id=ab.id from #tableA aa inner join (
select b.id,b.name,a.id as ta from (
select B.* from #tableB b left join #tableA a on a.id=b.id where a.id is null)b
inner join (
select b.* from #tableA a inner join #tableB b on a.id=b.id)a on a.name=b.name)ab on aa.id=ab.ta

SQL Server 2012 How to determine if all the values in table A are in table B

I have two tables (A & B) that share a common value (Color), both tables can have any number of rows. I am trying to find a way to determine if all the distinct 'Colors' in table A exist in table B:
I have tried using EXCEPT, which almost works, unfortunately it returns false when table B has more Colors than table A, which is irrelevant all i care about is if every distinct Color from table A is in table B. I have been fiddling with both EXISTS and IN but can't see to get the correct results
declare #TableA table (Color varchar(10))
declare #TableB table (Color varchar(10))
insert into #TableA(Color) values ('red')
insert into #TableA(Color) values ('blue')
insert into #TableA(Color) values ('green')
--insert into #TableA(Color) values ('orange')
insert into #TableB(Color) values ('red')
insert into #TableB(Color) values ('blue')
insert into #TableB(Color) values ('green')
insert into #TableB(Color) values ('yellow')
insert into #TableB(Color) values ('purple')
IF NOT EXISTS (
SELECT Color FROM #TableA
EXCEPT
SELECT Color FROM #TableB
)
SELECT 'true'
ELSE SELECT 'false'
I would like the above code to yield 'true'.
IF table A Colors > table B Colors THEN false
IF table A Colors <= table B Colors THEN true.
There are many ways to accomplish this. You could use a left join for this pretty easily.
if exists
(
SELECT a.Color
FROM #TableA a
left join #TableB b on b.Color = a.Color
where b.Color is null
)
select 'Some Colors in A are not in B'
else
select 'ALL Colors in A exist in B'
you could also just use your existing query and add DISTINCT:
IF NOT EXISTS (
SELECT DISTINCT Color FROM #TableA
EXCEPT
SELECT DISTINCT Color FROM #TableB
)
SELECT 'true'
ELSE SELECT 'false'

Summing the distinct elements of query result

I have following three tables representing a tree structure. Every row in #A is ancestor of zero or more rows in #B. Similarly every row in #B is ancestor of zero or more rows in #C. Table #B contains a column value. I need to find sum of value for all rows in #B whose belong to an input ancestor.
For example, consider following content of tables:
CREATE TABLE #A (id varchar(10));
CREATE TABLE #B (id varchar(10), value int);
CREATE TABLE #C (id varchar(10), a_id varchar(10), b_id varchar(10));
INSERT INTO #A(id) VALUES ('A1'), ('A2');
INSERT INTO #B(id, value) VALUES('B1', 41), ('B2', 43), ('B3', 47);
INSERT INTO #C(id, a_id, b_id) VALUES('C1', 'A1', 'B1'), ('C2', 'A1', 'B1'),
('C3', 'A1', 'B2'), ('C4', 'A2', 'B3');
The above content represents following structure:
A1
|--- B1 (41)
| |-------- C1
| |-------- C2
|
|--- B2 (43)
|-------- C3
A2
|--- B3 (47)
|-------- C4
The parent-child relationship is weirdly defined. Table #B does not have its own column that says which row in table #A is its ancestor. All the mappings should be evaluated from table #C. Columns a_id and b_id in table #C designate grandparent and parent rows in table #A and #B respectively. If there is a row Z in #C where a_id is X and b_id is Y, then X is the ancestor of Y and Y is ancestor of Z. There will not be conflicting mappings in #C.
Problem Statement: For given id A1, find the sum of column value for all rows in #B whose parent is A1. Here there are two children of A1, B1 with value 41 and B2 with value 43 so we expect answer to be 84.
If I do something like below:
SELECT SUM(#B.value) FROM #B
INNER JOIN #C ON #B.id = #C.b_id
INNER JOIN #A ON #C.a_id = #A.id
WHERE #A.id = 'A1'
I get 125 i.e. 41 + 41 + 43 instead of 84, since two rows in #A have mapping B1 -> C1. I can write below query to get values associated with distinct rows in #B i.e. 41 and 43 but now I do not know how to sum the resultant values. Can I get the expected result without creating a temporary table?
SELECT MAX(#B.value) FROM #B
INNER JOIN #C ON #B.id = #C.b_id
INNER JOIN #A ON #C.a_id = #A.id
WHERE #A.id = 'A1'
GROUP BY #B.id;
I am not a SQL expert, so probably there might be a very simple solution to this.
You don't need table #A here, because the IDs are in table #C and the values in table #B. That is all you need. No need to join either. Simply select the IDs needed from #C, then use them to select from #B.
select sum(value)
from #B
where id in
(
select b_id
from #C
where a_id = 'A1'
);
You could do this:
SELECT SUM(#B.value)
FROM #B
WHERE EXISTS
(
SELECT NULL FROM #C
INNER JOIN #A ON #C.a_id = #A.id
WHERE #B.id = #C.b_id
AND #A.id = 'A1'
)
Then you will only sum up the #B values where they exists in the other tables
The result will be: 84
A slightly different approach.
SELECT SUM(#B.value) FROM #B
INNER JOIN (
SELECT DISTINCT a_id, b_id FROM #C
) temp
ON
temp.b_id=#B.ID
WHERE temp.a_id='A1';
This has the advantage that you can change the WHERE temp.a_id='A1' to GROUP BY temp.a

Copy records from one table to another without duplicates

IF object_id('tempdb..#A') IS NOT NULL DROP TABLE #A
IF object_id('tempdb..#B') IS NOT NULL DROP TABLE #B
CREATE TABLE #A (fname varchar(20), lname varchar(20))
CREATE TABLE #B (fname varchar(20), lname varchar(20))
INSERT INTO #A
SELECT 'Kevin', 'XP'
UNION ALL
SELECT 'Tammy', 'Win7'
UNION ALL
SELECT 'Wes', 'XP'
UNION ALL
SELECT 'Susan', 'Win7'
UNION ALL
SELECT 'Kevin', 'Win7'
SELECT * FROM #A
INSERT INTO #B
SELECT a.fname, a.lname FROM #A a
WHERE a.fname NOT IN (SELECT fname from #B)
SELECT * FROM #B
DELETE FROM #B
INSERT INTO #B
SELECT a.fname, a.lname FROM #A a
LEFT OUTER JOIN #B b ON a.fname = b.fname
WHERE a.fname NOT IN (SELECT fname from #B)
SELECT * FROM #B
Both of these examples copy all 5 records to the new table.
I only want to see one unique fname so only one Kevin should show up.
Why don't these work, or is there a better way to do it?
It seems like such a simple thing.
This would create rows with unique fname and take Win7 if both Win7 and XP existed.
INSERT INTO #B
SELECT a.fname, MIN(a.lname)
FROM #A a
GROUP BY a.fname
As per comments, given that W comes before X then you should be able to do
INSERT INTO #B
SELECT fname, lname
FROM (
SELECT fname, lname,
ROW_NUMBER() OVER(PARTITION BY fname ORDER BY lname) r
FROM #A
) t
WHERE r=1
demo
Answering your question, why don't your queries work?
INSERT INTO #B
SELECT a.fname, a.lname FROM #A a
WHERE a.fname NOT IN (SELECT fname from #B)
This operation is evaluated in two different operations. In the first, the SELECT part of the query is executed. It returns a table. At such point #B is empty, hence, every tuple in #A will be part of this result. Then, once this result is computed, this result is inserted into #B. #B will end being a copy of #A.
The DBMS does not insert one tuple, and then re-evaluate the query for the next tuple of #A, as your question seems to imply. Insertions are always done AFTER the query has been completely evaluated.
if your goal is to insert into #B the tuples in #A without duplicates, there are many ways to do that. One of them is:
INSERT INTO #B SELECT distinct * from #A;
--dmg
Just use DISTINCT over the select query :
INSERT INTO TARGET_TABLE
SELECT DISTINCT * FROM
(
-- some big query
) x

referencing a table variable

I have a problem referencing a table variable in a update statement. Seems I can't use the #a.id column (the compiler says it's not declared).
The following example was written only to illustrate the problem, meaning that I know that I can solve the problem in the current example renaming the column id and avoiding the #a.id reference, but it's not a option, I really can't do that. I saw some solutions using the from statement to alias the table being updated, but in this example I'm using the from statement for something else. Is there another way to solve it?
declare #a table
(
id int not null,
name varchar(100) null
)
insert into #a (id, name) values (1, null)
insert into #a (id, name) values (2, null)
insert into #a (id, name) values (3, null)
declare #b table
(
id int not null,
name varchar(100) null
)
insert into #b (id, name) values (1, 'one')
insert into #b (id, name) values (2, 'two')
update #a
set
name = f.name
from
(
select
id,
name
from #b
where
id = #a.id
) f
where
#a.id = f.id
Try something like this:
declare #a table
(
id int not null,
name varchar(100) null
)
insert into #a (id, name) values (1, null)
insert into #a (id, name) values (2, null)
insert into #a (id, name) values (3, null)
declare #b table
(
id int not null,
name varchar(100) null
)
insert into #b (id, name) values (1, 'one')
insert into #b (id, name) values (2, 'two')
update upd
set
name = [#b].name
from #a AS upd
INNER JOIN #b
ON upd.id = [#b].id
And BTW following will work too:
update #a
set
name = f.name
from
(
select
id,
name
from #b
) f
where
[#a].id = f.id
The reference to #a.id is illegal because the table is out of scope at that point. Here's what you might try:
Update #a
set name = f.name
from #b g
join #a f
on g.id=f.id