How can I write a query to match groups of records - sql

This is a puzzling SQL task. I'd like to do it with a query instead of stepping through with a cursor and doing it the "hard way".
If I have two tables TableA and TableB each with a grouping column as below:
TableA TableB
------------- -------------
id group id group
------ ------ ------ ------
1 D 1 X
2 D 2 X
3 D 3 Y
4 D 4 Y
4 E 5 Y
5 E 5 Z
5 F 6 Z
Note the group names are not the same name.
I want to know if a given TableB group is comprised entirely of IDs which are also grouped together in any group in TableA. The TableA group can have more IDs than the TableB group, so long as it has all of the same IDs as the TableB group. IDs can be in more than one group in either table.
From the tables above, I should find out that group X from TableB matches a group in TableA, but groups Y and Z do not.
I've tried many different queries, subqueries, recursive CTEs. I've only ended up with wrong results and headaches. The real dataset is significantly larger, so performance should be considered a factor too. Unfortunately, that means the cross-join solution proposed in an answer below won't work.
Is this even possible with SQL?

The idea for this query is to consider every group to every other group. Then count the number of times that the ids match. This involves a cross join to generate all the group pairs, then some joins and aggregation:
select b.group, a.group
from (select group, count(*) as cnt b group by group) bg cross join
(select distinct group a) ag left join
b
on b.group = bg.group left join
a
on a.group = ag.group and a.id = b.id
group by bg.group, ag.group, bg.cnt
having bg.cnt = count(a.id)

This query could give the answer.
SELECT B.[group]
FROM
TableB B
FULL JOIN TableA A ON B.id = A.id
GROUP BY
B.[group]
HAVING
COUNT(DISTINCT A.[group]) = 1

Related

Subqueries vs Multi Table Join

I've 3 tables A, B, C. I want to list the intersection count.
Way 1:-
select count(id) from A a join B b on a.id = b.id join C c on B.id = C.id;
Result Count - X
Way 2:-
SELECT count(id) FROM A WHERE id IN (SELECT id FROM B WHERE id IN (SELECT id FROM C));
Result Count - Y
The result count in each of the query is different. What exactly is wrong?
A JOIN can multiply the number of rows as well as filtering out rows.
In this case, the second count should be the correct one because nothing is double counted -- assuming id is unique in a. If not, it needs count(distinct a.id).
The equivalent using JOIN would use COUNT(DISTINCT):
select count(distinct a.id)
from A a join
B b
on a.id = b.id join
C c
on B.id = C.id;
I mention this for completeness but do not recommend this approach. Multiplying the number of rows just to remove them using distinct is inefficient.
In many databases, the most efficient method might be:
select count(*)
from a
where exists (select 1 from b where b.id = a.id) and
exists (select 1 from c where c.id = a.id);
Note: This assumes there are indexes on the id columns and that id is unique in a.

Query left join without all the right rows from B table

I have 2 tables, A and B.
I need all columns from A + 1 column from B in my select.
Unfortunately, B has multiples rows(all identicals) for 1 row in A
on the join condition.
I tried but I can't isolate one row in A for one row in B with left join for example while keeping my select.
How can I do this query ? Query in ORACLE SQL
Thanks in advance.
This is a good use for outer apply. The structure of the query looks like this:
select a.*, b.col
from a outer apply
(select top 1 b.col
from b
where b.? = a.?
) b;
Normally, you would only use top 1 with order by. In this case, it doesn't seem to make a difference which row you choose.
You can group by on all columns from A, and then use an aggregate (like max or min) to pick any of the identical B values:
select a.*
, b.min_col1
from TableA a
left join
(
select a_id
, min(col1) as min_col1
from TableB
group by
a_id
) b
on b.a_id = a.id

How do I select group wise from a second table

This one is hard to explain and I'm sure I will facepalm when I see the solution, but I just can't get it right...
I have three tables:
Table A contains new records that I want to do something with.
Table B contains all activities from Table C of a specific type (done beforehand).
Table C is sort of a "master" table that contains all activities as well as a customer id and a lot of other stuff.
I need to select all activities that is in Table A from Table B. So far so good.
The part I can't get together is that I also need all the activities from Table B that has the same customer id as an activity contained in Table A.
This is what I'm after:
activity
2
3
4
5
6
The trick here is to get activity 2, because activity 2 is also done by customer 2, even though it is not in Table A.
Here are the tables:
Table A (new records)
activity
3
4
5
6
Table B (all records of a specific type from Table C)
activity
1
2 <-- How do I get this record as well?
3
4
5
6
Table C (all records)
activity customer
1 1
2 2
3 2
4 3
5 3
6 4
7 5
I tried something like this...
SELECT *
FROM table_b b
INNER JOIN table_c c
ON c.activity = b.activity
INNER JOIN table_a a
ON a.activity = b.activity
... but of course it only yields:
activity
3
4
5
6
How can I get activity 2 as well here?
To do this returning one column I would recommend staging the customer_ids of activities in Table_b that are in Table_a into a CTE (common table expression MSDN CTE) then select activities in table_c and join to the CTE to get only activities with a valid customer_id.
example of CTE: (Note the semi-colon ; before the WITH keyword is workaround for an issue in SQL 2005 with multiple statements. It it not necessary if you are in a newer version, or not running batch statements.)
;WITH cte_1 AS (
SELECT distinct c.customer --(only need a distinct result of customer ids)
from table_b b
join table_a a on b.activity = a.activity --(gets only activities in b that are in a)
join table_c c on b.activity = c.activity --(gets customer id of activies in table b)
)
SELECT a.activity
FROM table_c a
JOIN cte_1 b ON a.customer = b.customer
Alternatively you could do this in three joins with a select distinct. However I find the CTE to be an easier way to develop and think about this problem regardless of the way you decide to implement your solution. Although the three join solution will most likely scale and perform better over time with a growing data-set.
Example:
SELECT distinct d.activity
from table_b b
join table_a a on b.activity = a.activity --(gets only activities in b that are in a)
join table_c c on b.activity = c.activity --(gets customer id of activies in table b)
join table_c d ON c.customer = d.customer
Both would output:
2
3
4
5
6
Here is one way to do it
SELECT *
FROM TableB b1
WHERE EXISTS (SELECT 1
FROM Tablec c1
WHERE EXISTS (SELECT 1
FROM TableA a
INNER JOIN Tablec c
ON a.activity = c.activity
WHERE c.customer = c1.customer)
AND c1.activity = b1.activity)
Can you try doing a left join?
SELECT *
FROM table_b b
INNER JOIN table_c c
ON c.activity = b.activity
LEFT JOIN table_a a
ON b.activity = a.activity

SQL: Sort a table by the first relation to a second table

I have a Many-to-Many relationship between two tables. I'd like to sort the first table by the first relationship with the second table and only return a single result from that table. This is on SQL Server. I'd like something like this:
SELECT a.retrieve_me
FROM table_A AS a
JOIN table_B AS b ON a.foo = b.foo
JOIN table_C AS c ON b.bar = c.bar
ORDER BY c.sort_me
Unfortunately it returns MN(k) results, where M is the count of "table_A" and N(k) is the number of relations of a single row k with "table_C." To have it return just the results I wanted without post filtering I tried using DISTINCT on the SELECT clause and using TOP(SELECT COUNT(*) FROM table_A) but neither are valid syntax.
Any ideas? Hoping I can make this as performant as possible.
EDIT:
For clarity
table A
------------
"joe" 1
"betty" 2
"george" 3
table B
------------
1 2
1 3
2 3
2 4
3 1
table C
------------
1 "ashton"
2 "harding"
3 "spring"
4 "merry lane"
I'd like the results returned in the order of "george", "joe", and "betty" which is in the order (george -> ashton, joe -> harding, betty -> merry lane.)
If I understood correctly what you need, cause I think is very hard to follow you .. this should do it:
SELECT a.nm
FROM tablea a
cross apply (select top 1 *
from tableb b
join tablec c on b.id2 = c.id
where a.id = b.id1
order by c.nm) bc
order by bc.nm
http://sqlfiddle.com/#!3/661c0/5/0

sql left join and duplicates in result

Say I have 2 tables, A and B, each A entity can possibly have multiple B entities, in one case if I want to get all B's of some certain A's, I might do it with a simple left join
select A.id aid,B.id bid from A
left join B on B.aid = A.id
where A.id = 1
and it will return a result set like
aid bid
1 1
1 2
1 3
As you can see for the first column, all those 1's are kinda duplicates. Is it possible to modify the SQL statement to let him return a result like
aid bid
1 1,2,3
in other words to link all the bid's together as one entity?
Also what if there's another table C, and each A can have multiple C's, how to I make the SQL return a result set like
aid bid cid
1 1,2,3 1,2
instead of
aid bid cid
1 1 1
1 2 1
1 3 1
1 1 2
1 2 2
1 3 2
Thank you very much!
What DBMS are you using?
I can't speak for others, but in MySQL, starting from 4.1, you can use GROUP_CONCAT
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat
EG:
select A.id aid, GROUP_CONCAT(DISTINCT B.id) bid from A
left join B on B.aid = A.id
where A.id = 1
GROUP BY a.id
Try using the COALESCE function.
http://www.sqlteam.com/article/using-coalesce-to-build-comma-delimited-string