SQL query uses "wrong" join - sql

I have an query which gives me the wrong result.
Tables:
A
+----+
| id |
+----+
| 1 |
| 2 |
+----+
B
+----+----+
| id | x | B.id = A.id
+----+----+
| 1 | 1 |
| 1 | 1 |
| 1 | 0 |
+----+----+
C
+----+----+
| id | y | C.id = A.id
+----+----+
| 1 | 1 |
| 1 | 2 |
+----+----+
What I want to do: Select all rows from A. For each row in A count in B all x with value 1 and all x with value 0 with B.id = A.id. For each row in A get the minimum y from C with C.id = A.id.
The result I am expecting is:
+----+------+--------+---------+
| id | min | count1 | count 2 |
+----+------+--------+---------+
| 1 | 1 | 2 | 1 |
| 2 | NULL | 0 | 0 |
+----+------+--------+---------+
First Try:
This doesn't work.
SELECT a.id,
MIN(c.y),
SUM(IF(b.x = 1, 1, 0)),
SUM(IF(b.x = 0, 1, 0))
FROM a
LEFT JOIN b
ON ( a.id = b.id )
LEFT JOIN c
ON ( a.id = c.id )
GROUP BY a.id
+----+------+--------+---------+
| id | min | count1 | count 2 |
+----+------+--------+---------+
| 1 | 1 | 4 | 2 |
| 2 | NULL | 0 | 0 |
+----+------+--------+---------+
Second Try:
This works but I am sure it has a bad performance.
SELECT a.id,
MIN(c.y),
b.x,
b.y
FROM a
LEFT JOIN (SELECT b.id, SUM(IF(b.x = 1, 1, 0)) x, SUM(IF(b.x = 0, 1, 0)) y FROM b) b
ON ( a.id = b.id )
LEFT JOIN c
ON ( a.id = c.id )
GROUP BY a.id
+----+------+--------+---------+
| id | min | count1 | count 2 |
+----+------+--------+---------+
| 1 | 1 | 2 | 1 |
| 2 | NULL | 0 | 0 |
+----+------+--------+---------+
Last Try:
This works too.
SELECT x.*,
SUM(IF(b.x = 1, 1, 0)),
SUM(IF(b.x = 0, 1, 0))
FROM (SELECT a.id,
MIN(c.y)
FROM a
LEFT JOIN c
ON ( a.id = c.id )
GROUP BY a.id) x
LEFT JOIN b
ON ( b.id = x.id )
GROUP BY x.id
Now my question is: Is the last one the best choise or is there a way to write this query with just one select statement (like in the first try)?

Your joins are doing cartesian products for a given value, because there are multiple rows in each table.
You can fix this by using count(distinct) rather than sum():
SELECT a.id, MIN(c.y),
count(distinct (case when b.x = 1 then b.id end)),
count(distinct (case when b.x = 0 then b.id end))
FROM a
LEFT JOIN b
ON ( a.id = b.id )
LEFT JOIN c
ON ( a.id = c.id )
GROUP BY a.id;
You can also fix this by pre-aggregating b (and/or c). And you would need to take that approach if your aggregation function were something like the sum of a column in b.
EDIT:
You are correct. The above query counts the distinct values of B, but B contains rows that are exact duplicates. (Personally, I think having a column with the name id that has duplicates is a sign of poor design, but that is another issue.)
You could solve it by having a real id in the b table, because then the count(distinct) would count the correct values. You can also solve it by aggregating the two tables before joining them in:
SELECT a.id, c.y, x1, x0
FROM a
LEFT JOIN (select b.id,
sum(b.x = 1) as x1,
sum(b.x = 0) as x0
from b
group by b.id
) b
ON ( a.id = b.id )
LEFT JOIN (select c.id, min(c.y) as y
from c
group by c.id
) c
ON ( a.id = c.id );
Here is a SQL Fiddle for the problem.
EDIT II:
You can get it in one statement, but I'm not so sure that it would work on similar data. The idea is that you can count all the cases where x = 1 and then divide by the number of rows in the C table to get the real distinct count:
SELECT a.id, MIN(c.y),
coalesce(sum(b.x = 1), 0) / count(distinct coalesce(c.y, -1)),
coalesce(sum(b.x = 0), 0) / count(distinct coalesce(c.y, -1))
FROM a
LEFT JOIN b
ON ( a.id = b.id )
LEFT JOIN c
ON ( a.id = c.id )
GROUP BY a.id;
It is a little tricky, because you have to handle NULLs to get the right values. Note that this is counting the y value to get a distinct count from the C table. Your question re-enforces why it is a good idea to have a unique integer primary key in every table.

Related

SQL - Intersections

I have table like below:
|Group|User|
| 1 | X |
| 1 | Y |
| 1 | Z |
| 2 | X |
| 2 | Y |
| 2 | Z |
| 3 | X |
| 3 | Z |
| 4 | X |
I want to calculate intersections of groups: 1&2, 1&2&3, 1&2&3&4.
Then I want to see users in each intersection and how many of them are there.
In the given example should be:
1&2 -> 3
1&2&3 -> 2
1&2&3&4 -> 1
Is it possible using SQL?
If yes, how can I start?
On a daily basis I'm working with Python, but now I have to translate this code into SQL code and I have huge problem with it.
In PostgreSQL you can use JOIN or INTERSECT. Example with JOIN:
select count(*)
from t a
join t b on b.usr = a.usr
where a.grp = 1 and b.grp = 2
Then:
select count(*)
from t a
join t b on b.usr = a.usr
join t c on c.usr = a.usr
where a.grp = 1 and b.grp = 2 and c.grp = 3
And:
select count(*)
from t a
join t b on b.usr = a.usr
join t c on c.usr = a.usr
join t d on d.usr = a.usr
where a.grp = 1 and b.grp = 2 and c.grp = 3 and d.grp = 4
Result in 3, 2, and 1 respectively.
EDIT - You can also do:
select count(*)
from (
select usr, count(*)
from t
where grp in (1, 2, 3, 4)
group by usr
having count(*) = 4 -- number of groups
) x
See running example at db<>fiddle.

JOIN - Returns 5 instead of 3 rows for repeating values in ID column

I have the following two tables:
Table A:
id|valA
--+----
1 | a
2 | b
2 | c
Table B:
id|valA
--+----
1 | A
2 | B
2 | C
What I expect to get using join is:
id|valA|valB
--+----+----
1 | a | A
2 | b | B
2 | c | C
However, I get the following
id|valA|valB
--+----+----
1 | a | A
2 | b | B
2 | c | B
2 | c | C
2 | b | C
I am using the following query:
select id, valA, valB from A
inner join B
on A.id = B.id;
How do I use the join to get the expected results?
What am I doing wrong?
Both tables have a duplicate ID of 2, so you get a result with all the possible combinations of the two rows in each table with that ID (bB, bC, cB and cC). If you can rely on the values, you can add them to the join condition:
SELECT a.id, valA, valB
FROM a
JOIN b ON a.id = a.id and valA = LOWER(valB)
If you can't make this assumption, you could add a rank pseudocolumn to the values with the same ID and use in in the join condition too, although it may be a bit clunky:
SELECT idA, valA, valB
FROM (SELECT id AS idA, valA, RANK() OVER (PARTITION BY id ORDER BY valA) AS rankA
FROM a) a
JOIN (SELECT id AS idB, valB, RANK() OVER (PARTITION BY id ORDER BY valB) AS rankB
FROM b) b ON idA = idB AND rankA = rankB

How to query all rows where a given column matches at least all the values in a given array with PostgreSQL?

The request below:
SELECT foos.id,bars.name
FROM foos
JOIN bar_foo ON (bar_foo.foo_id = id )
JOIN bars ON (bars.id = bar_foo.bar_id )
returns a list like this:
id | name
---+-----
1 | a
1 | b
2 | a
2 | y
2 | z
3 | a
3 | b
3 | c
3 | d
How to get the ids for which id must have at least a and b, and more generally the content of a given array ?
From the example above, I would get:
id | name
---+-----
1 | a
1 | b
3 | a
3 | b
3 | c
3 | d
For two values, you can use windowing boolean aggregation:
select *
from (
select f.id, b.name,
bool_or(b.name = 'a') over(partition by id) has_a,
bool_or(b.name = 'b') over(partition by id) has_b
from foos f
join bar_foo bf on bf.foo_id = f.id
join bars b on b.id = bf.bar_id
) t
where has_a and has_b
A more generic approach uses array aggregation:
select *
from (
select f.id, b.name,
array_agg(b.name) over(partition by id) arr_names
from foos f
join bar_foo bf on bf.foo_id = f.id
join bars b on b.id = bf.bar_id
) t
where arr_names #> array['a', 'b']

Get count of foreign key from multiple tables

I have 3 tables, with Table B & C referencing Table A via Foreign Key. I want to write a query in PostgreSQL to get all ids from A and also their total occurrences from B & C.
a | b | c
-----------------------------------
id | txt | id | a_id | id | a_id
---+---- | ---+----- | ---+------
1 | a | 1 | 1 | 1 | 3
2 | b | 2 | 1 | 2 | 4
3 | c | 3 | 3 | 3 | 4
4 | d | 4 | 4 | 4 | 4
Output desired (just the id from A & total count in B & C) :
id | Count
---+-------
1 | 2 -- twice in B
2 | 0 -- occurs nowhere
3 | 2 -- once in B & once in C
4 | 4 -- once in B & thrice in C
SQL so far SQL Fiddle :
SELECT a_id, COUNT(a_id)
FROM
( SELECT a_id FROM b
UNION ALL
SELECT a_id FROM c
) AS union_table
GROUP BY a_id
The query I wrote fetches from B & C and counts the occurrences. But if the key doesn't occur in B or C, it doesn't show up in the output (e.g. id=2 in output). How can I start my selection from table A & join/union B & C to get the desired output
If the query involves large parts of b and / or c it is more efficient to aggregate first and join later.
I expect these two variants to be considerably faster:
SELECT a.id,
, COALESCE(b.ct, 0) + COALESCE(c.ct, 0) AS bc_ct
FROM a
LEFT JOIN (SELECT a_id, count(*) AS ct FROM b GROUP BY 1) b USING (a_id)
LEFT JOIN (SELECT a_id, count(*) AS ct FROM c GROUP BY 1) c USING (a_id);
You need to account for the possibility that some a_id are not present at all in a and / or b. count() never returns NULL, but that's cold comfort in the face of LEFT JOIN, which leaves you with NULL values for missing rows nonetheless. You must prepare for NULL. Use COALESCE().
Or UNION ALL a_id from both tables, aggregate, then JOIN:
SELECT a.id
, COALESCE(ct.bc_ct, 0) AS bc_ct
FROM a
LEFT JOIN (
SELECT a_id, count(*) AS bc_ct
FROM (
SELECT a_id FROM b
UNION ALL
SELECT a_id FROM c
) bc
GROUP BY 1
) ct USING (a_id);
Probably slower. But still faster than solutions presented so far. And you could do without COALESCE() and still not loose any rows. You might get occasional NULL values for bc_ct, in this case.
Another option:
SELECT
a.id,
(SELECT COUNT(*) FROM b WHERE b.a_id = a.id) +
(SELECT COUNT(*) FROM c WHERE c.a_id = a.id)
FROM
a
Use left join with a subquery:
SELECT a.id, COUNT(x.id)
FROM a
LEFT JOIN (
SELECT id, a_id FROM b
UNION ALL
SELECT id, a_id FROM c
) x ON (a.id = x.a_id)
GROUP BY a.id;

SQL Join to Get Row with Maximum Value from Right table

I am having problem with sql join (oracle/ms sql)
I have two tables
A
ID | B_ID
---|------
1 | 1
1 | 4
2 | 3
2 | 2
----------
B
B_ID | B_VA| B_VB
-------|--------|-------
1 | 1 | a
2 | 2 | b
3 | 5 | c
4 | 2 | d
-----------------------
From these two tables I need A.ID, B.B_ID, B.B_VA (MAX), B.B_VB (with max B.B_VA)
So result table would be like
ID | B_ID | B_VA| B_VB
-------|--------|--------|-------
1 | 4 | 2 | d
2 | 3 | 5 | c
I tried some joins without success. Can anyone help me with query to get the result I want.
Thank you
Your logic as described doesn't quite correspond to the data. For instance, b_va is numeric, but the column in the output is a string.
Perhaps you want this. The data in a to be aggregated to get the maximum b_id value. Then each column to be joined to get the corresponding b_vb column. That, at least, conforms to your desired output:
select a.id, a.b_id, b1.b_vb as b_va, b2.b_vb
from (select id, max(b_id) as b_id
from a
group by id
) a join
b b1
on a.id = b1.b_id join
b b2
on a.b_id = b2.b_id;
EDIT:
For the corrected data, I think this is what you want:
select a.id, a.b_id, max(b1.b_va) as b_va, b2.b_vb
from (select id, max(b_id) as b_id
from a
group by id
) a join
b b1
on a.id = b1.b_id join
b b2
on a.b_id = b2.b_id
group by a.id, a.b_id, b2.b_vb;
Try this
SELECT X.ID, Y.B_ID, X.B_VA, Y.B_VB
FROM (SELECT A.ID, MAX(B_VA) AS B_VA
FROM A INNER JOIN B ON A.B_ID = B.B_ID
GROUP BY A.ID) AS X INNER JOIN
A AS Z ON X.ID = Z.ID INNER JOIN
B AS Y ON Z.B_ID=Y.B_ID AND X.B_VA=Y.B_VA