Excluding results from joined tables based on a single count value - sql

I have two tables that I'm joining, and would like to exclude any result that has a count greater than 1 for a value in the second table.
For this example, I have a table called movie_info, which has information about films, each with a unique ID. A second table called crew_info has information about each film's crew (with film's unique ID), but rather than 1 entry per film, there are multiple entries per crew member. Visually it would be like this:
+----------------------+
| movie_info |
+======================+
| id = '123' |
+----------------------+
+----------------------+
| crew_info |
+======================+
| id = '123' |
+----------------------+
| name = 'John' |
+----------------------+
| role = 'director' |
+----------------------+
+----------------------+
| crew_info |
+======================+
| id = '123' |
+----------------------+
| name = 'Mary' |
+----------------------+
| role = 'director' |
+----------------------+
+----------------------+
| crew_info |
+======================+
| id = '123' |
+----------------------+
| name = 'Sue' |
+----------------------+
| role = 'writer' |
+----------------------+
I join the two tables like so:
SELECT a.id, b.*
FROM movie_info as a
LEFT JOIN crew_info as b
on a.id = b.id
So far it's all standard. What I'm trying to do though is only return results in which crew_info has a count of only 1 director. So a standalone query like this:
SELECT id
FROM crew_info
WHERE role = 'director'
HAVING count(id) = 1
successfully excludes results like this example, where there is more than 1 director. But how exactly do I join this with the movie_info table, so that it's all in one query?
I'm sorry if this is unclear. I am relatively new to SQL so please let me know if there's anything I haven't expressed properly. Thank you.
EDIT: One more thing I forgot about, sorry! If the count is more than 1, how do I still include results if another value is matched? So let's say movie_info also had a field called sequel_id, which is only filled out if the film is a sequel. I want to exclude results that have a director count > 1, AND an empty or null sequel_id, but include results that have a director count > 1 AND a valid sequel_id value. I tried something like (HAVING COUNT(*) = 1 OR (HAVING COUNT(*) > 1 AND sequel_id IS NOT NULL)) but I'm getting a syntax error.

Just use your existing query as an EXISTS sub-query:
SELECT *
FROM movie_info
WHERE EXISTS (
SELECT 1
FROM crew_info
WHERE crew_info.id = movie_info.id
AND crew_info.role = 'director'
HAVING COUNT(*) = 1
)
OR sequel_id IS NOT NULL

try like below by using cte
with cte as
(
SELECT id
FROM crew_info
WHERE role = 'director'
HAVING count(id) > 1
) select a.*,b.id FROM movie_info as a
LEFT JOIN cte as b
on a.id = b.id
without cte you can also subquery
select a.*,b.id FROM movie_info as a
left join (
SELECT id
FROM crew_info
WHERE role = 'director'
group by id
HAVING count(id) > 1
) b on a.id=b.id

Related

Sum of two counts from one table with additional data from another table

I have two tables as follows:
TABLE A
| id | col_a | col_b | user_id |
--------------------------------
| 1 | false | true | 1 |
| 2 | false | true | 2 |
| 3 | true | true | 2 |
| 4 | true | true | 3 |
| 5 | true | false | 1 |
TABLE B
| id | name |
--------------
| 1 | Bob |
| 2 | Jim |
| 3 | Helen |
| 4 | Michael|
| 5 | Jen |
I want to get the sum of two counts, which are the number of true values in col_a and number of true values in col_b. I want to group that data by user_id. I also want to join Table B and get the name of each user. The result would look like this:
|user_id|total (col_a + col_b)|name
------------------------------------
| 1 | 2 | Bob
| 2 | 3 | Jim
| 3 | 2 | Helen
So far I got the total sum with the following query:
SELECT
(SELECT COUNT(*) FROM "TABLE_A" WHERE "col_a" is true)+
(SELECT COUNT(*) FROM "TABLE_A" WHERE "col_b" is true)
as total
However, I'm not sure how to proceed with grouping these counts by user_id.
Something like this is typically fastest:
SELECT *
FROM "TABLE_B" b
JOIN (
SELECT user_id AS id
, count(*) FILTER (WHERE col_a)
+ count(*) FILTER (WHERE col_b) AS total
FROM "TABLE_A"
GROUP BY 1
) a USING (id);
While fetching all rows, aggregate first, join later. That's cheaper. See:
Query with LEFT JOIN not returning rows for count of 0
The aggregate FILTER clause is typically fastest. See:
For absolute performance, is SUM faster or COUNT?
Aggregate columns with additional (distinct) filters
Often, you want to keep total counts of 0 in the result. You did say:
get the name of each user.
SELECT b.id AS user_id, b.name, COALESCE(a.total, 0) AS total
FROM "TABLE_B" b
LEFT JOIN (
SELECT user_id AS id
, count(col_a OR NULL)
+ count(col_b OR NULL) AS total
FROM "TABLE_A"
GROUP BY 1
) a USING (id);
...
count(col_a OR NULL) is an equivalent alternative, shortest, and still fast. (Use the FILTER clause from above for best performance.)
The LEFT JOIN keeps all rows from "TABLE_B" in the result.
COALESCE() return 0 instead of NULL for the total count.
If col_a and col_b have only few true values, this is typically (much) faster - basically what you had already:
SELECT b.*, COALESCE(aa.ct, 0) + COALESCE(ab.ct, 0) AS total
FROM "TABLE_B" b
LEFT JOIN (
SELECT user_id AS id, count(*) AS ct
FROM "TABLE_A"
WHERE col_a
GROUP BY 1
) aa USING (id)
LEFT JOIN (
SELECT user_id AS id, count(*) AS ct
FROM "TABLE_A"
WHERE col_b
GROUP BY 1
) ab USING (id);
Especially with (small in this case!) partial indexes like:
CREATE INDEX a_true_idx on "TABLE_A" (user_id) WHERE col_a;
CREATE INDEX b_true_idx on "TABLE_A" (user_id) WHERE col_b;
Aside: use legal, lower-case unquoted names in Postgres to make your like simpler.
Are PostgreSQL column names case-sensitive?
select user_id,name
, count(case when col_a = true then 1 end)
+ count(case when col_b = true then 1 end) total
from tableA a
join TableB b on a.user_id= b.id
group by user_id,name
You are double counting JIM, if that is not supposed since it only shows up in two rows and not three, maybe you can do the following:
with cte_A as (
select col_a as col, user_id
from A
where col_a=true
union -- ALL -- (if you want to double count Jim)
select col_b as col, user_id
from A
where col_b=true
)
select B.user_id, sum(*) as total, B.name
from cte_A
join B
on cte_A.user_id = B.user_id
group by B.user_id
If you want to actually double count then use the UNION ALL instead of UNION

Left join command is not showing all results

I have a table RESTAURANT:
Id | Name
------------------
0 | 'McDonalds'
1 | 'Burger King'
2 | 'Starbucks'
3 | 'Pans'
And a table ORDER:
Id | ResId | Client
--------------------
0 | 1 | 'Peter'
1 | 2 | 'John'
2 | 2 | 'Peter'
Where 'ResId' is a foreign key from RESTAURANT.Id.
I want to select the number of order per restaurant:
Expected result:
Restaurant | Number of orders
----------------------------------
'McDonalds' | 0
'Burguer King' | 1
'Starbucks' | 2
'Pans' | 0
Actual result:
Restaurant | Number of orders
----------------------------------
'McDonalds' | 0
'Burguer King' | 1
'Starbucks' | 2
Command used:
select r.Name, count(o.ResId)
from RESTAURANT r
left join ORDER o on r.Id like o.ResId
group by o.ResId;
Just fix the group by clause:
select r.name, count(*) as cnt_orders
from restaurants r
left join orders o on r.id = o.resid
group by r.id, r.name;
That way, the SELECT and GROUP BY clauses are consistent; I also added the restaurant id to the group, so potential restaurants having the same name are not aggregated together. I also changed like to =: this is more efficient, and does not alter the logic.
You could also phrase this with a subquery, so there is no need for outer aggregation. I would prefer:
select r.*,
(select count(*) from orders o where o.resid = r.id) as cnt_orders
from restaurants r
Your query should be generating an error because the select columns and the group by columns are incompatible. Just aggregate by the unaggregated columns in the select:
select r.Name, count(o.ResId)
from RESTAURANT r left join
ORDER o
on r.Id = o.ResId
group by r.Name;
Notes:
You might want to include r.id in the GROUP BY (and SELECT) in case restaurants can have the same name.
Note the use of = instead of LIKE. The ids look like numbers, so you should use number operations. LIKE is a string operation.
ORDER is a bad name for a table because it is a SQL keyword.
As a general rule, in a LEFT JOIN, you don't want the aggregation keys to be from the second table, because those values could be NULL.

SQL - Select first group in group by

I have this table in DB2:
+----+-----+----------+
| id | name| key |
+----+-----+----------+
| 1 | foo |111000 |
| 2 | bar |111000 |
| 3 | foo |000111 |
+----+-----+----------+
When I group by name by I can extract the table grouped by the name, but how can I automatically only extract the first group, to get this result:
+----+-----+----------+
| id | name| key |
+----+-----+----------+
| 1 | foo |111000 |
| 3 | foo |000111 |
+----+-----+----------+
How can I solve this?
The MIN function will identify which row is the first one by id, then you can use that to filter the result to show only that row.
SELECT id,name,key
FROM Table1
WHERE id IN (SELECT MIN(ID) FROM Table1 GROUP BY name,key)
You could use a inner join on subselect aggregated by min id
select * from mytable
inner join (
select min(id) my_id
from mytable
group by name, key
) t on t.my_id = mytable.id
It looks like you want to get all names that have the same as the min(id). If this us correct then this should work:
Otherwise, please explain what you mean by "first group" and how that is defined.
select * from table
inner join (
select name, min(id)
from table
group by name
) t on t.name = table.name
In theory, given the way the question is asked you could also just do a simple select on the name you want.
SELECT id,name,key
From Table1
Where name = 'foo'
It really depends what you mean by 'first group'. If you grouped by name and ordered ascending by name then 'bar' would actually be the 'first group', not 'foo'. Maybe if you clarify that we can give you better answers?

Select and group results using the same column as a parameter

I have a query that returns the following result (example):
+----+-----------+------------+
| ID | FirstName | CourseName |
+----+-----------+------------+
| 1 | Alice | X |
| 2 | Bob | X |
| 2 | Bob | Y |
+----+-----------+------------+
the query takes 3 tables (users, user-courses and course), and using JOIN returns the id of the user and his first name, and all the names of all courses he is in.
i need to create a query which returns users who are in specific courses, for example:
select all the users in course X: will return the details both of Alice and Bob.
select all users in courses X AND Y: will return only Bob, since alice isn't in course Y.
the result of the query X AND Y will be:
+----+-----------+
| ID | FirstName |
+----+-----------+
| 2 | Bob |
+----+-----------+
Assuming that user table and course table have an id and a name columns, and user-courses has only foreign key ids, you can do the following:
For the first question:
select u.* from user u
inner join user-courses uc on uc.user_id=u.id
inner join course c on c.id=uc.course_id and c.name='X';
It filters the user on inner joins, and filter the course on tha last part (c.name = 'X'). You can filter in any other way.
For the second one:
select * from user
where id in (
select distinct a.* from (
select user_id from user-courses uc inner join course c
on c.id=uc.course_id
where c.name='X'
) a
inner join (
select user_id from user-courses uc inner join course c
on c.id=uc.course_id
where c.name='Y'
) b
on a.user_id=b.user_id
);
MS-Access don't have intersect, so I used inner join (between a and b) to achieve the same results. A is the table with users from course 'X' and b from 'Y'. The inner join intersect both, resulting in users that are in both courses. Then I used the ids to filter.
I don't have MS-access, so I tried in PostgreSQL, but I used SQL-ANSI, so I hope so.

Choose rows based on two connected column values in one statement - ORACLE

First, I'm not sure if the title represent the best of the issue. Any better suggestion is welcomed. My problem is I have the following table:
+----+----------+-------+-----------------+
| ID | SUPPLIER | BUYER | VALIDATION_CODE |
+----+----------+-------+-----------------+
| 1 | A | Z | 937886521 |
| 2 | A | X | 937886521 |
| 3 | B | Z | 145410916 |
| 4 | C | V | 775709785 |
+----+----------+-------+-----------------+
I need to show SUPPLIERS A and B which have BUYER Z, X. However, I want this condition to be one-to-one relationship rather than one-to-many. That is, for the supplier A, I want to show the column with ID: 1, 2. For the supplier B, I want to show the column 3 only. The following script will show the supplier A with all possible buyers (which I do not want):
SELECT *
FROM validation
WHERE supplier IN ( 'A', 'B' )
AND buyer IN ( 'X', 'Z');
This will show the following pairs: (A,Z), (A,X), (B, Z). I need to show only the following: (A,X)(B,Z) in one statement.
The desired result should be like this:
+----+----------+-------+-----------------+
| ID | SUPPLIER | BUYER | VALIDATION_CODE |
+----+----------+-------+-----------------+
| 2 | A | X | 937886521 |
| 3 | B | Z | 145410916 |
+----+----------+-------+-----------------+
You can update the WHERE clause to filter on the desired pairs:
select *
from sample
where (upper(supplier),upper(buyer))
in (('A','X'),('A','Y'),('A','Z'),('B','X'),('B','Y'),('B','Z'));
I used the UPPER function based on your mixed case examples.
See if this what you need:
SELECT MAX(id),
supplier,
MAX(buyer),
MAX(validation_code)
FROM
(SELECT *
FROM Validation
WHERE supplier IN ( 'A', 'B' ) AND buyer IN ( 'X', 'Z')
) filtered
GROUP BY supplier;
SQL Fiddle
I used GROUP BY supplier to flatten the table and included maximum values of ID, Buyer, and Validation_Code.
Alternatively, you could try this:
SELECT id
, supplier
, buyer
, validation_code
FROM (SELECT id
,max(id) OVER(PARTITION BY supplier) AS maxid
,supplier
,buyer
,validation_code
FROM sample) AS x
WHERE x.id=x.maxid
You may have a look to the results of the inner SQL statement to see what it does.
try this query:
select ID,SUPPLIER,BUYER,VALIDATION_CODE from
(select
t2.*,t1.counter
from
validation t2,
(select supplier,count(supplier) as counter from hatest group by supplier)t1
where
t1.supplier = t2.supplier)t3
where t3.supplier in('A','B') and
id = case when t3.counter > 1 then
(select max(id) from validation t4 where t4.supplier = t3.supplier) else t3.id end;