In SQL Query a one-to-many relationship with condition - sql

I have the following tables:
event_tbl
| event_id (PK) | event_date | event_location |
|---------------|------------|----------------|
| 1 | 01/01/2018 | Miami |
| 2 | 02/04/2018 | Tampa |
performer_tbl
| performer_id (PK) | event_id (FK) | genre |
|-------------------|---------------|-------|
| 1 | 1 | A |
| 2 | 1 | B |
| 3 | 2 | A |
| 4 | 2 | C |
I want to find events that have both genre A and genre B (should just return event 1), and I'm lost on writing the query. Maybe I just haven't had enough coffee, but all I can come up with is doing two derived columns with a case statement that count either genre and group by the event_id, then filtering both to >0. It just doesn't seem very elegant.

This should do the job (in MySQL, for other DBMS the syntax can be varied easily):
SELECT
e.event_id
FROM
event_tbl e
JOIN performer_tbl p USING(event_id)
GROUP BY e.event_id
HAVING SUM(IF(p.genre = 'A', 1, 0)) >= 1 AND SUM(IF(p.genre = 'B', 1, 0)) >= 1;

if you are using sql server, check below:
Select * From
event_tbl
where event_id
IN
(
select event_id
from performer_tbl as A
where exists (select 1
from perfoermer_tbl as B
where B.event_id = A.event_id and B.genre = 'A')
and
exists (select 1
from perfoermer_tbl as B
where B.event_id = A.event_id and B.genre = 'B')
)

This should work in any SQL database (at least in mysql, sql server, postgres or oracle)
select event_tbl.* FROM (
select event_id
from performer_tbl
where genre = 'A'
GROUP BY event_id) a_t
INNER JOIN (select event_id
from performer_tbl
where genre = 'B'
GROUP BY event_id) b_t
ON a_t.event_id = b_t.event_id
INNER JOIN event_tbl
ON event_tbl.event_id = a_t.event_id

This also works using left joins: (Since there are no function calls or sub-selects, it is fast. Also, it's usable in most SQL engines.)
SELECT DISTINCT
p1.event_id
,e.event_date
,e.event_location
FROM
performer_tbl as p1
inner join event_tbl as e on
p1.event_id = e.event_id
left outer join performer_tbl as p2 on
p1.event_id = p2.event_id
AND p2.genre = 'A'
left outer join performer_tbl as p3 on
p1.event_id = p3.event_id
AND p3.genre = 'B'
WHERE
p2.genre IS NOT NULL
AND p3.genre IS NOT NULL;

If I correctly understand what you need, you can try this:
Select *
from event_tbl e
where exists (select *
from performer_tbl p
where p.event_id = e.event_id
and p.genre in ('A', 'B'))

Related

select single row from foreign table in left join

I want to fetch the first row where foreign key match. I don't know how to select first row
where foreign key matches
events table
id | name
----------------
1 | john
----------------
2 | Cat
event_attendee table
id | event_id | type
--------------------------
1 | 1 | User
--------------------------
2 | 1 | Local
--------------------------
3 | 1 | User
--------------------------
4 | 2 | User
--------------------------
5 | 2 | User
I want this result
id | name | event_id | type
------------------------------------
1 | John | 1 | User
------------------------------------
2 | Cat | 2 | User
Tried
select
a.*,
b.*
from
events as a
left join (
select
distinct
event_attendee.events_id,
event_attendee.type
from
event_attendee
left join events on
event_attendees.events_id = events.id
where
events.id = event_attendees.events_id
limit 1
) as b on
a.id = b.events_id
Problem
It only works for the 1st row, for 2nd row its show empty
id | name | type
------------------------------------
1 | John | User
------------------------------------
2 | Cat |
You can do this using a lateral join. In Postgres, the syntax is:
select e.*, ea.*
from events e left join lateral
(select ea.event_Id, ea.Type
from event_attendee ea
where ea.event_id = e.id
order by ea.id
) ea
on 1=1;
However, distinct on is a way to do this with no subqueries:
select distinct on (e.event_id) e.*, ea.*
from events e join
event_attendee ea
on ea.event_id = e.id
order by e.event_id, ea.id;
I would expect the lateral join to work better on larger tables, particularly with the correct indexes.
This is easy with a cross apply:
select *
from events e
cross apply (
select top (1) event_Id, Type
from event_attendee ea
where ea.event_id=e.id
order by id
)x
Edit, alternative compatible method!
select e.*,ea.event_Id, (select type from event_attendee ea2 where ea2.id=ea.id ) Type
from (
select Min(id) Id, event_id
from event_attendee
group by event_id
)ea
join events e on e.id=ea.event_id
One way to get the rank and use it to filter 1st record:
select
t_.id, t_.name, t_.type
from
(
select a.*, b.type,
rank() OVER (PARTITION BY a.id ORDER BY b.id asc) rank_
from events a
left join event_attendees b
on
a.id = b.events_id
) t_
where
t_.rank_ = 1

Select from multiple table, eliminating duplicates values

I have these tables and values:
Person Account
------------------ -----------------------
ID | CREATED_BY ID | TYPE | DATA
------------------ -----------------------
1 | 1 | T1 | USEFUL DATA
2 | 2 | T2 |
3 | 3 | T3 |
4 | 4 | T2 |
Person_account_link
--------------------------
ID | PERSON_ID | ACCOUNT_ID
--------------------------
1 | 1 | 1
2 | 1 | 2
3 | 2 | 3
4 | 3 | 4
I want to select all persons with T1 account type and get the data column, for the others persons they should be in the result without any account information.
(I note that person 1 has two accounts : account_id_1 and account_id_2 but only one row must be displayed (priority for T1 type if exist otherwise null)
The result should be :
Table1
-----------------------------------------------------
PERSON_ID | ACCOUNT_ID | ACCOUNT_TYPE | ACCOUNT_DATA
-----------------------------------------------------
1 | 1 | T1 | USEFUL DATA
2 | NULL | NULL | NULL
3 | NULL | NULL | NULL
4 | NULL | NULL | NULL
You can do conditional aggregation :
SELECT p.id,
MAX(CASE WHEN a.type = 'T1' THEN a.id END) AS ACCOUNT_ID,
MAX(CASE WHEN a.type = 'T1' THEN 'T1' END) AS ACCOUNT_TYPE,
MAX(CASE WHEN a.type = 'T1' THEN a.data END) AS ACCOUNT_DATA
FROM person p LEFT JOIN
Person_account_link pl
ON p.id = pl.person_id LEFT JOIN
account a
ON pl.account_id = a.id
GROUP BY p.id;
You would need an outer join, starting with Person and then to the other two tables. I would also aggregate with group by and min to tackle the situation where a person would have two or more T1 accounts. In that case one of the data is taken (the min of them):
select p.id person_id,
min(a.id) account_id,
min(a.type) account_type,
min(a.data) account_data
from Person p
left join Person_account_link pa on p.id = pa.person_id
left join Account a on pa.account_id = a.id and a.type = 'T1'
group by p.id
In Postgres, I like to use the FILTER keyword. In addition, the Person table is not needed if you only want persons with an account. If you want all persons:
SELECT p.id,
MAX(a.id) FILTER (a.type = 'T1') as account_id,
MAX(a.type) FILTER (a.type = 'T1') as account_type,
MAX(a.data) FILTER (a.type = 'T1') as account_data
FROM Person p LEFT JOIN
Person_account_link pl
ON pl.person_id = p.id LEFT JOIN
account a
ON pl.account_id = a.id
GROUP BY p.id;

Get count of related records in two joined tables

Firstly, I apologize for my English. I want get auctions with count of bids and buys. It should look like this:
id | name | bids | buys
-----------------------
1 | Foo | 4 | 1
2 | Bar | 0 | 0
I have tables like following:
auction:
id | name
---------
1 | Foo
2 | Bar
auction_bid:
id | auction_id
---------------
1 | 1
2 | 1
3 | 1
4 | 1
auction_buy:
id | auction_id
---------------
1 | 1
I can get numbers in two queries:
SELECT *, COUNT(abid.id) AS `bids` FROM `auction` `t` LEFT JOIN auction_bid abid ON (t.id = abid.auction) GROUP BY t.id
SELECT *, COUNT(abuy.id) AS `buys` FROM `auction` `t` LEFT JOIN auction_buy abuy ON (t.id = abuy.auction) GROUP BY t.id
But when i combined it into one:
SELECT *, COUNT(abid.id) AS `bids`, COUNT(abuy.id) AS `buys` FROM `auction` `t` LEFT JOIN auction_bid abid ON (t.id = abid.auction) LEFT JOIN auction_buy abuy ON (t.id = abuy.auction) GROUP BY t.id
It was returning wrong amount (bids as much as buys).
How to fix this and get counts in one query?
You'll need to count DISTINCT abuy and abid IDs to eliminate the duplicates;
SELECT t.id, t.name,
COUNT(DISTINCT abid.id) `bids`,
COUNT(DISTINCT abuy.id) `buys`
FROM `auction` `t`
LEFT JOIN auction_bid abid ON t.id = abid.auction_id
LEFT JOIN auction_buy abuy ON t.id = abuy.auction_id
GROUP BY t.id, t.name;
An SQLfiddle to test with.
Try this:
SELECT t.*,COUNT(abid.id) as bids,buys
FROM auction t LEFT JOIN
auction_bid abid ON t.id = abid.auction_id LEFT JOIN
(SELECT t.id, Count(abuy.id) as buys
FROM auction t LEFT JOIN
auction_buy abuy ON t.id = abuy.auction_id
GROUP BY t.id) Temp ON t.id=Temp.id
GROUP BY t.id
Result:
ID NAME BIDS BUYS
1 Foo 2 0
2 Bar 1 1
Result in SQL Fiddle.

4:**Count/sum rows in multiple related tables

I have a complex select that - when simplified - looks like this:
select m.ID,
(select sum(AMOUNT) from A where M_ID = m.ID) sumA,
(select sum(AMOUNT) from B where M_ID = m.ID) sumB,
.....
from M;
The tables A,B,... have a foreign key M_ID pointing into table M.
The problem is that this select is very slow. I'd like to rewrite it using table joins, but I don't know how, because
select m.ID
sum(a.AMOUNT),
sum(b.AMOUNT),
.....
from M
join A on a.M_ID = m.ID
join B on b.M_ID = m.ID
....
group by m.ID;
gives incorrect (much higher) sum results, as each row in A or B can be counted multiple times.
Is there a way how to write that select optimally using e.g. analytical functions or some other ways?
Edit:
The explain plan for the original (not simplified) select looks like this:
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
|* 2 | FILTER | |
|* 3 | TABLE ACCESS BY INDEX ROWID| WORKITEM |
|* 4 | INDEX SKIP SCAN | WORKITEM_U01 |
|* 5 | FILTER | |
|* 6 | TABLE ACCESS FULL | RPRODUCT_INVENTORY_MASTER |
.....
| 31 | SORT AGGREGATE | |
|* 32 | FILTER | |
|* 33 | TABLE ACCESS BY INDEX ROWID| WORKITEM |
|* 34 | INDEX SKIP SCAN | WORKITEM_U01 |
|* 35 | FILTER | |
|* 36 | TABLE ACCESS FULL | RPRODUCT_INVENTORY_MASTER |
| 37 | SORT GROUP BY | |
| 38 | TABLE ACCESS FULL | RPRODUCT |
That's why I want to optimize it. Moreover, the AWR report shows that this select has 50000 gets/exec.
Edit2,3:
The whole select looks like this:
SELECT rprd.ID,
rprd.NAME,
(select sum(AMOUNT) from WORKITEM
where ACTION='REMOVE'
and trunc(CREATED_DATE) = to_date(:1,'DDMMYYYY')
and PAYEE_ID in
(select rim.RPRODUCT_ID from RPRODUCT_INVENTORY_MASTER rim
where rprd.ID = rim.RPRODUCT_ID
and rim.INVENTORY_DATE = to_date(:2,'DDMMYYYY')),
.....
(select sum(AMOUNT) from WORKITEM
where ACTION='COLLECT'
and trunc(CREATED_DATE) < to_date(:11,'DDMMYYYY')
and PAYEE_ID in
(select rim.RPRODUCT_ID from RPRODUCT_INVENTORY_MASTER rim
where rprd.ID = rim.RPRODUCT_ID
and rim.INVENTORY_DATE < to_date(:12,'DDMMYYYY'))
FROM RPRODUCT rprd
GROUP BY rprd.ID, rprd.NAME
ORDER BY rprd.ID
;
I didn't write it :-), I'm about to re-write it. Note, there are differences in comparison operators, in ACTION values, in dates to compare INVENTORY_DATE to.
Edit4:
I tried to rewrite the query like this (and the exec plan looks better), but have run into the "row multiplicity" issues described above:
with RPRODUCT_INVENTORY_MASTER# as (
select RPRODUCT_ID, min(INVENTORY_DATE) INVENTORY_DATE
from RPRODUCT_INVENTORY_MASTER
group by RPRODUCT_ID),
WORKITEM# as (
select AMOUNT, PAYEE_ID, ACTION, trunc(CREATED_DATE) CREATED_DATE
from WORKITEM
where ACTION in ('REMOVE','ADD','COLLECT')
)
select rprd.ID,
rprd.NAME,
-- sum(wip2.AMOUNT), -- this is singular because of '=' in inventory_date comparison
sum(abs(wip4.AMOUNT)),
.....
sum(wip12.AMOUNT)
from RPRODUCT rprd
left join RPRODUCT_INVENTORY_MASTER# rim4 on rim4.RPRODUCT_ID = rprd.ID
and rim4.INVENTORY_DATE <= to_date(:4 ,'DDMMYYYY')
left join WORKITEM# wip4 on wip4.PAYEE_ID = rim4.RPRODUCT_ID
and wip4.ACTION='REMOVE'
and wip4.CREATED_DATE = to_date(:3 ,'DDMMYYYY')
.....
left join RPRODUCT_INVENTORY_MASTER# rim12 on rim12.RPRODUCT_ID = rprd.ID
and rim12.INVENTORY_DATE < to_date(:12 ,'DDMMYYYY')
left join WORKITEM# wip12 on wip12.PAYEE_ID = rim12.RPRODUCT_ID
and wip12.ACTION='COLLECT'
and wip12.CREATED_DATE < to_date(:11 ,'DDMMYYYY')
group by rprd.ID, rprd.NAME
order by rprd.ID
;
RPRODUCT_INVENTORY_MASTER# always gives at most one row for each rprd.ID. WORKITEM# can have any number of rows for each RPRODUCT_ID = rprd.ID.
Yes, this is a typical problem. I like your original query for its clarity. However, if running in performence issues, one has to think of other options.
Here is one option. As A and B get multiplied you could simply divide the sum by the related count. Well, admittedly this looks kind of strange though.
select m.ID
sum(a.AMOUNT) / count(distinct b.id),
sum(b.AMOUNT) / count(distinct a.id),
.....
from M
join A on a.M_ID = m.ID
join B on b.M_ID = m.ID
....
group by m.ID;
The other option, which I would prefer is to build groups, so as not to have multiple A and B per m.id in the first place:
select m.ID
a_agg.SUM_AMOUNT,
b_agg.SUM_AMOUNT,
.....
from M
join (select M_ID, sum(AMOUNT) as SUM_AMOUNT from A group by M_ID) a_agg
on a_agg.M_ID = m.ID
join (select M_ID, sum(AMOUNT) as SUM_AMOUNT from B group by M_ID) b_agg
on b_agg.M_ID = m.ID
EDIT: In case an M_ID might not have any A or any B, you would have to replace the joins with LEFT JOIN in both queries. Then in the first query select:
nvl(sum(a.AMOUNT), 0) / greatest(count(distinct b.id), 1),
nvl(sum(b.AMOUNT), 0) / greatest(count(distinct a.id), 1),
And in the second query:
nvl(a_agg.SUM_AMOUNT, 0),
nvl(b_agg.SUM_AMOUNT, 0),
EDIT: Here is your query modified. The trick is to join with distinct rims.
SELECT
rprd.ID,
rprd.NAME,
nvl(same_date.SUM_AMOUNT, 0),
.....
nvl(earlier_date.SUM_AMOUNT, 0)
FROM RPRODUCT rprd
LEFT JOIN
(
select rim.RPRODUCT_ID, sum(w.AMOUNT) as SUM_AMOUNT
from
(
select distinct RPRODUCT_ID
from RPRODUCT_INVENTORY_MASTER
where INVENTORY_DATE = to_date(:2,'DDMMYYYY')
) rim
left join WORKITEM w
on w.PAYEE_ID = rim.RPRODUCT_ID
and w.ACTION = 'REMOVE'
and trunc(w.CREATED_DATE) = to_date(:1,'DDMMYYYY')
) same_date on same_date.RPRODUCT_ID = rprd.ID
LEFT JOIN
(
select rim.RPRODUCT_ID, sum(w.AMOUNT) as SUM_AMOUNT
from
(
select distinct RPRODUCT_ID
from RPRODUCT_INVENTORY_MASTER
where INVENTORY_DATE < to_date(:12,'DDMMYYYY')
) rim
left join WORKITEM w
on w.PAYEE_ID = rim.RPRODUCT_ID
and w.ACTION = 'REMOVE'
and trunc(w.CREATED_DATE) < to_date(:11,'DDMMYYYY')
) earlier_date on earlier_date.RPRODUCT_ID = rprd.ID
GROUP BY rprd.ID, rprd.NAME
ORDER BY rprd.ID
;
This should work
select m.ID,
a.aamount,
b.bamount
from M
inner join
(
select M_ID,sum(AMOUNT) as aamount
from A group by M_ID
) a
on a.M_ID = m.ID
inner join
(
select M_ID,sum(AMOUNT) as bamount
from B group by M_ID
) b
on b.M_ID = m.ID;
This should work regardlessly of number of m_id rows in A, B, C, ... tables:
select
M.id,
sum(decode(u.src, 'A', u.sumx, 0)) sum_a,
sum(decode(u.src, 'B', u.sumx, 0)) sum_b,
sum(decode(u.src, 'C', u.sumx, 0)) sum_c,
...
from M,
(select 'A' src, m_id, sum(amount) sumx from A group by m_id
union all
select 'B', m_id, sum(amount) from B group by m_id
union all
select 'C', m_id, sum(amount) from C group by m_id
...
) u
where
M.id=u.m_id
group by
M.id;

Joining tables based on the maximum value

Here's a simplified example of what I'm talking about:
Table: students exam_results
_____________ ____________________________________
| id | name | | id | student_id | score | date |
|----+------| |----+------------+-------+--------|
| 1 | Jim | | 1 | 1 | 73 | 8/1/09 |
| 2 | Joe | | 2 | 1 | 67 | 9/2/09 |
| 3 | Jay | | 3 | 1 | 93 | 1/3/09 |
|____|______| | 4 | 2 | 27 | 4/9/09 |
| 5 | 2 | 17 | 8/9/09 |
| 6 | 3 | 100 | 1/6/09 |
|____|____________|_______|________|
Assume, for the sake of this question, that every student has at least one exam result recorded.
How would you select each student along with their highest score? Edit: ...AND the other fields in that record?
Expected output:
_________________________
| name | score | date |
|------+-------|--------|
| Jim | 93 | 1/3/09 |
| Joe | 27 | 4/9/09 |
| Jay | 100 | 1/6/09 |
|______|_______|________|
Answers using all types of DBMS are welcome.
Answering the EDITED question (i.e. to get associated columns as well).
In Sql Server 2005+, the best approach would be to use a ranking/window function in conjunction with a CTE, like this:
with exam_data as
(
select r.student_id, r.score, r.date,
row_number() over(partition by r.student_id order by r.score desc) as rn
from exam_results r
)
select s.name, d.score, d.date, d.student_id
from students s
join exam_data d
on s.id = d.student_id
where d.rn = 1;
For an ANSI-SQL compliant solution, a subquery and self-join will work, like this:
select s.name, r.student_id, r.score, r.date
from (
select r.student_id, max(r.score) as max_score
from exam_results r
group by r.student_id
) d
join exam_results r
on r.student_id = d.student_id
and r.score = d.max_score
join students s
on s.id = r.student_id;
This last one assumes there aren't duplicate student_id/max_score combinations, if there are and/or you want to plan to de-duplicate them, you'll need to use another subquery to join to with something deterministic to decide which record to pull. For example, assuming you can't have multiple records for a given student with the same date, if you wanted to break a tie based on the most recent max_score, you'd do something like the following:
select s.name, r3.student_id, r3.score, r3.date, r3.other_column_a, ...
from (
select r2.student_id, r2.score as max_score, max(r2.date) as max_score_max_date
from (
select r1.student_id, max(r1.score) as max_score
from exam_results r1
group by r1.student_id
) d
join exam_results r2
on r2.student_id = d.student_id
and r2.score = d.max_score
group by r2.student_id, r2.score
) r
join exam_results r3
on r3.student_id = r.student_id
and r3.score = r.max_score
and r3.date = r.max_score_max_date
join students s
on s.id = r3.student_id;
EDIT: Added proper de-duplicating query thanks to Mark's good catch in comments
SELECT s.name,
COALESCE(MAX(er.score), 0) AS high_score
FROM STUDENTS s
LEFT JOIN EXAM_RESULTS er ON er.student_id = s.id
GROUP BY s.name
Try this,
Select student.name, max(result.score) As Score from Student
INNER JOIN
result
ON student.ID = result.student_id
GROUP BY
student.name
With Oracle's analytic functions this is easy:
SELECT DISTINCT
students.name
,FIRST_VALUE(exam_results.score)
OVER (PARTITION BY students.id
ORDER BY exam_results.score DESC) AS score
,FIRST_VALUE(exam_results.date)
OVER (PARTITION BY students.id
ORDER BY exam_results.score DESC) AS date
FROM students, exam_results
WHERE students.id = exam_results.student_id;
Select Name, T.Score, er. date
from Students S inner join
(Select Student_ID,Max(Score) as Score from Exam_Results
Group by Student_ID) T
On S.id=T.Student_ID inner join Exam_Result er
On er.Student_ID = T.Student_ID And er.Score=T.Score
Using MS SQL Server:
SELECT name, score, date FROM exam_results
JOIN students ON student_id = students.id
JOIN (SELECT DISTINCT student_id FROM exam_results) T1
ON exam_results.student_id = T1.student_id
WHERE exam_results.id = (
SELECT TOP(1) id FROM exam_results T2
WHERE exam_results.student_id = T2.student_id
ORDER BY score DESC, date ASC)
If there is a tied score, the oldest date is returned (change date ASC to date DESC to return the most recent instead).
Output:
Jim 93 2009-01-03 00:00:00.000
Joe 27 2009-04-09 00:00:00.000
Jay 100 2009-01-06 00:00:00.000
Test bed:
CREATE TABLE students(id int , name nvarchar(20) );
CREATE TABLE exam_results(id int , student_id int , score int, date datetime);
INSERT INTO students
VALUES
(1,'Jim'),(2,'Joe'),(3,'Jay')
INSERT INTO exam_results VALUES
(1, 1, 73, '8/1/09'),
(2, 1, 93, '9/2/09'),
(3, 1, 93, '1/3/09'),
(4, 2, 27, '4/9/09'),
(5, 2, 17, '8/9/09'),
(6, 3, 100, '1/6/09')
SELECT name, score, date FROM exam_results
JOIN students ON student_id = students.id
JOIN (SELECT DISTINCT student_id FROM exam_results) T1
ON exam_results.student_id = T1.student_id
WHERE exam_results.id = (
SELECT TOP(1) id FROM exam_results T2
WHERE exam_results.student_id = T2.student_id
ORDER BY score DESC, date ASC)
On MySQL, I think you can change the TOP(1) to a LIMIT 1 at the end of the statement. I have not tested this though.