SQL aggregation over one column giving a result from another - sql

I am trying (and failing) to join some tables in a SQLite database. The data itself is complicated but I think I have boiled it down to an illustrative example.
Here are the three tables I want to join.
Table: Events
+----+---------+-------+-----------+
| id | user_id | class | timestamp |
+----+---------+-------+-----------+
| 1 | 'user1' | 6 | 100 |
| 2 | 'user1' | 12 | 400 |
| 3 | 'user1' | 4 | 900 |
| 4 | 'user2' | 6 | 400 |
| 5 | 'user2' | 3 | 800 |
| 6 | 'user2' | 8 | 900 |
+----+---------+-------+-----------+
Table: Games
+---------+---------+------------+-----------+
| user_id | game_id | game_class | timestamp |
+---------+---------+------------+-----------+
| 'user1' | 1 | 'A' | 200 |
| 'user2' | 2 | 'A' | 300 |
| 'user1' | 3 | 'B' | 500 |
| 'user1' | 4 | 'A' | 600 |
| 'user1' | 5 | 'A' | 700 |
+---------+---------+------------+-----------+
Table: AScores
+---------+-------+
| game_id | score |
+---------+-------+
| 1 | 8 |
| 2 | 2 |
| 4 | 9 |
| 5 | 6 |
+---------+-------+
I would like to join these to provide an additional column on the first table containing the users current score in game class A at the time of the event. I.e. I would like theresult of the join to look like this:
Desired Result
+----+----------+-------+-----------+-----------------+
| id | user_id | class | timestamp | current_a_score |
+----+----------+-------+-----------+-----------------+
| 1 | 'user1' | 6 | 100 | (null) |
| 2 | 'user1' | 12 | 400 | 8 |
| 3 | 'user1' | 4 | 900 | 6 |
| 4 | 'user2' | 6 | 400 | 2 |
| 5 | 'user2' | 3 | 800 | 2 |
| 6 | 'user2' | 8 | 900 | 2 |
+----+----------+-------+-----------+-----------------+
The following simple join pulls together the two tables AScores and Games.
SELECT * FROM AScores
INNER JOIN Games
ON AScores.game_id = Games.game_id
And so I was hoping to join this to the Events table as a sub-query. Something like this:
SELECT Events.*, AScoredGames.time_stamp AS game_time_stamp, AScoredGames.score
FROM Events
LEFT OUTER JOIN (
SELECT AScores.score, Games.* FROM AScores
INNER JOIN Games
ON AScores.game_id = Games.game_id
) AS AScoredGames
ON Events.user_id = AScoredGames.user_id
AND Events.time_stamp >= AScoredGames.time_stamp
ORDER BY Events.time_stamp ASC
That results in the following:
+----+---------+-------+------------+-----------------+-------+
| id | user_id | class | time_stamp | game_time_stamp | score |
+----+---------+-------+------------+-----------------+-------+
| 1 | user1 | 6 | 100 | NULL | NULL |
| 2 | user1 | 12 | 400 | 200 | 8 |
| 4 | user2 | 6 | 400 | 300 | 2 |
| 5 | user2 | 3 | 800 | 300 | 2 |
| 6 | user2 | 8 | 900 | 300 | 2 |
| 3 | user1 | 4 | 900 | 200 | 8 |
| 3 | user1 | 4 | 900 | 600 | 9 |
| 3 | user1 | 4 | 900 | 700 | 6 |
+----+---------+-------+------------+-----------------+-------+
So I need to group by Events.id to get rid of the triplicated row with Events.id 3. But what I want to do is to choose the row with the maximum game_time_stamp but then use the row's score. If I do MAX(game_time_stamp) as my aggregation I still have to independently aggregate the score. Is there a way to tie the row choice in the score column's aggregation function to the result of the game_time_stamp column's aggregation function?
(N.B. Existing answers to questions like Select first record in a One-to-Many relation using left join and SQL Server: How to Join to first row seem to suggest I cannot and say one must use a WHERE clause over a sub-query. But I am struggling with that (I'll post another question about that) and I can think of at least one solution and I am hoping there are better ones.)

The following query should do it. It uses a NOT EXISTS condition with a correlated subquery to locate the relevant game record for each event.
SELECT e.*, s.score current_a_score
FROM
events e
LEFT JOIN games g
ON g.user_id = e .user_id
AND g.timestamp < e.timestamp
AND NOT EXISTS (
SELECT 1
FROM games g1
WHERE
g1.user_id = e .user_id
AND g1.timestamp < e.timestamp
AND g1.timestamp > g.timestamp
)
LEFT JOIN ascores s
ON s.game_id = g.game_id
ORDER BY e.id
This DB Fiddle demo with your test data returns :
| id | user_id | class | timestamp | current_a_score |
| --- | ------- | ----- | --------- | --------------- |
| 1 | user1 | 6 | 100 | |
| 2 | user1 | 12 | 400 | 8 |
| 3 | user1 | 4 | 900 | 6 |
| 4 | user2 | 6 | 400 | 2 |
| 5 | user2 | 3 | 800 | 2 |
| 6 | user2 | 8 | 900 | 2 |

I have one work-around, but it feels hacky and relies on the specifics of my data. First note that the time_stamps are all multiples of 100 while the scores are all below 10. I can acombine these in a way that will not interfere with my comparison but will mean they are both encoded in one numeric column. This query gives the desired result:
SELECT Events.id, MIN(Events.user_id) AS user_id, MIN(Events.class) AS class, MIN(Events.time_stamp) AS time_stamp, MAX(AScoredGames.combination) % 10 AS current_a_score
FROM Events
LEFT OUTER JOIN (
SELECT AScores.score, AScores.score + (Games.time_stamp - 10) AS combination, Games.* FROM AScores
INNER JOIN Games
ON AScores.game_id = Games.game_id) AS AScoredGames
ON Events.user_id = AScoredGames.user_id AND Events.time_stamp >= AScoredGames.time_stamp
GROUP BY Events.id
ORDER BY id ASC
(The combining is done in AScores.score + (Games.time_stamp - 10) and so the aggregate function becomes MAX(AScoredGames.combination) % 10.)
Actual Result
+----+---------+-------+------------+-----------------+
| id | user_id | class | time_stamp | current_a_score |
+----+---------+-------+------------+-----------------+
| 1 | user1 | 6 | 100 | NULL |
| 2 | user1 | 12 | 400 | 8 |
| 3 | user1 | 4 | 900 | 6 |
| 4 | user2 | 6 | 400 | 2 |
| 5 | user2 | 3 | 800 | 2 |
| 6 | user2 | 8 | 900 | 2 |
+----+---------+-------+------------+-----------------+

Related

SQL Count depending on certain conditions

I have two tables.
One have userid and email (users table). The other have payments information (payments table) from the userid in users.
users
+--------+------------+
| Userid | Name |
+--------+------------+
| 1 | Alex T |
| 2 | Jeremy T |
| 3 | Frederic A |
+--------+------------+
payments
+--------+-----------+------------+----------+
| Userid | ValuePaid | PaidMonths | Refunded |
+--------+-----------+------------+----------+
| 1 | 1 | 12 | null |
| 1 | 20 | 12 | null |
| 1 | 20 | 12 | null |
| 1 | 20 | 1 | null |
| 2 | 1 | 1 | null |
| 2 | 20 | 12 | 1 |
| 2 | 20 | 12 | null |
| 2 | 20 | 1 | null |
| 3 | 1 | 12 | null |
| 3 | 20 | 1 | 1 |
| 3 | 20 | 1 | null |
+--------+-----------+------------+----------+
I want to count the PaidMonths taking in consideration the following rules:
If ValuePaid < 10 PaidMonths should be = 0.23 (even if in the column the value seen is any other mumber).
If Refund=1 the PaidMonths should be = 0.
Based on this when i join both tables by userid, and sum the PaidMonths based in the previousrules, i expect to see as result:
+--------+------------+------------+
| userid | Name | paidMonths |
+--------+------------+------------+
| 1 | Alex T | 25.23 |
| 2 | Jeremy T | 13.23 |
| 3 | Frederic A | 1.23 |
+--------+------------+------------+
Can you help me to achieve this in the most elegant way? Should a temporary table be used?
The following gives your desired results, using apply with case expression to map your values:
select u.UserID, u.Name, Sum(pm) PaidMonths
from users u join payments p on p.userid=u.userid
cross apply (values(
case
when valuepaid <10 then 0.23
when Refunded=1 then 0
else PaidMonths end
))x(pm)
group by u.UserID, u.Name
See Working Fiddle

Sum with 3 tables to join

I have 3 tables. The link between the first and the second table is REQ_ID and the link between the second and the third table is ENC_ID. There is no direct link between the first and the third table.
INS_RCPT
+----+--------+------+----------+
| ID | REQ_ID | CURR | RCPT_AMT |
+----+--------+------+----------+
| 1 | 1 | USD | 100 |
| 2 | 2 | USD | 200 |
| 3 | 3 | USD | 300 |
+----+--------+------+----------+
ENC_LOG
+----+--------+--------+-------------+
| ID | REQ_ID | ENC_ID | ENC_LOG_AMT |
+----+--------+--------+-------------+
| 1 | 1 | 1 | 20 |
| 2 | 1 | 2 | 50 |
| 3 | 1 | 3 | 30 |
| 4 | 2 | 4 | 20 |
+----+--------+--------+-------------+
ENC_RCPT
+----+--------+--------------+
| ID | ENC_ID | ENC_RCPT_AMT |
+----+--------+--------------+
| 1 | 1 | 10 |
| 2 | 1 | 10 |
| 3 | 2 | 15 |
| 4 | 2 | 25 |
| 5 | 2 | 10 |
| 6 | 3 | 12 |
| 7 | 3 | 18 |
| 8 | 4 | 10 |
+----+--------+--------------+
I would like to have output as follows:
+----+--------+------+----------+-------------+--------------+
| ID | REQ_ID | CURR | RCPT_AMT | ENC_LOG_AMT | ENC_RCPT_AMT |
+----+--------+------+----------+-------------+--------------+
| 1 | 1 | USD | 100 | 100 | 100 |
| 2 | 2 | USD | 200 | 20 | 10 |
| 3 | 3 | USD | 300 | 0 | 0 |
+----+--------+------+----------+-------------+--------------+
I am using SQL Server to write this query. Any help is appreciated.
One approach would be to join the first table to two subqueries which compute the sums separately:
SELECT
ir.ID,
ir.REQ_ID,
ir.CURR,
ir.RCPT_AMT,
el.ENC_LOG_AMT,
er.ENC_RCPT_AMT
FROM INS_RCPT ir
LEFT JOIN
(
SELECT REQ_ID, SUM(ENC_LOG_AMT) AS ENC_LOG_AMT
FROM ENC_LOG
GROUP BY REQ_ID
) el
ON ir.REQ_ID = el.REQ_ID
LEFT JOIN
(
SELECT t1.REQ_ID, SUM(t2.ENC_RCPT_AMT) AS ENC_RCPT_AMT
FROM ENC_LOG t1
INNER JOIN ENC_RCPT t2 ON t1.ENC_ID = t2.ENC_ID
GROUP BY t1.REQ_ID
) er
ON ir.REQ_ID = er.REQ_ID
Demo
Note that your question includes a curve ball. The second subquery needs to return aggregates of the receipt table by REQ_ID, even though this field does not appear in that table. As a result, we actually need to join ENC_LOG to ENC_RCPT in that subquery, and then aggregate by REQ_ID.
You can try the below query. Also change the join from left to inner as per your requirement.
select a.id,a.req_id,a.curr,sum(a.rcpt_amt) rcpt_amt,sum(a.enc_log_amt) enc_log_amt,sum(c.enc_rcpt_amt) enc_rcpt_amt
from
(
select a.id id ,a.req_id req_id ,a.curr curr,sum(rcpt_amt) as rcpt_amt,sum(enc_log_amt) as enc_log_amt
from ins_rcpt a
left join enc_log b
on a.req_id=b.req_id
group by id,req_id,curr
) a
left join enc_rcpt c
on a.enc_id = c.enc_id
group by id,req_id,curr;

Merge columns on two left joins

I have 3 tables as shown:
Video
+----+--------+-----------+
| id | name | videoSize |
+----+--------+-----------+
| 1 | video1 | 1MB |
| 2 | video2 | 2MB |
| 3 | video3 | 3MB |
+----+--------+-----------+
Survey
+----+---------+-----------+
| id | name | questions |
+----+---------+-----------+
| 1 | survey1 | 1 |
| 2 | survey2 | 2 |
| 3 | survey3 | 3 |
+----+---------+-----------+
Sequence
+----+---------+-----------+----------+
| id | videoId | surveyId | sequence |
+----+---------+-----------+----------+
| 1 | null | 1 | 1 |
| 2 | 2 | null | 2 |
| 3 | null | 3 | 3 |
+----+---------+-----------+----------+
I would like to query Sequence and join on both of video and survey tables and merge common columns without specifying the column names (in this case name) like this:
Query Result:
+----+---------+-----------+----------+---------+-----------+-----------+
| id | videoId | surveyId | sequence | name | videoSize | questions |
+----+---------+-----------+----------+---------+-----------+-----------+
| 1 | null | 1 | 1 | survey1 | null | 1 |
| 2 | 2 | null | 2 | video2 | 2MB | null |
| 3 | null | 3 | 3 | survey3 | null | 3 |
+----+---------+-----------+----------+---------+-----------+-----------+
Is this possible?
BTW the below sql doesn't work as it doesn't merge on the name field:
SELECT * FROM "Sequence"
LEFT JOIN "Survey" ON "Survey"."id" = "Sequence"."surveyId"
LEFT JOIN "Video" ON "Video"."id" = "Sequence"."videoId"
This query will show what you want:
select
s.*,
coalesce(y.name, v.name) as name, -- picks the right column
v.videoSize,
y.questions
from sequence s
left join survey y on y.id = s.surveyId
left join video v on v.id = s.videoId
However, the SQL standard requires you to name the columns you want. The only exception being * as shown above.

Need query for JOIN four tables with some conditions?

I have the following four tables:
1) mls_user
2) mls_category
3) bonus_point
4) mls_entry
In mls_user table values are like below:
*-------------------------*
| id | store_id | name |
*-------------------------*
| 1 | 101 | sandeep |
| 2 | 101 | gagan |
| 3 | 102 | santosh |
| 4 | 103 | manu |
| 5 | 101 | jagveer |
*-------------------------*
In mls_category table values are like below:
*---------------------------------*
| cat_no | store_id | cat_value |
*---------------------------------*
| 20 | 101 | 1 |
| 21 | 101 | 4 |
| 30 | 102 | 1 |
| 31 | 102 | 2 |
| 40 | 103 | 1 |
| 41 | 103 | 1 |
*---------------------------------*
In bonus_point table values are like below:
*-----------------------------------*
| user_id | store_id | bonus_point |
| 1 | 101 | 10 |
| 4 | 101 | 5 |
*-----------------------------------*
In mls_entry table values are like below:
*-------------------------------------------------------*
| user_id | store_id | category | distance | status |
*-------------------------------------------------------*
| 1 | 101 | 20 | 10 | Approved |
| 1 | 101 | 21 | 40 | Approved |
| 1 | 101 | 20 | 10 | Approved |
| 2 | 101 | 20 | 5 | Approved |
| 3 | 102 | 30 | 10 | Approved |
| 3 | 102 | 31 | 80 | Approved |
| 4 | 101 | 20 | 15 | Approved |
*-------------------------------------------------------*
And I want below output:
*--------------------------------------------------*
| user name | Points | bonus Point | Total Point |
*--------------------------------------------------*
| Sandeep | 30 | 10 | 40 |
| Santosh | 30 | 0 | 30 |
| Manu | 15 | 5 | 20 |
| Gagan | 5 | 0 | 5 |
| Jagveer | 0 | 0 | 0 |
*--------------------------------------------------*
I tell the calculation of how the points will come for user Sandeep.
Points = ((10+10)/1 + 40/4)=30
Here 1 and 4 is cat value which comes from mls_category.
I am using below code for a particular user but when i
SELECT sum(t1.totald/c.cat_value) as total_distance
FROM mls_category c
join (
select sum(distance) totald, user_id, category
FROM mls_entry
WHERE user_id=1 AND store_id='101' AND status='approved'
group by user_id, category) t1 on c.cat_no = t1.category
I have created tables in online for checking
DEMO
Computing the points (other than the bonus points) requires a separate join between the mls_entry and mls_category tables. I would do this in a separate subquery, and then join this to the larger query.
Here is one approach:
SELECT
u.name,
COALESCE(t1.points, 0) AS points,
COALESCE(b.bonus_point, 0) AS bonus_points,
COALESCE(t1.points, 0) + COALESCE(b.bonus_point, 0) AS total_points
FROM mls_user u
LEFT JOIN
(
SELECT e.user_id, SUM(e.distance / c.cat_value) AS points
FROM mls_entry e
INNER JOIN mls_category c
ON e.store_id = c.store_id AND e.category = c.cat_no
GROUP BY e.user_id
) t1
ON u.id = t1.user_id
LEFT JOIN bonus_point b
ON u.id = b.user_id
ORDER BY
total_points DESC;
This is the output I am getting from the above query in the demo you setup:
The output does not match exactly, because you have (perhaps) a typo in Santosh's data in your question, or otherwise the expected output in your question has a typo.

SQL Query in MANY- MANY RELATIONSHIP exactly one record with matching criteria

I have 3 like with many - many relationship
As:
TABLE 1 : select * from student;
| id | name |
| 1 | sone |
| 2 | stwo |
| 3 | sthree |
| 4 | sfour |
| 6 | ssix |
TABLE 2 : select * from course;
| id | name |
| 100 | CSE |
| 101 | ECE |
| 102 | ITI |
RELATION_SHIP TABLE : select * from student_course
| id | stu_id | cou_id |
| 1 | 1 | 101 |
| 2 | 2 | 102 |
| 3 | 2 | 100 |
| 4 | 3 | 100 |
| 5 | 3 | 101 |
| 6 | 1 | 101 |
| 1 | 6 | 101 |
I need to write a query to select a student with exactly one course 'CSE' and he should not have any other courses.
Thanks in advance
Use query:
SELECT
sc.`stu_id`,
COUNT(sc.`cou_id`) AS cnt
FROM
student_course sc
GROUP BY sc.`stu_id`
HAVING cnt = 1
AND GROUP_CONCAT(cou_id) LIKE
(SELECT
id
FROM
course
WHERE NAME = 'CSE')