Multiple Left Joins - how to? - sql

I have a Rails app running at Heroku, where I'm trying to calculate the rank (position) of a user to a highscore list.
The app is a place for the users to bet each other and the can start the wager (be creating a CHOICE) or they can bet against an already created Choice (by making a BET).
I have the following SQL which should give me an array of users based on their total winnings on both Choices and Bets.. But it's giving me some wrong total winning and I think the problem is in the Left Joins because if I rewrite the SQL to only contain either the Choice or the Bet table then I works just fine..
Anyone with any pointers on how to rewrite the SQL to work correctly :)
SELECT users.id, sum(COALESCE(bets.profitloss, 0) + COALESCE(choices.profitloss, 0)) as total_pl
FROM users
LEFT JOIN bets ON bets.user_id = users.id
LEFT JOIN choices ON choices.user_id = users.id
GROUP BY users.id
ORDER BY total_pl DESC
Result:
+---------------+
| id | total_pl |
+---------------+
| 1 | 830 |
| 4 | 200 |
| 3 | 130 |
| 7 | -220 |
| 5 | -1360 |
| 6 | -4950 |
+---------------+
Below are the two SQL string where I only join to one table and the two results from that.. see that the sum of the below do not match the above result.. The below are the correct sum.
SELECT users.id, sum(COALESCE(bets.profitloss, 0)) as total_pl
FROM users
LEFT JOIN bets ON bets.user_id = users.id
GROUP BY users.id
ORDER BY total_pl DESC
SELECT users.id, sum(COALESCE(choices.profitloss, 0)) as total_pl
FROM users
LEFT JOIN choices ON choices.user_id = users.id
GROUP BY users.id
ORDER BY total_pl DESC
+---------------+
| id | total_pl |
+---------------+
| 3 | 170 |
| 1 | 150 |
| 4 | 100 |
| 5 | 80 |
| 7 | 20 |
| 6 | -30 |
+---------------+
+---------------+
| id | total_pl |
+---------------+
| 1 | 20 |
| 4 | 0 |
| 3 | -10 |
| 7 | -30 |
| 5 | -110 |
| 6 | -360 |
+---------------+

This is happening because of the relationship between the two LEFT JOINed tables - that is, if there are (multiple) rows in both bets and choices, the total number of rows seen is multiplied from the individual row counts, not the addition.
If you have
choices
id profitloss
================
1 20
1 30
bets
id profitloss
================
1 25
1 35
The result of the join is actually:
bets/choices
id bets.profitloss choices.profitloss
1 20 25
1 20 35
1 30 25
1 30 35
(see where this is going?)
Fixing this is actually fairly simple. You haven't specified an RDBMS, but this should work on any of them (or with minor tweaks).
SELECT users.id, COALESCE(bets.profitloss, 0)
+ COALESCE(choices.profitloss, 0) as total_pl
FROM users
LEFT JOIN (SELECT user_id, SUM(profitloss) as profitloss
FROM bets
GROUP BY user_id) bets
ON bets.user_id = users.id
LEFT JOIN (SELECT user_id, SUM(profitloss) as profitloss
FROM choices
GROUP BY user_id) choices
ON choices.user_id = users.id
ORDER BY total_pl DESC
(Also, I believe the convention is to name tables singular, not plural.)

Your problem is that you are blowing out your data set. If you did a SELECT * you would be able to see it. Try this. I was not able to test it because I don't have your tables, but it should work
SELECT
totals.id
,SUM(totals.total_pl) total_pl
FROM
(
SELECT users.id, sum(COALESCE(bets.profitloss, 0)) as total_pl
FROM users
LEFT JOIN bets ON bets.user_id = users.id
GROUP BY users.id
UNION ALL SELECT users.id, sum(COALESCE(choices.profitloss, 0)) as total_pl
FROM users
LEFT JOIN choices ON choices.user_id = users.id
GROUP BY users.id
) totals
GROUP BY totals.id
ORDER BY total_pl DESC

In a similar solution as Clockwork, since the columns are the same per table, I would pre-union them and just sum them. So, AT MOST, the inner query will have two records per user... one for the bets, one for the choices -- each respectively pre-summed since doing a UNION ALL. Then, simple join/sum to get the results
select
U.userid,
sum( coalesce( PreSum.profit, 0) ) as TotalPL
from
Users U
LEFT JOIN
( select user_id, sum( profitloss ) as Profit
from bets
group by user_id
UNION ALL
select user_id, sum( profitloss ) as Profit
from choices
group by user_id ) PreSum
on U.ID = PreSum.User_ID
group by
U.ID

Related

SUM CASE when DISTINCT?

Joining two tables and grouping, we're trying to get the sum of a user's value but only include a user's value once if that user is represented in a grouping multiple times.
Some sample tables:
user table:
| id | net_worth |
------------------
| 1 | 100 |
| 2 | 1000 |
visit table:
| id | location | user_id |
-----------------------------
| 1 | mcdonalds | 1 |
| 2 | mcdonalds | 1 |
| 3 | mcdonalds | 2 |
| 4 | subway | 1 |
We want to find the total net worth of users visiting each location. User 1 visited McDonalds twice, but we don't want to double count their net worth. Ideally we can use a SUM but only add in the net worth value if that user hasn't already been counted for at that location. Something like this:
-- NOTE: Hypothetical query
SELECT
location,
SUM(CASE WHEN DISTINCT user.id then user.net_worth ELSE 0 END) as total_net_worth
FROM visit
JOIN user on user.id = visit.user_id
GROUP BY 1;
The ideal output being:
| location | total_net_worth |
-------------------------------
| mcdonalds | 1100 |
| subway | 100 |
This particular database is Redshift/PostgreSQL, but it would be interesting if there is a generic SQL solution. Is something like the above possible?
You don't want to consider duplicate entries in the visits table. So, select distinct rows from the table instead.
SELECT
v.location,
SUM(u.net_worth) as total_net_worth
FROM (SELECT DISTINCT location, user_id FROM visit) v
JOIN user u on u.id = v.user_id
GROUP BY v.location
ORDER BY v.location;
You can use a window function to get the unique users, then join that to the user table:
select v.location, sum(u.net_worth)
from "user" u
join (
select location, user_id,
row_number() over (partition by location, user_id) as rn
from visit
order by user_id, location, id
) v on v.user_id = u.id and v.rn = 1
group by v.location;
The above is standard ANSI SQL, in Postgres this can also be expressed using distinct on ()
select v.location, sum(u.net_worth)
from "user" u
join (
select distinct on (user_id, location) *
from visit
order by user_id, location, id
) v on v.user_id = u.id
group by v.location;
You can join the user table with distinct values of location & user id combination like the below generic SQL.
SELECT v.location, SUM(u.net_worth)
FROM (SELECT location, user_id FROM visit GROUP BY location, user_id) v
JOIN user u on u.id = v.user_id
GROUP BY v.location;

Efficiently getting multiple counts of foreign key rows in PostgreSQL

I have a database that consists of users who can perform various actions, which I keep track of in multiple tables. I'm creating a point system, so I need to count how many of each type of action the user did. For example, if I had:
users posts comments shares
id | username id | user_id id | user_id id | user_id
------------- -------------- -------------- --------------
1 | abc 1 | 1 1 | 1 1 | 2
2 | xyz 2 | 1 2 | 2 2 | 2
I would want to return:
user_details
id | username | post_count | comment_count | share_count
---------------------------------------------------------
1 | abc | 2 | 1 | 0
2 | xyz | 0 | 1 | 2
This is slightly different from this question about foreign key counts since I want to return the individual counts per table.
What I've tried so far (example code):
SELECT
users.id,
users.username,
COUNT( DISTINCT posts.id ) as post_count,
COUNT( DISTINCT comments.id ) as comment_count,
COUNT( DISTINCT shares.id ) as share_count
FROM users
LEFT JOIN posts ON posts.user_id = users.id
LEFT JOIN comments ON comments.user_id = users.id
LEFT JOIN shares ON shares.user_id = users.id
GROUP BY users.id
While this works, I had to use DISTINCT in all of my counts because the LEFT JOINS were causing high numbers of duplicate rows. I feel like there must be a better way to do this since (please correct me if I'm wrong) on each LEFT JOIN, the DISTINCT is having to filter out an exponentially growing number of duplicated rows.
Thank you so much for any help you could give me with this!
You can join derived tables that already do the aggregation.
SELECT u.id,
u.username,
coalesce(pc.c, 0) AS post_count,
coalesce(cc.c, 0) AS comment_count,
coalesce(sc.c, 0) AS share_count
FROM users AS u
LEFT JOIN (SELECT p.user_id,
count(*) AS cc
FROM posts AS p
GROUP BY p.user_id) AS pc
ON pc.user_id = u.id
LEFT JOIN (SELECT c.user_id,
count(*) AS
FROM comments AS c
GROUP BY c.user_id) AS cc
ON cc.user_id = u.id
LEFT JOIN (SELECT s.user_id,
count(*) AS c
FROM shares AS s
GROUP BY s.user_id) AS sc
ON sc.user_id = u.id;

SQL Server : select from multiple tables

Table Accounts:
+----+------+----------+
| ID | Nick | Dono_CID |
+----+------+----------+
| 2 | Bart | 3 |
+----+------+----------+
Table Logins:
+------------+------------+
| Jogador_ID | TS_Logou |
+------------+------------+
| 2 | 1590116475 |
| 2 | 1590118258 |
+------------+------------+
In short, I intend to identify if there is a row with TS_Logou smaller than the Timestamp of 1 month ago, and if Dono_CID != -1
OBS: Accounts.ID = Logins.Jogador_ID
OBSĀ²: There are multiple records in the Logins table. I want to select the last one, in DESC order
My attempt:
SELECT
ct.Nick,
ct.Dono_CID
FROM
Contas AS ct
INNER JOIN
Logins AS lg ON lg.Jogador_ID = ct.ID
WHERE
ct.Dono_CID != -1
AND lg.TS_Logou < 1587524400
GROUP BY
lg.Jogador_ID
ORDER BY
lg.TS_Logou DESC
LIMIT 1
From your attempt, I understand that TS_Logou < 1587524400, means older than one month.
I am trying to select the login with the maximum TS_Logou satisfying the filter condition.
SELECT TOP 1 a.Id, a.Nick, a.Dono_CID
FROM Logins as l
Inner Join Account as a
a.Id = l.Jogador_Id
WHERE a.Dono_CID <> -1
AND a.TS_Logou < 1587524400
ORDER BY l.TS_Logou DESC
in hear, I try to select max TS_Logou form Logins for the user and that table joins with the Account table. this works for me
SELECT
ac.Nick,
ac.Dono_CID
FROM
Account AS ac
INNER JOIN
(SELECT l.Jogador_ID,MAX(l.TS_Logou) FROM Logins AS l
WHERE DATE(l.TS_Logou) < DATEADD(month, -1, GETDATE())
GROUP BY l.Jogador_ID) AS lg
ON lg.Jogador_ID = ac.ID
WHERE
ac.Dono_CID <> -1

GROUP BY with SUM without removing empty (null) values

TABLES:
Players
player_no | transaction_id
----------------------------
1 | 11
2 | 22
3 | (null)
1 | 33
Transactions
id | value |
-----------------------
11 | 5
22 | 10
33 | 2
My goal is to fetch all data, maintaining all the players, even with null values in following query:
SELECT p.player_no, COUNT(p.player_no), SUM(t.value) FROM Players p
INNER JOIN Transactions t ON p.transaction_id = t.id
GROUP BY p.player_no
nevertheless results omit null value, example:
player_no | count | sum
------------------------
1 | 2 | 7
2 | 1 | 10
What I would like to have is mention about the empty value:
player_no | count | sum
------------------------
1 | 2 | 7
2 | 1 | 10
3 | 0 | 0
What do I miss here?
Actually I use QueryDSL for that, but translated example into pure SQL since it behaves in the same manner.
using LEFT JOIN and coalesce function
SELECT p.player_no, COUNT(p.player_no), coalesce(SUM(t.value),0)
FROM Players p
LEFT JOIN Transactions t ON p.transaction_id = t.id
GROUP BY p.player_no
Change your JOIN to a LEFT JOIN, then add IFNULL(value, 0) in your SUM()
left join keeps all the rows in the left table
SELECT p.player_no
, COUNT(*) as count
, SUM(isnull(t.value,0))
FROM Players p
LEFT JOIN Transactions t
ON p.transaction_id = t.id
GROUP BY p.player_no
You might be looking for count(t.value) rather than count(*)
I'm just offering this so you have a correct answer:
SELECT p.player_no, COUNT(t.id) as [count], COALESCE(SUM(t.value), 0) as [sum]
FROM Players p LEFT JOIN
Transactions t
ON p.transaction_id = t.id
GROUP BY p.player_no;
You need to pay attention to the aggregation functions as well as the JOIN.
Please Try This:
SELECT P.player_no,
COUNT(*) as count,
SUM(isnull(T.value,0))
FROM Players P
LEFT JOIN Transactions T
ON P.transaction_id = T.id
GROUP BY P.player_no
Hope this helps.

SQL server matching two table on a column

I have two tables one storing user skills another storing skills required for a job. I want to match how many skills a of each user matches with a job.
The table structure is
Table1: User_Skills
| ID | User_ID | Skill |
---------------------------
| 1 | 1 | .Net |
---------------------------
| 2 | 1 | Software|
---------------------------
| 3 | 1 | Engineer|
---------------------------
| 4 | 2 | .Net |
---------------------------
| 5 | 2 | Software|
---------------------------
Table2: Job_Skills_Requirement
| ID | Job_ID | Skill |
--------------------------
| 1 | 1 | .Net |
---------------------------
| 2 | 1 | Engineer|
---------------------------
| 3 | 1 | HTML |
---------------------------
| 4 | 2 | Software|
---------------------------
| 5 | 2 | HTML |
---------------------------
I was trying to have comma separated skills and compare but these can be in different order.
Edit
All the answers here are excellent. The result I am looking for is matching all jobs with all users as later on I will match other properties as well.
You could join the tables by the skill columns and count the matches:
SELECT user_id, job_id, COUNT(*) AS matching_skills
FROM user_skills u
JOIN job_skills_requirement j ON u.skill = j.skill
GROUP BY user_id, job_id
EDIT:
IF you want to also show users and jobs that have no matching skills, you can use a full outer join instead.
SELECT user_id, job_id, COUNT(*) AS matching_skills
FROM user_skills u
FULL OUTER JOIN job_skills_requirement j ON u.skill = j.skill
GROUP BY user_id, job_id
EDIT 2:
As Jiri Tousek commented, the above query will produce nulls where there's no match between a user and a job. If you want a full Cartesian products between them, you could use (abuse?) the cross join syntax and count how many skills actually match between each user and each job:
SELECT user_id,
job_id,
COUNT(CASE WHEN u.skill = j.skill THEN 1 END) AS matching_skills
FROM user_skills u
CROSS JOIN job_skills_requirement j
GROUP BY user_id, job_id
If you want to match all users and all jobs, then Mureinik's otherwise excellent answer is not correct.
You need to generate all the rows first, which I would do using a cross join and then count the matching ones:
select u.user_id, j.job_id, count(jsr.job_id) as skills_in_common
from users u cross join
jobs j left join
user_skills us
on us.user_id = u.user_id left join
Job_Skills_Requirement jsr
on jsr.job_id = j.job_id and
jsr.skill = us.skill
group by u.user_id, j.job_id;
Note: This assumes the existence of a users and a jobs table. You can of course generate these using subqueries.
WITH User_Skills(ID,User_ID,Skill)AS(
SELECT 1,1,'.Net' UNION ALL
SELECT 2,1,'Software' UNION ALL
SELECT 3,1,'Engineer' UNION ALL
SELECT 4,2,'.Net' UNION ALL
SELECT 5,2 ,'Software'
),Job_Skills_Requirement(ID,Job_ID,Skill)AS(
SELECT 1,1,'.Net' UNION ALL
SELECT 2,1,'Engineer' UNION ALL
SELECT 3,1,'HTML' UNION ALL
SELECT 4,2,'Software' UNION ALL
SELECT 5,2 ,'HTML'
),Job_User_Skill AS (
SELECT j.Job_ID,u.User_ID,u.Skill
FROM Job_Skills_Requirement AS j INNER JOIN User_Skills AS u ON u.Skill=j.Skill
)
SELECT jus.Job_ID,jus.User_ID,COUNT(jus.Skill),STUFF(c.Skills,1,1,'') AS Skill
FROM Job_User_Skill AS jus
CROSS APPLY(SELECT ','+j.Skill FROM Job_User_Skill AS j WHERE j.Job_ID=jus.Job_ID AND j.User_ID=jus.User_ID FOR XML PATH('')) c(Skills)
GROUP BY jus.Job_ID,jus.User_ID,c.Skills
ORDER BY jus.Job_ID
Job_ID User_ID Skill
----------- ----------- ----------- -------------
1 1 2 .Net,Engineer
1 2 1 .Net
2 1 1 Software
2 2 1 Software