SUM CASE when DISTINCT? - sql

Joining two tables and grouping, we're trying to get the sum of a user's value but only include a user's value once if that user is represented in a grouping multiple times.
Some sample tables:
user table:
| id | net_worth |
------------------
| 1 | 100 |
| 2 | 1000 |
visit table:
| id | location | user_id |
-----------------------------
| 1 | mcdonalds | 1 |
| 2 | mcdonalds | 1 |
| 3 | mcdonalds | 2 |
| 4 | subway | 1 |
We want to find the total net worth of users visiting each location. User 1 visited McDonalds twice, but we don't want to double count their net worth. Ideally we can use a SUM but only add in the net worth value if that user hasn't already been counted for at that location. Something like this:
-- NOTE: Hypothetical query
SELECT
location,
SUM(CASE WHEN DISTINCT user.id then user.net_worth ELSE 0 END) as total_net_worth
FROM visit
JOIN user on user.id = visit.user_id
GROUP BY 1;
The ideal output being:
| location | total_net_worth |
-------------------------------
| mcdonalds | 1100 |
| subway | 100 |
This particular database is Redshift/PostgreSQL, but it would be interesting if there is a generic SQL solution. Is something like the above possible?

You don't want to consider duplicate entries in the visits table. So, select distinct rows from the table instead.
SELECT
v.location,
SUM(u.net_worth) as total_net_worth
FROM (SELECT DISTINCT location, user_id FROM visit) v
JOIN user u on u.id = v.user_id
GROUP BY v.location
ORDER BY v.location;

You can use a window function to get the unique users, then join that to the user table:
select v.location, sum(u.net_worth)
from "user" u
join (
select location, user_id,
row_number() over (partition by location, user_id) as rn
from visit
order by user_id, location, id
) v on v.user_id = u.id and v.rn = 1
group by v.location;
The above is standard ANSI SQL, in Postgres this can also be expressed using distinct on ()
select v.location, sum(u.net_worth)
from "user" u
join (
select distinct on (user_id, location) *
from visit
order by user_id, location, id
) v on v.user_id = u.id
group by v.location;

You can join the user table with distinct values of location & user id combination like the below generic SQL.
SELECT v.location, SUM(u.net_worth)
FROM (SELECT location, user_id FROM visit GROUP BY location, user_id) v
JOIN user u on u.id = v.user_id
GROUP BY v.location;

Related

SQL Query find users with only one product type

I solemnly swear I did my best to find an existing question, may I'm not sure how to phrase it correctly.
I would like to return records for users that have quota for only one product type.
| user_id | product |
| 1 | A |
| 1 | B |
| 1 | C |
| 2 | B |
| 3 | B |
| 3 | C |
| 3 | D |
In the example above I'd like a query that only returns users who carry quota for only one product type - doesn't really matter which product at this point.
I tried using select user_id, product from table group by 1,2 having count(user) < 2 but this does not work, nor does select user_id, product from table group by 1,2 having count(*) < 2
Any help is appreciated.
Your having clause is good; the issue's with your group by. Try this:
select user_id
, count(distinct product) NumberOfProducts
from table
group by user_id
having count(distinct product) = 1
Or you could do this; which is closer to your original:
select user_id
from table
group by user_id
having count(*) < 2
The group by clause can't take ordinal arguments (like, e.g., the order by clause can). When grouping by a value like 1, you're in fact grouping by the literal value 1, which would just be the same for any row in the table, and thus will group all the rows in the table to one group. Since there are more than one product in the entire table, no rows will be returned.
Instead, you should group by the user_id:
SELECT user_id
FROM mytable
GROUP BY user_id
HAVING COUNT(*) = 1
If you want the product, then do:
select user_id, max(product) as product
from table
group by user_id
having min(product) = max(product);
The having clause could also be:
having count(distinct product) = 1

SQL GROUP BY and retrieve last child records

I'm writing a DB view that pulls data from several tables. The goal is to determine the latest status of a company, and this is noted by each record (grouped by company_id) with the highest vetting_event_type_position.
Essentially I'm trying to grab the latest record for each company. I'm not a SQL guru at all; I understand I need to group by in order to collapse the related records, but I can't get that to work.
Current results
company_id | name | ... | vetting_event_type_position
-----------------------------------------------------
1 | ABC | ... | 1
1 | ABC | ... | 2
1 | ABC | ... | 3
2 | CBS | ... | 1
2 | CBS | ... | 2
3 | HBO | ... | 1
DESIRED results
company_id | name | ... | vetting_event_type_position
-----------------------------------------------------
1 | ABC | ... | 3
2 | CBS | ... | 2
3 | HBO | ... | 1
SQL Code
SELECT
companies.id as company_id,
companies.name as name,
companies.uuid as uuid,
companies.company_type as company_type,
companies.description as overview,
practice_areas.id as practice_area_id,
practice_areas.name as practice_area_name,
companies.created_at as created_at,
companies.updated_at as updated_at,
companies.created_by as created_by,
companies.updated_by as updated_by,
vettings.id as vetting_id,
vettings.name as vetting_name,
vetting_event_types.name as vetting_event_status,
vetting_events.id as vetting_event_id,
vetting_event_types.position as vetting_event_type_position
FROM
vettings
LEFT OUTER JOIN vetting_events ON (vettings.id = vetting_events.vetting_id)
LEFT OUTER JOIN vetting_event_types ON (vetting_events.vetting_event_type_id = vetting_event_types.id)
RIGHT OUTER JOIN companies ON (companies.id = vettings.company_id)
LEFT OUTER JOIN practice_areas ON (companies.practice_area_id = practice_areas.id)
LEFT OUTER JOIN dispositions ON (companies.disposition_id = dispositions.id)
ORDER BY
name, vetting_name, vetting_event_type_position
;
Associations among tables
companies has_many vettings
vettings has_many vetting_events
vetting_events belongs_to vetting_event_types
or put another way...
companies -> vettings -> vetting_events <- vetting_event_types
I am trying to retrieve the company record with the highest vetting_event_types.position value for each group.
SELECT company_id
,name
,uuid
,company_type
,overview
,practice_area_id
,practice_area_name
,created_at
,created_by
,updated_by
,vetting_id
,vetting_name
,vetting_event_status
,vetting_event_id
,vetting_event_type_position
FROM (
SELECT
companies.id as company_id,
companies.name as name,
companies.uuid as uuid,
companies.company_type as company_type,
companies.description as overview,
practice_areas.id as practice_area_id,
practice_areas.name as practice_area_name,
companies.created_at as created_at,
companies.updated_at as updated_at,
companies.created_by as created_by,
companies.updated_by as updated_by,
vettings.id as vetting_id,
vettings.name as vetting_name,
vetting_event_types.name as vetting_event_status,
vetting_events.id as vetting_event_id,
vetting_event_types.position as vetting_event_type_position,
ROW_NUMBER() OVER (PARTITION BY companies.id ORDER BY vetting_event_types.position DESC) rn
FROM vettings
LEFT OUTER JOIN vetting_events ON (vettings.id = vetting_events.vetting_id)
LEFT OUTER JOIN vetting_event_types ON (vetting_events.vetting_event_type_id = vetting_event_types.id)
RIGHT OUTER JOIN companies ON (companies.id = vettings.company_id)
LEFT OUTER JOIN practice_areas ON (companies.practice_area_id = practice_areas.id)
LEFT OUTER JOIN dispositions ON (companies.disposition_id = dispositions.id)
) A
WHERE A.rn = 1
ORDER BY name, vetting_name, vetting_event_type_position
You can use row_number analytic function.
Select * from (
Select ...,
Row_number() over ( partition by company_id order by vetting_event_type_position desc) as seq) T
Where seq=1

Select distinct where date is max

This feels really stupid to ask, but i can't do this selection in SQL Server Compact (CE)
If i have two tables like this:
Statuses Users
id | status | thedate id | name
------------------------- -----------------------
0 | Single | 2014-01-01 0 | Lisa
0 | Engaged | 2014-01-02 1 | John
1 | Single | 2014-01-03
0 | Divorced | 2014-01-04
How can i now select the latest status for each person in statuses?
the result should be:
Id | Name | Date | Status
--------------------------------
0 | Lisa | 2014-01-04 | Divorced
1 | John | 2014-01-03 | Single
that is, select distinct id:s where the date is the highest, and join the name. As bonus, sort the list so the latest record is on top.
In SQL Server CE, you can do this using a join:
select u.id, u.name, s.thedate, s.status
from users u join
statuses s
on u.id = s.id join
(select id, max(thedate) as mtd
from statuses
group by id
) as maxs
on s.id = maxs.id and s.thedate = maxs.mtd;
The subquery calculates the maximum date and uses that as a filter for the statuses table.
Use the following query:
SELECT U.Id AS Id, U.Name AS Name, S.thedate AS Date, S.status AS Status
FROM Statuses S
INNER JOIN Users U on S.id = U.id
WHERE S.thedate IN (
SELECT MAX(thedate)
FROM statuses
GROUP BY id);

Writing a complex SQL query, with table relations

I will present my table structures first (only relevant fields will be mentioned)
/* The table Users */
user_id | user_name | user_registration_date
1 | USER1 | 19/09/2010
2 | USER2 | 20/09/2010
/* The table Levels_Completed */
user_id | level_id
1 | 1
1 | 2
2 | 1
I would like to display a scoreboard. The first user on the list, will be the one with the highest count of levels he completed.
For the example above, USER1 will be displayed above USER2.
I want to receive the next data:
user_id, user_name, user_registration_date, COUNT(level_id rows) AS score
Ordered by the count of score, for each SQL row I receive.
Example:
1 | USER1 | 19/09/2010 | 2
2 | USER2 | 20/09/2010 | 1
I know how to use INNER JOIN, but I think the counting and ordering are above my current level. Help please?
SELECT Users.user_id, user_name, user_registration_date, COUNT(level_id) AS score
FROM Users INNER JOIN Levels_Completed ON Users.user_id = Levels_Completed.user_id
GROUP BY Users.user_id, user_name, user_registration_date
Try this:
SELECT
U.user_id,
U.user_name,
U.user_registration_date,
COUNT(L.level_id) as score
FROM Users U
LEFT JOIN Levels_Completed L
ON U.User_Id = L.User_Id
GROUP BY U.user_id, U.user_name, U.user_registration_date
ORDER BY score DESC
SELECT Users.user_id, user_name, user_registration_date, score
FROM Users
INNER JOIN (
SELECT user_id, COUNT(level_id) AS score
FROM Levels_Completed
GROUP BY user_id)
USING (user_id)
ORDER BY score DESC

Multiple Left Joins - how to?

I have a Rails app running at Heroku, where I'm trying to calculate the rank (position) of a user to a highscore list.
The app is a place for the users to bet each other and the can start the wager (be creating a CHOICE) or they can bet against an already created Choice (by making a BET).
I have the following SQL which should give me an array of users based on their total winnings on both Choices and Bets.. But it's giving me some wrong total winning and I think the problem is in the Left Joins because if I rewrite the SQL to only contain either the Choice or the Bet table then I works just fine..
Anyone with any pointers on how to rewrite the SQL to work correctly :)
SELECT users.id, sum(COALESCE(bets.profitloss, 0) + COALESCE(choices.profitloss, 0)) as total_pl
FROM users
LEFT JOIN bets ON bets.user_id = users.id
LEFT JOIN choices ON choices.user_id = users.id
GROUP BY users.id
ORDER BY total_pl DESC
Result:
+---------------+
| id | total_pl |
+---------------+
| 1 | 830 |
| 4 | 200 |
| 3 | 130 |
| 7 | -220 |
| 5 | -1360 |
| 6 | -4950 |
+---------------+
Below are the two SQL string where I only join to one table and the two results from that.. see that the sum of the below do not match the above result.. The below are the correct sum.
SELECT users.id, sum(COALESCE(bets.profitloss, 0)) as total_pl
FROM users
LEFT JOIN bets ON bets.user_id = users.id
GROUP BY users.id
ORDER BY total_pl DESC
SELECT users.id, sum(COALESCE(choices.profitloss, 0)) as total_pl
FROM users
LEFT JOIN choices ON choices.user_id = users.id
GROUP BY users.id
ORDER BY total_pl DESC
+---------------+
| id | total_pl |
+---------------+
| 3 | 170 |
| 1 | 150 |
| 4 | 100 |
| 5 | 80 |
| 7 | 20 |
| 6 | -30 |
+---------------+
+---------------+
| id | total_pl |
+---------------+
| 1 | 20 |
| 4 | 0 |
| 3 | -10 |
| 7 | -30 |
| 5 | -110 |
| 6 | -360 |
+---------------+
This is happening because of the relationship between the two LEFT JOINed tables - that is, if there are (multiple) rows in both bets and choices, the total number of rows seen is multiplied from the individual row counts, not the addition.
If you have
choices
id profitloss
================
1 20
1 30
bets
id profitloss
================
1 25
1 35
The result of the join is actually:
bets/choices
id bets.profitloss choices.profitloss
1 20 25
1 20 35
1 30 25
1 30 35
(see where this is going?)
Fixing this is actually fairly simple. You haven't specified an RDBMS, but this should work on any of them (or with minor tweaks).
SELECT users.id, COALESCE(bets.profitloss, 0)
+ COALESCE(choices.profitloss, 0) as total_pl
FROM users
LEFT JOIN (SELECT user_id, SUM(profitloss) as profitloss
FROM bets
GROUP BY user_id) bets
ON bets.user_id = users.id
LEFT JOIN (SELECT user_id, SUM(profitloss) as profitloss
FROM choices
GROUP BY user_id) choices
ON choices.user_id = users.id
ORDER BY total_pl DESC
(Also, I believe the convention is to name tables singular, not plural.)
Your problem is that you are blowing out your data set. If you did a SELECT * you would be able to see it. Try this. I was not able to test it because I don't have your tables, but it should work
SELECT
totals.id
,SUM(totals.total_pl) total_pl
FROM
(
SELECT users.id, sum(COALESCE(bets.profitloss, 0)) as total_pl
FROM users
LEFT JOIN bets ON bets.user_id = users.id
GROUP BY users.id
UNION ALL SELECT users.id, sum(COALESCE(choices.profitloss, 0)) as total_pl
FROM users
LEFT JOIN choices ON choices.user_id = users.id
GROUP BY users.id
) totals
GROUP BY totals.id
ORDER BY total_pl DESC
In a similar solution as Clockwork, since the columns are the same per table, I would pre-union them and just sum them. So, AT MOST, the inner query will have two records per user... one for the bets, one for the choices -- each respectively pre-summed since doing a UNION ALL. Then, simple join/sum to get the results
select
U.userid,
sum( coalesce( PreSum.profit, 0) ) as TotalPL
from
Users U
LEFT JOIN
( select user_id, sum( profitloss ) as Profit
from bets
group by user_id
UNION ALL
select user_id, sum( profitloss ) as Profit
from choices
group by user_id ) PreSum
on U.ID = PreSum.User_ID
group by
U.ID