Writing a complex SQL query, with table relations - sql

I will present my table structures first (only relevant fields will be mentioned)
/* The table Users */
user_id | user_name | user_registration_date
1 | USER1 | 19/09/2010
2 | USER2 | 20/09/2010
/* The table Levels_Completed */
user_id | level_id
1 | 1
1 | 2
2 | 1
I would like to display a scoreboard. The first user on the list, will be the one with the highest count of levels he completed.
For the example above, USER1 will be displayed above USER2.
I want to receive the next data:
user_id, user_name, user_registration_date, COUNT(level_id rows) AS score
Ordered by the count of score, for each SQL row I receive.
Example:
1 | USER1 | 19/09/2010 | 2
2 | USER2 | 20/09/2010 | 1
I know how to use INNER JOIN, but I think the counting and ordering are above my current level. Help please?

SELECT Users.user_id, user_name, user_registration_date, COUNT(level_id) AS score
FROM Users INNER JOIN Levels_Completed ON Users.user_id = Levels_Completed.user_id
GROUP BY Users.user_id, user_name, user_registration_date

Try this:
SELECT
U.user_id,
U.user_name,
U.user_registration_date,
COUNT(L.level_id) as score
FROM Users U
LEFT JOIN Levels_Completed L
ON U.User_Id = L.User_Id
GROUP BY U.user_id, U.user_name, U.user_registration_date
ORDER BY score DESC

SELECT Users.user_id, user_name, user_registration_date, score
FROM Users
INNER JOIN (
SELECT user_id, COUNT(level_id) AS score
FROM Levels_Completed
GROUP BY user_id)
USING (user_id)
ORDER BY score DESC

Related

SUM CASE when DISTINCT?

Joining two tables and grouping, we're trying to get the sum of a user's value but only include a user's value once if that user is represented in a grouping multiple times.
Some sample tables:
user table:
| id | net_worth |
------------------
| 1 | 100 |
| 2 | 1000 |
visit table:
| id | location | user_id |
-----------------------------
| 1 | mcdonalds | 1 |
| 2 | mcdonalds | 1 |
| 3 | mcdonalds | 2 |
| 4 | subway | 1 |
We want to find the total net worth of users visiting each location. User 1 visited McDonalds twice, but we don't want to double count their net worth. Ideally we can use a SUM but only add in the net worth value if that user hasn't already been counted for at that location. Something like this:
-- NOTE: Hypothetical query
SELECT
location,
SUM(CASE WHEN DISTINCT user.id then user.net_worth ELSE 0 END) as total_net_worth
FROM visit
JOIN user on user.id = visit.user_id
GROUP BY 1;
The ideal output being:
| location | total_net_worth |
-------------------------------
| mcdonalds | 1100 |
| subway | 100 |
This particular database is Redshift/PostgreSQL, but it would be interesting if there is a generic SQL solution. Is something like the above possible?
You don't want to consider duplicate entries in the visits table. So, select distinct rows from the table instead.
SELECT
v.location,
SUM(u.net_worth) as total_net_worth
FROM (SELECT DISTINCT location, user_id FROM visit) v
JOIN user u on u.id = v.user_id
GROUP BY v.location
ORDER BY v.location;
You can use a window function to get the unique users, then join that to the user table:
select v.location, sum(u.net_worth)
from "user" u
join (
select location, user_id,
row_number() over (partition by location, user_id) as rn
from visit
order by user_id, location, id
) v on v.user_id = u.id and v.rn = 1
group by v.location;
The above is standard ANSI SQL, in Postgres this can also be expressed using distinct on ()
select v.location, sum(u.net_worth)
from "user" u
join (
select distinct on (user_id, location) *
from visit
order by user_id, location, id
) v on v.user_id = u.id
group by v.location;
You can join the user table with distinct values of location & user id combination like the below generic SQL.
SELECT v.location, SUM(u.net_worth)
FROM (SELECT location, user_id FROM visit GROUP BY location, user_id) v
JOIN user u on u.id = v.user_id
GROUP BY v.location;

Efficiently getting multiple counts of foreign key rows in PostgreSQL

I have a database that consists of users who can perform various actions, which I keep track of in multiple tables. I'm creating a point system, so I need to count how many of each type of action the user did. For example, if I had:
users posts comments shares
id | username id | user_id id | user_id id | user_id
------------- -------------- -------------- --------------
1 | abc 1 | 1 1 | 1 1 | 2
2 | xyz 2 | 1 2 | 2 2 | 2
I would want to return:
user_details
id | username | post_count | comment_count | share_count
---------------------------------------------------------
1 | abc | 2 | 1 | 0
2 | xyz | 0 | 1 | 2
This is slightly different from this question about foreign key counts since I want to return the individual counts per table.
What I've tried so far (example code):
SELECT
users.id,
users.username,
COUNT( DISTINCT posts.id ) as post_count,
COUNT( DISTINCT comments.id ) as comment_count,
COUNT( DISTINCT shares.id ) as share_count
FROM users
LEFT JOIN posts ON posts.user_id = users.id
LEFT JOIN comments ON comments.user_id = users.id
LEFT JOIN shares ON shares.user_id = users.id
GROUP BY users.id
While this works, I had to use DISTINCT in all of my counts because the LEFT JOINS were causing high numbers of duplicate rows. I feel like there must be a better way to do this since (please correct me if I'm wrong) on each LEFT JOIN, the DISTINCT is having to filter out an exponentially growing number of duplicated rows.
Thank you so much for any help you could give me with this!
You can join derived tables that already do the aggregation.
SELECT u.id,
u.username,
coalesce(pc.c, 0) AS post_count,
coalesce(cc.c, 0) AS comment_count,
coalesce(sc.c, 0) AS share_count
FROM users AS u
LEFT JOIN (SELECT p.user_id,
count(*) AS cc
FROM posts AS p
GROUP BY p.user_id) AS pc
ON pc.user_id = u.id
LEFT JOIN (SELECT c.user_id,
count(*) AS
FROM comments AS c
GROUP BY c.user_id) AS cc
ON cc.user_id = u.id
LEFT JOIN (SELECT s.user_id,
count(*) AS c
FROM shares AS s
GROUP BY s.user_id) AS sc
ON sc.user_id = u.id;

PostgreSQL - How to remove duplicates when doing LEFT OUTER JOIN with WHERE clause?

I have 2 tables:
users table
+--------+---------+
| id | integer |
+--------+---------+
| phone | string |
+--------+---------+
| active | boolean |
+--------+---------+
statuses table
+---------+---------+
| id | integer |
+---------+---------+
| user_id | integer |
+---------+---------+
| step_1 | boolean |
+---------+---------+
| step_2 | boolean |
+---------+---------+
I'm doing LEFT OUTER JOIN statuses table on users table with WHERE clause like this:
SELECT users.id, statuses.step_1, statuses.step_2
FROM users
LEFT OUTER JOIN statuses ON users.id = statuses.user_id
WHERE (users.active='f')
ORDER BY users.id DESC
My problem
There are some users that have same phone number inside the users table and I want remove the duplicate users based on the phone number.
I don't want to delete them from database. But just want to exclude them for this query only.
For example, say John (ID: 1) and Sara (ID: 2) shared same phone number (+6012-3456789), removing one of them, either John or Sara is fine for me.
What I've tried but did not work?
First:
SELECT DISTINCT users.phone
FROM users
LEFT OUTER JOIN statuses ON users.id = statuses.user_id
WHERE (users.active='f')
ORDER BY users.id DESC
Second:
SELECT users.phone, COUNT(*)
FROM users
LEFT OUTER JOIN statuses ON users.id = statuses.user_id
WHERE (users.active='f')
GROUP BY phone
HAVING COUNT(users.phone) > 1
I would do this before doing the join. In Postgres, select distinct on is a very useful construct:
SELECT u.id, s.step_1, s.step_2
FROM (SELECT distinct on (phone) u.*
FROM users u
WHERE u.active = 'f'
ORDER BY phone
) u LEFT OUTER JOIN
statuses s
ON u.id = s.user_id
WHERE u.active = 'f'
ORDER BY u.id DESC;
distinct on returns one row for whatever is in parentheses. In this case, that would be by phone (based on "I want remove the duplicate users based on the phone number"). Then, the join should not be showing these as duplicates.
Here is one way
Self Join the users table and join using phone numbers and filter any one of the duplicate name by comparison operator.
SELECT *
FROM (SELECT u.*
FROM users u
JOIN users u1
ON u. u.phone = u1.phone -- to
AND u.name >= u1.name) u
LEFT OUTER JOIN statuses
ON users.id = statuses.user_id
WHERE ( users.active = 'f' )
or use ROW_NUMBER
Generate row number for each phone numbers and filter the first phone number with row number as 1
SELECT *
FROM (SELECT u.*,
Row_number()OVER(partition BY phone ORDER BY name) rn
FROM users u) u
LEFT OUTER JOIN statuses
ON users.id = statuses.user_id
WHERE ( users.active = 'f' )
AND rn = 1

sql select where column is a count

I have a table with users, and I have another table with activity, the user who had the activity is logged in a column. how could I make a query so that I can select each user with the count of activities they have.
I really can't think of how to do it nor search for something like this on the web.
so for example
User table
id | name
1 | john
2 | karen
Activity table
id | user_id
1 | 1
2 | 1
3 | 2
Results
name | Count
john | 2
karen| 1
Make use of LEFT JOIN and COUNT aggregate
SELECT name, COUNT(a.user_id) count
FROM [User] u LEFT JOIN Activity a
ON u.id = a.user_id
GROUP BY u.id, u.name
Output:
| name | count |
|-------|-------|
| john | 2 |
| karen | 1 |
Here is a SQLFiddle demo
Recommended reading:
A Visual Explanation of SQL Joins
select name, count(a.Id) as ActivityCount
from [user] u
inner join activity a on u.Us = a.UserId
group by name
very simple to do. You can combine the two tables by using a join. To have the count (ie the total count) added, there is a function you can use which is conveniently called "Count". So all together, it would look something like this-
select u.id, u.name, count(*) as ct
from tblUser u
left join tblActivity a on u.id = a.id
group by u.id, u.name
order by ct desc
select
u.id as user_id, -- name is not necessary unique
max(u.name) as name,
count(a.Id) as [count]
from
[User] u
left join Activity a -- left join becuase some users can have no activities
on u.Id = a.user_id
group by u.id

Multiple Left Joins - how to?

I have a Rails app running at Heroku, where I'm trying to calculate the rank (position) of a user to a highscore list.
The app is a place for the users to bet each other and the can start the wager (be creating a CHOICE) or they can bet against an already created Choice (by making a BET).
I have the following SQL which should give me an array of users based on their total winnings on both Choices and Bets.. But it's giving me some wrong total winning and I think the problem is in the Left Joins because if I rewrite the SQL to only contain either the Choice or the Bet table then I works just fine..
Anyone with any pointers on how to rewrite the SQL to work correctly :)
SELECT users.id, sum(COALESCE(bets.profitloss, 0) + COALESCE(choices.profitloss, 0)) as total_pl
FROM users
LEFT JOIN bets ON bets.user_id = users.id
LEFT JOIN choices ON choices.user_id = users.id
GROUP BY users.id
ORDER BY total_pl DESC
Result:
+---------------+
| id | total_pl |
+---------------+
| 1 | 830 |
| 4 | 200 |
| 3 | 130 |
| 7 | -220 |
| 5 | -1360 |
| 6 | -4950 |
+---------------+
Below are the two SQL string where I only join to one table and the two results from that.. see that the sum of the below do not match the above result.. The below are the correct sum.
SELECT users.id, sum(COALESCE(bets.profitloss, 0)) as total_pl
FROM users
LEFT JOIN bets ON bets.user_id = users.id
GROUP BY users.id
ORDER BY total_pl DESC
SELECT users.id, sum(COALESCE(choices.profitloss, 0)) as total_pl
FROM users
LEFT JOIN choices ON choices.user_id = users.id
GROUP BY users.id
ORDER BY total_pl DESC
+---------------+
| id | total_pl |
+---------------+
| 3 | 170 |
| 1 | 150 |
| 4 | 100 |
| 5 | 80 |
| 7 | 20 |
| 6 | -30 |
+---------------+
+---------------+
| id | total_pl |
+---------------+
| 1 | 20 |
| 4 | 0 |
| 3 | -10 |
| 7 | -30 |
| 5 | -110 |
| 6 | -360 |
+---------------+
This is happening because of the relationship between the two LEFT JOINed tables - that is, if there are (multiple) rows in both bets and choices, the total number of rows seen is multiplied from the individual row counts, not the addition.
If you have
choices
id profitloss
================
1 20
1 30
bets
id profitloss
================
1 25
1 35
The result of the join is actually:
bets/choices
id bets.profitloss choices.profitloss
1 20 25
1 20 35
1 30 25
1 30 35
(see where this is going?)
Fixing this is actually fairly simple. You haven't specified an RDBMS, but this should work on any of them (or with minor tweaks).
SELECT users.id, COALESCE(bets.profitloss, 0)
+ COALESCE(choices.profitloss, 0) as total_pl
FROM users
LEFT JOIN (SELECT user_id, SUM(profitloss) as profitloss
FROM bets
GROUP BY user_id) bets
ON bets.user_id = users.id
LEFT JOIN (SELECT user_id, SUM(profitloss) as profitloss
FROM choices
GROUP BY user_id) choices
ON choices.user_id = users.id
ORDER BY total_pl DESC
(Also, I believe the convention is to name tables singular, not plural.)
Your problem is that you are blowing out your data set. If you did a SELECT * you would be able to see it. Try this. I was not able to test it because I don't have your tables, but it should work
SELECT
totals.id
,SUM(totals.total_pl) total_pl
FROM
(
SELECT users.id, sum(COALESCE(bets.profitloss, 0)) as total_pl
FROM users
LEFT JOIN bets ON bets.user_id = users.id
GROUP BY users.id
UNION ALL SELECT users.id, sum(COALESCE(choices.profitloss, 0)) as total_pl
FROM users
LEFT JOIN choices ON choices.user_id = users.id
GROUP BY users.id
) totals
GROUP BY totals.id
ORDER BY total_pl DESC
In a similar solution as Clockwork, since the columns are the same per table, I would pre-union them and just sum them. So, AT MOST, the inner query will have two records per user... one for the bets, one for the choices -- each respectively pre-summed since doing a UNION ALL. Then, simple join/sum to get the results
select
U.userid,
sum( coalesce( PreSum.profit, 0) ) as TotalPL
from
Users U
LEFT JOIN
( select user_id, sum( profitloss ) as Profit
from bets
group by user_id
UNION ALL
select user_id, sum( profitloss ) as Profit
from choices
group by user_id ) PreSum
on U.ID = PreSum.User_ID
group by
U.ID