Efficiently getting multiple counts of foreign key rows in PostgreSQL - sql

I have a database that consists of users who can perform various actions, which I keep track of in multiple tables. I'm creating a point system, so I need to count how many of each type of action the user did. For example, if I had:
users posts comments shares
id | username id | user_id id | user_id id | user_id
------------- -------------- -------------- --------------
1 | abc 1 | 1 1 | 1 1 | 2
2 | xyz 2 | 1 2 | 2 2 | 2
I would want to return:
user_details
id | username | post_count | comment_count | share_count
---------------------------------------------------------
1 | abc | 2 | 1 | 0
2 | xyz | 0 | 1 | 2
This is slightly different from this question about foreign key counts since I want to return the individual counts per table.
What I've tried so far (example code):
SELECT
users.id,
users.username,
COUNT( DISTINCT posts.id ) as post_count,
COUNT( DISTINCT comments.id ) as comment_count,
COUNT( DISTINCT shares.id ) as share_count
FROM users
LEFT JOIN posts ON posts.user_id = users.id
LEFT JOIN comments ON comments.user_id = users.id
LEFT JOIN shares ON shares.user_id = users.id
GROUP BY users.id
While this works, I had to use DISTINCT in all of my counts because the LEFT JOINS were causing high numbers of duplicate rows. I feel like there must be a better way to do this since (please correct me if I'm wrong) on each LEFT JOIN, the DISTINCT is having to filter out an exponentially growing number of duplicated rows.
Thank you so much for any help you could give me with this!

You can join derived tables that already do the aggregation.
SELECT u.id,
u.username,
coalesce(pc.c, 0) AS post_count,
coalesce(cc.c, 0) AS comment_count,
coalesce(sc.c, 0) AS share_count
FROM users AS u
LEFT JOIN (SELECT p.user_id,
count(*) AS cc
FROM posts AS p
GROUP BY p.user_id) AS pc
ON pc.user_id = u.id
LEFT JOIN (SELECT c.user_id,
count(*) AS
FROM comments AS c
GROUP BY c.user_id) AS cc
ON cc.user_id = u.id
LEFT JOIN (SELECT s.user_id,
count(*) AS c
FROM shares AS s
GROUP BY s.user_id) AS sc
ON sc.user_id = u.id;

Related

How to count occurence of IDs and show this amount with name of item with this ID from other table in SQL?

if I have tables
Person: ID_Person, Name
Profession: ID_Prof, Prof_Name, ID_Person
If ID_Person appears multiple times in second table and I want to show all Person names with number of their professions how can I do this?
I know that if I want to count something I can write
SELECT ID_Person, count(*) as c
FROM Profession
GROUP BY ID_Person;
but don't know how to link it with column from other table in order to proper values.
Here is one way (MySQL InnoDB)
Person
+-----------+-------+
| ID_Person | Name |
+-----------+-------+
| 1 | bob |
| 2 | alice |
+-----------+-------+
Profession
+---------+--------------------+-----------+
| ID_Prof | Prof_Name | ID_Person |
+---------+--------------------+-----------+
| 1 | janitor | 1 |
| 2 | cook | 1 |
| 3 | computer scientist | 2 |
| 4 | home maker | 2 |
| 7 | astronaut | 2 |
+---------+--------------------+-----------+
select Name, count(Prof_Name)
from Person left join Profession
on (Person.ID_Person=Profession.ID_Person)
group by Name;
+-------+------------------+
| Name | count(Prof_Name) |
+-------+------------------+
| alice | 3 |
| bob | 2 |
+-------+------------------+
Hope this helps.
To just show those with multiple Profession then you would join the two tables, and aggregate with count() using group by and filter using having():
select pe.ID_Person, pe.Name, count(*) as ProfessionCount
from Person pe
inner join Profession pr
on pe.ID_Person = pr.ID_Person
group by pe.ID_Person, pe.Name
having count(*)>1
If you want to show the professions for those people as well:
select
multi.ID_Person
, multi.Name
, multi.ProfessionCount
, prof.ID_Prof
, prof.Prof_Name
from (
select pe.ID_Person, pe.Name, count(*) as ProfessionCount
from Person pe
inner join Profession pr
on pe.ID_Person = pr.ID_Person
group by pe.ID_Person, pe.Name
having count(*)>1
) multi
inner join Profession prof
on multi.ID_Person = prof.ID_Person
you can probably try something like this below. However, you will have to think about whether or not you need to left join versus inner join. You would want to left join if there is potentially someone who has not had any professions and therefore does not exist in the professions table.
SELECT pe.Name
, Professions = COUNT(pr.Prof_Name)
FROM dbo.Person (NOLOCK) pe
JOIN dbo.Profession (NOLOCK) pr ON pe.ID_Person = pr.ID_Person
GROUP BY pe.Name
You're looking for something like this I believe. The left join will bring in all the data and won't exclude any users.
The join can also be a inner join. Inner join would then only show users that exist in both tables.
LEFT
select x.ID_Person, count(x.ID_Person) as [count] from table1 x
left join table2 y on y.ID_Person= x.ID_Person
where x.ID_Person <> null
group by x.ID_Person
INNER
select x.ID_Person, count(y.ID_Person) from table1 x
inner join table2 y on y.ID_Person= x.ID_Person
group by x.ID_Person
The easiest solution is probably counting in a subquery:
select
id_person,
name,
(select count(*) from profession pr where pr.id_person = p.id_person) as profession_count
from person p;
You can achieve the same with an outer join:
select
p.id_person,
p.name,
coalesce(pr.cnt, 0) as profession_count
from person p
left join (select id_person, count(*) as cnt from profession group by id_person) pr
on pr.id_person = p.id_person;
It's usually a good idea to aggregate before joining. Anyway, this is how to join first and aggregate then:
select
p.id_person,
p.name,
coalesce(count(pr.id_person), 0) as profession_count
from person p
left join profession pr on pr.id_person = p.id_person
group by p.id_person, p.name;
As per standard SQL it would suffice to group by p.id_person, as the name functionally depends on the id (i.e. the id uniquely defines a person, so it's one single name belonging to it). Some DBMS however don't fully comply with the standard here and demand you to either put the name in the group by clause as shown or dummy-aggregate it in the select clause (e.g. max(p.name)) instead.

PostgreSQL - How to remove duplicates when doing LEFT OUTER JOIN with WHERE clause?

I have 2 tables:
users table
+--------+---------+
| id | integer |
+--------+---------+
| phone | string |
+--------+---------+
| active | boolean |
+--------+---------+
statuses table
+---------+---------+
| id | integer |
+---------+---------+
| user_id | integer |
+---------+---------+
| step_1 | boolean |
+---------+---------+
| step_2 | boolean |
+---------+---------+
I'm doing LEFT OUTER JOIN statuses table on users table with WHERE clause like this:
SELECT users.id, statuses.step_1, statuses.step_2
FROM users
LEFT OUTER JOIN statuses ON users.id = statuses.user_id
WHERE (users.active='f')
ORDER BY users.id DESC
My problem
There are some users that have same phone number inside the users table and I want remove the duplicate users based on the phone number.
I don't want to delete them from database. But just want to exclude them for this query only.
For example, say John (ID: 1) and Sara (ID: 2) shared same phone number (+6012-3456789), removing one of them, either John or Sara is fine for me.
What I've tried but did not work?
First:
SELECT DISTINCT users.phone
FROM users
LEFT OUTER JOIN statuses ON users.id = statuses.user_id
WHERE (users.active='f')
ORDER BY users.id DESC
Second:
SELECT users.phone, COUNT(*)
FROM users
LEFT OUTER JOIN statuses ON users.id = statuses.user_id
WHERE (users.active='f')
GROUP BY phone
HAVING COUNT(users.phone) > 1
I would do this before doing the join. In Postgres, select distinct on is a very useful construct:
SELECT u.id, s.step_1, s.step_2
FROM (SELECT distinct on (phone) u.*
FROM users u
WHERE u.active = 'f'
ORDER BY phone
) u LEFT OUTER JOIN
statuses s
ON u.id = s.user_id
WHERE u.active = 'f'
ORDER BY u.id DESC;
distinct on returns one row for whatever is in parentheses. In this case, that would be by phone (based on "I want remove the duplicate users based on the phone number"). Then, the join should not be showing these as duplicates.
Here is one way
Self Join the users table and join using phone numbers and filter any one of the duplicate name by comparison operator.
SELECT *
FROM (SELECT u.*
FROM users u
JOIN users u1
ON u. u.phone = u1.phone -- to
AND u.name >= u1.name) u
LEFT OUTER JOIN statuses
ON users.id = statuses.user_id
WHERE ( users.active = 'f' )
or use ROW_NUMBER
Generate row number for each phone numbers and filter the first phone number with row number as 1
SELECT *
FROM (SELECT u.*,
Row_number()OVER(partition BY phone ORDER BY name) rn
FROM users u) u
LEFT OUTER JOIN statuses
ON users.id = statuses.user_id
WHERE ( users.active = 'f' )
AND rn = 1

sql select where column is a count

I have a table with users, and I have another table with activity, the user who had the activity is logged in a column. how could I make a query so that I can select each user with the count of activities they have.
I really can't think of how to do it nor search for something like this on the web.
so for example
User table
id | name
1 | john
2 | karen
Activity table
id | user_id
1 | 1
2 | 1
3 | 2
Results
name | Count
john | 2
karen| 1
Make use of LEFT JOIN and COUNT aggregate
SELECT name, COUNT(a.user_id) count
FROM [User] u LEFT JOIN Activity a
ON u.id = a.user_id
GROUP BY u.id, u.name
Output:
| name | count |
|-------|-------|
| john | 2 |
| karen | 1 |
Here is a SQLFiddle demo
Recommended reading:
A Visual Explanation of SQL Joins
select name, count(a.Id) as ActivityCount
from [user] u
inner join activity a on u.Us = a.UserId
group by name
very simple to do. You can combine the two tables by using a join. To have the count (ie the total count) added, there is a function you can use which is conveniently called "Count". So all together, it would look something like this-
select u.id, u.name, count(*) as ct
from tblUser u
left join tblActivity a on u.id = a.id
group by u.id, u.name
order by ct desc
select
u.id as user_id, -- name is not necessary unique
max(u.name) as name,
count(a.Id) as [count]
from
[User] u
left join Activity a -- left join becuase some users can have no activities
on u.Id = a.user_id
group by u.id

GROUP BY including 0 where none present

I have a table of lists, each of which contains posts. I want a query that tells me how many posts each list has, including an entry with a 0 for each list that doesn't have any posts.
eg.
posts:
id | list_id
--------------
1 | 1
2 | 1
3 | 2
4 | 2
lists:
id
---
1
2
3
should return:
list_id | num_posts
-------------------
1 | 2
2 | 2
3 | 0
I have done so using the following query, but it feels a bit stupid to effectively do the grouping and then execute another sub-query to fill in the blanks:
WITH "count_data" AS (
SELECT "posts"."list_id" AS "list_id", COUNT(DISTINCT "posts"."id") AS "num_posts"
FROM "posts"
INNER JOIN "lists" ON "posts"."list_id" = "lists"."id"
GROUP BY "posts"."list_id"
)
SELECT "lists"."id", COALESCE("count_data"."num_posts", 0)
FROM "lists"
LEFT JOIN "count_data" ON "count_data"."list_id" = "lists"."id"
ORDER BY "count_data"."num_posts" DESC
Thanks!
It'll be more efficient to left join directly, avoiding a seq scan with a big merge join in the process:
select lists.id as list_id, count(posts.list_id) as num_posts
from lists
left join posts on posts.list_id = lists.id
group by lists.id
If I understand your question, this should work:
SELECT List_ID, ISNULL(b.list_ID,0)
FROM lists a
LEFT JOIN (SELECT list_ID, COUNT(*)
FROM posts
GROUP BY list_ID
)b
ON a.ID = b.list_ID

Multiple Left Joins - how to?

I have a Rails app running at Heroku, where I'm trying to calculate the rank (position) of a user to a highscore list.
The app is a place for the users to bet each other and the can start the wager (be creating a CHOICE) or they can bet against an already created Choice (by making a BET).
I have the following SQL which should give me an array of users based on their total winnings on both Choices and Bets.. But it's giving me some wrong total winning and I think the problem is in the Left Joins because if I rewrite the SQL to only contain either the Choice or the Bet table then I works just fine..
Anyone with any pointers on how to rewrite the SQL to work correctly :)
SELECT users.id, sum(COALESCE(bets.profitloss, 0) + COALESCE(choices.profitloss, 0)) as total_pl
FROM users
LEFT JOIN bets ON bets.user_id = users.id
LEFT JOIN choices ON choices.user_id = users.id
GROUP BY users.id
ORDER BY total_pl DESC
Result:
+---------------+
| id | total_pl |
+---------------+
| 1 | 830 |
| 4 | 200 |
| 3 | 130 |
| 7 | -220 |
| 5 | -1360 |
| 6 | -4950 |
+---------------+
Below are the two SQL string where I only join to one table and the two results from that.. see that the sum of the below do not match the above result.. The below are the correct sum.
SELECT users.id, sum(COALESCE(bets.profitloss, 0)) as total_pl
FROM users
LEFT JOIN bets ON bets.user_id = users.id
GROUP BY users.id
ORDER BY total_pl DESC
SELECT users.id, sum(COALESCE(choices.profitloss, 0)) as total_pl
FROM users
LEFT JOIN choices ON choices.user_id = users.id
GROUP BY users.id
ORDER BY total_pl DESC
+---------------+
| id | total_pl |
+---------------+
| 3 | 170 |
| 1 | 150 |
| 4 | 100 |
| 5 | 80 |
| 7 | 20 |
| 6 | -30 |
+---------------+
+---------------+
| id | total_pl |
+---------------+
| 1 | 20 |
| 4 | 0 |
| 3 | -10 |
| 7 | -30 |
| 5 | -110 |
| 6 | -360 |
+---------------+
This is happening because of the relationship between the two LEFT JOINed tables - that is, if there are (multiple) rows in both bets and choices, the total number of rows seen is multiplied from the individual row counts, not the addition.
If you have
choices
id profitloss
================
1 20
1 30
bets
id profitloss
================
1 25
1 35
The result of the join is actually:
bets/choices
id bets.profitloss choices.profitloss
1 20 25
1 20 35
1 30 25
1 30 35
(see where this is going?)
Fixing this is actually fairly simple. You haven't specified an RDBMS, but this should work on any of them (or with minor tweaks).
SELECT users.id, COALESCE(bets.profitloss, 0)
+ COALESCE(choices.profitloss, 0) as total_pl
FROM users
LEFT JOIN (SELECT user_id, SUM(profitloss) as profitloss
FROM bets
GROUP BY user_id) bets
ON bets.user_id = users.id
LEFT JOIN (SELECT user_id, SUM(profitloss) as profitloss
FROM choices
GROUP BY user_id) choices
ON choices.user_id = users.id
ORDER BY total_pl DESC
(Also, I believe the convention is to name tables singular, not plural.)
Your problem is that you are blowing out your data set. If you did a SELECT * you would be able to see it. Try this. I was not able to test it because I don't have your tables, but it should work
SELECT
totals.id
,SUM(totals.total_pl) total_pl
FROM
(
SELECT users.id, sum(COALESCE(bets.profitloss, 0)) as total_pl
FROM users
LEFT JOIN bets ON bets.user_id = users.id
GROUP BY users.id
UNION ALL SELECT users.id, sum(COALESCE(choices.profitloss, 0)) as total_pl
FROM users
LEFT JOIN choices ON choices.user_id = users.id
GROUP BY users.id
) totals
GROUP BY totals.id
ORDER BY total_pl DESC
In a similar solution as Clockwork, since the columns are the same per table, I would pre-union them and just sum them. So, AT MOST, the inner query will have two records per user... one for the bets, one for the choices -- each respectively pre-summed since doing a UNION ALL. Then, simple join/sum to get the results
select
U.userid,
sum( coalesce( PreSum.profit, 0) ) as TotalPL
from
Users U
LEFT JOIN
( select user_id, sum( profitloss ) as Profit
from bets
group by user_id
UNION ALL
select user_id, sum( profitloss ) as Profit
from choices
group by user_id ) PreSum
on U.ID = PreSum.User_ID
group by
U.ID