PostgreSQL value of COUNT multiply by a number - sql

I'm a Rails developer and I'm new to writing SQL script. I have users, portfolios, views, favorites and endorsements tables. users have many portfolios and many endorsements.portfolioshas manyviews, manyfavoritesand manyendorsements`.
Here is the script I wrote
top_users = User.find_by_sql(
"SELECT users.*,
COUNT(portfolios.id) +
COUNT(views.id) +
COUNT(favorites.id) +
COUNT(case when endorsements.portfolio_id = portfolios.id AND portfolios.user_id = users.id then 1 else 0 end) +
COUNT(case when endorsements.user_id = users.id then 1 else 0 end)
AS total
FROM users
LEFT OUTER JOIN portfolios ON portfolios.user_id = users.id
LEFT OUTER JOIN views ON views.subject_id = portfolios.id AND portfolios.user_id = users.id
LEFT OUTER JOIN favorites ON favorites.subject_id = portfolios.id AND portfolios.user_id = users.id
LEFT OUTER JOIN endorsements ON endorsements.portfolio_id = portfolios.id AND portfolios.user_id = users.id OR endorsements.user_id = users.id
GROUP BY users.id
ORDER BY total DESC LIMIT 8"
)
total count is not fully what I expect because each portfolio is worth 50 points, view is 2 points, favorite is worth 10 points, and endorsement is worth 2 points.
Let say we have 3 users
user | COUNT 1 | COUNT 2 | COUNT 3 | COUNT 4 | COUNT 5
-------------------------------------------------------
1 | 0 | 0 | 0 | 0 | 10
2 | 2 | 2 | 2 | 2 | 0
3 | 5 | 0 | 0 | 0 | 0
With my script, the result come in the order of user 1, user 2, then users 3. However base on the points system, it should come out in the order of user 3, user 2 then user 1 because user 3 total points is 250, users 2 total is 128 and user 1 is 20, and this is the order I expect. I did tried this:
top_users = User.find_by_sql(
"SELECT users.*,
COUNT(portfolios.id) * 50 +
COUNT(views.id) * 2 +
COUNT(favorites.id) * 10 +
COUNT(case when endorsements.portfolio_id = portfolios.id AND portfolios.user_id = users.id then 1 else 0 end) * 2 +
COUNT(case when endorsements.user_id = users.id then 1 else 0 end) * 2
AS total
FROM users
LEFT OUTER JOIN portfolios ON portfolios.user_id = users.id
LEFT OUTER JOIN views ON views.subject_id = portfolios.id AND portfolios.user_id = users.id
LEFT OUTER JOIN favorites ON favorites.subject_id = portfolios.id AND portfolios.user_id = users.id
LEFT OUTER JOIN endorsements ON endorsements.portfolio_id = portfolios.id AND portfolios.user_id = users.id OR endorsements.user_id = users.id
GROUP BY users.id
ORDER BY total DESC LIMIT 8"
)
I tried the above script but does not work for me. Any thoughts or help would be much appreciated. Again, I'm very new with raw SQL script.
UPDATED
I ended up doing this to avoid double count issue when LEFT INNTER JOIN multiple table.
SELECT t4.id, t4.username, t4.avatar_url, p_count * 50 + ue_count * 2 + fav_count * 10 + ep_count * 2 + COUNT(vp.id) * 2 as point
FROM (SELECT t3.id, t3.username, t3.avatar_url, p_count, ue_count, fav_count, COUNT(ep.id) as ep_count
FROM( SELECT t2.id, t2.username, t2.avatar_url, p_count, ue_count, COUNT(fav_p.id) as fav_count
FROM (SELECT t1.id, t1.username, t1.avatar_url, p_count, COUNT(e.user_id) as ue_count
FROM (SELECT u.*, COUNT(p.user_id) as p_count
FROM users u
LEFT OUTER JOIN (SELECT user_id, id
FROM portfolios) p
ON u.id = p.user_id
GROUP BY u.id) t1
LEFT OUTER JOIN (SELECT user_id
FROM endorsements) e
ON e.user_id = t1.id
GROUP BY t1.id, t1.username, t1.avatar_url, p_count ) t2
LEFT OUTER JOIN (SELECT p.id, p.user_id
FROM portfolios p
INNER JOIN favorites
ON favorites.subject_id = p.id) fav_p
ON fav_p.user_id = t2.id
GROUP BY t2.id, t2.username, t2.avatar_url, p_count, ue_count) t3
LEFT OUTER JOIN (SELECT p.id, p.user_id
FROM portfolios p
INNER JOIN endorsements
ON endorsements.portfolio_id = p.id) ep
ON ep.user_id = t3.id
GROUP BY t3.id, t3.username, t3.avatar_url, p_count, ue_count, fav_count) t4
LEFT OUTER JOIN (SELECT p.id, p.user_id
FROM portfolios p
INNER JOIN views
ON views.subject_id = p.id) vp
ON vp.user_id = t4.id
GROUP BY t4.id, t4.username, t4.avatar_url, p_count, ue_count, fav_count, ep_count
ORDER BY point DESC
LIMIT 8
Since I'm not familiar with SQL script as I'm a very beginner. The updated code above solve my problem but I wonder how bad the performance would be if I do that. Thanks for any inputs.

After reading through a few more times, I think I got what you were saying. Try this.
SELECT users.id
,COUNT(portfolios.id) * 50 +
COUNT(VIEWS.id) * 2 +
COUNT(favorites.id) * 10 +
COUNT(e1.id) * 2 +
COUNT(e2.id) * 2
AS total
FROM users
LEFT JOIN portfolios
ON portfolios.user_id = users.id
LEFT JOIN VIEWS
ON VIEWS.subject_id = portfolios.id
LEFT JOIN favorites
ON favorites.subject_id = portfolios.id
LEFT JOIN endorsements e1
ON e1.portfolio_id = portfolios.id
LEFT JOIN endorsements e2
ON e2.user_id = users.id
GROUP BY users.id
ORDER BY total DESC LIMIT 8
I assumed that endorsements related to either a user OR a portfolio. I don't know what your values look like in your tables but in theory, since an endorsement relates to a user or a portfolio but a portfolio always relates to a user it wouldn't be strictly necessary to join on both user_id or portfolio_id. In a case like that it's find to join the users table to the endorsements as e1 and the portfolios table to the endorsements as e2 and just add them.

First of all, unless your 'users" table only has one column, this breaks the rule that when you have aggregate functions in your select clause, every column that isn't passed into an aggregate function, has to be in your group by clause.
Second I don't think the case statements inside your COUNT() functions make sense. They are the same statements in your join. You should be able to just count the endoresements.Id and the Portfolios.id, I think. I may be a little fuzzy on what you're looking for. Also, what is a subject_id? is that an id field that determines whether an endorsement belongs to a user or a portfolio?
does a portfolio have both a user_id and a portfolio_id or is it one or the other but not both?

Any time you have multiple outer joins in a GROUP BY query, you have to be careful of double-counting. So I would change COUNT(portfolios.id) to COUNT(DISTINCT portfolios.id) etc. That should also remove the need for your CASE statements. Once you have those counts, you can multiply by their score values, as you say in your question (* 2 or * 50 or whatever you like).

Related

how to merge together two queries in bigquery

I have 2 queries
Query 1: # of content pieces in a campaign
Select c.campaign_id, sum(case when medium in ('photo', 'story', 'video', 'album') then 1 else 0 end) as totalcontent
From `public_collaboration_contents` as cc
Left join `public_collaborations` as c ON cc.collaboration_id = c.id
Left join `public_influencers` as i on c.influencer_id = i.id
Left join `public_collaboration_tasks` as ct on cc.id = ct.collaboration_content_id
Where cc.state = 'delivered'
And ct.state = 'delivered'
Group by c.campaign_id
Table looks like this:
campaign_id
TotalContent
1233
and
Query 2: campaign name, and Engagement rate
select c.campaign_id, brand_name,pc.title, (sum(psip.likes + psip.comments + psip.video_views)/sum(psip.influencer_starting_followers)) as ER FROM
`production.public_campaigns` as pc
LEFT JOIN `public_sponsored_instagram_posts` as psip
ON pc.id = psip.campaign_id
LEFT JOIN `public_instagram_post_performances` pipp
ON pipp.id=pc.id
LEFT JOIN `public_collaboration_contents`pcc
on pcc.id=pc.id
LEFt JOIN `public_collaborations` as c
ON c.id=pcc.collaboration_id
GROUP BY brand_name, title, campaign_id, pc.id
having (sum(psip.likes + psip.comments + psip.video_views)/sum(psip.influencer_starting_followers)) > 0
Table looks like this:
Campaign_id
Brand_name
title
ER
1233
asdf
asdf
2%
i'm hoping to join them together so my final table can look something like this:
Brand_name
title
ER
TotalContent
asdf
asdf
2%
50
how can i go about this? I tried to do joins with the two queries since they do pull from similar datasets, but i ended up getting the wrong TotalContent value (maybe i did the wrong join? but i tried all the joins and still didn't get the right number)
thank you SO much in advance for the help
EDIT #1:
the query i tried that gave me the wrong count is below (maybe i didn't do the joins right? i'm still very new to sql so could be that...)
select c.campaign_id, brand_name,pc.title, sum(case when medium in ('photo', 'story', 'video', 'album') then 1 else 0 end) as totalcontent, (sum(psip.likes + psip.comments + psip.video_views)/sum(psip.influencer_starting_followers)) as ER FROM
`production.public_campaigns` as pc
LEFT JOIN `public_sponsored_instagram_posts` as psip
ON pc.id = psip.campaign_id
LEFT JOIN `public_instagram_post_performances` pipp
ON pipp.id=pc.id
LEFT JOIN `public_collaboration_contents`pcc
on pcc.id=pc.id
LEFt JOIN `public_collaborations` as c
ON c.id=pcc.collaboration_id
GROUP BY brand_name, title, campaign_id, pc.id
having (sum(psip.likes + psip.comments + psip.video_views)/sum(psip.influencer_starting_followers)) > 0

Join on TOP 1 from subquery while referencing outer tables

I am starting with this query, which works fine:
SELECT
C.ContactSys
, ... a bunch of other rows...
FROM Users U
INNER JOIN Contacts C ON U.ContactSys = C.ContactSys
LEFT JOIN UserWatchList UW ON U.UserSys = UW.UserSys
LEFT JOIN Accounts A ON C.AccountSys = A.AccountSys
WHERE
C.OrganizationSys = 1012
AND U.UserTypeSys = 2
AND C.FirstName = 'steve'
Now, I've been given this requirement:
For every visitor returned by the Visitor Search, take ContactSys, get the most recent entry in the GuestLog table for that contact, then return the columns ABC and XYZ from the GuestLog table.
I'm having trouble with that. I need something like this (I think)...
SELECT
C.ContactSys
, GL.ABC
, GL.XYZ
, ... a bunch of other rows...
FROM Users U
INNER JOIN Contacts C ON U.ContactSys = C.ContactSys
LEFT JOIN UserWatchList UW ON U.UserSys = UW.UserSys
LEFT JOIN Accounts A ON C.AccountSys = A.AccountSys
LEFT JOIN (SELECT TOP 1 * FROM GuestLog GU WHERE GU.ContactSys = ????? ORDER BY GuestLogSys DESC) GL ON GL.ContactSys = C.ContactSys
WHERE
C.OrganizationSys = 1012
AND U.UserTypeSys = 2
AND C.FirstName = 'steve'
Only that's not it because that subquery on the JOIN doesn't know anything about the outer tables.
I've been looking at these posts and their answers, but I'm having a hard time translating them to my needs:
SQL: Turn a subquery into a join: How to refer to outside table in nested join where clause?
Reference to outer query in subquery JOIN
Referencing outer query in subquery
Referencing outer query's tables in a subquery
If that is the logic you want, you can use OUTER APPLY:
SELECT C.ContactSys, GL.ABC, GL.XYZ,
... a bunch of other columns ...
FROM Users U JOIN
Contacts C
ON U.ContactSys = C.ContactSys LEFT JOIN
UserWatchList UW
ON U.UserSys = UW.UserSys LEFT JOIN
Accounts A
ON C.AccountSys = A.AccountSys OUTER APPLY
(SELECT TOP 1 gl.*
FROM GuestLog gl
WHERE gl.ContactSys = C.ContactSys
ORDER BY gl.GuestLogSys DESC
) GL
WHERE C.OrganizationSys = 1012 AND
U.UserTypeSys = 2 AND
C.FirstName = 'steve'

SQL correct query or not

given these relationships, how could you query the following:
The tourists (name and email) that booked at least a pension whose rating is greater than 9, but didn't book any 3 star hotel with a rating less than 9.
Is the following correct?
SELECT Tourists.name, Tourists.email
FROM Tourists
WHERE EXISTS (
SELECT id FROM Bookings
INNER JOIN Tourists ON Bookings.touristId=Tourists.id
INNER JOIN AccomodationEstablishments ON Bookings.accEstId=AccomodationEstablishments.id
INNER JOIN AccomodationTypes ON AccomodationEstablishments.accType=AccomodationTypes.id
WHERE AccomodationTypes.name = 'Pension' AND
AccomodationEstablishments.rating > 9
) AND NOT EXISTS (
SELECT id FROM Bookings
INNER JOIN Tourists ON Bookings.touristId=Tourists.id
INNER JOIN AccomodationEstablishments ON Bookings.accEstId=AccomodationEstablishments.id
INNER JOIN AccomodationTypes ON AccomodationEstablishments.accType=AccomodationTypes.id
WHERE AccomodationTypes.name = 'Hotel' AND
AccomodationEstablishments.noOfStars = 3 AND
AccomodationEstablishments.rating < 9
)
I would do this using aggregation and having:
SELECT t.name, t.email
FROM Bookings b INNER JOIN
Tourists t
ON b.touristId = t.id INNER JOIN
AccomodationEstablishments ae
ON b.accEstId = ae.id INNER JOIN
AccomodationTypes a
ON ae.accType = a.id
GROUP BY t.name, t.email
HAVING SUM(CASE WHEN a.name = 'Pension' AND ae.rating > 9 THEN 1 ELSE 0 END) > 0 AND
SUM(a.name = 'Hotel' AND ae.noOfStars = 3 AND ae.rating < 9 THEN 1 ELSE 0 END)= 0;
Your method also works, but you probably need t.id in the subqueries.

SQL LEFT JOIN combined with regular joins

I have the following query that joins a bunch of tables.
I'd like to get every record from the INDUSTRY table that has consolidated_industry_id = 1 regardless of whether or not it matches the other tables. I believe this needs to be done with a LEFT JOIN?
SELECT attr.industry_id AS option_id,
attr.industry AS option_name,
uj.ft_job_industry_id,
Avg(CASE
WHEN s.salary > 0 THEN s.salary
END) AS average,
Count(CASE
WHEN s.salary > 0 THEN attr.industry
END) AS count_non_zero,
Count(attr.industry_id) AS count_total
FROM industry attr,
user_job_ft_job uj,
salary_ft_job s,
user_job_ft_job ut,
[user] u,
user_education_mba_school mba
WHERE u.user_id = uj.user_id
AND u.user_id = ut.user_id
AND u.user_id = mba.user_id
AND uj.ft_job_industry_id = attr.industry_id
AND uj.user_job_ft_job_id = s.user_job_id
AND u.include_in_student_site_results = 1
AND u.site_instance_id IN ( 1 )
AND uj.job_type_id = 1
AND attr.consolidated_industry_id = 1
AND mba.mba_graduation_year_id NOT IN ( 8, 9 )
AND uj.admin_approved = 1
GROUP BY attr.industry_id,
attr.industry,
uj.ft_job_industry_id
This returns only one row, but there are 8 matches in the industry table where consolidated_industry_id = 1.
--- EDIT: The real question here is, how do I combine the LEFT JOIN with the regular joins?
Use left join for tables that may miss a corresponding record. Put the conditions for each table in the on clause of the join, not in the where, as that would in effect make them inner joins anyway. Something like:
select
attr.industry_id AS option_id, attr.industry AS option_name,
uj.ft_job_industry_id, AVG(CASE WHEN s.salary > 0 THEN s.salary END) AS average,
COUNT(CASE WHEN s.salary > 0 THEN attr.industry END) as count_non_zero,
COUNT(attr.industry_id) as count_total
from
industry attr
left join user_job_ft_job uj on uj.ft_job_industry_id = attr.industry_id and uj.job_type_id = 1 and uj.admin_approved = 1
left join salary_ft_job s on uj.user_job_ft_job_id = s.user_job_id
left join [user] u on u.user_id = uj.user_id and u.include_in_student_site_results = 1 and u.site_instance_id IN (1)
left join user_job_ft_job ut on u.user_id = ut.user_id
left join user_education_mba_school mba on u.user_id = mba.user_id and mba.mba_graduation_year_id not in (8, 9)
where
attr.consolidated_industry_id = 1
group by
attr.industry_id, attr.industry, uj.ft_job_industry_id
If you have any tables that you know always have a corresponding record, just use innser join for that.

Complex SQL query

I have the these tables:
- Users
- id
- Photos
- id
- user_id
- Classifications
- id
- user_id
- photo_id
I would like to order Users by the total number of Photos + Classifications which they own.
I wrote this query:
SELECT users.id,
COUNT(photos.id) AS n_photo,
COUNT(classifications.id) AS n_classifications,
(COUNT(photos.id) + COUNT(classifications.id)) AS n_sum
FROM users
LEFT JOIN photos ON (photos.user_id = users.id)
LEFT JOIN classifications ON (classifications.user_id = users.id)
GROUP BY users.id
ORDER BY (COUNT(photos.id) + COUNT(classifications.id)) DESC
The problem is that this query does not work as I expect and returns high numbers while I have only a few photos and classifications in the db. It returns something like this:
id n_photo n_classifications n_sum
29 19241 19241 38482
16 16905 16905 33810
1 431 0 431
...
You are missing distinct.
SELECT U.ID, COUNT(DISTINCT P.Id)+COUNT(DISTINCT C.Id) Count
FROM User U
LEFT JOIN Photos P ON P.User_Id=U.Id
LEFT JOIN Classifications C ON C.User_Id=U.Id
GROUP BY U.Id
ORDER BY COUNT(DISTINCT P.Id)+COUNT(DISTINCT C.ID)
I could be misinterpreting your schema, but shouldn't this:
LEFT JOIN classifications ON (classifications.user_id = users.id)
Be this:
LEFT JOIN classifications ON (classifications.user_id = users.id)
AND (classifications.photo_id = photos.id)
?
SELECT users1.id, users1.n_photo, users2.n_classifications
FROM (
SELECT users.id, COUNT(photos.id) AS n_photo
FROM users LEFT OUTER JOIN photos ON photos.user_id = users.id
GROUP BY users.id
) users1
INNER JOIN (
SELECT users.id, COUNT(classifications.id) AS n_classifications
FROM users LEFT OUTER JOIN classifications ON classifications.user_id = users.id
GROUP BY users.id
) users2 ON users1.id = users1.id
Try something more like this instead:
SELECT users.id as n_id,
(SELECT COUNT(photos.id) FROM photos WHERE photos.user_id = n_id) AS n_photos,
(SELECT COUNT(classifications,id) FROM classifications WHERE classifications.user_id = n_id) AS n_classifications,
(n_photos + n_classifications) AS n_sum
FROM users
GROUP BY n_id
ORDER BY n_sum DESC