Complex SQL query - sql

I have the these tables:
- Users
- id
- Photos
- id
- user_id
- Classifications
- id
- user_id
- photo_id
I would like to order Users by the total number of Photos + Classifications which they own.
I wrote this query:
SELECT users.id,
COUNT(photos.id) AS n_photo,
COUNT(classifications.id) AS n_classifications,
(COUNT(photos.id) + COUNT(classifications.id)) AS n_sum
FROM users
LEFT JOIN photos ON (photos.user_id = users.id)
LEFT JOIN classifications ON (classifications.user_id = users.id)
GROUP BY users.id
ORDER BY (COUNT(photos.id) + COUNT(classifications.id)) DESC
The problem is that this query does not work as I expect and returns high numbers while I have only a few photos and classifications in the db. It returns something like this:
id n_photo n_classifications n_sum
29 19241 19241 38482
16 16905 16905 33810
1 431 0 431
...

You are missing distinct.
SELECT U.ID, COUNT(DISTINCT P.Id)+COUNT(DISTINCT C.Id) Count
FROM User U
LEFT JOIN Photos P ON P.User_Id=U.Id
LEFT JOIN Classifications C ON C.User_Id=U.Id
GROUP BY U.Id
ORDER BY COUNT(DISTINCT P.Id)+COUNT(DISTINCT C.ID)

I could be misinterpreting your schema, but shouldn't this:
LEFT JOIN classifications ON (classifications.user_id = users.id)
Be this:
LEFT JOIN classifications ON (classifications.user_id = users.id)
AND (classifications.photo_id = photos.id)
?

SELECT users1.id, users1.n_photo, users2.n_classifications
FROM (
SELECT users.id, COUNT(photos.id) AS n_photo
FROM users LEFT OUTER JOIN photos ON photos.user_id = users.id
GROUP BY users.id
) users1
INNER JOIN (
SELECT users.id, COUNT(classifications.id) AS n_classifications
FROM users LEFT OUTER JOIN classifications ON classifications.user_id = users.id
GROUP BY users.id
) users2 ON users1.id = users1.id

Try something more like this instead:
SELECT users.id as n_id,
(SELECT COUNT(photos.id) FROM photos WHERE photos.user_id = n_id) AS n_photos,
(SELECT COUNT(classifications,id) FROM classifications WHERE classifications.user_id = n_id) AS n_classifications,
(n_photos + n_classifications) AS n_sum
FROM users
GROUP BY n_id
ORDER BY n_sum DESC

Related

SQL filtering multiple joins by row-specific date window

I am trying to obtain aggregate stats of each customer in their first 60 days. Each user has a different join date, which resides in a user_info table. Currently, the best way I have of doing this is to repeatedly join to the user table each time I need to get aggregate stats from another table, then joining each pair together in a nested subquery. With multiple tables, this query becomes very sluggish and unwieldy. How can I do this in a more parsimonious manner?
My current solution:
SELECT t1.userid
,t1.total_transactions
,t1.days_transact
,t2.total_vouchers
,t2.days_redeemed
FROM (
SELECT u.userid
,SUM(s.transactions) AS total_transactions
,COUNT(DISTINCT s.dated) AS days_transact
FROM (
SELECT userid
,created
FROM schema.user_info
) u
LEFT JOIN (
SELECT userid
,transactions
,dated
FROM schema.transactions
) s
ON u.userid = s.userid
AND s.dated BETWEEN u.created AND DATE_ADD(u.created, 61)
GROUP BY u.userid
) t1
LEFT JOIN (
SELECT u.userid
,SUM(v.vouchers) AS total_vouchers
,COUNT(DISTINCT s.dated) AS days_redeemed
FROM (
SELECT userid
,created
FROM schema.user_info
) u
LEFT JOIN (
SELECT userid
,vouchers
,dated
FROM schema.vouchers
) v
ON u.userid = v.userid
AND v.dated BETWEEN u.created AND DATE_ADD(u.created, 61)
GROUP BY u.userid
) t2
ON t1.userid = t2.userid

How to loop through a cte in main query

I am trying to rank users on my system based on the user's totalArticleViews and the user's totalArticles on my system. The ranking should be based on the formula (totalArticleViews + ( totalArticles * 500 )) / 100
I have a system that allows users to post articles, a record is created every time any of these articles are read by anyone. My database has the following tables. users, articles, reads.
I have tried to get the views to insert into the formula, but i'm having issues getting all the users articles and multiplying it by 500 to insert into the formula to rank them all
with article_views AS (
SELECT article_id, COUNT(reads.id) AS views, 1 * 500 AS points
FROM reads
WHERE article_id IN (
SELECT id FROM articles WHERE articles.published_on IS NOT NULL AND
articles.deleted_at IS NULL
)
GROUP BY article_id
),
published AS (
SELECT COUNT(articles.id) AS TotalArticle, COUNT(articles.id) * 500 AS
points
FROM articles
WHERE published_on IS NOT NULL AND deleted_at IS NULL
GROUP BY articles.user_id
)
SELECT
users.id AS user_id,
ROUND((SUM(article_views.views) + () ) / 100.0, 2) AS points,
ROW_NUMBER() OVER (ORDER BY ROUND((SUM(article_views.views) + ()) /
100.0, 2) DESC)
FROM users
LEFT JOIN articles ON users.id = articles.user_id
LEFT JOIN reads ON articles.id = reads.article_id
LEFT JOIN article_views ON reads.article_id = article_views.article_id
WHERE
users.id IN (SELECT user_id FROM role_user WHERE role_id = 2)
AND status = 'ACTIVE'
GROUP BY users.id
ORDER BY points DESC NULLS LAST
I'm stuck at this point
(SUM(article_views.views) + () ) / 100.0, 2)
Simply use the published CTE by including the GROUP BY column user_id in SELECT and then joining published to users by this field in main level query.
WITH article_views AS (
SELECT r.article_id,
COUNT(r.id) AS views,
1 * 500 AS points
FROM reads r
WHERE r.article_id IN (
SELECT id
FROM articles a
WHERE a.published_on IS NOT NULL
AND a.deleted_at IS NULL
)
GROUP BY r.article_id
),
published AS (
SELECT a.user_id,
COUNT(a.id) AS TotalArticle,
COUNT(a.id) * 500 AS points
FROM articles a
WHERE a.published_on IS NOT NULL
AND a.deleted_at IS NULL
GROUP BY a.user_id
)
SELECT u.id AS user_id,
ROUND((SUM(av.views) + (p.TotalArticle)) / 100.0, 2) AS points,
ROW_NUMBER() OVER (ORDER BY ROUND((SUM(av.views) + (p.points))
/ 100.0, 2) DESC) AS rn
FROM users u
LEFT JOIN articles a ON u.id = a.user_id
LEFT JOIN reads r ON a.id = r.article_id
LEFT JOIN article_views av ON r.article_id = av.article_id
LEFT JOIN published p ON u.id = p.user_id
WHERE u.id IN (
SELECT user_id FROM role_user WHERE role_id = 2
)
AND u.status = 'ACTIVE'
GROUP BY u.id
ORDER BY points DESC NULLS LAST

How to optimize multiple subqueries to the same data set

Imagine I have a query like the following one:
SELECT
u.ID,
( SELECT
COUNT(*)
FROM
POSTS p
WHERE
p.USER_ID = u.ID
AND p.TYPE = 1
) AS interesting_posts,
( SELECT
COUNT(*)
FROM
POSTS p
WHERE
p.USER_ID = u.ID
AND p.TYPE = 2
) AS boring_posts,
( SELECT
COUNT(*)
FROM
COMMENTS c
WHERE
c.USER_ID = u.ID
AND c.TYPE = 1
) AS interesting_comments,
( SELECT
COUNT(*)
FROM
COMMENTS c
WHERE
c.USER_ID = u.ID
AND c.TYPE = 2
) AS boring_comments
FROM
USERS u;
( Hopefully it's correct because I just came up with it and didn't test it )
where I try to calculate the number of interesting and boring posts and comments that the user has.
Now, the problem with this query is that we have 2 sequential scans on both the posts and comments table and I wonder if there is a way to avoid that?
I could probably LEFT JOIN both posts and comments to the users table and do some aggregation but it's gonna generate a lot of rows before aggregation and I am not sure if that's a good way to go.
Aggregate posts and comments and outer join them to the users table.
select
u.id as user_id,
coaleasce(p.interesting, 0) as interesting_posts,
coaleasce(p.boring, 0) as boring_posts,
coaleasce(c.interesting, 0) as interesting_comments,
coaleasce(c.boring, 0) as boring_comments
from users u
left join
(
select
user_id,
count(case when type = 1 then 1 end) as interesting,
count(case when type = 2 then 1 end) as boring
from posts
group by user_id
) p on p.user_id = u.id
left join
(
select
user_id,
count(case when type = 1 then 1 end) as interesting,
count(case when type = 2 then 1 end) as boring
from comments
group by user_id
) c on c.user_id = u.id;
compare results and execution plan (here you scan posts once):
with c as (
select distinct
count(1) filter (where TYPE = 1) over (partition by USER_ID) interesting_posts
, count(1) filter (where TYPE = 2) over (partition by USER_ID) boring_posts
, USER_ID
)
, p as (select USER_ID,max(interesting_posts) interesting_posts, max(boring_posts) boring_posts from c)
SELECT
u.ID, interesting_posts,boring_posts
, ( SELECT
COUNT(*)
FROM
COMMENTS c
WHERE
c.USER_ID = u.ID
) AS comments
FROM
USERS u
JOIN p on p.USER_ID = u.ID

PostgreSQL value of COUNT multiply by a number

I'm a Rails developer and I'm new to writing SQL script. I have users, portfolios, views, favorites and endorsements tables. users have many portfolios and many endorsements.portfolioshas manyviews, manyfavoritesand manyendorsements`.
Here is the script I wrote
top_users = User.find_by_sql(
"SELECT users.*,
COUNT(portfolios.id) +
COUNT(views.id) +
COUNT(favorites.id) +
COUNT(case when endorsements.portfolio_id = portfolios.id AND portfolios.user_id = users.id then 1 else 0 end) +
COUNT(case when endorsements.user_id = users.id then 1 else 0 end)
AS total
FROM users
LEFT OUTER JOIN portfolios ON portfolios.user_id = users.id
LEFT OUTER JOIN views ON views.subject_id = portfolios.id AND portfolios.user_id = users.id
LEFT OUTER JOIN favorites ON favorites.subject_id = portfolios.id AND portfolios.user_id = users.id
LEFT OUTER JOIN endorsements ON endorsements.portfolio_id = portfolios.id AND portfolios.user_id = users.id OR endorsements.user_id = users.id
GROUP BY users.id
ORDER BY total DESC LIMIT 8"
)
total count is not fully what I expect because each portfolio is worth 50 points, view is 2 points, favorite is worth 10 points, and endorsement is worth 2 points.
Let say we have 3 users
user | COUNT 1 | COUNT 2 | COUNT 3 | COUNT 4 | COUNT 5
-------------------------------------------------------
1 | 0 | 0 | 0 | 0 | 10
2 | 2 | 2 | 2 | 2 | 0
3 | 5 | 0 | 0 | 0 | 0
With my script, the result come in the order of user 1, user 2, then users 3. However base on the points system, it should come out in the order of user 3, user 2 then user 1 because user 3 total points is 250, users 2 total is 128 and user 1 is 20, and this is the order I expect. I did tried this:
top_users = User.find_by_sql(
"SELECT users.*,
COUNT(portfolios.id) * 50 +
COUNT(views.id) * 2 +
COUNT(favorites.id) * 10 +
COUNT(case when endorsements.portfolio_id = portfolios.id AND portfolios.user_id = users.id then 1 else 0 end) * 2 +
COUNT(case when endorsements.user_id = users.id then 1 else 0 end) * 2
AS total
FROM users
LEFT OUTER JOIN portfolios ON portfolios.user_id = users.id
LEFT OUTER JOIN views ON views.subject_id = portfolios.id AND portfolios.user_id = users.id
LEFT OUTER JOIN favorites ON favorites.subject_id = portfolios.id AND portfolios.user_id = users.id
LEFT OUTER JOIN endorsements ON endorsements.portfolio_id = portfolios.id AND portfolios.user_id = users.id OR endorsements.user_id = users.id
GROUP BY users.id
ORDER BY total DESC LIMIT 8"
)
I tried the above script but does not work for me. Any thoughts or help would be much appreciated. Again, I'm very new with raw SQL script.
UPDATED
I ended up doing this to avoid double count issue when LEFT INNTER JOIN multiple table.
SELECT t4.id, t4.username, t4.avatar_url, p_count * 50 + ue_count * 2 + fav_count * 10 + ep_count * 2 + COUNT(vp.id) * 2 as point
FROM (SELECT t3.id, t3.username, t3.avatar_url, p_count, ue_count, fav_count, COUNT(ep.id) as ep_count
FROM( SELECT t2.id, t2.username, t2.avatar_url, p_count, ue_count, COUNT(fav_p.id) as fav_count
FROM (SELECT t1.id, t1.username, t1.avatar_url, p_count, COUNT(e.user_id) as ue_count
FROM (SELECT u.*, COUNT(p.user_id) as p_count
FROM users u
LEFT OUTER JOIN (SELECT user_id, id
FROM portfolios) p
ON u.id = p.user_id
GROUP BY u.id) t1
LEFT OUTER JOIN (SELECT user_id
FROM endorsements) e
ON e.user_id = t1.id
GROUP BY t1.id, t1.username, t1.avatar_url, p_count ) t2
LEFT OUTER JOIN (SELECT p.id, p.user_id
FROM portfolios p
INNER JOIN favorites
ON favorites.subject_id = p.id) fav_p
ON fav_p.user_id = t2.id
GROUP BY t2.id, t2.username, t2.avatar_url, p_count, ue_count) t3
LEFT OUTER JOIN (SELECT p.id, p.user_id
FROM portfolios p
INNER JOIN endorsements
ON endorsements.portfolio_id = p.id) ep
ON ep.user_id = t3.id
GROUP BY t3.id, t3.username, t3.avatar_url, p_count, ue_count, fav_count) t4
LEFT OUTER JOIN (SELECT p.id, p.user_id
FROM portfolios p
INNER JOIN views
ON views.subject_id = p.id) vp
ON vp.user_id = t4.id
GROUP BY t4.id, t4.username, t4.avatar_url, p_count, ue_count, fav_count, ep_count
ORDER BY point DESC
LIMIT 8
Since I'm not familiar with SQL script as I'm a very beginner. The updated code above solve my problem but I wonder how bad the performance would be if I do that. Thanks for any inputs.
After reading through a few more times, I think I got what you were saying. Try this.
SELECT users.id
,COUNT(portfolios.id) * 50 +
COUNT(VIEWS.id) * 2 +
COUNT(favorites.id) * 10 +
COUNT(e1.id) * 2 +
COUNT(e2.id) * 2
AS total
FROM users
LEFT JOIN portfolios
ON portfolios.user_id = users.id
LEFT JOIN VIEWS
ON VIEWS.subject_id = portfolios.id
LEFT JOIN favorites
ON favorites.subject_id = portfolios.id
LEFT JOIN endorsements e1
ON e1.portfolio_id = portfolios.id
LEFT JOIN endorsements e2
ON e2.user_id = users.id
GROUP BY users.id
ORDER BY total DESC LIMIT 8
I assumed that endorsements related to either a user OR a portfolio. I don't know what your values look like in your tables but in theory, since an endorsement relates to a user or a portfolio but a portfolio always relates to a user it wouldn't be strictly necessary to join on both user_id or portfolio_id. In a case like that it's find to join the users table to the endorsements as e1 and the portfolios table to the endorsements as e2 and just add them.
First of all, unless your 'users" table only has one column, this breaks the rule that when you have aggregate functions in your select clause, every column that isn't passed into an aggregate function, has to be in your group by clause.
Second I don't think the case statements inside your COUNT() functions make sense. They are the same statements in your join. You should be able to just count the endoresements.Id and the Portfolios.id, I think. I may be a little fuzzy on what you're looking for. Also, what is a subject_id? is that an id field that determines whether an endorsement belongs to a user or a portfolio?
does a portfolio have both a user_id and a portfolio_id or is it one or the other but not both?
Any time you have multiple outer joins in a GROUP BY query, you have to be careful of double-counting. So I would change COUNT(portfolios.id) to COUNT(DISTINCT portfolios.id) etc. That should also remove the need for your CASE statements. Once you have those counts, you can multiply by their score values, as you say in your question (* 2 or * 50 or whatever you like).

SQL SELECT with m:n relationship

I have m:n relationship between users and tags. One user can have m tags, and one tag can belong to n users. Tables look something like this:
USER:
ID
USER_NAME
USER_HAS_TAG:
USER_ID
TAG_ID
TAG:
ID
TAG_NAME
Let's say that I need to select all users, who have tags "apple", "orange" AND "banana". What would be the most effective way to accomplish this using SQL (MySQL DB)?
SELECT u.*
FROM (
SELECT user_id
FROM tag t
JOIN user_has_tag uht
ON uht.tag_id = t.id
WHERE tag_name IN ('apple', 'orange', 'banana')
GROUP BY
user_id
HAVING COUNT(*) = 3
) q
JOIN user u
ON u.id = q.user_id
By removing HAVING COUNT(*), you get OR instead of AND (though it will not be the most efficient way)
By replacing 3 with 2, you get users that have exactly two of three tags defined.
By replacing = 3 with >= 2, you get users that have at least two of three tags defined.
In addition to the other good answers, it's also possible to check the condition in a WHERE clause:
select *
from user u
where 3 = (
select count(distinct t.id)
from user_has_tag uht
inner join tag t on t.id = uht.tag_id
where t.name in ('apple', 'orange', 'banana')
and uht.user_id = u.userid
)
The count(distinct ...) makes sure a tag is counted only once, even if the user has multiple 'banana' tags.
By the way, the site fruitoverflow.com is not yet registered :)
You can do it all with joins...
select u.*
from user u
inner join user_has_tag ut1 on u.id = ut1.user_id
inner join tag t1 on ut1.tag_id = t1.id and t1.tag_name = 'apple'
inner join user_has_tag ut2 on u.id = ut2.user_id
inner join tag t2 on ut2.tag_id = t2.id and t2.tag_name = 'orange'
inner join user_has_tag ut3 on u.id = ut3.user_id
inner join tag t3 on ut3.tag_id = t3.id and t3.tag_name = 'banana'
SELECT *
FROM USER u
INNER JOIN USER_HAS_TAG uht
ON u.id = uht.user_id
INNER JOIN TAG t
ON uht.TAG_ID = t.ID
WHERE t.TAG_NAME IN ('apple','orange','banana')