Problems with multiple joins and a sum - sql

If have the following three PostgreSQL tables:
Post table:
postid | title | author | created
Vote table:
postid | username | vote
where vote is equal to 1 if the user voted the post up, 0 if the user did not vote and -1 if the user voted the post down.
Comment table:
commentID | parentID | postID | content | author | created
where the parentID is null if the comment is not a reply.
I want to receive now for every post its title, author, created date, sum of all votes and
the vote of the current logged in user and the number of comments.
I already had problems with the vote of the user and asked here and someone helped me to get the following query:
SELECT post.postID as postID, post.title as title, post.author as author,
post.created as created,
COALESCE(sum(votes.vote), 0) as voteCount,
COALESCE(sum(votes.vote) FILTER (WHERE votes.username = :username), 0) as userVote
FROM post
LEFT JOIN votes ON post.postID = votes.postID
GROUP BY post.postID
ORDER BY voteCount DESC
Now I tried another LEFT JOIN to fetch the number of comments like this:
COUNT(DISTINCT comments) FILTER (WHERE comments.parentID IS NULL) as numComments
LEFT JOIN comments on post.postID = comments.postID
However, while the number of comments work, the number of votes on each post is wrong since
due to the other join the rows seem to appear multiple times yielding a wrong sum and I have some trouble figuring out a way to solve this.
I already tried to fetch the number of comments as a subquery so that it is independent from the
number of votes without success.
Any further help would be very appreciated! :-)

You would typically pre-aggregate in subqueries before joining, like so:
SELECT p.*
COALESCE(v.voteCount, 0) as voteCount,
COALESCE(v.userVote, 0) as userVote,
COALESCE(c.numComments, 0) as numComments
FROM post p
LEFT JOIN (
SELECT postID,
SUM(vote) as voteCount,
SUM(vote) FILTER (WHERE username = :username) userVote
FROM votes
GROUP BY postID
) v ON v.postID = p.postID
LEFT JOIN (
SELECT postID, count(*) numComments
FROM comments
WHERE parentID IS NULL
GROUP BY postID
) c ON c.postID = p.postID
ORDER BY voteCount DESC

Count the values separately. The joins are causing a Cartesian product. This is a place where correlated subqueries or lateral joins help:
SELECT p.*, v.*, c.*
FROM post p CROSS JOIN LATERAL
(SELECT SUM(v.vote) as voteCount,
SUM(v.vote) FILTER (WHERE v.username = :username), 0) as userVote
FROM votes v
WHERE p.postID = v.postID
) v CROSS JOIN LATERAL
(SELECT SUM(c.vote) as commentCount,
SUM(c.vote) FILTER (WHERE c.username = :username), 0) as userVote
FROM comments c
WHERE p.postID = c.postID
) c
ORDER BY voteCount DESC;

Related

Optimizing a nested SQL query through (preferably) joins

I am currently trying to fetch a list of Posts from a database, along with the likes, dislikes and checking whether the user has liked the post or not.
What I have tried:
Here's what the first version of the query looked like:
SELECT
announcements.*,
users.FIRSTNAME,
users.LASTNAME,
((SELECT COUNT(USER_ID) FROM likes_posts WHERE POST_ID = announcements.ID) - (SELECT COUNT(USER_ID) FROM dislikes_posts WHERE POST_ID = announcements.ID)) as TLIKES,
(SELECT COUNT(USER_ID) FROM likes_posts WHERE USER_ID = ? AND POST_ID = announcements.ID) AS USER_LIKED,
(SELECT COUNT(USER_ID) FROM dislikes_posts WHERE USER_ID = ? AND POST_ID = announcements.ID) AS USER_DISLIKED FROM announcements LEFT JOIN users ON announcements.OWNER_ID = users.ID
WHERE announcements.CHANNEL = ? AND announcements.ID < ? ORDER BY announcements.ID DESC
I have tried optimizing it through serval JOINS, but the results are quite messed up:
SELECT
announcements.*,
users.FIRSTNAME,
users.LASTNAME,
COUNT(likes_posts.USER_ID) AS TLikes,
COUNT(dislikes_posts.USER_ID) AS TDislikes,
UserLiked.ID AS userLiked,
UserDisliked.ID AS userDisliked
FROM announcements
LEFT JOIN likes_posts ON likes_posts.POST_ID = announcements.ID
LEFT JOIN dislikes_posts ON dislikes_posts.POST_ID = announcements.ID
LEFT JOIN likes_posts AS UserLiked ON UserLiked.USER_ID = ?
LEFT JOIN likes_posts AS UserDisliked ON UserDisliked.USER_ID = ?
LEFT JOIN users ON announcements.OWNER_ID = users.ID
WHERE announcements.CHANNEL = ? AND announcements.ID < ?
GROUP BY announcements.ID
ORDER BY announcements.ID DESC
Queries' results
The first query manages to constantly fetch the correct number of likes and dislikes (example: 5 and 3).
For the second one, however, it constantly fetches a number that is the double of the current likes or dislikes, whichever is bigger (eg. if there are 5 likes and 6 dislikes, the result would be 16 likes and 16 dislikes)
Problem
I'm guessing the second query is somehow fetching the likes_posts table 2 times, which causes the discrepancy between the likes and dislikes.
Here's one way you could do it, by aggregating the like and dislike counts first, then joining them to the base table. This way you're only doing the counts once each instead of twice
SELECT
a.*,
u.FIRSTNAME,
u.LASTNAME,
coalesce(likes.cnt, 0) - coalesce(dislikes.cnt, 0) as TLIKES,
coalesce(likes.cnt, 0) AS USER_LIKED,
coalesce(dislikes.cnt, 0) AS USER_DISLIKED
FROM
announcements a
LEFT JOIN
users u ON a.OWNER_ID = u.ID
left join
(
select post_id, count(user_id) cnt
from likes_posts
group by post_id
) likes on likes.post_id = a.id
left join
(
select post_id, count(user_id) cnt
from dislikes_posts
group by post_id
) dislikes on dislikes.post_id = a.id
WHERE
announcements.CHANNEL = ? AND announcements.ID < ?
ORDER BY
announcements.ID DESC

Get all posts with sum of votes and if current user voted each post

If have the following two PostgreSQL tables:
Post table:
postid | title | author | created
Vote table:
postid | username | vote
where vote is equal to 1 if the user voted the post up, 0 if the user did not vote and -1 if the user voted the post down.
I want to receive now for every post its title, author, created date, sum of all votes and
the vote of the current logged in user.
I wrote a query to receive everything except the vote of the current user like this:
SELECT post.postID as postID, post.title as title, post.author as author,
COALESCE(sum(votes.vote), 0) as voteCount, post.created as created
FROM post LEFT JOIN votes ON post.postID = votes.postID
GROUP BY post.postID ORDER BY voteCount DESC
I tried to fetch the current userVote by running a subquery like
(SELECT vote FROM votes WHERE postID = post.postID AND username = :username) as userVote
However it does not seem to work and I am unable to figure out why and how to fix it.
Any help would be very appreciated :)
You are almost there with your query. Just use filter clause to get current user's vote count
SELECT
post.postID as postID,
post.title as title,
post.author as author,
post.created as created,
COALESCE(sum(votes.vote), 0) as voteCount,
COALESCE(sum(votes.vote) filter (where votes.username= 'username'), 0) as userVote -- in '' just provide username for current login user
FROM post
LEFT JOIN votes ON post.postID = votes.postID
GROUP BY 1,2,3,4
ORDER BY 5 DESC
Another way is by using one more left join like below:
SELECT
p.postID as postID,
p.title as title,
p.author as author,
p.created as created,
COALESCE(sum(v1.vote), 0) as voteCount,
COALESCE(v2.vote , 0) as userVote -- in '' just provide username for current login user
FROM post p
LEFT JOIN votes v1 ON p.postID = v1.postID
LEFT JOIN votes v2 on p.postID = v2.postID and v2.username='username'
GROUP BY 1,2,3,4,v2.vote
ORDER BY 5 DESC
DEMO

Group by on join and calculate max of groups

I have 3 tables as described below:
All three tables and output image
posts post table
post_comments posts comments
comments comments
Now I want to fetch the posts that have highest liked comments and the status of that comment should be active in Postgres.
OUTPUT:
posts resultant posts
NOTE: Since for post 1, the highest liked comment is inactive.
I've tried something like this:
select "posts".*
from "posts"
inner join (select id, max(likes) l from comments innner join post_comments on comments.id = post_comments.alert_id and post_comments.post_id = posts.id) a on posts.id = a.cid ...
This is not complete but I'm unable to do this.
In Postgres, you can get the active comment with the most likes for each post using distinct on:
select distinct on (pc.post_id) pc.*
from post_comments pc join
comments c
on pc.comment_id = c.id
where c.status = 'active'
order by pc.post_id, c.likes desc;
I think this is quite related to what you want.
Try something like this:
SELECT posts.*, MAX(likes) l
FROM posts
JOIN post_comments ON post_id = posts.id
LEFT JOIN comments ON comment_id = comments.id
GROUP BY posts.id

TSQL left join and only last row from right

I'm writing sql query to get post and only last comment of this post(if exists).
But I can't find a way to limit only 1 row for right column in left join.
Here is sample of this query.
SELECT post.id, post.title,comment.id,comment.message
from post
left outer join comment
on post.id=comment.post_id
If post has 3 comments I get 3 rows with this post, but I want only 1 row with last comment(ordered by date).
Can somebody help me with this query?
SELECT post.id, post.title, comment.id, comment.message
FROM post
OUTER APPLY
(
SELECT TOP 1 *
FROM comment с
WHERE c.post_id = post.id
ORDER BY
date DESC
) comment
or
SELECT *
FROM (
SELECT post.id, post.title, comment.id, comment.message,
ROW_NUMBER() OVER (PARTITION BY post.id ORDER BY comment.date DESC) AS rn
FROM post
LEFT JOIN
comment
ON comment.post_id = post.id
) q
WHERE rn = 1
The former is more efficient for few posts with many comments in each; the latter is more efficient for many posts with few comments in each.
Subquery:
SELECT p.id, p.title, c.id, c.message
FROM post p
LEFT join comment c
ON c.post_id = p.id AND c.id =
(SELECT MAX(c2.id) FROM comment c2 WHERE c2.post_id = p.id)
You'll want to join to a sub-query that returns the last comment for the post. For example:
select post.id, post.title. lastpostid, lastcommentmessage
from post
inner join
(
select post.id as lastpostid, max(comment.id) as lastcommentmessage
from post
inner join comment on commment.post_id = post.id
group by post.id
) lastcomment
on lastpostid = post.id
Couple of options....
One way is to do the JOIN on:
SELECT TOP 1 comment.message FROM comment ORDER BY comment.id DESC
(note I'm assuming that comment.id is an Identity field)
what version of SQL Server? If you have the Row_Number() function available you can sort your comments by whatever "first" means to you and then just add a "where RN=1" clause. Don't have a handy example or the right syntax off the top of my head but do have tons of queries that do exactly this. Other posts are all in the 1,000's of ways you could do this.
I'd say profile it and see which one performs best for you.
You didn't say the specific name of your date field, so I filled in with [DateCreated]. This is essentially the same as AGoodDisplayName's post above, but using the date field instead of relying on the ID column ordering.
SELECT post.id, post.title, comment.id, comment.message
FROM post p
LEFT OUTER JOIN comment
ON comment.id = (
SELECT TOP 1 id
FROM comment
WHERE p.id = post_id
ORDER BY [DateCreated] ASC
)

SQL Query Question: X has many Y. Get all X and get only the newest Y per X

Suppose we have two tables. Post and Comment. Post has many Comments. Pretend they are somewhat filled so that the number of comments per post is varied. I want a query which will grab all posts but only the newest comment per post.
I have been directed to joins and sub queries but I can't figure it out.
Example Output:
Post1:
Comment4 (newest for post1)
Post2:
Comment2 (newest for post2)
Post3:
Comment 10 (newest for post3)
etc...
Any help would be greatly appreciated. Thanks.
This answer assumes that you have a unique identifier for each comment, and that it's an increasing number. That is, later posts have higher numbers than earlier posts. Doesn't have to be sequential, just have to be corresponding to order of input.
First, do a query that extracts the maximum comment id, grouped by post id.
Something like this:
SELECT MAX(ID) MaxCommentID, PostID
FROM Comments
GROUP BY PostID
This will give you a list of post id's, and the highest (latest) comment id for each one.
Then you join with this, to extract the rest of the data from the comments, for those id's.
SELECT C1.*, C2.PostID
FROM Comments AS C1
INNER JOIN (
SELECT MAX(ID) MaxCommentID, PostID
FROM Comments
GROUP BY PostID
) AS C2 ON C1.CommentID = C2.MaxCommentID
Then, you join with the posts, to get the information about those posts.
SELECT C1.*, P.*
FROM Comments AS C1
INNER JOIN (
SELECT MAX(ID) MaxCommentID, PostID
FROM Comments
GROUP BY PostID
) AS C2 ON C1.CommentID = C2.MaxCommentID
INNER JOIN Posts AS P ON C2.PostID = P.ID
An alternate approach doesn't use the PostID of the inner query at all. First, pick out the maximum comment id for all unique posts, but don't care about which post, we know they're unique.
SELECT MAX(ID) AS MaxCommentID
FROM Comments
GROUP BY PostID
Then do an IN clause, to get the rest of the data for those comments:
SELECT C1.*
FROM Comments
WHERE C1.ID IN (
SELECT MAX(ID) AS MaxCommentID
FROM Comments
GROUP BY PostID
)
Then simply join in the posts:
SELECT C1.*, P.*
FROM Comments AS C1
INNER JOIN Posts AS P ON C1.PostID = P.ID
WHERE C1.ID IN (
SELECT MAX(ID) AS MaxCommentID
FROM Comments
GROUP BY PostID
)
Select the newest comment from a subquery
e.g
Select *
from Posts po
Inner Join
(
Select CommentThread, CommentDate, CommentBody, Post from comments a
inner join
(select commentthread, max(commentdate)
from comments b
group by commentthread)
on a.commentthread = b.commentthread
and a.commentdate = b.commentdate
) co
on po.Post = co.post
select *
from post
, comments
where post.post_id = comments.post_id
and comments.comments_id = (select max(z.comments_id) from comments z where z.post_id = post.post_id)
And if you should still be stuck with an old mysql version, that doesn't know subqueries you can use something likeSELECT
p.id, c1.id
FROM
posts as p
LEFT JOIN
comments as c1
ON
p.id = c1.postId
LEFT JOIN
comments as c2
ON
c1.postId = c2.postId
AND c1.id < c2.id
WHERE
isnull(c2.id)
ORDER BY
p.idEither way, check your query with EXPLAIN for performance issues.