Help with SQL Join on two tables - sql

I have two tables, one is a table of forum threads. It has a last post date column.
Another table has PostID, UserId, and DateViewed.
I want to join these tables so I can compare DateViewed and LastPostDate for the current user. However, if they have never viewed the thread, there will not be a row in the 2nd table.
This seems easy but I cant wrap my head around it. Advice please.
Thanks in advance.

What is it that you're trying to do specifically - determine if there are unread posts?
You just need to use an outer join:
SELECT p.PostID, p.LastPostDate, ...,
CASE
WHEN v.DateViewed IS NULL OR v.DateViewed < p.LastPostDate THEN 1
ELSE 0
END AS Unread
FROM Posts p
LEFT JOIN PostViews v
ON v.PostID = p.PostID
AND v.UserID = #UserID
Note that I've placed the UserID test in the JOIN condition; if you put it in the WHERE predicate then you'll get no results because there will be no matching rows in the PostViews table.

So you're thinking something like:
SELECT t.UserID, t.PostID, t.LastPostDate, v.DateViewed
FROM dbo.Threads t
LEFT JOIN dbo.Views v ON v.PostID = t.PostID
AND v.UserID = t.UserID
WHERE t.UserID = #user;
v.DateViewed will be NULL if there's no corresponding row in Views.
If you have lots of rows in Views, you may prefer to do something like:
SELECT t.UserID, t.PostID, t.LastPostDate, v.DateViewed
FROM dbo.Threads t
CROSS APPLY (SELECT MAX(vw.DateViewed) as DateViewed
FROM dbo.Views vw
WHERE vw.PostID = t.PostID
AND vw.UserID = t.UserID
) v
WHERE t.UserID = #user;

The key is to use a LEFT JOIN, which will cause non-existent rows on the right side to come up as all NULL:
SELECT threads.lastpostdate, posts.dateviewed
FROM threads
LEFT JOIN posts
ON threads.id=posts.postid

Related

How to join three tables having relation parent-child-child's child. And I want to access all records related to parent

I have three tables:
articles(id,title,message)
comments(id,article_id,commentedUser_id,comment)
comment_likes(id, likedUser_id, comment_id, action_like, action_dislike)
I want to show comments.id, comments.commentedUser_id, comments.comment, ( Select count(action_like) where action_like="like") as likes and comment_id=comments.id where comments.article_id=article.id
Actually I want to count all action_likes that related to any comment. And show all all comments of articles.
action_likes having only two values null or like
SELECT c.id , c.CommentedUser_id , c.comment , (cl.COUNT(action_like) WHERE action_like='like' AND comment_id='c.id') as likes
FROM comment_likes as cl
LEFT JOIN comments as c ON c.id=cl.comment_id
WHERE c.article_id=article.id
It shows nothing, I know I'm doing wrong way, that was just that I want say
I guess you are looking for something like below. This will return Article/Comment wise LIKE count.
SELECT
a.id article_id,
c.id comment_id,
c.CommentedUser_id ,
c.comment ,
COUNT (CASE WHEN action_like='like' THEN 1 ELSE NULL END) as likes
FROM article a
INNER JOIN comments C ON a.id = c.article_id
LEFT JOIN comment_likes as cl ON c.id=cl.comment_id
GROUP BY a.id,c.id , c.CommentedUser_id , c.comment
IF you need results for specific Article, you can add WHERE clause before the GROUP BY section like - WHERE a.id = N
I would recommend a correlated subquery for this:
SELECT a.id as article_id, c.id as comment_id,
c.CommentedUser_id, c.comment,
(SELECT COUNT(*)
FROM comment_likes cl
WHERE cl.comment_id = c.id AND
cl.action_like = 'like'
) as num_likes
FROM article a INNER JOIN
comments c
ON a.id = c.article_id;
This is a case where a correlated subquery often has noticeably better performance than an outer aggregation, particularly with the right index. The index you want is on comment_likes(comment_id, action_like).
Why is the performance better? Most databases will implement the group by by sorting the data. Sorting is an expensive operation that grows super-linearly -- that is, twice as much data takes more than twice as long to sort.
The correlated subquery breaks the problem down into smaller pieces. In fact, no sorting should be necessary -- just scanning the index and counting the matching rows.

SQL query returns same value in each column

I'm having an issue with SQL joins in a query that is designed to query the Post table having been joined to the comment, click and vote table and return stats about each posts activity. My query below is what I've been using.
SELECT
p.PostID,
p.Title,
CASE
WHEN COUNT(cm.CommentID) IS NULL THEN 0
ELSE COUNT(cm.CommentID)
END AS CommentCount,
CASE
WHEN COUNT(cl.ClickID) IS NULL THEN 0
ELSE COUNT(cl.ClickID)
END AS ClickCount,
CASE
WHEN SUM(vt.Value) IS NULL THEN 0
ELSE SUM(vt.Value)
END AS VoteScore
FROM
Post p
LEFT OUTER JOIN Comment cm ON p.PostID = cm.PostID
LEFT OUTER JOIN Click cl ON p.PostID = cl.PostID
LEFT OUTER JOIN Vote vt ON p.PostID = vt.PostID
GROUP BY
p.PostID,
p.Title
Yields the following result
| PostID | CommentCount | ClickCount | VoteScore |
|--------|--------------|------------|-----------|
| 41 | 60| 60| 60|
| 50 | 1683| 1683| 1683|
This, I know isn't correct. When comment out all but one of the joins:
SELECT
p.PostID
,p.Title
,CASE
WHEN COUNT(cm.CommentID) IS NULL THEN 0
ELSE COUNT(cm.CommentID)
END AS CommentCount
/*
,CASE
WHEN COUNT(cl.ClickID) IS NULL THEN 0
ELSE COUNT(cl.ClickID)
END AS ClickCount
,CASE
WHEN SUM(vt.Value) IS NULL THEN 0
ELSE SUM(vt.Value)
END AS VoteScore
*/
FROM
Post p
LEFT OUTER JOIN Comment cm ON p.PostID = cm.PostID
/*
LEFT OUTER JOIN Click cl ON p.PostID = cl.PostID
LEFT OUTER JOIN Vote vt ON p.PostID = vt.PostID
*/
GROUP BY
p.PostID,
p.Title
I get
| PostID | CommentCount |
|--------|--------------|
| 41 | 3|
Which is correct. Any ideas what I've done wrong?
Thanks.
The result that is being returned is expected because the query is producing a Cartesian (or semi-Cartesian) product. The query is basically telling MySQL to perform "cross join" operations on the rows returned from comment, click and vote.
Each row returned from comment (for a given postid) gets matched to each row from click (for the same postid). And then each of the rows in that result gets matched to each row from vote (for the same postid).
So, for two rows from comment, and three rows from click and four rows from vote, that will return a total of 24 (=2x3x4) rows.
The usual pattern for fixing this is to avoid the cross join operations.
There are a couple of approaches to do that.
correlated subqueries in select list
If you only need a single aggregate (e.g. COUNT or SUM) from each of the three tables, you could remove the joins, and use correlated subqueries in the SELECT list. Write a query that gets a count for a single postid, for example
SELECT COUNT(1)
FROM comment cmt
WHERE cmt.postid = ?
Then wrap that query in parens, and reference it in the SELECT list of another query, and replace the question mark to a reference to postid from the table referenced in the outer query.
SELECT p.postid
, ( SELECT COUNT(1)
FROM comment cmt
WHERE cmt.postid = p.postid
) AS comment_count
FROM post p
Repeat the same pattern to get "counts" from click and vote.
The downside of this approach is that the subquery in the SELECT list will get executed for each row returned by the outer query. So this can get expensive if the outer query returns a lot of rows. If comment is a large table, then to get reasonable performance, it's critical that there's appropriate index available on comment.
pre-aggregate in inline views
Another approach is to "pre-aggregate" the results inline views. Write a query that returns the comment count for postid. For example
SELECT cmt.postid
, COUNT(1)
FROM comment cmt
GROUP BY cmt.postid
Wrap that query in parens and reference it in the FROM clause of another query, assign an alias. The inline view query basically takes the place of a table in the outer query.
SELECT p.postid
, cm.postid
, cm.comment_count
FROM post p
LEFT
JOIN ( SELECT cmt.postid
, COUNT(1) AS comment_count
FROM comment cmt
GROUP BY cmt.postid
) cm
ON cm.postid = p.postid
And repeat that same pattern for click and vote. The trick here is the GROUP BY clause in the inline view query that guarantees that it won't return any duplicate postid values. And a cartesian product (cross join) to that won't produce duplicates.
The downside of this approach is that the derived table won't be indexed. So for a large number of postid, it may be expensive to perform the join in the outer query. (More recent versions of MySQL partially address this downside, by automatically creating an appropriate index.)
(We can workaround this limitation by creating a temporary able with an appropriate index. But this approach requires additional SQL statements, and is not entirely suitable for an adhoc single statement. But for batch processing of large sets, the additional complexity can be worth it for some significant performance gains.
collapse Cartesian product by DISTINCT values
As an entirely different approach, leave your query like it is, with the cross join operations, and allow MySQL to produce the Cartesian product. Then the aggregates in the SELECT list can filter out the duplicates. This requires that you have a column (or expression produced) from comment that is UNIQUE for each row in comment for a given postid.
SELECT p.postid
, COUNT(DISTINCT c.id) AS comment_count
FROM post p
LEFT
JOIN comment c
ON c.postid = p.postid
GROUP BY p.postid
The big downside of this approach is that it has the potential to produce a huge intermediate result, which is then "collapsed" with a "Using filesort" operation (to satisfy the GROUP BY). And this can be pretty expensive for large sets.
This isn't an exhaustive list of all possible query patterns to achieve the result you are looking to return. Just a representative sampling.
You probably want something like this:
SELECT
p.PostID,
p.Title,
(SELECT COUNT(*) FROM Comment cm
WHERE cm.PostID = p.PostID) AS CommentCount,
(SELECT COUNT(*) FROM Click cl
WHERE p.PostID = cl.PostID) AS ClickCount ,
(SELECT SUM(vt.Value) FROM Vote vt
WHERE p.PostID = vt.PostID) AS VoteScore
FROM
Post p
The problem with your query is that the second and third LEFT JOIN operations duplicate records: after the first LEFT JOIN has been applied you have, for example 3, records for post having PostID = 41. The second LEFT JOIN now joins to these 3 records, so PostID = 41 is used 3 times in the second LEFT JOIN.
If there is a 1:many relationship directly between (Post, Comment), (Post, Click) and (Post, Vote), then the above query will most probably give you what you want.
Your query isn’t doing what you think it is doing. When you join and count rows like this, you’re creating a new dataset with x rows, and then just counting the rows in that dataset three times. Hence you get the same count three times.
What you want to do is only count the rows for comments and and clicks where the left join found data in those two tables, for example:
SELECT
p.PostID
,p.Title
,COUNT(CASE
WHEN cm.PostID IS NULL THEN 0
ELSE 1
END) AS CommentCount
,COUNT(CASE
WHEN cl.PostID IS NULL THEN 0
ELSE 1
END) AS ClickCount
,SUM(CASE
WHEN vt.PostID IS NULL THEN 0
ELSE ISNULL(vt.Value,0)
END) AS VoteScore
FROM
Post p
LEFT OUTER JOIN Comment cm ON p.PostID = cm.PostID
LEFT OUTER JOIN Click cl ON p.PostID = cl.PostID
LEFT OUTER JOIN Vote vt ON p.PostID = vt.PostID
GROUP BY
p.PostID,
p.Title
It has been explained already what is wrong with your query: With say 3 comments, 5 clicks and 4 votes (each vote with a value of 1) for postid 41, you get 3x5x4=60 counts for the first and second count expression and 3x5x4x1=60 for the sum.
When dealing with several outer joins in combination with aggregation, you must not join tables first and aggregate later, but aggregate first and join the aggregates then:
select
p.postid,
p.title,
coalesce(cm.cnt, 0) as commentcount,
coalesce(cl.cnt, 0) as clickcount,
coalesce(vt.total, 0) as votescore
from post p
left outer join (select postid, count(*) as cnt from comment group by postid) cm
on cm.postid = p.postid
left outer join (select postid, count(*) as cnt from click group by postid) cl
on cl.postid = p.postid
left outer join (select postid, sum(value) as total from vote group by postid) vt
on vt.postid = p.postid;
COUNT counts non null values. But setting the nulls to 0, they count. Change your counts to SUM and move them outside the case when and I think it will fix the issue.
EG
SELECT
p.PostID,
p.Title,
SUM(CASE
WHEN cm.CommentID IS NULL THEN 0
ELSE cm.CommentID
END) AS CommentCount,
SUM(CASE
WHEN cl.ClickID IS NULL THEN 0
ELSE cl.ClickID
END) AS ClickCount,
SUM(CASE
WHEN SUM(vt.Value IS NULL THEN 0
ELSE SUM(vt.Value
END) AS VoteScore
FROM
Post p
LEFT OUTER JOIN Comment cm ON p.PostID = cm.PostID
LEFT OUTER JOIN Click cl ON p.PostID = cl.PostID
LEFT OUTER JOIN Vote vt ON p.PostID = vt.PostID
GROUP BY
p.PostID,
p.Title

Unable to Group on MSAccess SQL multiple search query

please can you help me before I go out of my mind. I've spent a while on this now and resorted to asking you helpful wonderful people. I have a search query:
SELECT Groups.GroupID,
Groups.GroupName,
( SELECT Sum(SiteRates.SiteMonthlySalesValue)
FROM SiteRates
WHERE InvoiceSites.SiteID = SiteRates.SiteID
) AS SumOfSiteRates,
( SELECT Count(InvoiceSites.SiteID)
FROM InvoiceSites
WHERE SiteRates.SiteID = InvoiceSites.SiteID
) AS CountOfSites
FROM (InvoiceSites
INNER JOIN (Groups
INNER JOIN SitesAndGroups
ON Groups.GroupID = SitesAndGroups.GroupID
) ON InvoiceSites.SiteID = SitesAndGroups.SiteID)
INNER JOIN SiteRates
ON InvoiceSites.SiteID = SiteRates.SiteID
GROUP BY Groups.GroupID
With the following table relationship
http://m-ls.co.uk/ExtFiles/SQL-Relationship.jpg
Without the GROUP BY entry I can get a list of the entries I want but it drills the results down by SiteID where instead I want to GROUP BY the GroupID. I know this is possible but lack the expertise to complete this.
Any help would be massively appreciated.
I think all you need to do is add groups.Name to the GROUP BY clause, however I would adopt for a slightly different approach and try to avoid the subqueries if possible. Since you have already joined to all the required tables you can just use normal aggregate functions:
SELECT Groups.GroupID,
Groups.GroupName,
SUM(SiteRates.SiteMonthlySalesValue) AS SumOfSiteRates,
COUNT(InvoiceSites.SiteID) AS CountOfSites
FROM (InvoiceSites
INNER JOIN (Groups
INNER JOIN SitesAndGroups
ON Groups.GroupID = SitesAndGroups.GroupID
) ON InvoiceSites.SiteID = SitesAndGroups.SiteID)
INNER JOIN SiteRates
ON InvoiceSites.SiteID = SiteRates.SiteID
GROUP BY Groups.GroupID, Groups.GroupName;
I think what you are looking for is something like the following:
SELECT Groups.GroupID, Groups.GroupName, SumResults.SiteID, SumResults.SumOfSiteRates, SumResults.CountOfSites
FROM Groups INNER JOIN
(
SELECT SitesAndGroups.SiteID, Sum(SiteRates.SiteMonthlySalesValue) AS SumOfSiteRates, Count(InvoiceSites.SiteID) AS CountOfSites
FROM SitesAndGroups INNER JOIN (InvoiceSites INNER JOIN SiteRates ON InvoiceSites.SiteID = SiteRates.SiteID) ON SitesAndGroups.SiteID = InvoiceSites.SiteID
GROUP BY SitesAndGroups.SiteID
) AS SumResults ON Groups.SiteID = SumResults.SiteID
This query will group your information based on the SiteID like you want. That query is referenced in the from statement linking to the Groups table to pull the group information that you want.

Help with Complicated SELECT query

I have this SELECT query:
SELECT Auctions.ID, Users.Balance, Users.FreeBids,
COUNT(CASE WHEN Bids.Burned=0 AND Auctions.Closed=0 THEN 1 END) AS 'ActiveBids',
COUNT(CASE WHEN Bids.Burned=1 AND Auctions.Closed=0 THEN 1 END) AS 'BurnedBids'
FROM (Users INNER JOIN Bids ON Users.ID=Bids.BidderID)
INNER JOIN Auctions
ON Bids.AuctionID=Auctions.ID
WHERE Users.ID=#UserID
GROUP BY Users.Balance, Users.FreeBids, Auctions.ID
My problam is that it returns no rows if the UserID cant be found on the Bids table.
I know it's something that has to do with my
(Users INNER JOIN Bids ON Users.ID=Bids.BidderID)
But i dont know how to make it return even if the user is no on the Bids table.
You're doing an INNER JOIN, which only returns rows if there are results on both sides of the join. To get what you want, change your WHERE clause like this:
Users LEFT JOIN Bids ON Users.ID=Bids.BidderID
You may also have to change your SELECT statement to handle Bids.Burned being NULL.
If you want to return rows even if there's no matching Auction, then you'll have to make some deeper changes to your query.
My problam is that it returns no rows if the UserID cant be found on the Bids table.
Then INNER JOIN Bids/Auctions should probably be left outer joins. The way you've written it, you're filtering users so that only those in bids and auctions appear.
Left join is the simple answer, but if you're worried about performance I'd consider re-writing it a little bit. For one thing, the order of the columns in the group by matters to performance (although it often doesn't change the results). Generally, you want to group by a column that's indexed first.
Also, it's possible to re-write this query to only have one group by, which will probably speed things up.
Try this out:
with UserBids as (
select
a.ID
, b.BidderID
, ActiveBids = count(case when b.Burned = 0 then 1 end)
, BurnedBids = count(case when b.Burned = 0 then 1 end)
from Bids b
join Auctions a
on a.ID = b.AuctionID
where a.Closed = 0
group by b.BidderID, a.AuctionID
)
select
b.ID
, u.Balance
, u.FreeBids
, b.ActiveBids
, b.BurnedBids
from Users u
left join UserBids b
on b.BidderID = u.ID
where u.ID = #UserID;
If you're not familiar with the with UserBids as..., it's called a CTE (common table expression), and is basically a way to make a one-time use view, and a nice way to structure your queries.

SQL help: COUNT aggregate, list of entries and its comment count

So, what I intended to do is to fetch a list of entries/posts with their category and user details, AND each of its total published comments. (entries, categories, users, and comments are separate tables)
This query below fetches the records fine, but it seems to skip those entries with no comments. As far as I can see, the JOINs are good (LEFT JOIN on the comments table), and the query is correct. What did I miss ?
SELECT entries.entry_id, entries.title, entries.content,
entries.preview_image, entries.preview_thumbnail, entries.slug,
entries.view_count, entries.posted_on, entry_categories.title AS category_title,
entry_categories.slug AS category_slug, entry_categories.parent AS category_parent,
entry_categories.can_comment AS can_comment, entry_categories.can_rate AS can_rate,
users.user_id, users.group_id, users.username, users.first_name, users.last_name,
users.avatar_small, users.avatar_big, users.score AS user_score,
COUNT(entry_comments.comment_id) AS comment_count
FROM (entries)
JOIN entry_categories ON entries.category = entry_categories.category_id
JOIN users ON entries.user_id = users.user_id
LEFT JOIN entry_comments ON entries.entry_id = entry_comments.entry_id
WHERE `entries`.`publish` = 'Y'
AND `entry_comments`.`publish` = 'Y'
AND `entry_comments`.`deleted_at` IS NULL
AND `category` = 5
GROUP BY entries.entry_id, entries.title, entries.content,
entries.preview_image, entries.preview_thumbnail, entries.slug,
entries.view_count, entries.posted_on, category_title, category_slug,
category_parent, can_comment, can_rate, users.user_id, users.group_id,
users.username, users.first_name, users.last_name, users.avatar_big,
users.avatar_small, user_score
ORDER BY posted_on desc
edit: I am using MySQL 5.0
Well, you're doing a left join on entry_comments, with conditions:
`entry_comments`.`publish` = 'Y'
`entry_comments`.`deleted_at` IS NULL
For the entries with no comments, these conditions are false.
I guess this should solve the problem:
WHERE `entries`.`publish` = 'Y'
AND (
(`entry_comments`.`publish` = 'Y'
AND `entry_comments`.`deleted_at` IS NULL)
OR
`entry_comments`.`id` IS NULL
)
AND `category` = 5
In the OR condition, I put entry_comments.id, assuming this is the primary key of the entry_comments table, so you should replace it with the real primary key of entry_comments.
It's because you are setting a filter on columns in the entry_comments table. Replace the first with:
AND IFNULL(`entry_comments`.`publish`, 'Y') = 'Y'
Because your other filter on this table is an IS NULL one, this is all you need to do to allow the unmatched rows from the LEFT JOIN through.
Try changing the LEFT JOIN to a LEFT OUTER JOIN
OR
I'm no expert with this style of SQL joins (more of an Oracle man myself), but the wording of the left join is leading me to believe that it is joining entry_comments on to entries with entry_comments on the left, you really want it to be the other way around (I think).
So try something like:
LEFT OUTER JOIN entries ON entries.entry_id = entry_comments.entry_id