Simple SQL question about getting rows and associated counts

Simple SQL question about getting rows and associated counts - sql

this oughta be an easy one.
My question is very similar to this one; basically, I've got a table of posts, a table of comments with a foreign key for the post_id, and a table of votes with a foreign key for the post id. I'd like to do a single query and get back a result set containing one row per post, along with the count of associated comments and votes.
From the question I've linked to above, it seems that for getting a table back containing just a row for each post and a comment count, this is the right approach:
SELECT a.ID, a.Title, COUNT(c.ID) AS NumComments
FROM Articles a
LEFT JOIN Comments c ON c.ParentID = a.ID
GROUP BY a.ID, a.Title
I thought adding vote count would be as easy as adding another left join, as in
SELECT a.ID, a.Title, COUNT(c.ID) AS NumComments, COUNT(v.id AS NumVotes)
FROM Articles a
LEFT JOIN Comments c ON c.ParentID = a.ID
LEFT JOIN Votes v ON v.ParentID = a.ID
GROUP BY a.ID, a.Title
but I'm getting bad numbers back. What am I missing?

SELECT
a.ID,
a.Title,
COUNT(DISTINCT c.ID) AS NumComments,
COUNT(DISTINCT v.id) AS NumVotes
FROM
Articles a
LEFT JOIN Comments c ON c.ParentID = a.ID
LEFT JOIN Votes v ON v.ParentID = a.ID
GROUP BY
a.ID,
a.Title

SELECT id, title,
(
SELECT COUNT(*)
FROM comments c
WHERE c.ParentID = a.ID
) AS NumComments,
(
SELECT COUNT(*)
FROM votes v
WHERE v.ParentID = a.ID
) AS NumVotes
FROM articles a

try:
COUNT(DISTINCT c.ID) AS NumComments

You are thinking in trees, not recordsets.
In the recordset the you get each Comment and each Vote returned multiple times combined with each other. Run the query without the group by and the count to see what I mean.
The solution is simple: use COUNT(DISCTINCT c.ID) and COUNT(DISTINCT v.ID)

Related

Combine a CROSS JOIN and a LEFT JOIN

I have two tables named author and commit_metrics. Both of them have an id field. Author has author_name and author_email. Commit_metrics has author_id and author_date.
I am trying to write a query that will get the number of commits that each author had in a given week, even if that number is 0. Here's what I have so far:
SELECT a.id, a.author_name, a.author_email, c.week_num, COUNT(c.id)
FROM author AS a
CROSS JOIN generate_series(1, 610) AS s(n)
LEFT JOIN (SELECT c.id,
c.author_id,
c.author_date,
WEEK_NUMBER(c.author_date) AS week_num
FROM commit_metrics c) AS c ON s.n = c.week_num AND a.id = c.author_id
WHERE c.week_num IS NOT NULL
GROUP BY a.id, a.author_name, a.author_email, c.week_num
ORDER BY c.week_num DESC, a.author_name;
WEEK_NUMBER is a function I wrote for this query:
CREATE OR REPLACE FUNCTION WEEK_NUMBER(date TIMESTAMP) RETURNS INTEGER AS
$$
SELECT TRUNC(DATE_PART('day', date - '2008-01-01') / 7)::INTEGER;
$$ LANGUAGE SQL;
Currently, the query works like a charm with one major caveat. It doesn't properly calculate 0 when the author made no commits in a given week. I'm not sure why it doesn't. When I do the query with just the FROM and CROSS JOIN, it properly prints the many thousand combined authors/weeks. However, when I add the LEFT JOIN, it loses any week where the author did not make a commit.
Any help would be greatly appreciated. I'm open to doing away with the generate_series call if it's unnecessary.
Also, I found this post, but I don't think it's helpful for my case.

Although you are using a left join, "WHERE c.week_num IS NOT NULL" filters out all of the cases where there is no post. Try this:
SELECT a.id, a.author_name, a.author_email, s.n as week_num, COUNT(c.id) as post_count
FROM author AS a
CROSS JOIN generate_series(1, 610) AS s(n)
LEFT JOIN (SELECT c.id,
c.author_id,
c.author_date,
WEEK_NUMBER(c.author_date) AS week_num
FROM commit_metrics c) AS c ON s.n = c.week_num AND a.id = c.author_id
GROUP BY a.id, a.author_name, a.author_email, s.n
ORDER BY s.n DESC, a.author_name;

Your WHERE clause is excluding the records on commit_metrics that are null, which is the case when the author has no commits during the week selected. You should just remove this from the WHERE clause to get your desired output.
If you need the WHERE clause to eliminate some of the CROSS JOIN records based on your data, you will need that CROSS JOIN and WHERE to be in a sub-select that you LEFT JOIN to, or create some more complicated logic in the current WHERE clause.

Remove the filtering condition. Also a subquery is not needed and you want to select s.n instead of c.week_num:
SELECT a.id, a.author_name, a.author_email, s.n as week_num, COUNT(c.id)
FROM author a CROSS JOIN
generate_series(1, 610) AS s(n) LEFT JOIN
commit_metrics c
ON s.n = WEEK_NUMBER(c.author_date) AND a.id = c.author_id
GROUP BY a.id, a.author_name, a.author_email, c.week_num
ORDER BY c.week_num DESC, a.author_name;

Get the row with max(timestamp)

I need to select most recently commented articles, with the last comment for each article, i.e. other columns of the row which contains max(c.created):
SELECT a.id, a.title, a.text, max(c.created) AS cid, c.text?
FROM subscriptions s
JOIN articles a ON a.id=s.article_id
JOIN comments c ON c.article_id=a.id
WHERE s.user_id=%d
GROUP BY a.id, a.title, a.text
ORDER BY max(c.created) DESC LIMIT 10;
Postgres tells me that I have to put c.text into GROUP BY. Obviously, I don't want to do this. min/max doesn't fit too. I don't have idea, how to select this.
Please advice.

In PostgreSQL, DISTINCT ON is probably the optimal solution for this kind of query:
SELECT DISTINCT ON (a.id)
a.id, a.title, a.text, c.created, c.text
FROM subscriptions s
JOIN articles a ON a.id = s.article_id
JOIN comments c ON c.article_id = a.id
WHERE s.user_id = %d
ORDER BY a.id, c.created DESC
This retrieve articles with the latest comment and associated additional columns.
Explanation, links and a benchmark in this closely related answer.
To get the latest 10, wrap this in a subquery:
SELECT *
FROM (
SELECT DISTINCT ON (a.id)
a.id, a.title, a.text, c.created, c.text
FROM subscriptions s
JOIN articles a ON a.id = s.article_id
JOIN comments c ON c.article_id = a.id
WHERE s.user_id = 12
ORDER BY a.id, c.created DESC
) x
ORDER BY created DESC
LIMIT 10;
Alternatively, you could use window functions in combination with standard DISTINCT:
SELECT DISTINCT
a.id, a.title, a.text, c.created, c.text
,first_value(c.created) OVER w AS c_created
,first_value(c.text) OVER w AS c_text
FROM subscriptions s
JOIN articles a ON a.id = s.article_id
JOIN comments c ON c.article_id = a.id
WHERE s.user_id = 12
WINDOW w AS (PARTITION BY c.article_id ORDER BY c.created DESC)
ORDER BY c_created DESC
LIMIT 10;
This works, because DISTINCT (unlike aggregate functions) is applied after window functions.
You'd have to test which is faster. I'd guess the last one is slower.

Is there a simpler way to write this query?

I have three tables. Categories, topics, and posts. Each topic has a foreign key that references the category that it's under. Each post has a foreign key that references the topic that it's under.
The purpose of this query is to basically be the front page query. I want each category along with the number of topics and number of posts in each category. This is the query I have, and it works. Is this the simplest way of going about it?
SELECT c.*,
COUNT(t.idCategory) AS tCount,
p.pCount
FROM categories AS c
LEFT JOIN topics AS t
ON c.id = t.idCategory
LEFT JOIN (SELECT t.idCategory,
COUNT(p2.idTopic) AS pCount
FROM topics AS t
LEFT JOIN posts AS p2
ON t.id = p2.idTopic
GROUP BY t.idCategory) AS p
ON c.id = p.idCategory
GROUP BY t.idCategory
ORDER BY c.id
Thanks!

If you are talking of simplicity I guess this could be an answer:
Select
c.*,
(Select count(*) from topic t where c.id = t.idCategory) as tCount,
(Select count(*) from posts p join topics t2 on t2.id = p.idTopic where c.id = t2.idCategory) as pCount
From categories c

You can put together the topics and posts inside the derived table first before joining with the categories:
SELECT
c.id,
COUNT(tp.id) AS TotalTopics,
tp.TotalPosts
FROM categories AS c
LEFT JOIN (
SELECT
t.id,
t.idCategory,
COUNT(p.id) AS TotalPosts
FROM topics AS t
LEFT JOIN posts AS p ON t.id = p.idTopic
GROUP BY
t.id,
t.idCategory) AS tp ON c.id = tp.idCategory
GROUP BY
c.id,
tp.TotalPosts
ORDER BY c.id

Single SQL query on many to many relationship

I have a simple database with few tables (and some sample columns):
Posts (ID, Title, Content)
Categories (ID, Title)
PostCategories (ID, ID_Post, ID_Category)
Is there a way to create single SQL query which will return posts with categories that are assigned to each post?

You can use the GROUP_CONCAT function
select p.*, group_concat(DISTINCT c.title ORDER BY c.title DESC SEPARATOR ', ')
from Posts p
inner join PostCategories pc on p.ID = pc.ID_Post
inner join Categories c on pc.ID_Category = c.ID
group by p.id, p.title, p.content

Simple joins work well.
SELECT posts.id, posts.title, categories.id, categories.title
FROM posts
JOIN posts_categories ON posts.id = posts_categories.post_id
JOIN categories ON posts_categories.category_id = categories.id

select p.*, c.*
from Posts p
inner join PostCategories pc on p.ID = pc.ID_Post
inner join Categories c on pc.ID_Category = c.ID
If you mean with only one record per post, I will need to know what database platform you are using.

Sure. If I understand your question correctly, it should be as simple as
SELECT Posts.title, Categories.title
FROM Posts, Categories, PostCategories
WHERE PostCategories.ID_Post = Posts.ID AND PostCategories.ID_Category = Categories.ID
ORDER BY Posts.title, Categories.title;
Getting one row per Post will be a little more complicated, and will depend on what RDBMS you're using.

We can use this query also.
select e.*,c.* from Posts e, Categories c, PostCategories cp where cp.id in ( select s.id from PostCategories s where s.empid=e.id and s.companyid=c.id );

SQL Join and Count can't GROUP BY correctly?

So let's say I want to select the ID of all my blog posts and then a count of the comments associated with that blog post, how do I use GROUP BY or ORDER BY so that the returned list is in order of number of comments per post?
I have this query which returns the data but not in the order I want? Changing the group by makes no difference:
SELECT p.ID, count(c.comment_ID)
FROM wp_posts p, wp_comments c
WHERE p.ID = c.comment_post_ID
GROUP BY c.comment_post_ID;

I'm not familiar with pre-SQL92 syntax, so I'll express it in a way that I'm familiar with:
SELECT c.comment_post_ID, COUNT(c.comment_ID)
FROM wp_comments c
GROUP BY c.comment_post_ID
ORDER BY COUNT(c.comment_ID) -- ASC or DESC
What database engine are you using? In SQL Server, at least, there's no need for a join unless you're pulling more data from the posts table. With a join:
SELECT p.ID, COUNT(c.comment_ID)
FROM wp_posts p
JOIN wp_comments c ON c.comment_post_ID = p.ID
GROUP BY p.ID
ORDER BY COUNT(c.comment_ID)

SELECT p.ID, count(c.comment_ID) AS [count]
FROM wp_posts p, wp_comments c
WHERE p.ID = c.comment_post_ID
GROUP BY c.comment_post_ID;
ORDER BY [count] DESC

probably there are no related data on the comments table, so please try grouping it by the post ID, and please learn JOIN statements, it is very helpful and produces better results
SELECT p.ID, count(c.comment_ID)
FROM wp_posts p
LEFT JOIN wp_comments c ON (p.ID = c.comment_post_ID)
GROUP BY p.ID
I also encountered that kind of situation in my SQL query journeys :)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Simple SQL question about getting rows and associated counts - sql

SELECT a.ID, a.Title, COUNT(DISTINCT c.ID) AS NumComments, COUNT(DISTINCT v.id) AS NumVotes FROM Articles a LEFT JOIN Comments c ON c.ParentID = a.ID LEFT JOIN Votes v ON v.ParentID = a.ID GROUP BY a.ID, a.Title

SELECT id, title, ( SELECT COUNT() FROM comments c WHERE c.ParentID = a.ID ) AS NumComments, ( SELECT COUNT() FROM votes v WHERE v.ParentID = a.ID ) AS NumVotes FROM articles a

try: COUNT(DISTINCT c.ID) AS NumComments

You are thinking in trees, not recordsets. In the recordset the you get each Comment and each Vote returned multiple times combined with each other. Run the query without the group by and the count to see what I mean. The solution is simple: use COUNT(DISCTINCT c.ID) and COUNT(DISTINCT v.ID)

Related

Combine a CROSS JOIN and a LEFT JOIN

Get the row with max(timestamp)

Is there a simpler way to write this query?

Single SQL query on many to many relationship

SQL Join and Count can't GROUP BY correctly?

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Simple SQL question about getting rows and associated counts - sql

SELECT a.ID, a.Title, COUNT(DISTINCT c.ID) AS NumComments, COUNT(DISTINCT v.id) AS NumVotes FROM Articles a LEFT JOIN Comments c ON c.ParentID = a.ID LEFT JOIN Votes v ON v.ParentID = a.ID GROUP BY a.ID, a.Title

SELECT id, title, ( SELECT COUNT(*) FROM comments c WHERE c.ParentID = a.ID ) AS NumComments, ( SELECT COUNT(*) FROM votes v WHERE v.ParentID = a.ID ) AS NumVotes FROM articles a

try: COUNT(DISTINCT c.ID) AS NumComments

You are thinking in trees, not recordsets. In the recordset the you get each Comment and each Vote returned multiple times combined with each other. Run the query without the group by and the count to see what I mean. The solution is simple: use COUNT(DISCTINCT c.ID) and COUNT(DISTINCT v.ID)

Related

Combine a CROSS JOIN and a LEFT JOIN

Get the row with max(timestamp)

Is there a simpler way to write this query?

Single SQL query on many to many relationship

SQL Join and Count can't GROUP BY correctly?

Categories

Resources

SELECT id, title, ( SELECT COUNT() FROM comments c WHERE c.ParentID = a.ID ) AS NumComments, ( SELECT COUNT() FROM votes v WHERE v.ParentID = a.ID ) AS NumVotes FROM articles a