I need to construct some rather simple SQL, I suppose, but as it's a rare event that I work with DBs these days I can't figure out the details.
I have a table 'posts' with the following columns:
id, caption, text
and a table 'comments' with the following columns:
id, name, text, post_id
What would the (single) SQL statement look like which retrieves the captions of all posts which have one or more comments associated with it through the 'post_id' key? The DBMS is MySQL if it has any relevance for the SQL query.
select p.caption, count(c.id)
from posts p join comments c on p.id = c.post_id
group by p.caption
having count (c.id) > 0
SELECT DISTINCT p.caption, p.id
FROM posts p,
comments c
WHERE c.post_ID = p.ID
I think using a join would be a lot faster than using the IN clause or a subquery.
SELECT DISTINCT caption
FROM posts
INNER JOIN comments ON posts.id = comments.post_id
Forget about counts and subqueries.
The inner join will pick up all the comments that have valid posts and exclude all the posts that have 0 comments. The DISTINCT will coalesce the duplicate caption entries for posts that have more then 1 comment.
I find this syntax to be the most readable in this situation:
SELECT * FROM posts P
WHERE EXISTS (SELECT * FROM Comments WHERE post_id = P.id)
It expresses your intent better than most of the others in this thread - "give me all the posts ..." (select * from posts) "... that have any comments" (where exist (select * from comments ... )). It's essentially the same as the joins above, but because you're not actually doing a join, you don't have to worry about getting duplicates of the records in Posts, so you'll just get one record per post.
SELECT caption FROM posts
INNER JOIN comments ON comments.post_id = posts.id
GROUP BY posts.id;
No need for a having clause or count().
edit: Should be a inner join of course (to avoid nulls if a comment is orphaned), thanks to jishi.
Just going off the top of my head here but maybe something like:
SELECT caption FROM posts WHERE id IN (SELECT post_id FROM comments HAVING count(*) > 0)
You're basically looking at performing a subquery --
SELECT p.caption FROM posts p WHERE (SELECT COUNT(*) FROM comments c WHERE c.post_id=p.id) > 1;
This has the effect of running the SELECT COUNT(*) subquery for each row in the posts table. Depending on the size of your tables, you might consider adding an additional column, comment_count, into your posts table to store the number of corresponding comments, such that you can simply do
SELECT p.caption FROM posts p WHERE comment_count > 1
Related
I'm having a bit of trouble debugging a SQL query and would really appreciate some help.
Here is the query:
SELECT p.id, p.type, p.submission_id,
p.title, p.description, p.date, extract('epoch' FROM p.time) AS time,
podcasts.image_url, podcasts.title AS podcast_title,
COUNT(u1) as upvote_count, u2.id as upvote_id,
episodes.mp3_url, episodes.duration,
COUNT(c) as comment_count
FROM posts AS p LEFT JOIN upvotes AS u1 ON p.id=u1.post_id AND u1.comment_id=-1
LEFT JOIN upvotes AS u2 ON p.id=u2.post_id AND u2.user_id=$1 AND u2.comment_id=-1
LEFT JOIN episodes ON p.submission_id = episodes.id
LEFT JOIN podcasts ON episodes.podcast_id=podcasts.id
LEFT JOIN comments AS c ON c.post_id=p.id
WHERE p.type='podcast' AND p.time IS NOT NULL
GROUP BY(p.id, u2.id, podcasts.image_url, episodes.mp3_url, episodes.duration, podcasts.title);
The unexpected behavior comes from the two COUNT statements. I expect upvote_count to be equivalent to
SELECT COUNT(*) FROM upvotes WHERE upvotes.post_id = (individual post id);
for each individual post and same for comment count (which I expect to return the total number of comments for each post. However, I am getting strange seemingly random results from these queries for those two fields. Can anybody help me diagnose the problem?
count() (and all other aggregate functions) ignores null values.
However, COUNT(c) references the complete row ("record") from the table alias c But that is is always not null even when all columns of that record are null.
You need to change both count() calls and pass a column from that table to it, e.g. count(u1.post_id) and count(c.post_id)
I'm trying to count authors who don't have any articles in our system, which aggregates authorship across sites. I've got a query working, but it isn't performant.
The best query I have thus far is this:
select count(*) as count_all
from (
select authors.id
from authors
left outer join site_authors on site_authors.author_id = authors.id
left outer join articles on articles.site_author_id = site_authors.id
group by authors.id
having count(articles.id) = 0
) a;
However, the subquery is rather inefficient. I was hoping there's a way to flatten this. I have several similar queries that add extra conditions on the left outer joins, so adding a count column to my schema isn't really an option here.
Extra rub: this is a cross-platform query and needs to work against both pgSQL, SQLite, and MySQL.
you can try a little bit different query, but I'm not sure that it will be faster:
select count(*)
from authors as a
where not exists (
select b.id
from site_authors as b
inner join
articles as c
on a.id=b.author_id and b.id=c.site_author_id)
of course I suppose you have proper indexes on tables:
site_authors: unique (author_id, id)
articles: non unique (site_author_id)
Assuming that 'normal' joins are simpler and faster, you could subtract the number of authors with articles from the total number of authors:
SELECT (SELECT COUNT(*)
FROM authors) -
(SELECT COUNT(DISTINCT site_authors.author_id)
FROM site_authors
JOIN articles ON articles.site_author_id = site_authors.id)
Alternatively, try a subquery:
SELECT COUNT(*)
FROM authors
WHERE id NOT IN (SELECT site_authors.author_id
FROM site_authors
JOIN articles ON articles.site_author_id = site_authors.id)
It might be simpler and faster to use NOT IN rather than a join. Sql processors are pretty smart about using indexes even when it looks obtuse. Something like this:
Select count(*)
from authors
where id not in (select author_id from site_authors)
and id not in (select site_author_id from articles);
Be sure that author_id and site_author_id are indexed. The optimizer will notice what your are doing and create an indexed look up for the "NOT IN" clause.
I have a Database with the following two tables, member, POSTS I am looking for a way to get the count of how many posts a user has.
(Source: http://i.stack.imgur.com/FDv31.png)
I have tried many variations of the following SQL command with out any success. instead of showing the count of posts for a single user it shows a single row with all the posts as the count.
In the end I want something like this
(Source: http://i.stack.imgur.com/EbaEj.png)
Might be that I'm missing something here, but this query would seem to give you the results you want:
SELECT member.ID,
member.Name,
(SELECT COUNT(*) FROM Posts WHERE member.ID = Posts.user_id) AS total
FROM member;
I have left comment out of the query as it is not obvious what comment you want to be returned in that column for the group of comments that is counted.
See a SQL Fiddle demo here.
Edit
Sorry, misinterpreted your question :-) This query will properly return all the comments, along with the person who posted them and the total number of comments that the person made:
SELECT Posts.ID,
member.Name,
(SELECT COUNT(*) FROM Posts WHERE member.ID = Posts.user_id) AS total,
Posts.comment
FROM Posts
INNER JOIN member ON Posts.user_id = member.ID
GROUP BY Posts.ID, member.Name, member.ID, Posts.comment;
See an updated SQL Fiddle demo here.
You could use a subquery to calculate the total posts per member:
select m.ID
, m.Name
, coalesce(grp.total, 0)
, p.comment
from member m
left join
posts p
on p.user_id = m.id
left join
(
select user_id
, count(*) as total
from posts
group by
user_id
) grp
on grp.user_id = m.id
select
a.id
, a.name
, count(1) over (partition by b.user_id) as TotalCountPerUser
, b.comment
from member a join post b
on a.id = b.user_id
Well, it can, but I can't query ;)
Here's my query:
SELECT code.id AS codeid, code.title AS codetitle, code.summary AS codesummary, code.author AS codeauthor, code.date, code.challengeid, ratingItems.*, FORMAT((ratingItems.totalPoints / ratingItems.totalVotes), 1) AS rating, code_tags.*, tags.*, users.firstname AS authorname, users.id AS authorid, GROUP_CONCAT(tags.tag SEPARATOR ', ') AS taggroup,
COUNT(DISTINCT comments.codeid) AS commentcount
FROM (code)
JOIN code_tags ON code_tags.code_id = code.id
JOIN tags ON tags.id = code_tags.tag_id
JOIN users ON users.id = code.author
LEFT JOIN comments ON comments.codeid = code.id
LEFT JOIN ratingItems ON uniqueName = code.id
WHERE `code`.`approved` = 1
GROUP BY code_id
ORDER BY date desc
LIMIT 15
The important line is the second one - the one I've indented. I'm asking it to COUNT the number of comments on a particular post, but it doesn't return the right number. For example, something with two comments will return "1". Something with 8 comments by two different authors will still return "1"...
Any ideas?
Thanks!
Jack
EDIT: Forgot to mention. When I remove the DISTINCT part, something with 8 comments from two authors returns "28". Sorry, I'm not a MySQL expert and don't really understand why it's returning that :(
You group by code.id and in each group you count (DISTINCT comments.codeid), but comments.codeid = code.id as defined in JOIN, that's why you always get 1.
You need to count by some other field on comments... if there is a primary surrogate key, this is the way to go COUNT(comments.commentid).
Also, if the comments in every group are known to be distinct, a simple COUNT(*) should work.
I have posts, votes, and comments tables. Each post can have N 'yes votes', N 'no votes' and N comments. I am trying to get a set of posts sorted by number of yes votes.
I have a query that does exactly this, but is running far too slowly. On a data set of 1500 posts and 15K votes, it's take .48 seconds on my dev machine. How can I optimize this?
select
p.*,
v.yes,
x.no
from
posts p
left join (select post_id, vote_type_id, count(1) as yes from votes where (vote_type_id = 1) group by post_id) v on v.post_id = p.id
left join (select post_id, vote_type_id, count(1) as no from votes where (vote_type_id = 2) group by post_id) x on x.post_id = p.id
left join (select post_id, count(1) as comment_count from comments group by post_id) p on p.confession_id = p.id
order by
yes desc
limit
0, 10
EDIT:
Votes and Comments both have a post_id FK
Adding an index on vote_type_id and post_id in the votes table shaved .1sec off the query execution.
Add a 'yes_count' column and use a trigger to update the vote count for each post when the vote is made. You can index this column, then it should be very fast.
Use explain for checking the query execution plan so you can see why it is slow, usually it is enough to see the plan and later create appropriate indexes. The 1.5k and 15k tables are really small so that query should be much faster.
Why don't you add a column yes and no ? Rather than adding a new entry at every post, just increment the count.
If I misunderstood your database or you can't modify it, at least do you have a foreign key on votes.post_id to post.id? Foreign keys are crutial if you do any join.
First off, your current query shouldn't compile, as it uses p as an alias for both the comments and the posts table.
Second, you're joining votes twice: once for no, and once for yes. Using a CASE statement, you can compute the sums of both with a single join. Here's a sample query:
select
p.*,
sum(case when v.vote_type_id = 1 then 1 else 0 end) as yes,
sum(case when v.vote_type_id = 2 then 1 else 0 end) as no,
count(c.id) as comment_count
from posts p
left join votes v on v.post_id = p.id
left join comments c on c.post_id = p.id
order by yes desc
limit 0, 10
Third, you could verify that the proper foreign keys exists for the relations between posts, votes and comments. An (post_id, vote_type_id) index on the votes could also help.