Counting from two tables according to single criteria - sql

I am trying to do something like this:
SELECT COUNT(topic.topic_id) + COUNT(post.post_id)
FROM topic, post WHERE author_id = ?
Both tables have column author_id.
I get column reference "author_id" is ambiguous error.
How can I tell it that author_id is present in both tables?

While you could, you most probably do not want to join both tables, since that might result in different counts. Explanation in this related answer:
Two SQL LEFT JOINS produce incorrect result
Two subqueries would be fastest:
SELECT (SELECT COUNT(topic_id) FROM topic WHERE author_id = ?)
+ (SELECT COUNT(post_id) FROM post WHERE author_id = ?) AS result
If topic_id and post_id are defined NOT NULL in their respective tables, you can slightly simplify:
SELECT (SELECT COUNT(*) FROM topic WHERE author_id = ?)
+ (SELECT COUNT(*) FROM post WHERE author_id = ?) AS result
If at least one of both author_id columns is unique, a JOIN would work, too, in this case (but slower, and I wouldn't use it):
SELECT COUNT(t.topic_id) + COUNT(p.post_id) AS result
FROM topic t
LEFT post p USING (author_id)
WHERE t.author_id = ?;
If you want to enter the value only once, use a CTE:
WITH x AS (SELECT ? AS author_id) -- enter value here
SELECT (SELECT COUNT(*) FROM topic JOIN x USING (author_id))
+ (SELECT COUNT(*) FROM post JOIN x USING (author_id)) AS result
But be sure to understand how joins work. Read the chapter about Joined Tables in the manual.

Related

Select records where column value is unique

I have a table of posts in a forum (mybb_posts, with the username of the poster).
I want all the posts posted by people who only posted once, in other words, all the rows where username is a single occurrence in the username column.
So far I am using this:
SELECT *
FROM mybb_posts
WHERE username IN
(SELECT username
FROM
(SELECT username,
count(*) COUNT
FROM `mybb_posts`
GROUP BY username) tbl1
WHERE COUNT=1)
But the three nested SELECTs look ugly.
Is there a more elegant/efficient/simple way? All the answers I have seen on SO and elsewhere focus on getting the unique ids.
This is for a MySQL database, if you want to suggest non-standard solutions (but standard ones are preferred).
all the rows where username is a single occurrence in the username column.
This suggests window functions:
SELECT p.*
FROM (SELECT p.*, COUNT(*) OVER (PARTITION BY p.username) as cnt
FROM mybb_posts p
) p
WHERE cnt = 1;
As a note: You don't need two nested subqueries for your version. You can use a HAVING clause:
SELECT p.*
FROM mybb_posts p
WHERE p.username IN (SELECT p2.username
FROM mybb_posts p2
GROUP BY p2.username
HAVING COUNT(*) = 1
);
The most portable solution that I can think of is not exists and a correlated subquery. This works in most databases, including those that do not support window functions (such as MySQL 5.x versions, or MS Access). This should also be a rather efficient option.
For this, you need a primary key in your table. Assuming that it is called post_id, that would be:
select p.*
from mybb_posts p
where not exists (
select 1
from mybb_posts p1
where p1.username = p.username and p1.post_id <> p.post_id
)
For performance, you need an index on (username, post_id).

Counting empty relations from a SQL table

I'm trying to count authors who don't have any articles in our system, which aggregates authorship across sites. I've got a query working, but it isn't performant.
The best query I have thus far is this:
select count(*) as count_all
from (
select authors.id
from authors
left outer join site_authors on site_authors.author_id = authors.id
left outer join articles on articles.site_author_id = site_authors.id
group by authors.id
having count(articles.id) = 0
) a;
However, the subquery is rather inefficient. I was hoping there's a way to flatten this. I have several similar queries that add extra conditions on the left outer joins, so adding a count column to my schema isn't really an option here.
Extra rub: this is a cross-platform query and needs to work against both pgSQL, SQLite, and MySQL.
you can try a little bit different query, but I'm not sure that it will be faster:
select count(*)
from authors as a
where not exists (
select b.id
from site_authors as b
inner join
articles as c
on a.id=b.author_id and b.id=c.site_author_id)
of course I suppose you have proper indexes on tables:
site_authors: unique (author_id, id)
articles: non unique (site_author_id)
Assuming that 'normal' joins are simpler and faster, you could subtract the number of authors with articles from the total number of authors:
SELECT (SELECT COUNT(*)
FROM authors) -
(SELECT COUNT(DISTINCT site_authors.author_id)
FROM site_authors
JOIN articles ON articles.site_author_id = site_authors.id)
Alternatively, try a subquery:
SELECT COUNT(*)
FROM authors
WHERE id NOT IN (SELECT site_authors.author_id
FROM site_authors
JOIN articles ON articles.site_author_id = site_authors.id)
It might be simpler and faster to use NOT IN rather than a join. Sql processors are pretty smart about using indexes even when it looks obtuse. Something like this:
Select count(*)
from authors
where id not in (select author_id from site_authors)
and id not in (select site_author_id from articles);
Be sure that author_id and site_author_id are indexed. The optimizer will notice what your are doing and create an indexed look up for the "NOT IN" clause.

Using UNNEST with a JOIN

I want to be able to use unnest() function in PostgreSQL in a complicated SQL query that has many JOINs. Here's the example query:
SELECT 9 as keyword_id, COUNT(DISTINCT mentions.id) as total, tags.parent_id as tag_id
FROM mentions
INNER JOIN taggings ON taggings.mention_id = mentions.id
INNER JOIN tags ON tags.id = taggings.tag_id
WHERE mentions.taglist && ARRAY[9] AND mentions.search_id = 3
GROUP BY tags.parent_id
I want to eliminate the taggings table here, because my mentions table has an integer array field named taglist that consists of all linked tag ids of mentions.
I tried following:
SELECT 9 as keyword_id, COUNT(DISTINCT mentions.id) as total, tags.parent_id as tag_id
FROM mentions
INNER JOIN tags ON tags.id IN (SELECT unnest(taglist))
WHERE mentions.taglist && ARRAY[9] AND mentions.search_id = 3
GROUP BY tags.parent_id
This works but brings different results than the first query.
So what I want to do is to use the result of the SELECT unnest(taglist) in a JOIN query to compensate for the taggings table.
How can I do that?
UPDATE: taglist is the same set as the respective list of tag ids of mention.
Technically, your query might work like this (not entirely sure about the objective of this query):
SELECT 9 AS keyword_id, count(DISTINCT m.id) AS total, t.parent_id AS tag_id
FROM (
SELECT unnest(m.taglist) AS tag_id
FROM mentions m
WHERE m.search_id = 3
AND 9 = ANY (m.taglist)
) m
JOIN tags t USING (tag_id) -- assumes tag.tag_id!
GROUP BY t.parent_id;
However, it seems to me you are going in the wrong direction here. Normally one would remove the redundant array taglist and keep the normalized database schema. Then your original query should serve well, only shortened the syntax with aliases:
SELECT 9 AS keyword_id, count(DISTINCT m.id) AS total, t.parent_id AS tag_id
FROM mentions m
JOIN taggings mt ON mt.mention_id = m.id
JOIN tags t ON t.id = mt.tag_id
WHERE 9 = ANY (m.taglist)
AND m.search_id = 3
GROUP BY t.parent_id;
Unravel the mystery
<rant>
The root cause for your "different results" is the unfortunate naming convention that some intellectually challenged ORMs impose on people.
I am speaking of id as column name. Never use this anti-pattern in a database with more than one table. Right, that means basically any database. As soon as you join a bunch of tables (that's what you do in a database) you end up with a bunch of columns named id. Utterly pointless.
The ID column of a table named tag should be tag_id (unless there is another descriptive name). Never id.
</rant>
Your query inadvertently counts tags instead of mentions:
SELECT 25 AS keyword_id, count(m.id) AS total, t.parent_id AS tag_id
FROM (
SELECT unnest(m.taglist) AS id
FROM mentions m
WHERE m.search_id = 4
AND 25 = ANY (m.taglist)
) m
JOIN tags t USING (id)
GROUP BY t.parent_id;
It should work this way:
SELECT 25 AS keyword_id, count(DISTINCT m.id) AS total, t.parent_id
FROM (
SELECT m.id, unnest(m.taglist) AS tag_id
FROM mentions m
WHERE m.search_id = 4
AND 25 = ANY (m.taglist)
) m
JOIN tags t ON t.id = m.tag_id
GROUP BY t.parent_id;
I also added back the DISTINCT to your count() that got lost along the way in your query.
Something like this should work:
...
tags t INNER JOIN
(SELECT UNNEST(taglist) as idd) a ON t.id = a.idd
...

MySQL COUNT can't count

Well, it can, but I can't query ;)
Here's my query:
SELECT code.id AS codeid, code.title AS codetitle, code.summary AS codesummary, code.author AS codeauthor, code.date, code.challengeid, ratingItems.*, FORMAT((ratingItems.totalPoints / ratingItems.totalVotes), 1) AS rating, code_tags.*, tags.*, users.firstname AS authorname, users.id AS authorid, GROUP_CONCAT(tags.tag SEPARATOR ', ') AS taggroup,
COUNT(DISTINCT comments.codeid) AS commentcount
FROM (code)
JOIN code_tags ON code_tags.code_id = code.id
JOIN tags ON tags.id = code_tags.tag_id
JOIN users ON users.id = code.author
LEFT JOIN comments ON comments.codeid = code.id
LEFT JOIN ratingItems ON uniqueName = code.id
WHERE `code`.`approved` = 1
GROUP BY code_id
ORDER BY date desc
LIMIT 15
The important line is the second one - the one I've indented. I'm asking it to COUNT the number of comments on a particular post, but it doesn't return the right number. For example, something with two comments will return "1". Something with 8 comments by two different authors will still return "1"...
Any ideas?
Thanks!
Jack
EDIT: Forgot to mention. When I remove the DISTINCT part, something with 8 comments from two authors returns "28". Sorry, I'm not a MySQL expert and don't really understand why it's returning that :(
You group by code.id and in each group you count (DISTINCT comments.codeid), but comments.codeid = code.id as defined in JOIN, that's why you always get 1.
You need to count by some other field on comments... if there is a primary surrogate key, this is the way to go COUNT(comments.commentid).
Also, if the comments in every group are known to be distinct, a simple COUNT(*) should work.

SQL: Get all posts with any comments

I need to construct some rather simple SQL, I suppose, but as it's a rare event that I work with DBs these days I can't figure out the details.
I have a table 'posts' with the following columns:
id, caption, text
and a table 'comments' with the following columns:
id, name, text, post_id
What would the (single) SQL statement look like which retrieves the captions of all posts which have one or more comments associated with it through the 'post_id' key? The DBMS is MySQL if it has any relevance for the SQL query.
select p.caption, count(c.id)
from posts p join comments c on p.id = c.post_id
group by p.caption
having count (c.id) > 0
SELECT DISTINCT p.caption, p.id
FROM posts p,
comments c
WHERE c.post_ID = p.ID
I think using a join would be a lot faster than using the IN clause or a subquery.
SELECT DISTINCT caption
FROM posts
INNER JOIN comments ON posts.id = comments.post_id
Forget about counts and subqueries.
The inner join will pick up all the comments that have valid posts and exclude all the posts that have 0 comments. The DISTINCT will coalesce the duplicate caption entries for posts that have more then 1 comment.
I find this syntax to be the most readable in this situation:
SELECT * FROM posts P
WHERE EXISTS (SELECT * FROM Comments WHERE post_id = P.id)
It expresses your intent better than most of the others in this thread - "give me all the posts ..." (select * from posts) "... that have any comments" (where exist (select * from comments ... )). It's essentially the same as the joins above, but because you're not actually doing a join, you don't have to worry about getting duplicates of the records in Posts, so you'll just get one record per post.
SELECT caption FROM posts
INNER JOIN comments ON comments.post_id = posts.id
GROUP BY posts.id;
No need for a having clause or count().
edit: Should be a inner join of course (to avoid nulls if a comment is orphaned), thanks to jishi.
Just going off the top of my head here but maybe something like:
SELECT caption FROM posts WHERE id IN (SELECT post_id FROM comments HAVING count(*) > 0)
You're basically looking at performing a subquery --
SELECT p.caption FROM posts p WHERE (SELECT COUNT(*) FROM comments c WHERE c.post_id=p.id) > 1;
This has the effect of running the SELECT COUNT(*) subquery for each row in the posts table. Depending on the size of your tables, you might consider adding an additional column, comment_count, into your posts table to store the number of corresponding comments, such that you can simply do
SELECT p.caption FROM posts p WHERE comment_count > 1