sql distinct or group by to get correct order - sql

Okay, so i have a list of posts and some posts are replies to other posts. I'd like to get a list of post parents in reverse order of replies.
I've tried group by but it always lists the wrong order and distinct is the only way i've managed to get it to work but obviously then it only lists the post id and not the rest of the data.
example of database here
The order i want to pull the posts out in is 1,3,5,4,2 These are the non-reply posts in the order of the latest reply.
SELECT DISTINCT `thread`
FROM
(
SELECT COALESCE(NULLIF(`parent_post`, 0), `postID`) AS `thread`
FROM `posts`
ORDER BY `postID` DESC
LIMIT 100
) `sub`
This pulls them out in the correct order but obviously only pulls out the postID and not the rest of the fields, i've tried group by but it loses the correct order.

A straightforward translation of your requirements to SQL would be:
select *
from posts p1
where parent_post = 0
order by (
select max("datetime")
from posts p2
where p2.parent_post = p1.postID
) desc
I.e. select all rows from posts that are thread starters (not replies) and order them by the latest timestamp from any of their replies in descending order.

Related

How to MAX(COUNT(x)) in SQLite

I have an SQLite table blog_posts. Every blog post has an id and blog_id.
If I want to know how many blog posts every blog has:
SELECT blog_id, count(1) posts FROM blog_posts group by blog_id
What do I do if I want to know how many posts the blog with the most posts has? (I don't need the blog_id.) Apparently this is illegal:
SELECT max(count(1)) posts FROM blog_posts group by blog_id
I'm pretty sure I'm missing something, but I don't see it...
Other solution:
select count(*) as Result from blog_posts
group by blog_id
order by Result desc
limit 1
I'm not sure which solution would run faster, if this one or the one with the subquery.
You can use a subquery. Here's how you do it:
get the number of posts for each blog
select the maximum number of posts
Example:
select max(num_posts) as max_posts
from (
select blog_id, count(*) as num_posts
from blog_posts
group by blog_id
) a
(The subquery is in the (...)).
NB: I'm not a SQLite power user and so I don't know if this works, but the SQLite docs indicate that subqueries are supported.

Select items that are the top N results for a related table

Say I have a game where a question is asked, people post responses which are scored, and the top 10 responses win. I have a SQL database that stores all of this information, so I might have tables such as Users, Questions, and Responses. The Responses table has foreign_keys user_id and question_id, and attribute total_score.
Obviously, for a particular Question I can retrieve the top 10 Responses with an order and limit:
SELECT * FROM Responses WHERE question_id=? ORDER BY total_score DESC LIMIT 10;
What I'm looking for is a way I can determine, for a particular User, a list of all their Responses that are winners (in the top 10 for their particular Question). It is simple programmatically to step through each Response and see if it is included in the top 10 for its Question, but I would like to optimize this so I am not doing N+1 queries where N is the number of Responses the User has submitted.
If you use Oracle, Microsoft SQL Server, DB2, or PostgreSQL, these databases support windowing functions. Join the user's responses to other responses to the same question. Then partition by question and order by score descending. Use the row number within each partition to restrict the set to those in the top 10. Also pass along the user_id of the given user so you can pick them out of the top 10, since you're only interested in the given user's responses.
SELECT *
FROM (
SELECT r1.user_id AS given_user, r2.*,
ROW_NUMBER() OVER (PARTITION BY r2.question_id ORDER BY r2.total_score DESC) AS rownum
FROM Responses r1 JOIN Responses r2 ON r1.question_id = r2.question_id
WHERE r1.user_id = ?
) t
WHERE rownum <= 10 AND user_id = given_user;
However, if you use MySQL or SQLite or other databases that don't support windowing functions, you can use this different solution:
Query for the user's responses, and use a join to match other responses to the respective questions with greater score (or earlier PK in the case of ties). Group by question, and count the number of responses that have higher score. If the count is fewer than 10, then the user's response is among the top 10 per question.
SELECT r1.*
FROM Responses r1
LEFT OUTER JOIN Responses r2 ON r1.question_id = r2.question_id
AND (r1.total_score < r2.total_score
OR r1.total_score = r2.total_score AND r1.response_id > r2.response_id)
WHERE r1.user_id = ?
GROUP BY r1.question_id
HAVING COUNT(*) < 10;
Try an embedded select statement. I don't have access to a DB tool today so I can't confirm the syntax/output. Just make the appropriate changes to capture all the columns you need. You can also add questions to the main query and join off of responses.
select *
from users
, responses
where users.user_id=responses.user_id
and responses.response_id in (SELECT z.response_id
FROM Responses z
WHERE z.user_id = users.user_id
ORDER BY total_score DESC
LIMIT 10)
Or you can really optimize it by adding another field like "IsTopPost". You would have to update the top posts when someone votes, but your query would be simple:
SELECT * FROM Responses WHERE user_id=? and IsTopPost = 1
I think something like this should do the trick:
SELECT
user_id, question_id, response_id
FROM
Responses AS r1
WHERE
user_id = ?
AND
response_id IN (SELECT response_id
FROM Responses AS r2
WHERE r2.question_id = r1.question_id
ORDER BY total_score DESC LIMIT 10)
Effectively, for each question_id, a subquery is performed which determines the top 10 responses for that question_id.
You may want to consider adding a column which marks certain Responses as 'winners'. That way, you can simply select those rows and save the database from having to calculate the top 10's over and over again.

Fetch last item in a category that fits specific criteria

Let's assume I have a database with two tables: categories and articles. Every article belongs to a category.
Now, let's assume I want to fetch the latest article of each category that fits a specific criteria (read: the article does). If it weren't for that extra criteria, I could just add a column called last_article_id or something similar to the categories table - even though that wouldn't be properly normalized.
How can I do this though? I assume there's something using GROUP BY and HAVING?
Try with:
SELECT *
FROM categories AS c
LEFT JOIN (SELECT * FROM articles ORDER BY id DESC) AS a
ON c.id = a.id_category
AND /criterias about joining/
WHERE /more criterias/
GROUP BY c.id
If you provide us with the Tables schemas, we could be a little more specific, but you could try something like (12.2.9.6. EXISTS and NOT EXISTS, SELECT Syntax for LIMIT)
SELECT *
FROM articles a
WHERE EXISTS (
SELECT 1
FROM articles
where category_id = a.category_id
AND <YourCriteria Here>
ORDER BY <Order Required : ID DESC, LastDate DESC or something?
LIMIT 1
)
Assuming the id's in the articles table represent always increasing numbers, this should work. Using the id is not semantically correct IMHO, you should actually use a time/date tamp field if one is available.
SELECT * FROM articles WHERE article_id IN
(
SELECT
MAX(article_id)
FROM
articles
WHERE [your filters here]
GROUP BY
category_id
)

Need help with Join

So I'm trying to build a simple forum. It'll be a list of topics in descending order by the date of either the topic (if no replies) or latest reply. Here's the DB structure:
Topics
id, subject, date, poster
Posts
id, topic_id, message, date, poster
The forum itself will consist of an HTML table with the following headers:
Topic | Last Post | Replies
What would the query or queries look like to produce such a structure? I was thinking it would involve a cross join, but not sure... Thanks in advance.
Of course you can make a query for this, but I advise you to create in Topics table fields 'replies' and 'last post', then update them on every new post. That could really improve your database speed, not now, but the time when you will have thousands of topics.
SELECT *
FROM
`Topics`,
(
SELECT *, COUNT(*) AS `replies`
FROM `Posts`
GROUP BY `Posts`.`topic_id`
ORDER BY `Posts`.`date` DESC
) AS `TopicPosts`
WHERE `Topics`.`id` = `TopicPosts`.`topic_id`
ORDER BY `Posts`.`date` DESC
This 'should' work, or almost work in the case it doesn't, but I agree with the other poster, it's probably better to store this data in the topics table for all sorts of reasons, even if it is duplication of data.
The forum itself will consist of an
HTML table with the following headers:
Topic | Last Post | Replies
If "Last Post" is meant to be a date, it's simple.
SELECT
t.id,
t.subject,
MAX(p.date) AS last_post,
COUNT(p.id) AS count_replies
FROM
Topics t
INNER JOIN Posts p ON p.topic_id = t.id
GROUP BY
t.id,
t.subject
If you want other things to display along with the last post date, like its id or the poster, it gets a little more complex.
SELECT
t.id,
t.subject,
aggregated.reply_count,
aggregated.distinct_posters,
last_post.id,
last_post.date,
last_post.poster
FROM
Topics t
INNER JOIN (
SELECT topic_id,
MAX(p.date) AS last_date,
COUNT(p.id) AS reply_count,
COUNT(DISTINCT poster) AS distinct_posters
FROM Posts
GROUP BY topic_id
) AS aggregated ON aggregated.topic_id = t.id
INNER JOIN Posts AS last_post ON p.date = aggregated.last_date
As an example, I've added the count of distinct posters for a topic to show you where this approach can be extended.
The query relies on the assumption that no two posts within one topic can ever have the same date. If you expect this to happen, the query must be changed to account for it.

Select N rows from a table with a non-unique foreign key

I have asked a similar question before and while the answers I got were spectacular I might need to clearify.
Just like This question I want to return N number of rows depending on a value in a column.
My example will be I have a blog where I want to show my posts along with a preview of the comments. The last three comments to be exact.
I have have I need for my posts but I am racking my brain to get the comments right. The comments table has a foreign key of post_id which obviously multiple comments can be attached to one post so if a post has 20 comments then I just want to return the last three. What makes this somewhat tricky is I want to do it in one query and not a "limit 3" query per blog post which makes rendering a page with a lot of posts very query heavy.
SELECT *
FROM replies
GROUP BY post_id
HAVING COUNT( post_id ) <=3
This query does what I want but only returns one of each comment and not three.
SELECT l.*
FROM (
SELECT post_id,
COALESCE(
(
SELECT id
FROM replies li
WHERE li.post_id = dlo.post_id
ORDER BY
li.post_id, li.id
LIMIT 2, 1
), CAST(0xFFFFFFFF AS DECIMAL)) AS mid
FROM (
SELECT DISTINCT post_id
FROM replies dl
) dlo
) lo, replies l
WHERE l.replies >= lo.replies
AND l.replies <= lo.replies
AND l.id <= lo.mid
Having an index on replies (post_id, id) (in this order) will greatly improve this query.
Note the usage of l.replies >= lo.replies AND l.replies <= lo.replies: this is to make the index to be usable.
See the article in my blog for details:
Advanced row sampling (how to select N rows from a table for each GROUP)
Do you track comment date? You can sort those results to grab only the 3 most recent ones.
following ian Jacobs idea
declare #PostID int
select top 3 post_id, comment
from replies
where post_id=#PostID
order by createdate desc