Need help with Join - sql

So I'm trying to build a simple forum. It'll be a list of topics in descending order by the date of either the topic (if no replies) or latest reply. Here's the DB structure:
Topics
id, subject, date, poster
Posts
id, topic_id, message, date, poster
The forum itself will consist of an HTML table with the following headers:
Topic | Last Post | Replies
What would the query or queries look like to produce such a structure? I was thinking it would involve a cross join, but not sure... Thanks in advance.

Of course you can make a query for this, but I advise you to create in Topics table fields 'replies' and 'last post', then update them on every new post. That could really improve your database speed, not now, but the time when you will have thousands of topics.

SELECT *
FROM
`Topics`,
(
SELECT *, COUNT(*) AS `replies`
FROM `Posts`
GROUP BY `Posts`.`topic_id`
ORDER BY `Posts`.`date` DESC
) AS `TopicPosts`
WHERE `Topics`.`id` = `TopicPosts`.`topic_id`
ORDER BY `Posts`.`date` DESC
This 'should' work, or almost work in the case it doesn't, but I agree with the other poster, it's probably better to store this data in the topics table for all sorts of reasons, even if it is duplication of data.

The forum itself will consist of an
HTML table with the following headers:
Topic | Last Post | Replies
If "Last Post" is meant to be a date, it's simple.
SELECT
t.id,
t.subject,
MAX(p.date) AS last_post,
COUNT(p.id) AS count_replies
FROM
Topics t
INNER JOIN Posts p ON p.topic_id = t.id
GROUP BY
t.id,
t.subject
If you want other things to display along with the last post date, like its id or the poster, it gets a little more complex.
SELECT
t.id,
t.subject,
aggregated.reply_count,
aggregated.distinct_posters,
last_post.id,
last_post.date,
last_post.poster
FROM
Topics t
INNER JOIN (
SELECT topic_id,
MAX(p.date) AS last_date,
COUNT(p.id) AS reply_count,
COUNT(DISTINCT poster) AS distinct_posters
FROM Posts
GROUP BY topic_id
) AS aggregated ON aggregated.topic_id = t.id
INNER JOIN Posts AS last_post ON p.date = aggregated.last_date
As an example, I've added the count of distinct posters for a topic to show you where this approach can be extended.
The query relies on the assumption that no two posts within one topic can ever have the same date. If you expect this to happen, the query must be changed to account for it.

Related

sql distinct or group by to get correct order

Okay, so i have a list of posts and some posts are replies to other posts. I'd like to get a list of post parents in reverse order of replies.
I've tried group by but it always lists the wrong order and distinct is the only way i've managed to get it to work but obviously then it only lists the post id and not the rest of the data.
example of database here
The order i want to pull the posts out in is 1,3,5,4,2 These are the non-reply posts in the order of the latest reply.
SELECT DISTINCT `thread`
FROM
(
SELECT COALESCE(NULLIF(`parent_post`, 0), `postID`) AS `thread`
FROM `posts`
ORDER BY `postID` DESC
LIMIT 100
) `sub`
This pulls them out in the correct order but obviously only pulls out the postID and not the rest of the fields, i've tried group by but it loses the correct order.
A straightforward translation of your requirements to SQL would be:
select *
from posts p1
where parent_post = 0
order by (
select max("datetime")
from posts p2
where p2.parent_post = p1.postID
) desc
I.e. select all rows from posts that are thread starters (not replies) and order them by the latest timestamp from any of their replies in descending order.

How to perform count in access across three tables?

I am having a three tables, for a sake of simplicity let's say
Category (1...many) Topic (1...many) Post
What I am trying to achieve is to get a CategoryID and a total number of topics in this category as well as the total number of posts.
The best result what I made was using the following query:
SELECT category.ID, COUNT(topic.id) AS topiccount, COUNT(post.id) AS postcount
FROM ((category)
LEFT JOIN topic ON topic.categoryid = category.id)
LEFT JOIN post ON post.topicid = topic.id
GROUP BY category.id
Unfortunately, even if I have just a 6 topics in the table associated with a category I am getting '7' as a result.
I did some research on it and it seems that I have to use a DISTINCT keyword inside a COUNT however access does not support it and I could not find appropriate way to do it in the subqueries.
Thank you for any help!
You get one more record, because you are not counting the topics, you are actually counting the topic-post joined records. Use the following :
SELECT category.id, count(topic.id), Nz(Sum(numofposts),0)
FROM (category LEFT JOIN (
SELECT topic.id, count(post.id) as numofposts, topic.categoryId
FROM topic LEFT JOIN post on topic.id = post.topicId
GROUP BY topic.id, topic.categoryId
) as TP ON category.id=TP.categoryid)
GROUP BY category.id
The Nz is there to ensure that in empty topics you don't get Null sums

SQL count distinct values for records but filter some dups

I have a MS SQL 2008 table of survey responses and I need to produce some reports. The table is fairly basic, it has a autonumber key, a user ID for the person responding, a date, and then a bunch of fields for each individual question. Most of the questions are multiple choice and the data value in the response field is a short varchar text representation of that choice.
What I need to do is count the number of distinct responses for each choice option (ie. for question 1, 10 people answered A, 20 answered B, and so forth). That is not overly complex. However, the twist is that some people have taken the survey multiple times (so they would have the same User ID field). For these responses, I am only supposed to include the latest data in my report (based on the survey date field). What would be the best way to exclude the older survey records for those users that have multiple records?
Since you didn't give us your DB schema I've had to make some assumptions but you should be able to use row_number to identify the latest survey taken by a user.
with cte as
(
SELECT
Row_number() over (partition by userID, surveyID order by id desc) rn,
surveyID
FROM
User_survey
)
SELECT
a.answer_type,
Count(a.anwer) answercount
FROM
cte
INNER JOIN Answers a
ON cte.surveyID = a.surveyID
WHERE
cte.rn = 1
GROUP BY
a.answer_type
Maybe not the most efficient query, but what about:
select userid, max(survey_date) from my_table group by userid
then you can inner join on the same table to get additional data.

Fetch last item in a category that fits specific criteria

Let's assume I have a database with two tables: categories and articles. Every article belongs to a category.
Now, let's assume I want to fetch the latest article of each category that fits a specific criteria (read: the article does). If it weren't for that extra criteria, I could just add a column called last_article_id or something similar to the categories table - even though that wouldn't be properly normalized.
How can I do this though? I assume there's something using GROUP BY and HAVING?
Try with:
SELECT *
FROM categories AS c
LEFT JOIN (SELECT * FROM articles ORDER BY id DESC) AS a
ON c.id = a.id_category
AND /criterias about joining/
WHERE /more criterias/
GROUP BY c.id
If you provide us with the Tables schemas, we could be a little more specific, but you could try something like (12.2.9.6. EXISTS and NOT EXISTS, SELECT Syntax for LIMIT)
SELECT *
FROM articles a
WHERE EXISTS (
SELECT 1
FROM articles
where category_id = a.category_id
AND <YourCriteria Here>
ORDER BY <Order Required : ID DESC, LastDate DESC or something?
LIMIT 1
)
Assuming the id's in the articles table represent always increasing numbers, this should work. Using the id is not semantically correct IMHO, you should actually use a time/date tamp field if one is available.
SELECT * FROM articles WHERE article_id IN
(
SELECT
MAX(article_id)
FROM
articles
WHERE [your filters here]
GROUP BY
category_id
)

selecting and displaying ranked items and a user's votes, a la reddit, digg, et al

when selecting ranked objects from a database (eg, articles users have voted on), what is the best way to show:
the current page of items
the user's rating, per item (if they've voted)
rough schema:
articles: id, title, content, ...
user: id, username, ...
votes: id, user_id, article_id, vote_value
is it better/ideal to:
select the current page of items
select the user's vote, limiting them to the page of items with an 'IN' clause
or
select the current page of items and just 'JOIN' vote data from the table of user votes
or, something entirely different?
this is theoretically in a high-traffic environment, and using an rdbms like mysql. fwiw, i see this on the side of "thinking it out before doing" and not "premature optimization."
thanks!
The JOIN would be faster; it would save a round trip to the database.
However, I wouldn't worry at all about this until you actually get some traffic. Many people have spoken out against premature optimization, I'll quote a random one:
More computing sins are committed in
the name of efficiency (without
necessarily achieving it) than for any
other single reason - including blind
stupidity.
If you need to order on votes, use this:
SELECT *
FROM (
SELECT a.*, (
SELECT SUM(vote_value)
FROM votes v
WHERE v.article_id = a.id
) AS votes
FROM article a
)
ORDER BY
votes DESC
LIMIT 100, 110
This will count the votes and paginate in a single query.
If you want to show only the user's own votes, use LEFT JOIN:
SELECT a.*, vote_value
FROM articles a
LEFT JOIN
votes v
ON v.user_id = #current_user
AND v.article_id = a.id
ORDER BY
a.timestamp DESC
LIMIT 100, 110
Having an index on (vote_user, vote_item) will greatly improve this query.
Note that you can make (vote_user, vote_item) a PRIMARY KEY for votes, which will improve this query even more.