How to write this challenging SQL (MySQL) command? - sql

This is the scenario:
I am developing a website which is similar to stackoverflow.com.
After an asker/seeker posts his question, other users can post their answers to the question.
A user can post more than one answer to the same question, but usually only the latest answer will be displayed. User can give comments on an answer, if comments are beyond consideration,
the SQL statement is
mysql_query("SELECT * , COUNT( memberid ) AS num_revisions
FROM (
SELECT *
FROM answers
WHERE questionid ='$questionid'
ORDER BY create_moment DESC
) AS dyn_table JOIN users
USING ( memberid )
GROUP BY memberid order by answerid asc")or die(mysql_error());
When comments are taken into considerations,there will be three tables.
I want to select all the latest answer a solver gave on a particular question, how many answers(num_revisions) a solver gave on the question, the name of the solver,the comments on these latest answer.
How to write this SQL statement? I am using MySQL.
I hope you can understand my question. If you are not clear about my question, just ask for clarification.
It is a little bit complex than stackoverflow.com. On stackoverflow.com, you can only give one answer to a question. But on my website, a user can answer a question many times.But only the latest answer will be seriously treated.
The columns of comment table are commentid, answerid,comment, giver, comment_time.
So it is question--->answer---->comment.

You can use a correlated subquery so that you only get the latest answer per member. Here's T-SQL that works like your example (only answers for a given question). And you'll have to convert to mysql flavour:
select *
from answers a
where questionid = '$questionid'
and answerid in (select top 1 answerid
from answers a2
where a2.questionid = a.questionid
and a2.memberid = a.memberid
order by create_moment desc)
order by create_moment
You haven't provided the schema for your comments table so I can't yet include that :)
-Krip

How about this (obviously answers will repeat if there is more than one comment):
select *
from answers a
left outer join comment c on c.answerid = a.answerid
join users u on u.memberid = a.memberid
where questionid = 1
and a.answerid in (select top 1 answerid
from answers a2
where a2.questionid = a.questionid
and a2.memberid = a.memberid
order by create_moment desc)
order by a.create_moment, c.comment_time
-Krip

Related

Querying Stackoverflow public dataset on BigQuery on Q&A SQL

i have a home work assignment :
We want to find all questions about the python pandas library, as well as their answers.
Write a query that retrieves all the questions for which the title contains the word "pandas" from the posts_questions table, as well as all the appropriate answers for each such question from the post_answers , where each row in the returned table will represent a pair of (question + answer). If the question has a number Answers, the same question will appear in multiple rows in the returned table. returned and the
of the question as well as the following fields: the id, title, tag, answer_count score, creation time (creation_date)
and the body of the text (the body) of both the question and the answer. For the body, all slash characters must be removed the line '\n'.
for this i wrote the following SQL code:
SELECT tb1.id as q_id,tb1.title as q_title,tb1.tags as q_tags
,tb1.creation_date as q_creation_date,tb1.score as q_score,tb1.answer_count as q_answer_count
,REPLACE(tb1.body,'\n',' ') as body_qustion,REPLACE(tb2.body,'\n',' ') as body_answer
from `bigquery-public-data.stackoverflow.posts_questions` as tb1
left join `bigquery-public-data.stackoverflow.posts_answers` as tb2
on tb1.id=tb2.id
where( tb1.title like "%pandas%" or tb1.title like "%Pandas%" or tb1.title like "%PANDAS%")
group by tb1.id ,tb1.title ,tb1.tags,tb1.creation_date,tb1.score
,tb1.answer_count,body_qustion,body_answer
but the problem is that when for example for a question i have 3 answers i expect it to return 3 rows for the question instead it returns only one and i dont know what is the problem .
the data is :
bigquery-public-data.stackoverflow.posts_questions
and bigquery-public-data.stackoverflow.posts_answers :
You have joined with the wrong ID of the answer table. In the answer table ID column represents the ID of the answer itself whereas parent_id represents the question id. You can play with the below query to have more understanding.
Query:
SELECT
q.id AS q_id #id of the question in question table
,
a.id AS a_id #id of the answer in answer table
,
q.title AS q_title,
q.tags AS q_tags,
q.creation_date AS q_creation_date,
q.score AS q_score,
q.answer_count AS q_answer_count,
REPLACE(q.body,'\n',' ') AS body_qustion,
REPLACE(a.body,'\n',' ') AS body_answer
FROM
`bigquery-public-data.stackoverflow.posts_questions` q
LEFT JOIN
`bigquery-public-data.stackoverflow.posts_answers` a
ON
q.id = a.parent_id #Joining with quesiton Ids
WHERE
LOWER(q.title) LIKE '%pandas%'
AND q.creation_date BETWEEN '2021-01-01'
AND '2021-01-31'
AND q.answer_count >1
Output:

SQL Count returning username of who has asked the most questions

I'm having trouble working out how to do an sql query and wondered if anyone could help. In my application I have users who can ask questions and I would like to implement some functionality to work out who the most active question poster is.
The table structure is as follows:
User:
UserID (Primary Key), Username
Question: Question ID (PK),UserID(Foreign Key) QuestionText, DateTime Asked
What I would like to do is to find out who has asked the most questions then return their username. I'm having trouble finding answers to similar solutions on the internet. All I can do is count the number of questions asked, and the number of questions asked by different users, e.g. total number of questions asked is 9 and total number of users who have posted questions is 2.
Thanks for your help.
Selects only one question poster who has posted maximumn number of question.
SQL Server
SELECT TOP 1 username
FROM
(
Select userid,username,count(*) as numQuestion
From user u
inner join question q
on u.userid=q.userid
Group by userid,username
)Z
order by numQuestion desc
MySql
SELECT username
FROM
(
Select userid,username,count(*) as numQuestion
From user u
inner join question q
on u.userid=q.userid
Group by userid,username
)Z
order by numQuestion desc
Limit 1
or you could also try this:
select count(*) as counter,
name from user join question
on user.id = question.userid group by user.id
order by counter desc limit 1
sql fiddle

SQL Statement that never returns same row twice?

Requirements: I have a table of several thousand questions. Users can view these multiple choice questions and then answer them. Once a question is answered, it should not be shown to the same user again even if he logs in after a while.
Question
How would I go about doing this efficiently? Would Bloom Filters work?
Create a QuestionsAnswered table and join on it in your select. When the user answers a question, insert the question ID and the user ID into the table.
CREATE TABLE QuestionsAnswered (UserID INT, QuestionID INT)
SELECT *
FROM Question
WHERE ID NOT IN (SELECT QuestionID
FROM QuestionsAnswered
WHERE UserID = #UserID)
INSERT INTO QuestionsAnswered
(UserID, QuestionID)
VALUES
(#UserID, #QuestionID)
Could you add something to the users info in the database which contains a list of answered questions?
So when that user comes back you can only show them questions which are NOT answered?
Create a many-to-many table between users and questions (userQuestions) to store the questions that have been answered already. Then you'd only display questions that don't exist in that userQuestions table for that user.
You insert each question shown into a log table with question_id/user_id, then show him the ones that don't match:
SELECT [TOP 1] ...
FROM questions
WHERE question_id NOT IN (
SELECT question_id
FROM question_user_log
WHERE userd_id = <current_user>)
[ORDER BY ...]
or
SELECT [TOP 1] ...
FROM questions AS q
LEFT OUTER JOIN question_user_log AS l ON q.question_id = l.question_id
AND l.user_id = <current_user>
WHERE l.question_id IS NULL
[ORDER BY...]
after you show the question, you
INSERT INTO question_user_log (question_id, user_id)
VALUES (<question shown>, <current_user>);
BTW, if you cannot create a table to track questions shown then you can query the questions in a deterministic order (ie. by Id or by Title) and select each time the one with the rank higher than the last rank shown (using ROW_NUMBER() in SQL Server/Oracle/DB2, or LIMIT in MySQL). You'd track the last rank shown somewhere in your user state (you do have a user state, otherwise the whole question is pointless).

Need help with Join

So I'm trying to build a simple forum. It'll be a list of topics in descending order by the date of either the topic (if no replies) or latest reply. Here's the DB structure:
Topics
id, subject, date, poster
Posts
id, topic_id, message, date, poster
The forum itself will consist of an HTML table with the following headers:
Topic | Last Post | Replies
What would the query or queries look like to produce such a structure? I was thinking it would involve a cross join, but not sure... Thanks in advance.
Of course you can make a query for this, but I advise you to create in Topics table fields 'replies' and 'last post', then update them on every new post. That could really improve your database speed, not now, but the time when you will have thousands of topics.
SELECT *
FROM
`Topics`,
(
SELECT *, COUNT(*) AS `replies`
FROM `Posts`
GROUP BY `Posts`.`topic_id`
ORDER BY `Posts`.`date` DESC
) AS `TopicPosts`
WHERE `Topics`.`id` = `TopicPosts`.`topic_id`
ORDER BY `Posts`.`date` DESC
This 'should' work, or almost work in the case it doesn't, but I agree with the other poster, it's probably better to store this data in the topics table for all sorts of reasons, even if it is duplication of data.
The forum itself will consist of an
HTML table with the following headers:
Topic | Last Post | Replies
If "Last Post" is meant to be a date, it's simple.
SELECT
t.id,
t.subject,
MAX(p.date) AS last_post,
COUNT(p.id) AS count_replies
FROM
Topics t
INNER JOIN Posts p ON p.topic_id = t.id
GROUP BY
t.id,
t.subject
If you want other things to display along with the last post date, like its id or the poster, it gets a little more complex.
SELECT
t.id,
t.subject,
aggregated.reply_count,
aggregated.distinct_posters,
last_post.id,
last_post.date,
last_post.poster
FROM
Topics t
INNER JOIN (
SELECT topic_id,
MAX(p.date) AS last_date,
COUNT(p.id) AS reply_count,
COUNT(DISTINCT poster) AS distinct_posters
FROM Posts
GROUP BY topic_id
) AS aggregated ON aggregated.topic_id = t.id
INNER JOIN Posts AS last_post ON p.date = aggregated.last_date
As an example, I've added the count of distinct posters for a topic to show you where this approach can be extended.
The query relies on the assumption that no two posts within one topic can ever have the same date. If you expect this to happen, the query must be changed to account for it.

Select N rows from a table with a non-unique foreign key

I have asked a similar question before and while the answers I got were spectacular I might need to clearify.
Just like This question I want to return N number of rows depending on a value in a column.
My example will be I have a blog where I want to show my posts along with a preview of the comments. The last three comments to be exact.
I have have I need for my posts but I am racking my brain to get the comments right. The comments table has a foreign key of post_id which obviously multiple comments can be attached to one post so if a post has 20 comments then I just want to return the last three. What makes this somewhat tricky is I want to do it in one query and not a "limit 3" query per blog post which makes rendering a page with a lot of posts very query heavy.
SELECT *
FROM replies
GROUP BY post_id
HAVING COUNT( post_id ) <=3
This query does what I want but only returns one of each comment and not three.
SELECT l.*
FROM (
SELECT post_id,
COALESCE(
(
SELECT id
FROM replies li
WHERE li.post_id = dlo.post_id
ORDER BY
li.post_id, li.id
LIMIT 2, 1
), CAST(0xFFFFFFFF AS DECIMAL)) AS mid
FROM (
SELECT DISTINCT post_id
FROM replies dl
) dlo
) lo, replies l
WHERE l.replies >= lo.replies
AND l.replies <= lo.replies
AND l.id <= lo.mid
Having an index on replies (post_id, id) (in this order) will greatly improve this query.
Note the usage of l.replies >= lo.replies AND l.replies <= lo.replies: this is to make the index to be usable.
See the article in my blog for details:
Advanced row sampling (how to select N rows from a table for each GROUP)
Do you track comment date? You can sort those results to grab only the 3 most recent ones.
following ian Jacobs idea
declare #PostID int
select top 3 post_id, comment
from replies
where post_id=#PostID
order by createdate desc