SQL count distinct values for records but filter some dups

SQL count distinct values for records but filter some dups - sql

I have a MS SQL 2008 table of survey responses and I need to produce some reports. The table is fairly basic, it has a autonumber key, a user ID for the person responding, a date, and then a bunch of fields for each individual question. Most of the questions are multiple choice and the data value in the response field is a short varchar text representation of that choice.
What I need to do is count the number of distinct responses for each choice option (ie. for question 1, 10 people answered A, 20 answered B, and so forth). That is not overly complex. However, the twist is that some people have taken the survey multiple times (so they would have the same User ID field). For these responses, I am only supposed to include the latest data in my report (based on the survey date field). What would be the best way to exclude the older survey records for those users that have multiple records?

Since you didn't give us your DB schema I've had to make some assumptions but you should be able to use row_number to identify the latest survey taken by a user.
with cte as
(
SELECT
Row_number() over (partition by userID, surveyID order by id desc) rn,
surveyID
FROM
User_survey
)
SELECT
a.answer_type,
Count(a.anwer) answercount
FROM
cte
INNER JOIN Answers a
ON cte.surveyID = a.surveyID
WHERE
cte.rn = 1
GROUP BY
a.answer_type

Maybe not the most efficient query, but what about:
select userid, max(survey_date) from my_table group by userid
then you can inner join on the same table to get additional data.

Related

Link data from sql from 2/3 columns

I don't know if the question title is so clear, but here is my question:
I had table UsersMovements which contains Users along with their movements
UsersMovements:
ID
UserID
MovementID
Comments
Time/Date
I need help looking for a query which would give me if users 1, 2 & 3 had been in a common MovementID, knowing that I don't know what is the MovementID
The real case is that, I want to see if those X users which I would select been in an area (in a limited interval, assuming I had date/Time in the table)
Thank you

if you want to select list of movements which have userid 1,2 and 3 you can use group by with having
select movementid
from usermovements
where userid in(1,2,3)
group by movementid
having count(distinct userid)=3

SQL JOIN to select MAX value among multiple user attempts returns two values when both attempts have the same value

Good morning, everyone!
I have a pretty simple SELECT/JOIN statement that gets some imported data from a placement test and returns the highest scored attempt a user made, the best score. Users can take this test multiple times, so we just use the best attempt. What if a user makes multiple attempts (say, takes it twice,) and receives the SAME score both times?
My current query ends up returning BOTH of those records, as they're both equal, so MAX() returns both. There are no primary keys setup on this yet--the query I'm using below is the one I hope to add into an INSERT statement for another table, once I only get a SINGLE best attempt per User (StudentID), and set that StudentID as the key. So you see my problem...
I've tried a few DISTINCT or TOP statements in my query but either I'm putting them into the wrong part of the query or they still return two records for a user who had identically scored attempts. Any suggestions?
SELECT p.*
FROM
(SELECT
StudentID, MAX(PlacementResults) AS PlacementResults
FROM AleksMathResults
GROUP BY StudentID)
AS mx
JOIN AleksMathResults p ON mx.StudentID = p.StudentID AND mx.PlacementResults = p.PlacementResults
ORDER BY
StudentID

Sounds like you want row_number():
SELECT amr.*
FROM (SELECT amr.*
ROW_NUMBER() OVER (PARTITION BY StudentID ORDER BY PlacementResults DESC) as seqnum
FROM AleksMathResults amr
) amr
WHERE seqnum = 1;

How to get last value from a table category wise?

I have a problem with retrieving the last value of every category from my table which should not be sorted. For example i want the daily inventory value of nov-1 last appearance in the table without sorting the column daily inventory i.e "471". Is there a way to achieve this?
similarly i need to get the value of the next week's last daily inventory value and i should be able to do this for multiple items in the table too.
p.s: nov-1 represents nov-1 st week

Question from comments of initial post: will I be able to achieve what I need if I introduce a column id? If so, how can I do it?
Here's a way to do it (no guarantee that it's the most efficient way to do it)...
;WITH SetID AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY Week ORDER BY Week) AS rowid, * FROM <TableName>
),
MaxRow AS
(
SELECT LastRecord = MAX(rowid), Week
FROM SetID
GROUP BY Week
)
SELECT a.*
FROM SetID a
INNER JOIN MaxRow b
ON a.rowid = b.LastRecord
AND b.Week = a.Week
ORDER BY a.Week
I feel like there's more to the table though, and this is also untested on large amounts of data. I'd be afraid that a different RowID could be potentially assigned upon each run. (I haven't used ROW_NUMBER() enough to know if this would throw unexpected data.)
I suppose this example is to enforce the idea that, if you had a dedicated rowID on the table, it's possible. Also, I believe #Larnu's comment to you on your original post - introducing an ID column that retains current order, but reinserting all your data - is a concern too.
Here's a SQLFiddle example here.

SQL Distinct Query

My SQL seems to be letting me down this morning. I have a table with the columns
Id, Guid,AttributeId,AttributeValue,CreationDate,Status
This stores data from a qustonnaire which has around 15 pages to it. Each time you move on to the next question (next page) in the questionnaire the entire questionnaire is persisted to the table i.e after completing the 1st question, that questions data is stored in the table, after completing the 2nd question, the 1st and 2nd question is persisted to the table meaning that know, we have two lots of the 1st qestion and one lot of the second question saved in the table.
I need to write a query that will return the latest lot of saved data for a given questionnaire (and all questionnaires). i.e. if the user got to question 13 i would want only that set of data returned.

Something like...
SELECT Q.*
FROM Questionnaire Q
INNER JOIN (
SELECT TOP 1 Guid, CreationDate
FROM Questionnaire
ORDER BY CreationDate DESC
) Q2
ON Q2.Guid = Q.Guid AND Q2.CreationDate = Q.CreationDate
...ought to do it. The join to Guid is possibly redundant - and you'll presumably need a WHERE somewhere to ensure you get the questionnaire for the particular user / session.

Maybe this does the trick...
SELECT TOP 1 * FROM Questionaire ORDER BY CreationDate DESC

Select items that are the top N results for a related table

Say I have a game where a question is asked, people post responses which are scored, and the top 10 responses win. I have a SQL database that stores all of this information, so I might have tables such as Users, Questions, and Responses. The Responses table has foreign_keys user_id and question_id, and attribute total_score.
Obviously, for a particular Question I can retrieve the top 10 Responses with an order and limit:
SELECT * FROM Responses WHERE question_id=? ORDER BY total_score DESC LIMIT 10;
What I'm looking for is a way I can determine, for a particular User, a list of all their Responses that are winners (in the top 10 for their particular Question). It is simple programmatically to step through each Response and see if it is included in the top 10 for its Question, but I would like to optimize this so I am not doing N+1 queries where N is the number of Responses the User has submitted.

If you use Oracle, Microsoft SQL Server, DB2, or PostgreSQL, these databases support windowing functions. Join the user's responses to other responses to the same question. Then partition by question and order by score descending. Use the row number within each partition to restrict the set to those in the top 10. Also pass along the user_id of the given user so you can pick them out of the top 10, since you're only interested in the given user's responses.
SELECT *
FROM (
SELECT r1.user_id AS given_user, r2.*,
ROW_NUMBER() OVER (PARTITION BY r2.question_id ORDER BY r2.total_score DESC) AS rownum
FROM Responses r1 JOIN Responses r2 ON r1.question_id = r2.question_id
WHERE r1.user_id = ?
) t
WHERE rownum <= 10 AND user_id = given_user;
However, if you use MySQL or SQLite or other databases that don't support windowing functions, you can use this different solution:
Query for the user's responses, and use a join to match other responses to the respective questions with greater score (or earlier PK in the case of ties). Group by question, and count the number of responses that have higher score. If the count is fewer than 10, then the user's response is among the top 10 per question.
SELECT r1.*
FROM Responses r1
LEFT OUTER JOIN Responses r2 ON r1.question_id = r2.question_id
AND (r1.total_score < r2.total_score
OR r1.total_score = r2.total_score AND r1.response_id > r2.response_id)
WHERE r1.user_id = ?
GROUP BY r1.question_id
HAVING COUNT(*) < 10;

Try an embedded select statement. I don't have access to a DB tool today so I can't confirm the syntax/output. Just make the appropriate changes to capture all the columns you need. You can also add questions to the main query and join off of responses.
select *
from users
, responses
where users.user_id=responses.user_id
and responses.response_id in (SELECT z.response_id
FROM Responses z
WHERE z.user_id = users.user_id
ORDER BY total_score DESC
LIMIT 10)

Or you can really optimize it by adding another field like "IsTopPost". You would have to update the top posts when someone votes, but your query would be simple:
SELECT * FROM Responses WHERE user_id=? and IsTopPost = 1

I think something like this should do the trick:
SELECT
user_id, question_id, response_id
FROM
Responses AS r1
WHERE
user_id = ?
AND
response_id IN (SELECT response_id
FROM Responses AS r2
WHERE r2.question_id = r1.question_id
ORDER BY total_score DESC LIMIT 10)
Effectively, for each question_id, a subquery is performed which determines the top 10 responses for that question_id.
You may want to consider adding a column which marks certain Responses as 'winners'. That way, you can simply select those rows and save the database from having to calculate the top 10's over and over again.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL count distinct values for records but filter some dups - sql

Maybe not the most efficient query, but what about: select userid, max(survey_date) from my_table group by userid then you can inner join on the same table to get additional data.

Related

Link data from sql from 2/3 columns

SQL JOIN to select MAX value among multiple user attempts returns two values when both attempts have the same value

How to get last value from a table category wise?

SQL Distinct Query

Select items that are the top N results for a related table

Categories

Resources