SQL Distinct Query - sql

My SQL seems to be letting me down this morning. I have a table with the columns
Id, Guid,AttributeId,AttributeValue,CreationDate,Status
This stores data from a qustonnaire which has around 15 pages to it. Each time you move on to the next question (next page) in the questionnaire the entire questionnaire is persisted to the table i.e after completing the 1st question, that questions data is stored in the table, after completing the 2nd question, the 1st and 2nd question is persisted to the table meaning that know, we have two lots of the 1st qestion and one lot of the second question saved in the table.
I need to write a query that will return the latest lot of saved data for a given questionnaire (and all questionnaires). i.e. if the user got to question 13 i would want only that set of data returned.

Something like...
SELECT Q.*
FROM Questionnaire Q
INNER JOIN (
SELECT TOP 1 Guid, CreationDate
FROM Questionnaire
ORDER BY CreationDate DESC
) Q2
ON Q2.Guid = Q.Guid AND Q2.CreationDate = Q.CreationDate
...ought to do it. The join to Guid is possibly redundant - and you'll presumably need a WHERE somewhere to ensure you get the questionnaire for the particular user / session.

Maybe this does the trick...
SELECT TOP 1 * FROM Questionaire ORDER BY CreationDate DESC

Related

Return most recent active rows of one sql table based on activity of another table

I am making a small posting forum as a learning project to teach myself Bootstrap/PhP/SQL.
The behavior of the main forum will display a list of topics ranked by their most recent responses. So a topic named Cookies that had a reply one minute ago would sit above a topic named Pie that had a reply two days and so on and so forth.
I have constructed 2 tables, frm_THREAD and frm_POST and have been able to write an appropriate join SQL statement to pull the data I want back in one query rather than several. I want to push this so that I am pulling back only the data I need, no extra.
Right now my SQL returns a table like this with 13 rows.
I want to return a table with three rows as show here. Each row representing an active topic, the topics can be sorted by datatime
I think that the final magic is in how I sort/order the results, but I haven't chanced on to the correct phrasing to figure out how to parse out the unnecessary data. I want to only return only one row for each topic, the row returned should be the most recent active row which is a combination of the two tables queried that can be sorted by datetime so that I can properly display the activity - show what is most active/of interest first. post table which would have the date/timestamp of recent activity that can then be used to sort the topic id/name -- for brevity sake, I sliced out things like topic name and post comments etc. I tried to create a simple barebones bit of SQL that concentrates on what I need doing which I can then expand on once I have the final solution.
this is my SQL in it's current state.
SELECT t.ThreadID, t.isTITLE,
p.postID, p.isACTIVE, p.dateCREATED
FROM frm_THREAD AS t
INNER JOIN frm_POST AS p
ON t.ThreadID = .threadID
ORDER BY p.dateCREATED, t.ThreadID
TO SUMMARIZE: I want to return one row representing each topic, with the rows being sorted based on activity as determined by the timestamp from the response reply found in the post table.
HERE IS THE THREAD AND POST TABLES I AM DRAWING ON - The yellow highlights shows the data i am specifically trying to pull back from the frm_POST data with it being merged with the frm_THREAD data
You can use row_number():
SELECT ThreadID, isTITLE, postID, isACTIVE, dateCREATED
FROM (SELECT t.ThreadID, t.isTITLE, p.postID, p.isACTIVE, p.dateCREATED,
ROW_NUMBER() OVER (PARTITION BY t.ThreadID ORDER BY p.DateCreated DESC) as seqnum
FROM frm_THREAD t JOIN
frm_POST p
ON t.ThreadID = p.threadID
) pt
ORDER BY p.dateCREATED, t.ThreadID;

How to get last value from a table category wise?

I have a problem with retrieving the last value of every category from my table which should not be sorted. For example i want the daily inventory value of nov-1 last appearance in the table without sorting the column daily inventory i.e "471". Is there a way to achieve this?
similarly i need to get the value of the next week's last daily inventory value and i should be able to do this for multiple items in the table too.
p.s: nov-1 represents nov-1 st week
Question from comments of initial post: will I be able to achieve what I need if I introduce a column id? If so, how can I do it?
Here's a way to do it (no guarantee that it's the most efficient way to do it)...
;WITH SetID AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY Week ORDER BY Week) AS rowid, * FROM <TableName>
),
MaxRow AS
(
SELECT LastRecord = MAX(rowid), Week
FROM SetID
GROUP BY Week
)
SELECT a.*
FROM SetID a
INNER JOIN MaxRow b
ON a.rowid = b.LastRecord
AND b.Week = a.Week
ORDER BY a.Week
I feel like there's more to the table though, and this is also untested on large amounts of data. I'd be afraid that a different RowID could be potentially assigned upon each run. (I haven't used ROW_NUMBER() enough to know if this would throw unexpected data.)
I suppose this example is to enforce the idea that, if you had a dedicated rowID on the table, it's possible. Also, I believe #Larnu's comment to you on your original post - introducing an ID column that retains current order, but reinserting all your data - is a concern too.
Here's a SQLFiddle example here.

Paginated UNION ALL result - best performance

We have a situation where we need results from 4 different tables combined into one list and paginate it through OFFSET/FETCH.
What want to select records from tables a, b, c & d, order them by CreatedDatetime and then OFFSET X, FETCH Y. Tables are quite big (in terms of numbers of rows) and it sounds horrible to do just UNION ALL and then pagination because it would mean probably compiling whole list of records and then taking paginated part.
Problem is that none of the tables can be taken as reference to extract Start/End Datetime window because every collection might but also might not contain records from any of the table. For example, ending result might contain records from any combination of tables a; a/b; a/b/c; a/b/c/d; b; b/c;.... and we need fixed size number to be returned (paging size, for example, being 20).
Any ideas on how to most effectively approach this?
UPDATE
Based on question from #HABO
There are unfortunately no special clues like that about queries. We are showing user activities in the system. There are different kinds of it (tables we select over). Now, query pops up data for administrator who views the activities. How administrator will look at data may vary drastically: some users will have thousands of activities in last few hours and admin will want to see them all. In other cases, users will have 3 actions in a day and admin will see just first page of data.
PS. It's not a pure log tables as activities act as state machines over time, each having their states, which we also look for in these queries.
if you know the page size (eg 100) then you can simply write 4 Top 100 queries (order by Create Date) - Then do a Union ALL on the result.
That way even if all the first 100 records come from 1 table you are covered.
For Subsequent Paging queries - You'll need to record the last displayed row from each table and use this as your High-Water mark for the next fetch - (Select top 100 FROM TableA Where RowID > #HighWater)
Should be fairly efficient...
This is where a cache comes in useful. You can either cache the result of the query in your application layer and do the paging there if it is not too large, or cache the results of the query in a table (or temp table) if it is large.
There would be filters i suppose. From what you say, those may vary a lot. So at the worst scenario, all columns can be filters.
My suggestion is to use 5 views, one for each table and a final one union them. Just make sure all filter columns go up the physical tables as straightforward as possible.
Finally, select the master view and fetch but be careful of the order by clause. Make sure order by has unique data combination else you might have cases where a row change pages on a simple plain refresh. If there is user order by defined, force add some key columns at the end.
How to safely ensure order by to have distinct values for 100% safe fetch/offset:
At the 4 views create a new column with a simple constant number as value, e.g. 1, 2, 3, 4 AS [TableSource]
Make sure you select the PK of each table. If you don't have, you have to create one in the views, probably using ROW_NUMBER or NEWID, as [Pk] for example.
Finally, when selecting from the master view, you ORDER BY CreateDate, Pk, TableSource. This way you are 100% safe that within the same set of data any row will be placed exactly at the same position, resulting correct paging.
Example of safely isolating a page of 30 rows order by CreateDate:
SELECT * FROM (
SELECT src, id, ROW_NUMBER() OVER(ORDER BY dt DESC,src,id)rn FROM (
SELECT 1 src, id, dt FROM table1 /*WHERE x=y*/ UNION ALL
SELECT 2 src, id, dt FROM table2 /*WHERE x=y*/ UNION ALL
SELECT 3 src, id, dt FROM table3 /*WHERE x=y*/ UNION ALL
SELECT 4 src, id, dt FROM table4 /*WHERE x=y*/)alltables
)data WHERE data.rn BETWEEN 3001 AND 3030

Select Last Updated Row with condition

I'm working on building a workload tracking system, I have a table that currently has listed all the tasks to be completed (each with a unique ID), but also has all the updates with a datestamp so that I can track how long it took for the status to be updated.
My dilemma is that for a form I want to query only the latest update, currently the select query shows both the original task and the updated task separately.
In words, I guess what I need to do is to select only a task given that the ID is the last one with that same task number (which is different than the ID, there will be duplicates when it is updated)
So if I have:
ID Task Date
1 A 4/30/13
2 B 5/2/13
3 A 5/3/13
That the table only shows:
ID Task Date
3 A 5/3/13
2 B 5/2/13
How can I do this? I think I'm missing something simple...
There are multiple ways to approach this query, even in Access. Here is a way using in with a subquery:
select t.*
from t
where t.id in (select MAX(id) as maxid
from t
group by task
)
order by task
The subquery finds the maximum ids for all the tasks. It then returns the rows from the original table that match those ids.

SQL count distinct values for records but filter some dups

I have a MS SQL 2008 table of survey responses and I need to produce some reports. The table is fairly basic, it has a autonumber key, a user ID for the person responding, a date, and then a bunch of fields for each individual question. Most of the questions are multiple choice and the data value in the response field is a short varchar text representation of that choice.
What I need to do is count the number of distinct responses for each choice option (ie. for question 1, 10 people answered A, 20 answered B, and so forth). That is not overly complex. However, the twist is that some people have taken the survey multiple times (so they would have the same User ID field). For these responses, I am only supposed to include the latest data in my report (based on the survey date field). What would be the best way to exclude the older survey records for those users that have multiple records?
Since you didn't give us your DB schema I've had to make some assumptions but you should be able to use row_number to identify the latest survey taken by a user.
with cte as
(
SELECT
Row_number() over (partition by userID, surveyID order by id desc) rn,
surveyID
FROM
User_survey
)
SELECT
a.answer_type,
Count(a.anwer) answercount
FROM
cte
INNER JOIN Answers a
ON cte.surveyID = a.surveyID
WHERE
cte.rn = 1
GROUP BY
a.answer_type
Maybe not the most efficient query, but what about:
select userid, max(survey_date) from my_table group by userid
then you can inner join on the same table to get additional data.