SQL Server Join with Latest 2 Entries - sql

I know the title of the post is bad but hear me out. A question like this arose the other day at work, and while I found a way around it, the problem still haunts me.
Lets assume Stackoverflow has only 3 tables.
Users ( username )
Comments ( comment, creationdate )
UsersCommentsJoin , this is the join table between the first 2 tables.
Now lets say I want to make a query that would return the all the users with the last 2 most recent comments. So the result set would look like this.
|username| most recent comment | second most recent comment|
How on earth do I go about creating that query ? I solved this problem earlier by simply only returning the most recent comment and not even trying to get the second one, and boy, let me tell you it seemed a WHOLE lot more involved than when I thought with subselects, TOP and other weird DB acrobatics.
Bonus Round Why do some queries which seem easy logically, turn out to be monster queries, at least from my rookie perspective ?
EDIT: I was using an MS SQL server.

You can use a crosstab query pivoting on ROW_NUMBER
WITH UC
AS (SELECT UCJ.userId,
C.comment,
ROW_NUMBER() OVER (PARTITION BY userId
ORDER BY creationdate DESC) RN
FROM UsersCommentsJoin UCJ
JOIN Comments C
ON C.commentId = U.commentId)
SELECT username,
MAX(CASE
WHEN RN = 1 THEN comment
END) AS MostRecent,
MAX(CASE
WHEN RN = 2 THEN comment
END) AS SecondMostRecent
FROM Users U
JOIN UC
ON UC.userId = U.userId
WHERE UC.RN <= 2
GROUP BY UC.userId

Related

Trouble with PIVOT in SQL (Azure SQL)

I need to list survey responses in one row. The table of survey responses lists a questionID and a ResponseID (these are multiple choice questions), so one row for each response. There are 12 questions. Things like the date of the response, worker who conducted the survey, and worker who entered the survey are kept in other tables.
So, I have a query that gets the responses for one survey into 12 rows. Now I need to get all that into one row. Pivot, right?
But I could never get it to work. :-( Tried several solutions from this and other fora (including Mickey's documentation here: https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-2017).
Then I found this solution, that doesn't use pivot at all, here: SQL Pivot Table Grouping
It worked great, but the example has only two questions. Some of our surveys will have over 50 questions, so I'm guessing that won't be a very elegant solution.
So I'm back to my pivot issue.
Experience level is somewhere between idiot and novice, so I'm probably missing something obvious.
Here's the query by itself (works as expected):
SELECT AssessmentResponses.ID, AssessmentQuestions.QuestionNumber,
AssessmentResponseAnswers.QuestionID,
AssessmentAnswerChoices.AnswerChoiceNumber
FROM (AssessmentResponses RIGHT JOIN AssessmentResponseAnswers
ON AssessmentResponses.ID = AssessmentResponseAnswers.AssessmentResponseID)
LEFT JOIN (AssessmentQuestions RIGHT JOIN AssessmentAnswerChoices
ON AssessmentQuestions.ID = AssessmentAnswerChoices.AssessmentQuestionID)
ON AssessmentResponseAnswers.AnswerChoiceID = AssessmentAnswerChoices.AnswerChoiceID
WHERE AssessmentResponses.AssessmentID = 1 AND AssessmentResponses.RespondentID = 44;
Here's how I tried to make it pivot:
SELECT ID, [1A], [1B], [2A], [2B], [3A], [3B], [4A], [4B], [5A], [5B], [6A], [6B]
FROM (
SELECT AssessmentResponses.ID, AssessmentQuestions.QuestionNumber,
AssessmentResponseAnswers.QuestionID,
AssessmentAnswerChoices.AnswerChoiceNumber
FROM (AssessmentResponses RIGHT JOIN AssessmentResponseAnswers
ON AssessmentResponses.ID = AssessmentResponseAnswers.AssessmentResponseID)
LEFT JOIN (AssessmentQuestions RIGHT JOIN AssessmentAnswerChoices
ON AssessmentQuestions.ID = AssessmentAnswerChoices.AssessmentQuestionID)
ON AssessmentResponseAnswers.AnswerChoiceID = AssessmentAnswerChoices.AnswerChoiceID
WHERE AssessmentResponses.AssessmentID = 1 AND AssessmentResponses.RespondentID = 44
) AS Src
PIVOT
(
MAX(AnswerChoiceNumber)
FOR QuestionNumber IN ([1A], [1B], [2A], [2B], [3A], [3B], [4A], [4B], [5A], [5B], [6A], [6B])
)
AS Pvt;
I was hoping that would give me 1 row with 13 columns (ID plus the twelve questions). But it gave me still 12 rows: the 13 columns are there, and it just gives null values for 11 of the twelve questions. (In row1, 1A has an answer; in row2, 1B has an answer, etc.)
What am I missing?
Tweaked your code just a bit to get rid of "QuestionID" in the subquery. It is not involved in what you're pivoting on, so SQL Server will think it's one of your keys.
SELECT ID, [1A], [1B], [2A], [2B], [3A], [3B], [4A], [4B], [5A], [5B], [6A], [6B]
FROM (
SELECT AssessmentResponses.ID, AssessmentQuestions.QuestionNumber,
AssessmentAnswerChoices.AnswerChoiceNumber
FROM (AssessmentResponses RIGHT JOIN AssessmentResponseAnswers
ON AssessmentResponses.ID = AssessmentResponseAnswers.AssessmentResponseID)
LEFT JOIN (AssessmentQuestions RIGHT JOIN AssessmentAnswerChoices
ON AssessmentQuestions.ID = AssessmentAnswerChoices.AssessmentQuestionID)
ON AssessmentResponseAnswers.AnswerChoiceID = AssessmentAnswerChoices.AnswerChoiceID
WHERE AssessmentResponses.AssessmentID = 1 AND AssessmentResponses.RespondentID = 44
) AS Src
PIVOT
(
MAX(AnswerChoiceNumber)
FOR QuestionNumber IN ([1A], [1B], [2A], [2B], [3A], [3B], [4A], [4B], [5A], [5B], [6A], [6B])
)
AS Pvt;
I imagine you're used to the designer. A few tips (not implemented above) to make code more readable:
You can usually omit parentheses around your join statements
I'd re-work your joins to associate one on-statement per join statement
Aliases!
In my opinion this is not exactly a work to do with pivot. Maybe like this:
SELECT ar.ID
,(
SELECT aac.AnswerChoiceNumber
FROM AssessmentResponseAnswers ara, AssessmentAnswerChoices aac, AssessmentQuestions aq
WHERE ar.ID=ara.AssessmentResponseID
AND ara.AnswerChoiceID=aac.AnswerChoiceID
AND aq.ID=aac.AssessmentQuestionID
AND aq.QuestionNumber='1A'
) 1A
,(
SELECT aac.AnswerChoiceNumber
FROM AssessmentResponseAnswers ara, AssessmentAnswerChoices aac, AssessmentQuestions aq
WHERE ar.ID=ara.AssessmentResponseID
AND ara.AnswerChoiceID=aac.AnswerChoiceID
AND aq.ID=aac.AssessmentQuestionID
AND aq.QuestionNumber='1B'
) 1B
, (...)
FROM AssessmentResponses ar
WHERE ar.AssessmentID=1
AND ar.RespondentID=44
I was having problems with your usage of the joins but I tried to figure it out - maybe it is correct.

Slow Query Due to Sub Select

I have several SQL Server 2014 queries that pull back a data set where we need to get a count on related, but different criteria along with that data. We do this with a sub query, but that is slowing it down immensely. It was fine until now where we are getting more data in our database to count on. Here is the query:
SELECT
T.*,
ISNULL((SELECT COUNT(1)
FROM EventRegTix ERT, EventReg ER
WHERE ER.EventRegID = ERT.EventRegID
AND ERT.TicketID = T.TicketID
AND ER.OrderCompleteFlag = 1), 0) AS NumTicketsSold
FROM
Tickets T
WHERE
T.EventID = 12345
AND T.DeleteFlag = 0
AND T.ActiveFlag = 1
ORDER BY
T.OrderNumber ASC
I am pretty sure its mostly due to the relation back outside of the sub query to the Tickets table. If I change the T.TicketID to an actual ticket # (999 for example), the query is MUCH faster.
I have attempted to join together these queries into one, but since there are other fields in the sub query, I just cannot get it to work properly. I was playing around with
COUNT(1) OVER (PARTITION BY T.TicketID) AS NumTicketsSold
but could not figure that out either.
Any help would be much appreciated!
I would write this as:
SELECT T.*,
(SELECT COUNT(1)
FROM EventRegTix ERT JOIN
EventReg ER
ON ER.EventRegID = ERT.EventRegID
WHERE ERT.TicketID = T.TicketID AND ER.OrderCompleteFlag = 1
) AS NumTicketsSold
FROM Tickets T
WHERE T.EventID = 12345 AND
T.DeleteFlag = 0 AND
T.ActiveFlag = 1
ORDER BY T.OrderNumber ASC;
Proper, explicit, standard JOIN syntax does not improve performance; it is just the correct syntax. COUNT(*) cannot return NULL values, so COALESCE() or a similar function is unnecessary.
You need indexes. The obvious ones are on Tickets(EventID, DeleteFlag, ActiveFlag, OrderNumber), EventRegTix(TicketID, EventRegID), and EventReg(EventRegID, OrderCompleteFlag).
I would try with OUTER APPLY :
SELECT T.*, T1.*
FROM Tickets T OUTER APPLY
(SELECT COUNT(1) AS NumTicketsSold
FROM EventRegTix ERT JOIN
EventReg ER
ON ER.EventRegID = ERT.EventRegID
WHERE ERT.TicketID = T.TicketID AND ER.OrderCompleteFlag = 1
) T1
WHERE T.EventID = 12345 AND
T.DeleteFlag = 0 AND
T.ActiveFlag = 1
ORDER BY T.OrderNumber ASC;
And, obvious you need indexes Tickets(EventID, DeleteFlag, ActiveFlag, OrderNumber), EventRegTix(TicketID, EventRegID), and EventReg(EventRegID, OrderCompleteFlag) to gain the performance.
Fixed this - query went from 5+ seconds to 1/2 second or less. Issues were:
1) No indexes. Did not know all FK fields needed indexes as well. I indexed all the fields that we joined or were in WHERE clause.
2) Used SQL Execution Plan to see the place where the bottle neck was. Told me no index, hence 1) above! :)
Thanks for all your help guys, hopefully this post helps someone else.
Dennis
PS: Changed the syntax too!

Access data source using SQL to show most recent entry per site

First of all I am a complete beginner to SQL and have been thrown in at the deep end a bit ! I'm learning as I go along and each mistake I make or question I ask will hopefully help me develop... please be kind :)
I have a working query that extracts electricty meter readings and other information. I am after finding the most recent reading for each site. This is the query at the moment :
PARAMETERS [Site Group] Text ( 255 );
SELECT
Lookup.Lookup_Name AS [Group],
Contacts.Name AS Site,
Points.Number AS MPAN,
Max(DataElectricity.Date) AS MaxDate,
DataElectricity.M1_Present,
DataElectricity.M2_Present,
DataElectricity.M3_Present,
DataElectricity.M4_Present,
DataElectricity.M5_Present,
DataElectricity.M6_Present,
DataElectricity.M7_Present,
DataElectricity.M8_Present,
DataElectricity.Direct
FROM
DataElectricity INNER JOIN (Lookup INNER JOIN (Points INNER JOIN Contacts ON Points.Contacts_Id = Contacts.Id) ON Lookup.Lookup_Id = Contacts.Group_1) ON DataElectricity.Point_Id = Points.Id
WHERE
((DataElectricity.Direct)='D')
GROUP BY
Lookup.Lookup_Name, Contacts.Name, Points.Number, DataElectricity.M1_Present, DataElectricity.M2_Present, DataElectricity.M3_Present, DataElectricity.M4_Present, DataElectricity.M5_Present, DataElectricity.M6_Present, DataElectricity.M7_Present, DataElectricity.M8_Present, DataElectricity.Direct
ORDER BY
Lookup.Lookup_Name, Contacts.Name, Max(DataElectricity.Date) DESC;
However this returns all the readings for a site rather than just the most recent... I'm sure this is simple but I can't figure it out.
Any advice or guidence is gratefully received :)
Can't you just use top 1 to get only the first result?
SELECT top 1 ...
I have evolved the code a bit further using caspian's suggestion of SELECT top 1... but am struggling to refine it further and produce the result I need.
PARAMETERS [Site Group] Text ( 255 );
SELECT
Lookup.Lookup_Name,
Contacts.Name AS Site,
Points.Number AS MPAN,
DataElectricity.M1_Present,
DataElectricity.M2_Present,
DataElectricity.M3_Present,
DataElectricity.M4_Present,
DataElectricity.M5_Present,
DataElectricity.M6_Present,
DataElectricity.M7_Present,
DataElectricity.M8_Present,
DataElectricity.Direct
FROM
(
SELECT TOP 1 DataElectricity.Date AS MaxDate,
DataElectricity.M1_Present,
DataElectricity.M2_Present,
DataElectricity.M3_Present,
DataElectricity.M4_Present,
DataElectricity.M5_Present,
DataElectricity.M6_Present,
DataElectricity.M7_Present,
DataElectricity.M8_Present,
DataElectricity.Point_id
FROM
DataElectricity
ORDER BY MaxDate DESC
)
DataElectricity INNER JOIN (Lookup INNER JOIN (Points INNER JOIN Contacts ON Points.Contacts_Id = Contacts.Id) ON Lookup.Lookup_Id = Contacts.Group_1) ON DataElectricity.Point_Id = Points.Id
WHERE
((Lookup.Lookup_Name)=Lookup_Name)
ORDER BY
Lookup.Lookup_Name, Contacts.Name, MaxDate DESC;
I do have a Google Drive file showing a small example of the data tables and desired result with hopfully a clear guide as to how the tables connect.
https://docs.google.com/file/d/0BybrcUCD29TxWVRsV1VtTm1Bems/edit?usp=sharing
The actual data contains hundreds of Site Groups each with potentially hundreds of sites.
I would like my end users to be able to select the Site Group name from the Lookup.Lookup_Name list and for it to return all the relevant sites and readings.
.... I really hope that makes sense !

sqlzoo track join exercise

I've found a great site to practice sql - http://sqlzoo.net. my sql is very weak that is why i want to improve it by working on the exercises online. But i have this one problem that I cannot solve. can you please give me a hand.
3a. Find the songs that appear on more than 2 albums. Include a count of the number of times each shows up.
album(asin, title, artist, price, release, label, rank)
track(album, dsk, posn, song)
my answer is incorrect as i ran the query.
select a.song, count(a.song) from track a, track b
where a.song = b.song
a.album != b.album
group by a.song
having count(a.song) > 2
thanks in advance! :D
I realize this answer may be late but for future reference to anyone taking on this tutorial the answer is as such
SELECT track.song, count(album.title)
FROM album INNER JOIN track ON (album.asin = track.album)
GROUP BY track.song
HAVING count(DISTINCT album.title) > 2
Some things that my help you in your quest for this query is that what to group by is usually specified by the word each. As per the tip presented in the previous answers you want to select by distinct albums, SINCE it mentioned in the database description that album titles would be repeated when the two tables are joined
Your original answer is very close, with the GROUP BY and HAVING clause. What is wrong, is just that you don't need to join the track table against itself.
SELECT song, count(*)
FROM track
GROUP BY song
HAVING count(*) > 2
Another answer here uses COUNT(DISTNCT album), which is necessary only if a song can appear on an album more than once.
If they support nested querys, you can:
Select song, count(*)
from(
select a.song
from track a
group by a.song, a.album
having count(*) > 1
)
group by song
or(best way to write it)if they support this syntax:
select a.song, count(distinct a.album)
from track a
group by a.song
having count(distinct a.album) > 1

JOIN; only one record please!

OK, I have a complicated query from a poorly designed DB... In one query I need to get a count from one database, information from another with a link from another, here goes:
Each blog has a type (news, report etc) and a section Id for a certain part of the site but it also can be linked to multiple computer games and sections)
type ( blog_id, title, body, etc...) // yes I know the type is the name of the blog and not just an id number in the table not my design
blog_link ( blog_id, blog_type, section_id, game_id )
blog_comments (blog_id, blog_type, comment, etc...)
So the query goes a little like this:
SELECT bl.`blog_id`, count(bc.`blog_id`) AS 'comment_count', t.`added`
FROM blog_link bl
JOIN type t ON t.`id` = bl.`blog_id`
JOIN blog_comments bc ON (`item_id` = bl.`blog_id` AND `blog_type` = '[$type]')
WHERE bl.`section_id` = [$section_id] AND bl.`blog_type` = '[$type]'
GROUP BY bl.`blog_id`
ORDER BY `added` DESC
LIMIT 0,20
Now this is fine so long as I do not have multiple games associated with one blog.
Edit: So currently if more than one game is associated the comment_count is multiplied by the amount of games associated... not good.
I have no idea how I could do this... It just isn't working! If I could somehow group by the blog_id before I join it would be gold... anyone got an Idea?
Many thanks in advance
Dorjan
edit2: I've offered a bounty as this problem surely can be solved!! Come on guys!
It seems like you just want to get a DISTINCT count, so just add DISTINCT inside the count. Although you will need to add some sort of unique identifier for each comment. Ideally you would have a unique id (ie. auto increment) for each comment, but if you don't you could probably use blog_id+author+timestamp.
SELECT bl.`blog_id`, count(DISTINCT CONCANT(bc.`blog_id`,bc.`author`,bc.`timestamp`) AS 'comment_count',...
That should give you a unique comment count.
I think you need to get the blogs of type "X" first, then do a count of comments for those blogs.
SELECT
EXPR1.blog_id,
count(bc.`blog_id`) AS 'comment_count'
FROM
(
SELECT
bl.blog_id, t.added
FROM
blog_link bl
JOIN
type t ON t.id = bl.blog_id
WHERE
bl.`section_id` = [$section_id]
AND
bl.`blog_type` = '[$type]'
GROUP BY
bl.`blog_id`
ORDER BY
`added` DESC
LIMIT 0,20
) AS EXPR1
JOIN
blog_comments bc ON
(
bc.item_id = EXPR1.blog_id
)
Not tested :
SELECT bl.`blog_id`, count(bc.`blog_id`) AS 'comment_count', t.`added`
FROM
(
SELECT DISTINCT blog_id, blog_type
FROM blog_link
WHERE
`section_id` = [$section_id]
AND `blog_type` = '[$type]'
) bl
INNER JOIN blog_comments bc ON (
bc.`item_id` = bl.`blog_id` AND bc.`blog_type` = bl.`blog_type`
)
INNER JOIN type t ON t.`id` = bl.`blog_id`
GROUP BY bl.`blog_id`
ORDER BY t.`added` DESC
LIMIT 0,20