Trouble with PIVOT in SQL (Azure SQL) - sql

I need to list survey responses in one row. The table of survey responses lists a questionID and a ResponseID (these are multiple choice questions), so one row for each response. There are 12 questions. Things like the date of the response, worker who conducted the survey, and worker who entered the survey are kept in other tables.
So, I have a query that gets the responses for one survey into 12 rows. Now I need to get all that into one row. Pivot, right?
But I could never get it to work. :-( Tried several solutions from this and other fora (including Mickey's documentation here: https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-2017).
Then I found this solution, that doesn't use pivot at all, here: SQL Pivot Table Grouping
It worked great, but the example has only two questions. Some of our surveys will have over 50 questions, so I'm guessing that won't be a very elegant solution.
So I'm back to my pivot issue.
Experience level is somewhere between idiot and novice, so I'm probably missing something obvious.
Here's the query by itself (works as expected):
SELECT AssessmentResponses.ID, AssessmentQuestions.QuestionNumber,
AssessmentResponseAnswers.QuestionID,
AssessmentAnswerChoices.AnswerChoiceNumber
FROM (AssessmentResponses RIGHT JOIN AssessmentResponseAnswers
ON AssessmentResponses.ID = AssessmentResponseAnswers.AssessmentResponseID)
LEFT JOIN (AssessmentQuestions RIGHT JOIN AssessmentAnswerChoices
ON AssessmentQuestions.ID = AssessmentAnswerChoices.AssessmentQuestionID)
ON AssessmentResponseAnswers.AnswerChoiceID = AssessmentAnswerChoices.AnswerChoiceID
WHERE AssessmentResponses.AssessmentID = 1 AND AssessmentResponses.RespondentID = 44;
Here's how I tried to make it pivot:
SELECT ID, [1A], [1B], [2A], [2B], [3A], [3B], [4A], [4B], [5A], [5B], [6A], [6B]
FROM (
SELECT AssessmentResponses.ID, AssessmentQuestions.QuestionNumber,
AssessmentResponseAnswers.QuestionID,
AssessmentAnswerChoices.AnswerChoiceNumber
FROM (AssessmentResponses RIGHT JOIN AssessmentResponseAnswers
ON AssessmentResponses.ID = AssessmentResponseAnswers.AssessmentResponseID)
LEFT JOIN (AssessmentQuestions RIGHT JOIN AssessmentAnswerChoices
ON AssessmentQuestions.ID = AssessmentAnswerChoices.AssessmentQuestionID)
ON AssessmentResponseAnswers.AnswerChoiceID = AssessmentAnswerChoices.AnswerChoiceID
WHERE AssessmentResponses.AssessmentID = 1 AND AssessmentResponses.RespondentID = 44
) AS Src
PIVOT
(
MAX(AnswerChoiceNumber)
FOR QuestionNumber IN ([1A], [1B], [2A], [2B], [3A], [3B], [4A], [4B], [5A], [5B], [6A], [6B])
)
AS Pvt;
I was hoping that would give me 1 row with 13 columns (ID plus the twelve questions). But it gave me still 12 rows: the 13 columns are there, and it just gives null values for 11 of the twelve questions. (In row1, 1A has an answer; in row2, 1B has an answer, etc.)
What am I missing?

Tweaked your code just a bit to get rid of "QuestionID" in the subquery. It is not involved in what you're pivoting on, so SQL Server will think it's one of your keys.
SELECT ID, [1A], [1B], [2A], [2B], [3A], [3B], [4A], [4B], [5A], [5B], [6A], [6B]
FROM (
SELECT AssessmentResponses.ID, AssessmentQuestions.QuestionNumber,
AssessmentAnswerChoices.AnswerChoiceNumber
FROM (AssessmentResponses RIGHT JOIN AssessmentResponseAnswers
ON AssessmentResponses.ID = AssessmentResponseAnswers.AssessmentResponseID)
LEFT JOIN (AssessmentQuestions RIGHT JOIN AssessmentAnswerChoices
ON AssessmentQuestions.ID = AssessmentAnswerChoices.AssessmentQuestionID)
ON AssessmentResponseAnswers.AnswerChoiceID = AssessmentAnswerChoices.AnswerChoiceID
WHERE AssessmentResponses.AssessmentID = 1 AND AssessmentResponses.RespondentID = 44
) AS Src
PIVOT
(
MAX(AnswerChoiceNumber)
FOR QuestionNumber IN ([1A], [1B], [2A], [2B], [3A], [3B], [4A], [4B], [5A], [5B], [6A], [6B])
)
AS Pvt;
I imagine you're used to the designer. A few tips (not implemented above) to make code more readable:
You can usually omit parentheses around your join statements
I'd re-work your joins to associate one on-statement per join statement
Aliases!

In my opinion this is not exactly a work to do with pivot. Maybe like this:
SELECT ar.ID
,(
SELECT aac.AnswerChoiceNumber
FROM AssessmentResponseAnswers ara, AssessmentAnswerChoices aac, AssessmentQuestions aq
WHERE ar.ID=ara.AssessmentResponseID
AND ara.AnswerChoiceID=aac.AnswerChoiceID
AND aq.ID=aac.AssessmentQuestionID
AND aq.QuestionNumber='1A'
) 1A
,(
SELECT aac.AnswerChoiceNumber
FROM AssessmentResponseAnswers ara, AssessmentAnswerChoices aac, AssessmentQuestions aq
WHERE ar.ID=ara.AssessmentResponseID
AND ara.AnswerChoiceID=aac.AnswerChoiceID
AND aq.ID=aac.AssessmentQuestionID
AND aq.QuestionNumber='1B'
) 1B
, (...)
FROM AssessmentResponses ar
WHERE ar.AssessmentID=1
AND ar.RespondentID=44
I was having problems with your usage of the joins but I tried to figure it out - maybe it is correct.

Related

Query does execute without errors, but returns no output

My query here has a sub-query in it but it returns no output, but in reality it has to give some output because I manually checked and output exists.I have posted the query below.
select mac.mac_id,mac.mac1,mac.mac_type,record.soc_id
from mso_charter.mac
join record on mac.record_id = record.record_id
where mac.mac_type='ethB' and record.soc_id IN (select soc from d);
Sample data is below
mac_id mac1 mac_type record_id--- for table mac
1 6142 ethA 1
2 6412 ethB 1
3 2313 ethC 1
record_id soc_id ---- for table record
1 Qu132
1 as432
1 342aq
soc --- for table d
a12w2
23we
qw12
mso_charter is the schema name mac,d and record is the table name.
Note that your subquery is actually still a join and can be written that way:
select mac.mac_id,mac.mac1,mac.mac_type,record.soc_id
from mso_charter.mac
join record using(record_id)
join d on record.soc_id = d.soc
where mac.mac_type='ethB';
As per the comment we still need a data set to reproduce and help.
Should be select soc_id from d instead of select select soc from d
According to your sample data, d has a column soc_id. That should be used for the comparison:
select m.mac_id, m.mac1, m.mac_type, r.soc_id
from mso_charter.mac m join
record r
on m.record_id = r.record_id
where m.mac_type = 'ethB' and
r.soc_id in (select d.soc_id from d);
It is possible that ids look the same but are not, because of international characters, hidden characters, spaces in the wrong place, and so on.
If this doesn't work then try the following:
Remove the soc_id condition and see if any rows match the first condition and join.
If that still returns nothing, remove the entire where clause to see if anything matches the join.
None of your record.soc_id match any of your d.soc_id. So you get no row.
Also, you write select soc from d. soc, not soc_id. Typo or error?
So, thanks to all who tried helping me in this situation. I actually had did a very silly mistake.The reason I am posting the right answer is probably because if someone else in future get stuck in such a issue or something similar it would be helpful to them.
select m.mac_id,m.mac,m.mac_type,r.soc_id
from mso_charter.mac m
join mso_charter.record r on m.record_id = r.record_id
where m.mac_type = 'ethB' and r.soc_id IN (select d.soc_id from d);
Mistake was I had not mentioned the schema name while performing join and there were multiple tables named record in other schema's, it was just out of frustration we tend to forget small things which costed me few hours to work over.

COUNT is outputting more than one row

I am having a problem with my SQL query using the count function.
When I don't have an inner join, it counts 55 rows. When I add the inner join into my query, it adds a lot to it. It suddenly became 102 rows.
Here is my SQL Query:
SELECT COUNT([fmsStage].[dbo].[File].[FILENUMBER])
FROM [fmsStage].[dbo].[File]
INNER JOIN [fmsStage].[dbo].[Container]
ON [fmsStage].[dbo].[File].[FILENUMBER] = [fmsStage].[dbo].[Container].[FILENUMBER]
WHERE [fmsStage].[dbo].[File].[RELATIONCODE] = 'SHIP02'
AND [fmsStage].[dbo].[Container].DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08'
GROUP BY [fmsStage].[dbo].[File].[FILENUMBER]
Also, I have to do TOP 1 at the SELECT statement because it returns 51 rows with random numbers inside of them. (They are probably not random, but I can't figure out what they are.)
What do I have to do to make it just count the rows from [fmsStage].[dbo].[file].[FILENUMBER]?
First, your query would be much clearer like this:
SELECT COUNT(f.[FILENUMBER])
FROM [fmsStage].[dbo].[File] f INNER JOIN
[fmsStage].[dbo].[Container] c
ON v.[FILENUMBER] = c.[FILENUMBER]
WHERE f.[RELATIONCODE] = 'SHIP02' AND
c.DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08';
No GROUP BY is necessary. Otherwise you'll just one row per file number, which doesn't seem as useful as the overall count.
Note: You might want COUNT(DISTINCT f.[FILENUMBER]). Your question doesn't provide enough information to make a judgement.
Just remove GROUP BY Clause
SELECT COUNT([fmsStage].[dbo].[File].[FILENUMBER])
FROM [fmsStage].[dbo].[File]
INNER JOIN [fmsStage].[dbo].[Container]
ON [fmsStage].[dbo].[File].[FILENUMBER] = [fmsStage].[dbo].[Container].[FILENUMBER]
WHERE [fmsStage].[dbo].[File].[RELATIONCODE] = 'SHIP02'
AND [fmsStage].[dbo].[Container].DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08'

Include missing years in Group By query

I am fairly new in Access and SQL programming. I am trying to do the following:
Sum(SO_SalesOrderPaymentHistoryLineT.Amount) AS [Sum Of PaymentPerYear]
and group by year even when there is no amount in some of the years. I would like to have these years listed as well for a report with charts. I'm not certain if this is possible, but every bit of help is appreciated.
My code so far is as follows:
SELECT
Base_CustomerT.SalesRep,
SO_SalesOrderT.CustomerId,
Base_CustomerT.Customer,
SO_SalesOrderPaymentHistoryLineT.DatePaid,
Sum(SO_SalesOrderPaymentHistoryLineT.Amount) AS [Sum Of PaymentPerYear]
FROM
Base_CustomerT
INNER JOIN (
SO_SalesOrderPaymentHistoryLineT
INNER JOIN SO_SalesOrderT
ON SO_SalesOrderPaymentHistoryLineT.SalesOrderId = SO_SalesOrderT.SalesOrderId
) ON Base_CustomerT.CustomerId = SO_SalesOrderT.CustomerId
GROUP BY
Base_CustomerT.SalesRep,
SO_SalesOrderT.CustomerId,
Base_CustomerT.Customer,
SO_SalesOrderPaymentHistoryLineT.DatePaid,
SO_SalesOrderPaymentHistoryLineT.PaymentType,
Base_CustomerT.IsActive
HAVING
(((SO_SalesOrderPaymentHistoryLineT.PaymentType)=1)
AND ((Base_CustomerT.IsActive)=Yes))
ORDER BY
Base_CustomerT.SalesRep,
Base_CustomerT.Customer;
You need another table with all years listed -- you can create this on the fly or have one in the db... join from that. So if you had a table called alltheyears with a column called y that just listed the years then you could use code like this:
WITH minmax as
(
select min(year(SO_SalesOrderPaymentHistoryLineT.DatePaid) as minyear,
max(year(SO_SalesOrderPaymentHistoryLineT.DatePaid) as maxyear)
from SalesOrderPaymentHistoryLineT
), yearsused as
(
select y
from alltheyears, minmax
where alltheyears.y >= minyear and alltheyears.y <= maxyear
)
select *
from yearsused
join ( -- your query above goes here! -- ) T
ON year(T.SO_SalesOrderPaymentHistoryLineT.DatePaid) = yearsused.y
You need a data source that will provide the year numbers. You cannot manufacture them out of thin air. Supposing you had a table Interesting_year with a single column year, populated, say, with every distinct integer between 2000 and 2050, you could do something like this:
SELECT
base.SalesRep,
base.CustomerId,
base.Customer,
base.year,
Sum(NZ(data.Amount)) AS [Sum Of PaymentPerYear]
FROM
(SELECT * FROM Base_CustomerT INNER JOIN Year) AS base
LEFT JOIN
(SELECT * FROM
SO_SalesOrderT
INNER JOIN SO_SalesOrderPaymentHistoryLineT
ON (SO_SalesOrderPaymentHistoryLineT.SalesOrderId = SO_SalesOrderT.SalesOrderId)
) AS data
ON ((base.CustomerId = data.CustomerId)
AND (base.year = Year(data.DatePaid))),
WHERE
(data.PaymentType = 1)
AND (base.IsActive = Yes)
AND (base.year BETWEEN
(SELECT Min(year(DatePaid) FROM SO_SalesOrderPaymentHistoryLineT)
AND (SELECT Max(year(DatePaid) FROM SO_SalesOrderPaymentHistoryLineT))
GROUP BY
base.SalesRep,
base.CustomerId,
base.Customer,
base.year,
ORDER BY
base.SalesRep,
base.Customer;
Note the following:
The revised query first forms the Cartesian product of BaseCustomerT with Interesting_year in order to have base customer data associated with each year (this is sometimes called a CROSS JOIN, but it's the same thing as an INNER JOIN with no join predicate, which is what Access requires)
In order to have result rows for years with no payments, you must perform an outer join (in this case a LEFT JOIN). Where a (base customer, year) combination has no associated orders, the rest of the columns of the join result will be NULL.
I'm selecting the CustomerId from Base_CustomerT because you would sometimes get a NULL if you selected from SO_SalesOrderT as in the starting query
I'm using the Access Nz() function to convert NULL payment amounts to 0 (from rows corresponding to years with no payments)
I converted your HAVING clause to a WHERE clause. That's semantically equivalent in this particular case, and it will be more efficient because the WHERE filter is applied before groups are formed, and because it allows some columns to be omitted from the GROUP BY clause.
Following Hogan's example, I filter out data for years outside the overall range covered by your data. Alternatively, you could achieve the same effect without that filter condition and its subqueries by ensuring that table Intersting_year contains only the year numbers for which you want results.
Update: modified the query to a different, but logically equivalent "something like this" that I hope Access will like better. Aside from adding a bunch of parentheses, the main difference is making both the left and the right operand of the LEFT JOIN into a subquery. That's consistent with the consensus recommendation for resolving Access "ambiguous outer join" errors.
Thank you John for your help. I found a solution which works for me. It looks quiet different but I learned a lot out of it. If you are interested here is how it looks now.
SELECT DISTINCTROW
Base_Customer_RevenueYearQ.SalesRep,
Base_Customer_RevenueYearQ.CustomerId,
Base_Customer_RevenueYearQ.Customer,
Base_Customer_RevenueYearQ.RevenueYear,
CustomerPaymentPerYearQ.[Sum Of PaymentPerYear]
FROM
Base_Customer_RevenueYearQ
LEFT JOIN CustomerPaymentPerYearQ
ON (Base_Customer_RevenueYearQ.RevenueYear = CustomerPaymentPerYearQ.[RevenueYear])
AND (Base_Customer_RevenueYearQ.CustomerId = CustomerPaymentPerYearQ.CustomerId)
GROUP BY
Base_Customer_RevenueYearQ.SalesRep,
Base_Customer_RevenueYearQ.CustomerId,
Base_Customer_RevenueYearQ.Customer,
Base_Customer_RevenueYearQ.RevenueYear,
CustomerPaymentPerYearQ.[Sum Of PaymentPerYear]
;

Timeout running SQL query

I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
EDIT:
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
EDIT 2:
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
EDIT 3:
The python code which generated this is:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?
SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.
what about this?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)

Why do I have to use DISTINCT for this to work?

here's my problem: I have an SQL query that makes 4 calls to a lookup table to return their values from a list of combinations in another table. I finally got this working, and for some reason, when I run the query without DISTINCT, I get a ton of data back, so I'm guessing that I'm either missing something or not doing this correctly. It would be really great if this would not only work, but also return the list alphabetically by the first colour name.
I'm putting my SQL here I hope I've explained this well enough:
SELECT DISTINCT
colour1.ColourID AS colour1_ColourID,
colour1.ColourName AS colour1_ColourName,
colour1.ColourHex AS colour1_ColourHex,
colour1.ManufacturerColourID AS colour1_ManufacturerColourID,
colour2.ColourID AS colour2_ColourID,
colour2.ColourName AS colour2_ColourName,
colour2.ColourHex AS colour2_ColourHex,
colour2.QEColourID2 AS colour2_QEColourID2,
colour3.ColourID AS colour3_ColourID,
colour3.ColourName AS colour3_ColourName,
colour3.ColourHex AS colour3_ColourHex,
colour3.QEColourID3 AS colour3_QEColourID3,
colour4.ColourID AS colour4_ColourID,
colour4.ColourName AS colour4_ColourName,
colour4.ColourHex AS colour4_ColourHex,
colour4.QEColourID4 AS colour4_QEColourID4,
Combinations.ID,
Combinations.ManufacturerColourID AS Combinations_ManufacturerColourID,
Combinations.QEColourID2 AS Combinations_QEColourID2,
Combinations.QEColourID3 AS Combinations_QEColourID3,
Combinations.QEColourID4 AS Combinations_QEColourID4,
Combinations.ColourSupplierID,
ColourSuppliers.ColourSupplier
FROM
ColourSuppliers INNER JOIN
(
colour4 INNER JOIN
(
colour3 INNER JOIN
(
colour2 INNER JOIN
(
colour1 INNER JOIN Combinations ON
colour1.ColourID=Combinations.ManufacturerColourID
) ON colour2.ColourID=Combinations.QEColourID2
) ON colour3.ColourID=Combinations.QEColourID3
) ON colour4.ColourID=Combinations.QEColourID4
) ON ColourSuppliers.ColourSupplierID=Combinations.ColourSupplierID
WHERE Combinations.ColourSupplierID = ?
Thanks
Steph
It looks as though you've probably got multiple records for each set of four colour combinations in the Combinations table - posting the structure of the table might help us to work it out.
Adding the clause order by colour1.ColourName to the end of the query should sort it alphabetically by the first colour name.
My guess (and it is a guess because your SQL query is very wide!) is that you're getting the cartesian product.