Complex SQL Pivot query - sql

A quick background so that my problem makes sense: The system collects data from the user in the form of questionnaires. Users belong to Organisations, Organisations belong to Sectors, and Questions/Calculations (as found on the questionnaires) differ across the Sectors. (Questions are answered by users ; Calculations are calculated by the system).
The following tables exist:
Sectors (SectorID, Name)
Organisations (OrganisationID, Name, SectorID)
Years (YearID, Name)
Questions (QuestionID, DisplayText, CommonName, SectorID)
Answers (AnswerID, Answer, OrganisationID, YearID, QuestionID)
Calculations (CalculationID, DisplayText, CommonName, SectorID)
CalculationResults (CalculationResultID, Result, OrganisationID, YearID, CalculationID)
I need to display data in the following way:
The thing that makes this particularly complex (for me) is that questions are displayed (to the user) in different ways across the different sectors that they belong to, but some of them can still be common questions. E.g. "Manufacturing sales" would be the same thing as "Sales (manufacturing)". I need to be using the CommonName field to determine commonality.
I've managed to use SQL Pivot to get close to what I want - SQL Fiddle (if you run the SQL you'll notice the nulls and the "commonality" issue). However some things are missing from my attempt:
Commonality and column names - I need the column names to be the CommonName field, not the QuestionID field.
I've only selected from the Answers table - I need to also select from the CalculationResults table which is identically structured.
Edit: Desired result with the SQL Fiddle data is:
(The two blocks with the orange corners need to shift all the way to the left, so that there are a total of 3 columns for the Questions - the 3 unique CommonName values. The next 3 columns are for the 3 unique CommonName values for Calculations. I hope I've made sense, if not let me know.)
Edit2: Another edit just for fun. I've definitely thought about redesigning the db but it's not an option at this stage - too risky on this legacy system. In case anyone saw the design and thought that. I need a solution in the form of Pivot hopefully.

Sometimes instead of PIVOT you can use [Aggregate](CASE EXPRESSION) to get the same data. And sometimes it's faster.
For your problem you can use OUTER APPLY with dynamic MAX(CASE)
DECLARE #Questions NVARCHAR(MAX),
#Calculations NVARCHAR(MAX),
#Sql NVARCHAR(MAX)
SELECT #Questions = COALESCE(#Questions + ', ', '')
+ 'MAX(CASE WHEN q.CommonName = ''' + CommonName + ''' THEN a.Answer END) AS ' + QUOTENAME(CommonName)
FROM Questions
GROUP BY CommonName
SELECT #Calculations = COALESCE(#Calculations + ', ', '')
+ 'MAX(CASE WHEN c.CommonName = ''' + CommonName + ''' THEN cr.Result END) AS ' + QUOTENAME(CommonName)
FROM Calculations
GROUP BY CommonName
SET #Sql = N'
SELECT
o.Name As [Organisation],
y.Name As [Year],
q.*,
c.*
FROM
Organisations o
CROSS JOIN Years y
OUTER APPLY (
SELECT ' + #Questions + '
FROM Answers a
JOIN Questions q ON a.QuestionID = q.QuestionID
WHERE a.OrganisationID = o.OrganisationID
AND a.YearID = y.YearID
) q
OUTER APPLY (
SELECT ' + #Calculations + '
FROM CalculationResults cr
JOIN Calculations c ON cr.CalculationID = c.CalculationID
WHERE cr.OrganisationID = o.OrganisationID
AND cr.YearID = y.YearID
) c
'
SQL FIDDLE DEMO

Basically we want to get the order of the QuestionID Grouped By SectorID, and Name.
You can do this using PARTITION BY with something like this:
ROW_NUMBER() OVER(PARTITION BY q.SectorID, y.Name ORDER BY a.QuestionID)
this should do it:
DECLARE #cols AS NVARCHAR(MAX)
, #query AS NVARCHAR(MAX);
SELECT #cols = STUFF(
(SELECT DISTINCT
','+QUOTENAME(CAST(ROW_NUMBER() OVER(PARTITION BY q.SectorID
, y.Name ORDER BY a.QuestionID) AS VARCHAR(10)))
FROM Answers a
LEFT JOIN Years y ON a.YearID = y.YearID
LEFT JOIN Organisations o ON a.OrganisationID = o.OrganisationID
LEFT JOIN Questions q ON a.QuestionID = q.QuestionID
FOR XML PATH(''), TYPE).value
('.', 'NVARCHAR(MAX)'), 1, 1, '');
SET #query = '
SELECT Organisation, Year, '+#cols+' from
(
SELECT QuestionID = ROW_NUMBER() OVER(PARTITION BY q.SectorID
, y.Name ORDER BY a.QuestionID)
, o.Name AS Organisation
, y.Name AS Year
, a.Answer
FROM Answers a
LEFT JOIN Years y ON a.YearID = y.YearID
LEFT JOIN Organisations o ON a.OrganisationID = o.OrganisationID
LEFT JOIN Questions q ON a.QuestionID = q.QuestionID
) src
pivot
(
max(Answer)
for QuestionID in ('+#cols+')
) piv
order by Organisation, Year
';
PRINT(#query);
EXECUTE (#query);
RESULT:

Related

The multi-part identifier could not be bound - stuff cmd

I'm attempting a Stuff Cmd to combine multiple rows to a single entry. I keep getting "The multi-part identifier "SPCLT.CD_VAL_DESC" could not be bound." (under the first SELECT statement)
STUFF(
(SELECT
**',' + SPCLT.CD_VAL_DESC**
FROM
(
SELECT DISTINCT
SPCLT.CD_VAL_DESC SPECIALTY
FROM PIN_STATUS PS
INNER JOIN PROV_TYPE_SPCLT SPC
ON PS.PROV_ID = SPC.PROV_ID
AND SPC.VLDT_IND = 'Y'
INNER JOIN CODE_REF SPCLT
ON SPC.SPCLT_CD = SPCLT.CD_VAL
AND SPCLT.CD_REF_NM = 'SPECIALTY'
AND SPCLT.VLDT_IND = 'Y'
WHERE SPC.VLDT_IND = 'Y'
) SPCLTY
for xml
path('')
)
,1,1,'') SPECIALTIES
You need to pay attention to the format of your SQL, and then the answer would probably jump out and bite you on the nose... happens to everyone.
Your query:
STUFF(
(
SELECT
',' + SPCLT.CD_VAL_DESC
FROM
(
SELECT DISTINCT
SPCLT.CD_VAL_DESC SPECIALTY
FROM PIN_STATUS PS
INNER JOIN PROV_TYPE_SPCLT SPC
ON PS.PROV_ID = SPC.PROV_ID
AND SPC.VLDT_IND = 'Y'
INNER JOIN CODE_REF SPCLT
ON SPC.SPCLT_CD = SPCLT.CD_VAL
AND SPCLT.CD_REF_NM = 'SPECIALTY'
AND SPCLT.VLDT_IND = 'Y'
WHERE SPC.VLDT_IND = 'Y'
) SPCLTY
for xml path('')
)
,1,1,'') SPECIALTIES
...is divided into sub-queries. The STUFF() function is acting on the first SELECT beneath it.
That first SELECT is taking data FROM a sub-query, which has been aliased as SPCLTY. So, naturally, within that SELECT, you need to be referencing SPCLTY and not SPCLT.
Adding a bit of whitespace makes it a little clearer, I think.

how to omit null values using SQL query

I am trying to only display the rows in which there is date for Researchers.
I cannot manage to omit the rows with Null Values. I even tried this solution How to remove null rows from sql query result?..
This is my Query:
SELECT Submission.Title AS [Submission_Title], CA.Surname AS [Researchers], Submission.Status AS [Status]
FROM Submission
CROSS APPLY (SELECT STUFF((SELECT DISTINCT ', ' + r.Surname
FROM ResearcherSubmission rs INNER JOIN Researcher r
ON r.ResearcherID = rs.ResearcherID
WHERE CONCAT (DATENAME(MONTH,[Submission].[SubmissionDate]), ' ',DATEPART (YEAR,[Submission].[SubmissionDate])) = 'October 2015'
AND Submission.SubmissionID = rs.SubmissionID
FOR XML PATH (''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 2, ' ')) AS CA (Surname)
GROUP BY convert(varchar(10),datename(month,Submission.SubmissionDate)), Submission.Title, CA.Surname, Submission.Status;
This is my Current output:
any suggestion. Thank you
Quickfix, without reading query:
WITH cte AS
(
SELECT Submission.Title AS [Submission_Title], CA.Surname AS [Researchers], Submission.Status AS [Status]
FROM Submission
CROSS APPLY (SELECT STUFF((SELECT DISTINCT ', ' + r.Surname
FROM ResearcherSubmission rs INNER JOIN Researcher r
ON r.ResearcherID = rs.ResearcherID
WHERE CONCAT (DATENAME(MONTH,[Submission].[SubmissionDate]), ' ',DATEPART (YEAR,[Submission].[SubmissionDate])) = 'October 2015'
AND Submission.SubmissionID = rs.SubmissionID
FOR XML PATH (''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 2, ' ')) AS CA (Surname)
GROUP BY convert(varchar(10),datename(month,Submission.SubmissionDate)), Submission.Title, CA.Surname, Submission.Status
)
SELECT *
FROM cte
WHERE Researchers IS NOT NULL;
There is probably more elegant solution, but you need to share sample data and structures.
This part may cause problems:
SELECT DISTINCT ', ' + r.Surname
try with CONCAT instead or :
SELECT DISTINCT ', ' + ISNULL(r.Surname, '')
You should filter out the researchers before the group by rather than afterwards. When possible, it is better (performance-wise) to put conditions before aggregation.
SELECT s.Title AS Submission_Title, CA.Surname AS Researchers, s.Status
FROM Submission s CROSS APPLY
(SELECT STUFF((SELECT DISTINCT ', ' + r.Surname
FROM ResearcherSubmission rs INNER JOIN
Researcher r
ON r.ResearcherID = rs.ResearcherID
WHERE s.SubmissionID = rs.SubmissionID
FOR XML PATH (''), TYPE).value('.', 'NVARCHAR(MAX)'
), 1, 2, ' '))
) AS CA(Surname)
WHERE s.SubmissionDate >= '2015-10-01' AND s.SubmissionDate < '2015-11-01' AND
ca.Surname IS NULL
GROUP BY YEAR(s.SubmissionDate), MONTH(s.SubmissionDate), s.Title, CA.Surname, s.Status;
Note the changes made:
Table aliases make the query easier to write and to read.
I changed the date comparison to have no functions on the date itself. This would allow SQL Server to use an index, if appropriate.
I also moved the date comparison from the CROSS APPLY subquery to the outer query. This could be a big gain in efficiency. Why do the extra work for rows that will be filtered out anyway?
I added the NOT NULL condition to the WHERE clause.
The date key in the outer GROUP BY is redundant because the query is only using one month of data. I simplified the logic but left it.

display more than one value using a SQL query

I am trying to display multiple authors per title in a single column. At the moment there a repeating rows, due to the fact that some Titles have more than 1 FirstName. Is there a form of concatenation that can be used to resolve this and display all the authors in a single filed and perhaps separated by a comma.
This is my current query:
SELECT
Submission.Title, Researcher.FirstName, Submission.Type
FROM
Submission
INNER JOIN
((Faculty
INNER JOIN
School ON Faculty.FacultyID = School.[FacultyID])
INNER JOIN
(Researcher
INNER JOIN
ResearcherSubmission ON Researcher.ResearcherID = ResearcherSubmission.ResearcherID)
ON School.SchoolID = Researcher.SchoolID)
ON Submission.SubmissionID = ResearcherSubmission.SubmissionID
GROUP BY
Submission.Title, Researcher.FirstName, Submission.Type;
This the output it generates:
[
this is the output I am trying to generate:
Title FirstName Type
---------------------------------------------------------------------------
21st Century Business Matthew, Teshar Book Chapter
A Family Tree... Keshant, Lawrence Book Chapter
Benefits of BPM... Jafta Journal Article
Business Innovation Matthew, Morna, Teshar Book Chapter
You may inclde the concantenation logic within a CROSS APPLY
SELECT
Submission.Title
, CA.FirstNames
, Submission.Type
FROM Submission
CROSS APPLY (
SELECT
STUFF((
SELECT /* DISTINCT ??? */
', ' + r.FirstName
FROM ResearcherSubmission rs
INNER JOIN Researcher r ON r.ResearcherID = rs.ResearcherID
WHERE Submission.SubmissionID = rs.SubmissionID
FOR XML PATH (''), TYPE
).value('.', 'NVARCHAR(MAX)'), 1, 2, ' ')
) AS CA (FirstNames)
GROUP BY
Submission.Title
, CA.FirstNames
, Submission.Type
;
NB: I'm not sure if you need to include DISTINCT into the subquery when concatenating the names, e.g. if these was 'Jane' (Smith) and 'Jane' (Jones) do you want the final list as: 'Jane' or 'Jane, Jane'?
You can do this in your application logic as well.
But if you want to do this with a query. You should be able do something like this:
SELECT DISTINCT
sm.Title,
STUFF(
(SELECT ', ' + r.FirstName
FROM ResearcherSubmission rs
INNER JOIN Researcher r ON r.ResearcherID = rs.ResearcherID
WHERE sm.SubmissionID = rs.SubmissionID
FOR XML PATH('')), 1, 2, '') AS FirstNames,
sm.Type
FROM Submission sm
You can use the below query to generate the o/p as you want from the o/p that you have got.
CREATE TABLE #temptable(Title VARCHAR(200), FirstName VARCHAR(200), Type VARCHAR(200))
INSERT INTO #temptable
SELECT 'Book1','Matt','Chapter' UNION
SELECT 'Book1','Tesh','Chapter' UNION
SELECT 'BPM','Jafta','Article' UNION
SELECT 'Ethics','William','Journal' UNION
SELECT 'Ethics','Lawrence','Journal' UNION
SELECT 'Ethics','Vincent','Journal' UNION
SELECT 'Cellular','Jane','Conference'
SELECT Title
,STUFF((SELECT ', ' + CAST(FirstName AS VARCHAR(10)) [text()]
FROM #temptable
WHERE Title = t.Title
FOR XML PATH(''), TYPE)
.value('.','NVARCHAR(MAX)'),1,2,' ') List_Output
,Type
FROM #temptable t
GROUP BY Title,Type

SQL Aggregation of text on joined table

Pardon the lack of correct terminology, I'm a professional software engineer usually dealing with Direct3D frameworks. I'm self taught on databases.
I have a _People table and an _Ethnicities table. Since people may have more than one cultural group I have a link table _linkPersonEthnicity. Sample data is shown below:
What I want is output in the following form:
To illustrate the problem I present the following (runnable) query:
select lPE.Person, Sum(E.ID) as SumOfIDs,
Ethnicity = stuff(
(select ', ' + Max(E.Name) as [text()]
from _linkPersonEthnicity xPE
where xPE.Person = lPE.Person
for xml path('')
),
1, 2, '')
from _Ethnicities E
join _linkPersonEthnicity lPE on lPE.Ethnicity = E.ID
group by lPE.Person
It returns the Person's ID, a sum of the IDs found for the person's ethnicity, and concatenates the maximum Name with commas. The data is grouped correctly, and the SumOfIDs works, proving the correct data is used.
Naturally I would like to take away the Max aggregate function, but cannot since it is not in the group by list.
Any ideas how to make this work?
Thanks in advance,
AM
(Many thanks to other answers on StackOverflow for getting me this far! Particiularly #Jonathan Leffler for his explanation of the partitioning proceess and #Marc_s for illustrating a text concatenation technique.)
I've also tried coalesce from an answer to concatenating strings by #Chris Shaffer
declare #Names VARCHAR(8000)
select #Names = COALESCE(#Names + ', ', '') + E.Name
from _Ethnicities E join _linkPersonEthnicity lPE on lPE.Ethnicity = E.ID
where lPE.Person = 1001;
select #Names
Same problem. If I remove the where and add group by the text field Name cannot be accessed.
If I understand correctly, you need for the join to be in the subquery rather than the outer query
select lPE.Person, Sum(lpe.ethnicity) as SumOfIDs,
Ethnicity = stuff((select ', ' + E.Name as [text()]
from _linkPersonEthnicity lPE2 join
_Ethnicities e
on lpe2.Ethnicity = e.id
where lpe2.Person = lPE.Person
for xml path('')
), 1, 2, '')
from _linkPersonEthnicity lPE
group by lPE.Person;
By the way, do you really want the sum of the ids or a count?

Convert a query of views into a single query with derived tables

Is it possible to convert (as in get the query as text) a query that consists of a lot of views within views into a query that is just based on the original tables?
The obvious is to go through all the views and then do it manually but wondered whether or not there was a quicker way?
It's not pretty, and like #David Faber commented, not sure how practical this will be, but here goes...
There's a whole lot of assumptions that one has to make for this to work, like
All of the views start with SELECT. (no CTE's)
The view is not enclosed in [ ] when referenced in another view
View names don't have spaces in them
This version only goes 1 level deep, but it should be possible to resolve any additional views you find in the output in a similar fashion.
And probably some more I didn't think of
I'm using the sys tables, and not the newer schema information objects, simply because I know the sys table structure better.
Assuming the following view
CREATE VIEW VW_DUMMY
AS
SELECT c.Name as Company, g.Name as [Group], gu.UserId
FROM VW_Company c
JOIN VW_Group g ON c.Id = g.CompanyId
JOIN VW_GroupUser gu ON g.Id = gu.GroupId AND gu.CompanyId = c.Id
Here's what I did.
1) Grab the view definition from syscomments for VW_DUMMY.
2) Strip off the CREATE VIEW part
2) Grab a list of objects that VW_DUMMY depends on from sysdepends
3) Grab the view definition from syscomments for all the dependant objects.
4) Strip off the CREATE VIEW part
5) Replace the name of the 'depends' object in the original view with the
definition...
1)
DECLARE #SQL VARCHAR(MAX);
SET #SQL = REPLACE((
SELECT c.text AS [text()]
FROM syscomments c
WHERE c.id = OBJECT_ID('VW_DUMMY')
FOR XML PATH('')), '
', '');
2)
SET #SQL = SUBSTRING(#SQL, PATINDEX('%SELECT%', #SQL), LEN(#SQL))
3), 4) and 5)
SELECT #SQL = REPLACE(#SQL, ' ' + name + ' ', '(' + SUBSTRING(text, PATINDEX('%SELECT%', text), LEN(text)) + ') ')
FROM (
SELECT DISTINCT OBJECT_NAME(depid) as name,
REPLACE((
SELECT c.text AS [text()]
FROM syscomments c
WHERE c.id = d.depid
FOR XML PATH('')), '
', '') as text
FROM sysdepends d
WHERE d.id = OBJECT_ID('VW_DUMMY')
AND exists(select 1 from sysobjects c where c.id = d.depid and c.type='V')) data
SELECT #SQL
I tried in on VW_DUMMY in my database, and the output is some of the worst formatted code that you might ever see, but the result is the same as the view.
Here's the output (bad formatting is deliberate)
SELECT c.Name as Company, g.Name as [Group], gu.UserId
FROM(SELECT *
FROM Company
) c
JOIN(SELECT *
FROM [Group]
) g ON c.Id = g.CompanyId
JOIN(SELECT *
FROM [GroupUser]
) gu ON g.Id = gu.GroupId AND gu.CompanyId = c.Id
Does that help?