SQL Aggregation of text on joined table - sql

Pardon the lack of correct terminology, I'm a professional software engineer usually dealing with Direct3D frameworks. I'm self taught on databases.
I have a _People table and an _Ethnicities table. Since people may have more than one cultural group I have a link table _linkPersonEthnicity. Sample data is shown below:
What I want is output in the following form:
To illustrate the problem I present the following (runnable) query:
select lPE.Person, Sum(E.ID) as SumOfIDs,
Ethnicity = stuff(
(select ', ' + Max(E.Name) as [text()]
from _linkPersonEthnicity xPE
where xPE.Person = lPE.Person
for xml path('')
),
1, 2, '')
from _Ethnicities E
join _linkPersonEthnicity lPE on lPE.Ethnicity = E.ID
group by lPE.Person
It returns the Person's ID, a sum of the IDs found for the person's ethnicity, and concatenates the maximum Name with commas. The data is grouped correctly, and the SumOfIDs works, proving the correct data is used.
Naturally I would like to take away the Max aggregate function, but cannot since it is not in the group by list.
Any ideas how to make this work?
Thanks in advance,
AM
(Many thanks to other answers on StackOverflow for getting me this far! Particiularly #Jonathan Leffler for his explanation of the partitioning proceess and #Marc_s for illustrating a text concatenation technique.)
I've also tried coalesce from an answer to concatenating strings by #Chris Shaffer
declare #Names VARCHAR(8000)
select #Names = COALESCE(#Names + ', ', '') + E.Name
from _Ethnicities E join _linkPersonEthnicity lPE on lPE.Ethnicity = E.ID
where lPE.Person = 1001;
select #Names
Same problem. If I remove the where and add group by the text field Name cannot be accessed.

If I understand correctly, you need for the join to be in the subquery rather than the outer query
select lPE.Person, Sum(lpe.ethnicity) as SumOfIDs,
Ethnicity = stuff((select ', ' + E.Name as [text()]
from _linkPersonEthnicity lPE2 join
_Ethnicities e
on lpe2.Ethnicity = e.id
where lpe2.Person = lPE.Person
for xml path('')
), 1, 2, '')
from _linkPersonEthnicity lPE
group by lPE.Person;
By the way, do you really want the sum of the ids or a count?

Related

SQL Server Stuff() functions to concatenate data

I am trying to retrieve data for a list of student conditions.
SELECT DISTINCT
DF.DFKEY AS StudentID
Condition = STUFF((SELECT DISTCINT ',' + DFCOND.ENR_COND
FROM DFCOND
WHERE DFCOND.SKEY = DF.DFKEY
GROUP BY DFCOND.ENR_COND
FOR XML PATH('')), 1, 1, '')
FROM
DF
LEFT JOIN
DFCOND ON df.dfkey = dfcond.skey
WHERE
DFCOND.ENR_COND IN ('12', 'CDOC', 'CONSUPP', 'CSEM')
ORDER BY
DF.DFKEY
So in this code. Each student can be assigned with many conditions but I only want to display ones that listed in WHERE IN conditions. DFCOND is the table to stored students condition data and DF is the table to store student information.
My problem is when I run it, all students conditions will be displayed so it skips the 'where in' function. How can I fix it?
For Example,
Student ID(DF.KEY) | DFCOND.ENR_COND (Conditions)
AA12 70%,12,DOC
since '12' is in the "where in" list, I only need
Student ID | condition1
AA12 12
These two tables are connected with DF.key and DFCOND.SKEY, they represent Student ID.
Thank you for any advice.
You want the filtering in the subquery:
select DF.DFKEY as StudentID,
STUFF((Select Distinct ',' + DFCOND.ENR_COND
from DFCOND
where DFCOND.SKEY = DF.DFKEY AND
DFCOND.ENR_COND in ('12', 'CDOC', 'CONSUPP', 'CSEM')
for xml path ('')
), 1, 1, ''
) as conditions
from DF
order by DF.DFKEY;
You also don't need the JOIN in the outer query. Nor the GROUP BY in the subquery.

Complex SQL Pivot query

A quick background so that my problem makes sense: The system collects data from the user in the form of questionnaires. Users belong to Organisations, Organisations belong to Sectors, and Questions/Calculations (as found on the questionnaires) differ across the Sectors. (Questions are answered by users ; Calculations are calculated by the system).
The following tables exist:
Sectors (SectorID, Name)
Organisations (OrganisationID, Name, SectorID)
Years (YearID, Name)
Questions (QuestionID, DisplayText, CommonName, SectorID)
Answers (AnswerID, Answer, OrganisationID, YearID, QuestionID)
Calculations (CalculationID, DisplayText, CommonName, SectorID)
CalculationResults (CalculationResultID, Result, OrganisationID, YearID, CalculationID)
I need to display data in the following way:
The thing that makes this particularly complex (for me) is that questions are displayed (to the user) in different ways across the different sectors that they belong to, but some of them can still be common questions. E.g. "Manufacturing sales" would be the same thing as "Sales (manufacturing)". I need to be using the CommonName field to determine commonality.
I've managed to use SQL Pivot to get close to what I want - SQL Fiddle (if you run the SQL you'll notice the nulls and the "commonality" issue). However some things are missing from my attempt:
Commonality and column names - I need the column names to be the CommonName field, not the QuestionID field.
I've only selected from the Answers table - I need to also select from the CalculationResults table which is identically structured.
Edit: Desired result with the SQL Fiddle data is:
(The two blocks with the orange corners need to shift all the way to the left, so that there are a total of 3 columns for the Questions - the 3 unique CommonName values. The next 3 columns are for the 3 unique CommonName values for Calculations. I hope I've made sense, if not let me know.)
Edit2: Another edit just for fun. I've definitely thought about redesigning the db but it's not an option at this stage - too risky on this legacy system. In case anyone saw the design and thought that. I need a solution in the form of Pivot hopefully.
Sometimes instead of PIVOT you can use [Aggregate](CASE EXPRESSION) to get the same data. And sometimes it's faster.
For your problem you can use OUTER APPLY with dynamic MAX(CASE)
DECLARE #Questions NVARCHAR(MAX),
#Calculations NVARCHAR(MAX),
#Sql NVARCHAR(MAX)
SELECT #Questions = COALESCE(#Questions + ', ', '')
+ 'MAX(CASE WHEN q.CommonName = ''' + CommonName + ''' THEN a.Answer END) AS ' + QUOTENAME(CommonName)
FROM Questions
GROUP BY CommonName
SELECT #Calculations = COALESCE(#Calculations + ', ', '')
+ 'MAX(CASE WHEN c.CommonName = ''' + CommonName + ''' THEN cr.Result END) AS ' + QUOTENAME(CommonName)
FROM Calculations
GROUP BY CommonName
SET #Sql = N'
SELECT
o.Name As [Organisation],
y.Name As [Year],
q.*,
c.*
FROM
Organisations o
CROSS JOIN Years y
OUTER APPLY (
SELECT ' + #Questions + '
FROM Answers a
JOIN Questions q ON a.QuestionID = q.QuestionID
WHERE a.OrganisationID = o.OrganisationID
AND a.YearID = y.YearID
) q
OUTER APPLY (
SELECT ' + #Calculations + '
FROM CalculationResults cr
JOIN Calculations c ON cr.CalculationID = c.CalculationID
WHERE cr.OrganisationID = o.OrganisationID
AND cr.YearID = y.YearID
) c
'
SQL FIDDLE DEMO
Basically we want to get the order of the QuestionID Grouped By SectorID, and Name.
You can do this using PARTITION BY with something like this:
ROW_NUMBER() OVER(PARTITION BY q.SectorID, y.Name ORDER BY a.QuestionID)
this should do it:
DECLARE #cols AS NVARCHAR(MAX)
, #query AS NVARCHAR(MAX);
SELECT #cols = STUFF(
(SELECT DISTINCT
','+QUOTENAME(CAST(ROW_NUMBER() OVER(PARTITION BY q.SectorID
, y.Name ORDER BY a.QuestionID) AS VARCHAR(10)))
FROM Answers a
LEFT JOIN Years y ON a.YearID = y.YearID
LEFT JOIN Organisations o ON a.OrganisationID = o.OrganisationID
LEFT JOIN Questions q ON a.QuestionID = q.QuestionID
FOR XML PATH(''), TYPE).value
('.', 'NVARCHAR(MAX)'), 1, 1, '');
SET #query = '
SELECT Organisation, Year, '+#cols+' from
(
SELECT QuestionID = ROW_NUMBER() OVER(PARTITION BY q.SectorID
, y.Name ORDER BY a.QuestionID)
, o.Name AS Organisation
, y.Name AS Year
, a.Answer
FROM Answers a
LEFT JOIN Years y ON a.YearID = y.YearID
LEFT JOIN Organisations o ON a.OrganisationID = o.OrganisationID
LEFT JOIN Questions q ON a.QuestionID = q.QuestionID
) src
pivot
(
max(Answer)
for QuestionID in ('+#cols+')
) piv
order by Organisation, Year
';
PRINT(#query);
EXECUTE (#query);
RESULT:

How to do a cross tab where the number of columns varies? (ADO SQL Server)

I'm trying to write a query that has a variable number of columns depending on the data but I've never done this kind of thing.
We're running ADO and hope to have a single query (possibly with subqueries) but no other coding or GO statements, stored procedures, etc.
We're planning to use the results of this query in an editable grid.
Below is a sample of our data. We have a list of employees and a list of Projects. Note that this isn't a "summed" cross tab. There's only one source number per cell.
We want the query results to have one column for each project. The cells in this column would contain the hours for that employee on that project.
If we add a project, we want another column to appear in the query results.
Edit: Since we're writing the query in code and the submitting it, we can generate as the query dynamically. We need not dynamically generate the code. For example, in our data below, we'll be able to read (in our native language) the Project table and know we have 3 projects and what their names are. We can use them in a Pivot I'm seeing from reading up, but just not sure how...
How about something like this: http://sqlfiddle.com/#!6/2ded1/6
declare #pivotquery as nvarchar(max)
declare #columnname as nvarchar(max)
select #columnname= isnull(#columnname + ',','')
+ quotename(name)
from (select name from project) as t2
set #pivotquery =
N'with t1 as (
select ph.employee, e.name as empname, p.name, ph.hours
from project_hours ph
inner join project p
on p.id = ph.project
inner join employees e
on e.id = ph.employee
)
select *
from t1
pivot(sum(hours) for name in (' + #columnname + ')) as pivot_table'
exec sp_executesql #pivotquery
with significant help and further explanation here: http://sqlhints.com/2014/03/18/dynamic-pivot-in-sql-server/
EDIT: As I read your question again, I notice that you're building the query in code, in which case you probably don't need the utility above, but a simple pivot where you build the for name in clause in your program, like this: http://sqlfiddle.com/#!6/2ded1/12
with t1 as (
select ph.employee, e.name as empname, p.name, ph.hours
from project_hours ph
inner join project p
on p.id = ph.project
inner join employees e
on e.id = ph.employee
)
select *
from t1
pivot(sum(hours)
for name in ([First Floor], [Basement], [Parking Lot A]))
as hours_summary
For future readers, here is a generalizable ANSI-syntax SQL query using two nested derived table subqueries.
This query should be compliant on most RDMS (SQL Server, MySQL, SQLite, Oracle, PostgreSQL, DB2) as it uses no CTE Window functions (WITH) or database-specific functions like SQL Server's Pivot():
SELECT [Key], [Employee Name], [Age],
Max(FF) As [First Floor],
Max(BSMT) As [Basement],
Max(PRKLotA) As [Parking Lot A]
FROM (
SELECT dT.Key,
dT.[Employee Name],
dT.[Age],
CASE WHEN dT.Project = 'First Floor' THEN dT.Hours END As FF,
CASE WHEN dT.Project = 'Basement' THEN dT.Hours END As BSMT,
CASE WHEN dT.Project = 'Parking Lot A' THEN dT.Hours END As PRKLotA
FROM (
SELECT [Employees].Key,
[Employees.[Employee Name],
Employees.Age, Projects.Project,
[Project Hours].Hours
FROM [Project Hours]
INNER JOIN Employees ON [Project Hours].[Employees FK] = Employees.Key
INNER JOIN Projects ON [Project Hours].[Projects FK] = Projects.Key
) AS dT
) As dT2
GROUP BY [Key], [Employee Name], [Age]
Output:
Key Employee Name Age First Floor Basement Parking Lot A
1 Tim 40 1000 3000
2 John 5 2000 4000 5000

SQL Server dynamically change WHERE clause in a SELECT based on returned data

I'm mainly a presentation/logic tier developer and don't mess around with SQL all that much but I have a problem and am wondering if it's impossible within SQL as it's not a full programming language.
I have a field ContactID which has an CompanyID attached to it
In another table, the CompanyID is attached to CompanyName
I am trying to create a SELECT statement that returns ONE CONTACT ID and in a seperate column, an aggregate of all the Companies attached to this contact (by name).
E.G
ContactID - CompanyID - CompanyName
***********************************
1 001 Lol
1 002 Haha
1 003 Funny
2 002 Haha
2 004 Lmao
I want to return
ContactID - Companies
*********************
1 Lol, Haha, Funny
2 Haha, Lmao
I have found the logic to do so with ONE ContactID at a time:
SELECT x.ContactID, substring(
(
SELECT ', '+y.CompanyName AS [text()]
FROM TblContactCompany x INNER JOIN TblCompany y ON x.CompanyID = y.CompanyID WHERE x.ContactID = 13963
For XML PATH (''), root('MyString'), type
).value('/MyString[1]','varchar(max)')
, 3, 1000)
[OrgNames] from TblContact x WHERE x.ContactID = 13963
As you can see here, I am hardcoding in the ContactID 13963, which is neccessary to only return the companies this individual is linked to.
The issue is when I want to return this aggregate information PER ROW on a much bigger scale SELECT (on a whole table full of ContactID's).
I want to have x.ContactID = (this.ContactID) but I can't figure out how!
Failing this, could I run one statement to return a list of ContactID's, then in the same StoredProc run another statement that LOOPS through this list of ContactID's (essentially performing the second statement x times where x = no. of ContactID's)?
Any help greatly appreciated.
You want a correlated subquery:
SELECT ct.ContactID,
stuff((SELECT ', ' + co.CompanyName AS [text()]
FROM TblContactCompany cc INNER JOIN
TblCompany co
ON cc.CompanyID = co.CompanyID
WHERE cc.ContactID = ct.ContactId
For XML PATH (''), root('MyString'), type
).value('/MyString[1]', 'varchar(max)'),
1, 2, '')
[OrgNames]
from TblContact ct;
Note the where clause on the inner subquery.
I also made two other changes:
I changed the table aliases to better represent the table names. This makes queries easier to understand. (Plus, the aliases had to be changed because you were using x in the outer query and the inner query.)
I replaced the substring() with stuff(), which does exactly what you want.
You could use a table variable to store the required x.ContactID and in your main query in the WHERE clause use IN clause like below
WHERE
...
x.ContactID IN (SELECT ContactID FROM #YourTableVariable)
I guess all you need to do is to use unique table identifiers in your subquery and join the table in subquery with outer table x:
SELECT x.ContactID, substring(
(
SELECT ', '+z.CompanyName AS [text()]
FROM TblContactCompany y, TblCompany z WHERE y.CompanyID = z.CompanyID AND y.ContactId = x.ContactId
For XML PATH (''), root('MyString'), type
).value('/MyString[1]','varchar(max)')
, 3, 1000)
[OrgNames] from TblContact x
Don't loop or you will get performance problems (row by agonising row RBAR). Instead do set based queries.
This is untested but should give you an idea of how it may work:
SELECT
x.ContactID,
substring(
(SELECT ', '+y.CompanyName AS [text()]
FROM TblContactCompany y
WHERE x.CompanyID = y.CompanyID
For XML PATH (''), root('MyString'), type).value('/MyString[1]','varchar(max)')
, 3, 1000)
[OrgNames]
FROM TblContact x
And I have a feeling you can use CONCAT instead of substring

carriage return in sql server 2012

Hey I am using the following query to display the problem list separated by commas.
SELECT tt.VrNo, STUFF((select ','+ Er1.ErrorDesc
from ( select * from CallRegErrors )as Main
left join ErrorMaster ER1 on Main.ErrorCode=ER1.ErrorCode
WHERE (main.VrNo = tt.VrNo)
FOR XML PATH('')) ,1,1,'') AS Problemlist
query is giving the output like a,b,c,d etc
But my actual requirement is I want to display each error description in a new line like,
a
b
c
d
etc
I tried the following query for it:
SELECT tt.VrNo, STUFF((select char(13)+char(10)+ Er1.ErrorDesc
from ( select * from CallRegErrors )as Main
left join ErrorMaster ER1 on Main.ErrorCode=ER1.ErrorCode
WHERE (main.VrNo = tt.VrNo)
FOR XML PATH('')) ,1,1,'') AS Problemlist
and also i have used
SELECT tt.VrNo,Replace(STUFF((select ','+ Er1.ErrorDesc as [text()] from ( select * from CallRegErrors )as Main left join ErrorMaster ER1 on Main.ErrorCode=ER1.ErrorCode
WHERE (main.VrNo = tt.VrNo)
FOR XML PATH('')),1,1,''),',',char(13)+char(10)) AS Problemlist
from (select main.VrNo, Er1.ErrorDesc from ( select * from CallRegErrors )as Main left join ErrorMaster ER1 on Main.ErrorCode=ER1.ErrorCode )as tt
group by tt.VrNo
but now get the problem list seperated by spaces instead of commas after using the above query
but its does not give the output that i want.
please help..
Thanks in advance
I think we need more information before we can help you.
I think you are trying to format the information at the child level in a parent child relationship into a list. You probably saw something like this blog on the web.
However, your query is not correctly formatted.
Is the ErrorMaster (Production.ProductCategory) the parent and CallRegErrors (SUB.ProductCategoryID) the child?
If so just change the query to those table name field names for it to work.
I used the REPLACE function on the overall result to change COMMAS to CR + LF.
-- Sample database
USE AdventureWorks2012
GO
-- Change SQL from www.sqlandme.com for this users problem
SELECT
CAT.Name AS [Category],
REPLACE(STUFF((
SELECT ',' + SUB.Name AS [text()]
FROM Production.ProductSubcategory SUB
WHERE SUB.ProductCategoryID = CAT.ProductCategoryID
FOR XML PATH('')
), 1, 1, '' ), ',', CHAR(13) + CHAR(10))
AS [Sub Categories]
FROM Production.ProductCategory CAT
You can only see carriage returns in the output window when the type is set to TEXT in SSMS.
I hope this solves your problem. If not, please write back with more information!!