Formatting SQL Output (Pivot) - sql

This is running on SQL Server 2008.
Anyway, I have sales data, and I can write a query to get the output to look like this:
id | Name | Period | Sales
1 | Customer X | 2013/01 | 50
1 | Customer X | 2013/02 | 45
etc. Currently, after running this data, I am rearranging the data in the code behind so that the final output looks like this:
id | Name | 2013/01 | 2013/02
1 | Customer X | 50 | 40
The issues are:
The date (YYYY/MM) range is an input from the user.
If the user selects more outputs (like, say, address, and a ton of other possible fields relating to that customer), that information is duplicated in every line. When you're doing 10-15 items per line, over a period of 5+ years, for 50000+ users, this causes problems with running out of memory, and is also inefficient.
I've considered pulling only the necessary data (the customer id -- how they're joined together, the period, and the sales figure), and then after the fact running a separate query to get the additional data. This doesn't seem like it would be efficient though, but it's a possibility.
The other, which is what I'm thinking should be the best option, would be to rewrite my query to go ahead and do what my current code behind is doing, and pivot the data together, that way the customer data is never duplicated and I'm not moving a lot of unnecessary data around.
To give a better example of what I'm working with, let's assume these tables:
Address
id | HouseNum | Street | Unit | City | State
Customer
id | Name |
Sales
id | Period | Sales
So I would like to join these tables on the customer id, display all of the address data, assume the user inputs "2012/01 -- 2012/12", I can translate that into 2012/01, 2012/02 ... 2012/12 in my code behind to input into the query before it executes, so I have that available.
What I want it to look like would be:
id | Name | HouseNum | Street | City | State | 2012/01 | 2012/02 | ... | 2012/12
1 | X | 100 | Main St. | ABC | DEF | 30 | | ... | 20
(no sales data for that customer on 2012/02 -- if any of the data is blank I want it to be a blank string "", not a NULL)
I realize I may not be explaining this the best way possible, so just let me know and I'll add more information. Thank you!
edit: oh, one last thing. Would it be possible to add a Min, Max, Avg, & Total columns to the end, which sum up all of the pivoted data? It wouldn't be a big deal to do it on the code behind, but the more sql server can do for me the better, imo!
edit: One more, the period is in the tables as "2013/01" etc, but I'd like to rename them to "Jan 2013" etc, if it's not too complicated?

You can implement the PIVOT function to transform the data from rows into columns. You can use the following to get the result:
select id,
name,
HouseNum,
Street,
City,
State,
isnull([2013/01], 0) [2013/01],
isnull([2013/02], 0) [2013/02],
isnull([2012/02], 0) [2012/02],
isnull([2012/12], 0) [2012/12],
MinSales,
MaxSales,
AvgSales,
TotalSales
from
(
select c.id,
c.name,
a.HouseNum,
a.Street,
a.city,
a.state,
s.period,
s.sales,
min(s.sales) over(partition by c.id) MinSales,
max(s.sales) over(partition by c.id) MaxSales,
avg(s.sales) over(partition by c.id) AvgSales,
sum(s.sales) over(partition by c.id) TotalSales
from customer c
inner join address a
on c.id = a.id
inner join sales s
on c.id = s.id
) src
pivot
(
sum(sales)
for period in ([2013/01], [2013/02], [2012/02], [2012/12])
) piv;
See SQL Fiddle with Demo.
If you have a unknown number of period values that you want to transform into column, then you will have to use dynamic SQL to get the result:
DECLARE #cols AS NVARCHAR(MAX),
#colsNull AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT distinct ',' + QUOTENAME(period)
from Sales
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
select #colsNull = STUFF((SELECT distinct ', IsNull(' + QUOTENAME(period) + ', 0) as '+ QUOTENAME(period)
from Sales
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT id,
name,
HouseNum,
Street,
City,
State,' + #colsNull + ' ,
MinSales,
MaxSales,
AvgSales,
TotalSales
from
(
select c.id,
c.name,
a.HouseNum,
a.Street,
a.city,
a.state,
s.period,
s.sales,
min(s.sales) over(partition by c.id) MinSales,
max(s.sales) over(partition by c.id) MaxSales,
avg(s.sales) over(partition by c.id) AvgSales,
sum(s.sales) over(partition by c.id) TotalSales
from customer c
inner join address a
on c.id = a.id
inner join sales s
on c.id = s.id
) x
pivot
(
sum(sales)
for period in (' + #cols + ')
) p '
execute(#query)
See SQL Fiddle with Demo. These give the result:
| ID | NAME | HOUSENUM | STREET | CITY | STATE | 2012/02 | 2012/12 | 2013/01 | 2013/02 | MINSALES | MAXSALES | AVGSALES | TOTALSALES |
---------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | Customer X | 100 | Maint St. | ABC | DEF | 0 | 20 | 50 | 45 | 20 | 50 | 38 | 115 |
| 2 | Customer Y | 108 | Lost Rd | Unknown | Island | 10 | 0 | 0 | 0 | 10 | 10 | 10 | 10 |

Related

SQL Server 2012 Merge Records in OUTER APPLY related table

I have a Table "Customers" with customers details and Table "CALLS" where I store the result of each phonecall
When I need to get a list of the customers I have to call I use this query
SELECT *
FROM (
SELECT TOP (50) S.ID,S.URL,S.Phone,S.Email
FROM dbo.Customers AS S
WHERE URL is not null and City like 'Berl%'
ORDER BY S.ID
) AS S
OUTER APPLY (
SELECT TOP (3) I.CalledOn, I.Answer
FROM dbo.Calls AS I
WHERE S.URL = I.URL
ORDER BY I.CalledOn DESC
) AS I;
where I get the list of all customer in the city together with last 3 answers.
But this returns up to 3 records for each customer, while I would have only one
and summarize the value 3 values of CalledOn and Answer into the same record
to be more clear:
Now:
+-----------+---------------+-------------+------------------+
|Customer 1 | 555-333 333 | 02-10-17 | Call Tomorrow |
+-----------+---------------+-------------+------------------+
|Customer 2 | 555-444 333 | 02-10-17 | Call Tomorrow |
+-----------+---------------+-------------+------------------+
|Customer 1 | 555-333 333 | 02-11-17 | Call Tomorrow |
+-----------+---------------+-------------+------------------+
|Customer 1 | 555-333 333 | 02-12-17 | Stop Calling |
+-----------+---------------+-------------+------------------+
Expected
+-----------+---------------+--------------------------------+
|Customer 1 | 555-333 333 | 02-12-17 : Call Stop Calling |
| | | 02-11-17 : Call Tomorrow |
| | | 02-10-17 : Call Tomorrow |
+-----------+---------------+-------------+------------------+
|Customer 2 | 555-444 333 | 02-10-17 | Call Tomorrow |
+-----------+---------------+-------------+------------------+
Currently I'm achieveing this with server-side logic, but I'm sure it can be done, easier and in a better way with TSQL
Can suggest the direction?
Thanks
For SQL-Server 2012
SELECT TOP (50) S.ID, S.URL, S.Phone, S.Email,
STUFF((SELECT CHAR(10) + concat (I.CalledOn, ' ', I.Answer)
FROM dbo.Calls AS I
WHERE S.URL = I.URL
ORDER BY I.CalledOn DESC
FOR XML PATH('')
), 1, 1, '') AS CallAnswer
FROM dbo.Customers AS S
WHERE URL is not null and City like 'Berl%'
ORDER BY S.ID
vNext:
SELECT TOP (50) S.ID, S.URL, S.Phone, S.Email,
(SELECT TOP (3) STRING_AGG(CONCAT(I.CalledOn, ' ', I.Answer), CHAR(13))
FROM dbo.Calls AS I
WHERE S.URL = I.URL
ORDER BY I.CalledOn DESC
) AS CallAnswer
FROM dbo.Customers AS S
WHERE URL is not null and City like 'Berl%'
ORDER BY S.ID
Check it here: http://rextester.com/HSIEL20631

Crosstab query in SQL that compares and adds columns

I have a table in sql server that contains three columns: "date", "noon", and "3pm." The first column is self-explanatory, but the latter two contain the names of guest speakers at a venue according to the time they arrived. I want to write a cross-tab query that writes speaker names into the column header and counts the number of times that speaker spoke on that date.
Example
Date | Noon | 3pm
092916 | Tom | <null>
092816 | Dick | Tom
092716 | <null> | Suzy
Desired Output
Date | Dick | Tom | Suzy
092916 | <null> | 1 | <null>
092816 | 1 | 1 | <null>
092716 | <null> | <null> | 1
I can do this pretty easily with a crosstab query if I only select one time and put a count into the value category, but I'm having trouble with merging multiple times so that I can get an accurate count of who spoke on what day.
you can build your query dynamically.
this will create a count(case) statement for each name found in either the noon or 3pm column.. similar to COUNT(CASE WHEN 'Dick' IN ([Noon],[3pm]) THEN 1 END) as [Dick]
DECLARE #speakers NVARCHAR(MAX),
#sql NVARCHAR(MAX)
SET #speakers = STUFF((
SELECT ',COUNT(CASE WHEN ''' + [Name] + ''' IN ([Noon],[3pm]) THEN 1 END) as ' + QUOTENAME([Name])
FROM (SELECT [Noon] AS [Name] FROM Table1
UNION ALL SELECT [3pm] FROM Table1) t
GROUP BY t.Name
FOR XML PATH('')
), 1, 1, '')
SET #sql = N'SELECT Date, ' + #speakers + ' FROM Table1 GROUP BY Date'
--Print #sql to see what's going on
EXEC(#sql)
You could use this query:
select *
from (
select date, noon as speaker, count(*) as times
from events
group by date, noon
union all
select date, [3pm], count(*)
from events
group by date, [3pm]
) as u
pivot (
sum(times)
for speaker in ([Dick], [Tom], [Suzy])
) as piv
order by date desc;
... which gives you a count per cell (null, 1 or 2):
Date | Dick | Tom | Suzy
092916 | <null> | 1 | <null>
092816 | 1 | 1 | <null>
092716 | <null> | <null> | 1

Count by unique ID, group by another

I've inherited some scripts that count the number of people in a team by department; the current scripts create a table for each individual department and the previous user would copy/paste the data into Excel. I've been tasked to pull this report into SSRS so I need one table for all the departments by team.
Current Table
+-------+-----------+---------+
| Dept | DataMatch | Team |
+-------+-----------+---------+
| 01 | 4687Joe | Dodgers |
| 01 | 3498Cindy | RedSox |
| 01 | 1057Bob | Yankees |
| 01 | 0497Lucy | Dodgers |
| 02 | 7934Jean | Yankees |
| 02 | 4584Tom | Dodgers |
+-------+-----------+---------+
Desired Results
+-------+---------+--------+---------+
| Dept | Dodgers | RedSox | Yankees |
+-------+---------+--------+---------+
| 01 | 2 | 1 | 1 |
| 02 | 1 | 0 | 1 |
+-------+---------+--------+---------+
The DataMatch field is the unique identifier I will be counting. I started by wrapping each department in a CTE however this results in the Dept as the Column which would not work for my report, so I need to transpose my results and I haven't been able to figure that out. There are 60 departments and my query was getting very long.
Current query
SELECT Dept, DataMatch, Team INTO #temp_Team
FROM TeamDatabase
WHERE Status = 14
AND Team <> 'Missing'
;WITH A_cte (Team, Dept01)
AS
(
SELECT Team
, COUNT(DISTINCT datamatch) AS 'Dept01'
FROM #temp_Team
WHERE Dept = '01'
GROUP BY Team
),
B_cte (Team, Dept02) AS
(
SELECT Team
, COUNT(DISTINCT datamatch) AS 'Dept02'
FROM #temp_Team
WHERE Dept = '02'
GROUP BY Team
)
SELECT A_cte.Team
, A_cte.Dept01
, B_cte.Dept02
FROM A_cte
INNER JOIN B_cte
ON A_cte.Team=B_cte.Team
Which results in:
+----------------------------+-------+-------+
| Team | Prg01 | Prg02 |
+----------------------------+-------+-------+
| RedSox | 144 | 141 |
| Yankees | 63 | 236 |
| Dodgers | 298 | 196 |
+----------------------------+-------+-------+
I feel that using a pivot on my already very long query would be excessive and impact performance, 60 departments with over 30,000 rows.
What, mostly likely basic, step am I missing?
TL;DR - How do I count people by team and list by department?
I would replace the whole query with a dynamic pivot instead of adding a pivot to your CTEs.
You can add your Status/Team conditions to the SELECT inside the dynamic query at the bottom. They would be WHERE STATUS=14 AND TEAM !=''MISSING'' - note that is two single quotes to nest it within the string.
IF OBJECT_ID('tempdb..#data') IS NOT NULL DROP TABLE #data
CREATE TABLE #data (Dept VARCHAR(50), DataMatch NVARCHAR(50), Team VARCHAR(50))
INSERT INTO #data (Dept, DataMatch, Team)
VALUES ('01', '4687Joe','Dodgers'),
('01', '3498Cindy','RedSox'),
('01', '1057Bob','Yankees'),
('01', '0497Lucy','Dodgers'),
('02', '7934Jean','Yankees'),
('02', '4584Tom','Dodgers')
DECLARE #cols AS NVARCHAR(MAX),
#sql AS NVARCHAR(MAX)
SET #cols = STUFF(
(SELECT N',' + QUOTENAME(y) AS [text()]
FROM (SELECT DISTINCT Team AS y FROM #data) AS Y
ORDER BY y
FOR XML PATH('')),
1, 1, N'');
SET #sql = 'SELECT Dept, '+#cols+'
FROM (SELECT Dept, DataMatch, Team
FROM #data D) SUB
PIVOT (COUNT([DataMatch]) FOR Team IN ('+#cols+')) AS P'
PRINT #SQL
EXEC (#SQL)
In case you don't want to use a dynamic pivot, here is just a stand-alone query... again, add your conditions as you need.
SELECT Dept, Dodgers, RedSox, Yankees
FROM (SELECT Dept, DataMatch, Team
FROM #data D) SUB
PIVOT (COUNT([DataMatch]) FOR Team IN ([Dodgers], [RedSox], [Yankees])) AS P
I'm not sure I follow what relevance your existing query has, but to get from your current table to your desired results is a pretty straightforward usage of PIVOT:
SELECT *
FROM Table1
PIVOT(COUNT(DataMatch) FOR Team IN (Dodgers,RedSox,Yankees))pvt
Demo: SQL Fiddle
And this of course could be done dynamically if the teams list isn't static.

SQL Server Pivot for counting instances in join table

I have 3 tables; category, location and business.
The category and location tables simply have an id, and a name.
Each business record has a categoryID, and a locationID, and a name field.
I'd like to construct a table that shows as a matrix, the number of businesses in each location and category combination. So having the categories as columns and locations as rows, with the counts in as the cell data.
Having a totals column and row would also be amazing.
I know I should be able to do this with pivot tables but I'm unable to get my head around the syntax for the pivots.
Any help would be much appreciated.
Thanks,
Nick
Edit: Here is a JS fiddle of my tables; http://sqlfiddle.com/#!2/4d6d2/1
Desired output:
| Activities | Bars | Sweet shops | Total
Chester | 1 | 0 | 0 | 1
Frodsham | 0 | 2 | 0 | 2
Stockport | 1 | 0 | 1 | 2
Total | 2 | 2 | 1 | 5
To get the final result that you want you can use the PIVOT function. I would first start with a subquery that returns all of your data plus gives you a total of each activity per location:
select l.name location,
c.name category,
count(b.locationid) over(partition by b.locationid) total
from location l
left join business b
on l.id = b.locationid
left join category c
on b.categoryid = c.id;
See SQL Fiddle with Demo. Using the windowing function count() over() creates the total number of activities for each location. Once you have this, then you can pivot the data to convert your categories to columns:
select
isnull(location, 'Total') Location,
sum([Activities]) Activities,
sum([Bars]) bars,
sum([Sweet Shops]) SweetShops,
sum(tot) total
from
(
select l.name location,
c.name category,
count(b.locationid) over(partition by b.locationid) tot
from location l
left join business b
on l.id = b.locationid
left join category c
on b.categoryid = c.id
) d
pivot
(
count(category)
for category in ([Activities], [Bars], [Sweet Shops])
) piv
group by grouping sets(location, ());
See SQL Fiddle with Demo. I also implemented GROUPING SETS() to create the final row with the totals for each activity.
The above works great if you have a limited number of activities but if your activities will be unknown, then you will want to use dynamic SQL:
DECLARE
#cols AS NVARCHAR(MAX),
#colsgroup AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT ',' + QUOTENAME(name)
from dbo.category
group by id, name
order by id
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
select #colsgroup = STUFF((SELECT ', sum(' + QUOTENAME(name)+ ') as '+ QUOTENAME(name)
from dbo.category
group by id, name
order by id
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = N'SELECT
Isnull(location, ''Total'') Location, '+ #colsgroup + ', sum(Total) as Total
from
(
select l.name location,
c.name category,
count(b.locationid) over(partition by b.locationid) total
from location l
left join business b
on l.id = b.locationid
left join category c
on b.categoryid = c.id
) x
pivot
(
count(category)
for category in ('+#cols+')
) p
group by grouping sets(location, ());'
exec sp_executesql #query;
See SQL Fiddle with Demo. Both versions give the result:
| LOCATION | ACTIVITIES | BARS | SWEET SHOPS | TOTAL |
|-----------|------------|------|-------------|-------|
| Chester | 1 | 0 | 0 | 1 |
| Frodsham | 0 | 1 | 0 | 1 |
| Stockport | 1 | 0 | 1 | 2 |
| Total | 2 | 1 | 1 | 4 |
`SELECT b.businessName, count(l.locationId),count(b.categoryId)
FROM businesses b JOIN locations l ON b.locationId=l.locationId
JOIN categories c ON b.categoryId=c.categoryId GROUP BY b.businessName;`

SQL: Putting an individuals distinct diagnosis into one horizontal row

I'm using Microsoft SQL Server 2008 for a mental health organization.
I have a table that lists all of out clients and their diagnoses, but each diagnoses that a client has is in a new row. I want them all to be in a single row listed out horizontally with the date for each diagnosis. Some people have just one diagnosis, some have 20, some have none.
Here's an example of how my data sort of looks now (only with a lot few clients, we have thousands):
And Here's the format I'd like it to end up:
Any solutions you could offer or hints in the right direction would be great, thanks!
In order to get the result, I would first unpivot and then pivot your data. The unpivot will take your date and diagnosis columns and convert them into rows. Once the data is in rows, then you can apply the pivot.
If you have a known number of values, then you can hard-code your query similar to this:
select *
from
(
select person, [case#], age,
col+'_'+cast(rn as varchar(10)) col,
value
from
(
select person,
[case#],
age,
diagnosis,
convert(varchar(10), diagnosisdate, 101) diagnosisDate,
row_number() over(partition by person, [case#]
order by DiagnosisDate) rn
from yourtable
) d
cross apply
(
values ('diagnosis', diagnosis), ('diagnosisDate', diagnosisDate)
) c (col, value)
) t
pivot
(
max(value)
for col in (diagnosis_1, diagnosisDate_1,
diagnosis_2, diagnosisDate_2,
diagnosis_3, diagnosisDate_3,
diagnosis_4, diagnosisDate_4)
) piv;
See SQL Fiddle with Demo.
I am going to assume that you will have an unknown number of diagnosis values for each case. If that is the case, then you will need to use dynamic sql to generate the result:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT ',' + QUOTENAME(col+'_'+cast(rn as varchar(10)))
from
(
select row_number() over(partition by person, [case#]
order by DiagnosisDate) rn
from yourtable
) t
cross join
(
select 'Diagnosis' col union all
select 'DiagnosisDate'
) c
group by col, rn
order by rn, col
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT person,
[case#],
age,' + #cols + '
from
(
select person, [case#], age,
col+''_''+cast(rn as varchar(10)) col,
value
from
(
select person,
[case#],
age,
diagnosis,
convert(varchar(10), diagnosisdate, 101) diagnosisDate,
row_number() over(partition by person, [case#]
order by DiagnosisDate) rn
from yourtable
) d
cross apply
(
values (''diagnosis'', diagnosis), (''diagnosisDate'', diagnosisDate)
) c (col, value)
) t
pivot
(
max(value)
for col in (' + #cols + ')
) p '
execute(#query);
See SQL Fiddle with Demo. Both queries give the result:
| PERSON | CASE# | AGE | DIAGNOSIS_1 | DIAGNOSISDATE_1 | DIAGNOSIS_2 | DIAGNOSISDATE_2 | DIAGNOSIS_3 | DIAGNOSISDATE_3 | DIAGNOSIS_4 | DIAGNOSISDATE_4 |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| John | 13784 | 56 | Depression | 03/13/2012 | Brain Injury | 03/14/2012 | Spinal Cord Injury | 03/15/2012 | Hypertension | 03/16/2012 |
| Kate | 2643 | 37 | Bipolar | 03/11/2012 | Hypertension | 03/12/2012 | (null) | (null) | (null) | (null) |
| Kevin | 500934 | 25 | Down Syndrome | 03/18/2012 | Clinical Obesity | 03/19/2012 | (null) | (null) | (null) | (null) |
| Pete | 803342 | 34 | Schizophenia | 03/17/2012 | (null) | (null) | (null) | (null) | (null) | (null) |
For this type of pivoting, I think the aggregate/group method is feasible:
select d.case, d.person,
max(case when seqnum = 1 then diagnosis end) as d1,
max(case when seqnum = 1 then diagnosisdate end) as d1date,
max(case when seqnum = 2 then diagnosis end) as d2,
max(case when seqnum = 2 then diagnosisdate end) as d2date,
. . . -- and so on, for as many groups that you want
from (select d.*, row_number() over (partition by case order by diagnosisdate) as seqnum
from diagnoses d
) d
group by d.case, d.person
Since you are dealing with sensitive medical information, identifyiable information (name age etc) shouldn't be stored in the same table as the medical information. Also, if you extract out the person info into its own table and a Diagnosis table that has the personID foreign key you can establish the 1 to many relationship you want.
Unless you use Dynamic SQL, the PIVOT operator will not work here. I assume that patients can come in on any date. The PIVOT operator works with a finite and predefined number of columns. Your options are to use Dynamic SQL to create the PIVOT table, or to use Excel or a reporting tool like SSRS to do a Pivot report.
I think the Dynamic SQL option would not be practical here, since, you could end up having hundreds of columns for each of the patient visit dates.
If you want to explore the Dynamic SQL option anyway, have a look here:
https://www.simple-talk.com/blogs/2007/09/14/pivots-with-dynamic-columns-in-sql-server-2005/