Crosstab query in SQL that compares and adds columns - sql

I have a table in sql server that contains three columns: "date", "noon", and "3pm." The first column is self-explanatory, but the latter two contain the names of guest speakers at a venue according to the time they arrived. I want to write a cross-tab query that writes speaker names into the column header and counts the number of times that speaker spoke on that date.
Example
Date | Noon | 3pm
092916 | Tom | <null>
092816 | Dick | Tom
092716 | <null> | Suzy
Desired Output
Date | Dick | Tom | Suzy
092916 | <null> | 1 | <null>
092816 | 1 | 1 | <null>
092716 | <null> | <null> | 1
I can do this pretty easily with a crosstab query if I only select one time and put a count into the value category, but I'm having trouble with merging multiple times so that I can get an accurate count of who spoke on what day.

you can build your query dynamically.
this will create a count(case) statement for each name found in either the noon or 3pm column.. similar to COUNT(CASE WHEN 'Dick' IN ([Noon],[3pm]) THEN 1 END) as [Dick]
DECLARE #speakers NVARCHAR(MAX),
#sql NVARCHAR(MAX)
SET #speakers = STUFF((
SELECT ',COUNT(CASE WHEN ''' + [Name] + ''' IN ([Noon],[3pm]) THEN 1 END) as ' + QUOTENAME([Name])
FROM (SELECT [Noon] AS [Name] FROM Table1
UNION ALL SELECT [3pm] FROM Table1) t
GROUP BY t.Name
FOR XML PATH('')
), 1, 1, '')
SET #sql = N'SELECT Date, ' + #speakers + ' FROM Table1 GROUP BY Date'
--Print #sql to see what's going on
EXEC(#sql)

You could use this query:
select *
from (
select date, noon as speaker, count(*) as times
from events
group by date, noon
union all
select date, [3pm], count(*)
from events
group by date, [3pm]
) as u
pivot (
sum(times)
for speaker in ([Dick], [Tom], [Suzy])
) as piv
order by date desc;
... which gives you a count per cell (null, 1 or 2):
Date | Dick | Tom | Suzy
092916 | <null> | 1 | <null>
092816 | 1 | 1 | <null>
092716 | <null> | <null> | 1

Related

SQL count and group then pivot

So, I have been having this problem and I guess I am just too overloaded to figure it out. I have a database that I need to count from. That's all good. But where I run into a problem is i need to store it as only 2 rows, one for all the dates and one for the count. Here is an example:
obj_name | date_made
--------------------
1 | 2016-3-04
2 | 2016-5-23
3 | 2016-5-23
4 | 2016-5-23
5 | 2016-6-07
6 | 2016-6-07
7 | 2016-6-07
8 | 2016-6-07
9 | 2016-9-12
10 | 2016-9-12
What I want is to count how many objects are created on a certain date, then return it as 2 rows - one with all the dates then one with all the counts
Row1 | 2016-3-04 | 2016-5-23 | 2016-6-07 | 2016-9-12
Row2 | 1 | 3 | 4 | 2
If anyone can help that would be much appreciated.
here is what I have so far, I can get all the info I need but as 2 columns and I need it as 2 rows
SELECT datem,
SUM(num) AS total_num
FROM (
SELECT date_made AS datem,
obj_name,
COUNT(1) AS num
FROM db.tn
GROUP BY 1,2
) sub
GROUP BY 1
ORDER BY 1 DESC
You can try a dynamic pivot query like below
DECLARE #cols AS NVARCHAR(MAX),#query AS NVARCHAR(MAX)
SELECT #cols = STUFF((SELECT ',' + QUOTENAME(date_made)
FROM tbl
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
SELECT #query =
'SELECT * FROM '+
'(SELECT COUNT(1) count, date_made FROM tbl ) src '+
' pivot '+
'( max(count) for date_made in ('+#cols+'))p'
EXEC(#query)
If I read this correctly, you are going to end up with an aggregate table that looks like this:
date_made | count
----------|------
2016-3-04 | 1
2016-5-23 | 3
2016-6-07 | 4
2016-9-12 | 2
And then you want to pivot the table on its side to look like the output in your initial question. Therefore, I think this is a repeat of this question:
Simple way to transpose columns and rows in Sql?

How to combine two records?

I have a table that looks like this
ID | Value | Type
-----------------------
1 | 50 | Travel
1 | 25 | Non-Travel
1 | 25 | Non-Travel
1 | 25 | Non-Travel
1 | 50 | Travel
1 | 75 | Non-Travel
How can I query this to make the output rearrange to this?
ID | Travel | Non-Travel
------------------------
1 | 100 | 150
The query to actually get the first table I posted has many joins and a BIT column in one of the tables where 0 or NULL is non-travel and 1 is travel. So I have something like this:
SELECT
[ID]
,CASE WHEN [IsTravel] IN (0,NULL) THEN ISNULL(SUM([VALUE]),0) END AS 'NonTravel'
,CASE WHEN [IsTravel] = 1 THEN ISNULL(SUM([VALUE]),0) END AS 'Travel'
FROM
...
However the result ends up showing this
ID | Travel | Non-Travel
------------------------
1 | 100 | NULL
1 | NULL | 150
How can I edit my query to combine the rows to show this result?
ID | Travel | Non-Travel
------------------------
1 | 100 | 150
Thanks in advance.
select ID,
SUM(CASE WHEN Type = 'Travel' THEn value ELSE 0 END) [Travel],
SUM(CASE WHEN Type = 'NonTravel' THEn value ELSE 0 END) [NonTravel]
from #Table1
GROUP BY ID
You need to wrap each of your conditionals in aggregations such as MAX(), and GROUP BY other columns to roll up the values and remove the NULL. Something like this:
SELECT
[ID]
,MAX(CASE WHEN [IsTravel] IN (0,NULL) THEN ISNULL(SUM([VALUE]),0) END) AS 'NonTravel'
,MAX(CASE WHEN [IsTravel] = 1 THEN ISNULL(SUM([VALUE]),0) END) AS 'Travel'
FROM
...
GROUP BY [ID]
If the logic gets too cluttered or confusing (don't know without seeing your whole current query) then drop those results into a temp table or CTE and do the simple MAX() and GROUP BY from there.
You can use pivot as below:
Select * from (
Select Id, [Value], [Type] from yourtable ) a
pivot (sum([Value]) for [Type] in ([Travel],[Non-Travel]) ) p
Output as below:
+----+------------+--------+
| Id | Non-Travel | Travel |
+----+------------+--------+
| 1 | 150 | 100 |
+----+------------+--------+
For dynamic list of Travel types you can do dynamic query as below:
Declare #cols1 varchar(max)
Declare #query nvarchar(max)
Select #cols1 = stuff((select Distinct ','+QuoteName([Type]) from #traveldata for xml path('')),1,1,'')
Set #query = ' Select * from (
Select Id, [Value], [Type] from #traveldata ) a
pivot (sum([Value]) for [Type] in (' + #cols1 + ') ) p '
Exec sp_executesql #query

Count by unique ID, group by another

I've inherited some scripts that count the number of people in a team by department; the current scripts create a table for each individual department and the previous user would copy/paste the data into Excel. I've been tasked to pull this report into SSRS so I need one table for all the departments by team.
Current Table
+-------+-----------+---------+
| Dept | DataMatch | Team |
+-------+-----------+---------+
| 01 | 4687Joe | Dodgers |
| 01 | 3498Cindy | RedSox |
| 01 | 1057Bob | Yankees |
| 01 | 0497Lucy | Dodgers |
| 02 | 7934Jean | Yankees |
| 02 | 4584Tom | Dodgers |
+-------+-----------+---------+
Desired Results
+-------+---------+--------+---------+
| Dept | Dodgers | RedSox | Yankees |
+-------+---------+--------+---------+
| 01 | 2 | 1 | 1 |
| 02 | 1 | 0 | 1 |
+-------+---------+--------+---------+
The DataMatch field is the unique identifier I will be counting. I started by wrapping each department in a CTE however this results in the Dept as the Column which would not work for my report, so I need to transpose my results and I haven't been able to figure that out. There are 60 departments and my query was getting very long.
Current query
SELECT Dept, DataMatch, Team INTO #temp_Team
FROM TeamDatabase
WHERE Status = 14
AND Team <> 'Missing'
;WITH A_cte (Team, Dept01)
AS
(
SELECT Team
, COUNT(DISTINCT datamatch) AS 'Dept01'
FROM #temp_Team
WHERE Dept = '01'
GROUP BY Team
),
B_cte (Team, Dept02) AS
(
SELECT Team
, COUNT(DISTINCT datamatch) AS 'Dept02'
FROM #temp_Team
WHERE Dept = '02'
GROUP BY Team
)
SELECT A_cte.Team
, A_cte.Dept01
, B_cte.Dept02
FROM A_cte
INNER JOIN B_cte
ON A_cte.Team=B_cte.Team
Which results in:
+----------------------------+-------+-------+
| Team | Prg01 | Prg02 |
+----------------------------+-------+-------+
| RedSox | 144 | 141 |
| Yankees | 63 | 236 |
| Dodgers | 298 | 196 |
+----------------------------+-------+-------+
I feel that using a pivot on my already very long query would be excessive and impact performance, 60 departments with over 30,000 rows.
What, mostly likely basic, step am I missing?
TL;DR - How do I count people by team and list by department?
I would replace the whole query with a dynamic pivot instead of adding a pivot to your CTEs.
You can add your Status/Team conditions to the SELECT inside the dynamic query at the bottom. They would be WHERE STATUS=14 AND TEAM !=''MISSING'' - note that is two single quotes to nest it within the string.
IF OBJECT_ID('tempdb..#data') IS NOT NULL DROP TABLE #data
CREATE TABLE #data (Dept VARCHAR(50), DataMatch NVARCHAR(50), Team VARCHAR(50))
INSERT INTO #data (Dept, DataMatch, Team)
VALUES ('01', '4687Joe','Dodgers'),
('01', '3498Cindy','RedSox'),
('01', '1057Bob','Yankees'),
('01', '0497Lucy','Dodgers'),
('02', '7934Jean','Yankees'),
('02', '4584Tom','Dodgers')
DECLARE #cols AS NVARCHAR(MAX),
#sql AS NVARCHAR(MAX)
SET #cols = STUFF(
(SELECT N',' + QUOTENAME(y) AS [text()]
FROM (SELECT DISTINCT Team AS y FROM #data) AS Y
ORDER BY y
FOR XML PATH('')),
1, 1, N'');
SET #sql = 'SELECT Dept, '+#cols+'
FROM (SELECT Dept, DataMatch, Team
FROM #data D) SUB
PIVOT (COUNT([DataMatch]) FOR Team IN ('+#cols+')) AS P'
PRINT #SQL
EXEC (#SQL)
In case you don't want to use a dynamic pivot, here is just a stand-alone query... again, add your conditions as you need.
SELECT Dept, Dodgers, RedSox, Yankees
FROM (SELECT Dept, DataMatch, Team
FROM #data D) SUB
PIVOT (COUNT([DataMatch]) FOR Team IN ([Dodgers], [RedSox], [Yankees])) AS P
I'm not sure I follow what relevance your existing query has, but to get from your current table to your desired results is a pretty straightforward usage of PIVOT:
SELECT *
FROM Table1
PIVOT(COUNT(DataMatch) FOR Team IN (Dodgers,RedSox,Yankees))pvt
Demo: SQL Fiddle
And this of course could be done dynamically if the teams list isn't static.

Formatting SQL Output (Pivot)

This is running on SQL Server 2008.
Anyway, I have sales data, and I can write a query to get the output to look like this:
id | Name | Period | Sales
1 | Customer X | 2013/01 | 50
1 | Customer X | 2013/02 | 45
etc. Currently, after running this data, I am rearranging the data in the code behind so that the final output looks like this:
id | Name | 2013/01 | 2013/02
1 | Customer X | 50 | 40
The issues are:
The date (YYYY/MM) range is an input from the user.
If the user selects more outputs (like, say, address, and a ton of other possible fields relating to that customer), that information is duplicated in every line. When you're doing 10-15 items per line, over a period of 5+ years, for 50000+ users, this causes problems with running out of memory, and is also inefficient.
I've considered pulling only the necessary data (the customer id -- how they're joined together, the period, and the sales figure), and then after the fact running a separate query to get the additional data. This doesn't seem like it would be efficient though, but it's a possibility.
The other, which is what I'm thinking should be the best option, would be to rewrite my query to go ahead and do what my current code behind is doing, and pivot the data together, that way the customer data is never duplicated and I'm not moving a lot of unnecessary data around.
To give a better example of what I'm working with, let's assume these tables:
Address
id | HouseNum | Street | Unit | City | State
Customer
id | Name |
Sales
id | Period | Sales
So I would like to join these tables on the customer id, display all of the address data, assume the user inputs "2012/01 -- 2012/12", I can translate that into 2012/01, 2012/02 ... 2012/12 in my code behind to input into the query before it executes, so I have that available.
What I want it to look like would be:
id | Name | HouseNum | Street | City | State | 2012/01 | 2012/02 | ... | 2012/12
1 | X | 100 | Main St. | ABC | DEF | 30 | | ... | 20
(no sales data for that customer on 2012/02 -- if any of the data is blank I want it to be a blank string "", not a NULL)
I realize I may not be explaining this the best way possible, so just let me know and I'll add more information. Thank you!
edit: oh, one last thing. Would it be possible to add a Min, Max, Avg, & Total columns to the end, which sum up all of the pivoted data? It wouldn't be a big deal to do it on the code behind, but the more sql server can do for me the better, imo!
edit: One more, the period is in the tables as "2013/01" etc, but I'd like to rename them to "Jan 2013" etc, if it's not too complicated?
You can implement the PIVOT function to transform the data from rows into columns. You can use the following to get the result:
select id,
name,
HouseNum,
Street,
City,
State,
isnull([2013/01], 0) [2013/01],
isnull([2013/02], 0) [2013/02],
isnull([2012/02], 0) [2012/02],
isnull([2012/12], 0) [2012/12],
MinSales,
MaxSales,
AvgSales,
TotalSales
from
(
select c.id,
c.name,
a.HouseNum,
a.Street,
a.city,
a.state,
s.period,
s.sales,
min(s.sales) over(partition by c.id) MinSales,
max(s.sales) over(partition by c.id) MaxSales,
avg(s.sales) over(partition by c.id) AvgSales,
sum(s.sales) over(partition by c.id) TotalSales
from customer c
inner join address a
on c.id = a.id
inner join sales s
on c.id = s.id
) src
pivot
(
sum(sales)
for period in ([2013/01], [2013/02], [2012/02], [2012/12])
) piv;
See SQL Fiddle with Demo.
If you have a unknown number of period values that you want to transform into column, then you will have to use dynamic SQL to get the result:
DECLARE #cols AS NVARCHAR(MAX),
#colsNull AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT distinct ',' + QUOTENAME(period)
from Sales
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
select #colsNull = STUFF((SELECT distinct ', IsNull(' + QUOTENAME(period) + ', 0) as '+ QUOTENAME(period)
from Sales
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT id,
name,
HouseNum,
Street,
City,
State,' + #colsNull + ' ,
MinSales,
MaxSales,
AvgSales,
TotalSales
from
(
select c.id,
c.name,
a.HouseNum,
a.Street,
a.city,
a.state,
s.period,
s.sales,
min(s.sales) over(partition by c.id) MinSales,
max(s.sales) over(partition by c.id) MaxSales,
avg(s.sales) over(partition by c.id) AvgSales,
sum(s.sales) over(partition by c.id) TotalSales
from customer c
inner join address a
on c.id = a.id
inner join sales s
on c.id = s.id
) x
pivot
(
sum(sales)
for period in (' + #cols + ')
) p '
execute(#query)
See SQL Fiddle with Demo. These give the result:
| ID | NAME | HOUSENUM | STREET | CITY | STATE | 2012/02 | 2012/12 | 2013/01 | 2013/02 | MINSALES | MAXSALES | AVGSALES | TOTALSALES |
---------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | Customer X | 100 | Maint St. | ABC | DEF | 0 | 20 | 50 | 45 | 20 | 50 | 38 | 115 |
| 2 | Customer Y | 108 | Lost Rd | Unknown | Island | 10 | 0 | 0 | 0 | 10 | 10 | 10 | 10 |

SQL - how to create a dynamic matrix showing attribution values per item over time (where number of attributes varies per date)

I have:
items which are described by a set of ids (GroupType, ID, Name)
VALUES table which gets populated with factor values on each date so that an item gets only a certain set of factors with values per date.
FACTORS table containing static descriptions of the factors.
Looking for:
I want to create a temporary table with a matrix showing factor values for each item per date so that one could see in user friendly way which Factors were populated on a given date (with corresponding values).
Values
Date GroupType ID Name FactorId Value
01/01/2013 1 1 A 1 10
01/01/2013 1 1 A 2 8
01/01/2013 1 1 A 3 12
01/01/2013 1 2 B 3 5
01/01/2013 1 2 B 4 6
02/01/2013 1 1 A 1 7
02/01/2013 1 1 A 2 6
02/01/2013 1 2 B 3 9
02/01/2013 1 2 B 4 9
Factors
FactorId FactorName
1 Factor1
2 Factor2
3 Factor3
4 Factor4
. .
. .
. .
temporary table Factor Values Matrix
Date Group ID Name Factor1 Factor2 Factor3 Factor4 Factor...
01/01/2013 1 1 A 10 8 12
01/01/2013 1 2 B 5 6
02/01/2013 1 1 A 7 6
02/01/2013 1 2 B 9 9
Any help is greatly appreciated!
This type of data transformation is known as a PIVOT which takes values from rows and converts it into columns.
In SQL Server 2005+, there is a function that will perform this rotation of data.
Static Pivot:
If your values will be set then you can hard-code the FactorNames into the columns by using a static pivot.
select date, grouptype, id, name, Factor1, Factor2, Factor3, Factor4
from
(
select v.date,
v.grouptype,
v.id,
v.name,
f.factorname,
v.value
from [values] v
left join factors f
on v.factorid = f.factorid
-- where v.date between date1 and date2
) src
pivot
(
max(value)
for factorname in (Factor1, Factor2, Factor3, Factor4)
) piv;
See SQL Fiddle with Demo.
Dynamic Pivot:
In your case, you stated that you are going to have an unknown number of values. If so, then you will need to use dynamic SQL to generate a SQL string that will be executed at run-time:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT distinct ',' + QUOTENAME(FactorName)
from factors
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT date, grouptype, id, name,' + #cols + ' from
(
select v.date,
v.grouptype,
v.id,
v.name,
f.factorname,
v.value
from [values] v
left join factors f
on v.factorid = f.factorid
-- where v.date between date1 and date2
) x
pivot
(
max(value)
for factorname in (' + #cols + ')
) p '
execute(#query)
See SQL Fiddle with Demo.
Both of these versions generate the same result:
| DATE | GROUPTYPE | ID | NAME | FACTOR1 | FACTOR2 | FACTOR3 | FACTOR4 |
------------------------------------------------------------------------------
| 2013-01-01 | 1 | 1 | A | 10 | 8 | 12 | (null) |
| 2013-01-01 | 1 | 2 | B | (null) | (null) | 5 | 6 |
| 2013-02-01 | 1 | 1 | A | 7 | 6 | 11 | (null) |
| 2013-02-01 | 1 | 1 | B | (null) | (null) | 9 | 9 |
If you want to filter the results based on a date range, then you will just need to add a WHERE clause to the above queries.
It looks like you are simply trying to pivot the rows into columns. I think this does what you want:
select Date, Group, ID, Name,
max(case when factorid = 1 then name end) as Factor1,
max(case when factorid = 2 then name end) as Factor2,
max(case when factorid = 3 then name end) as Factor3,
max(case when factorid = 4 then name end) as Factor4
from t
group by Date, Group, ID, Name