SQL Pivot query with multiple column grouping - sql

I have a table data as follows
day group category appcount
-----------------------------------------
Fri F27-28 music 4
Fri F27-28 radio 1
Fri F27-28 show 1
Fri F27-28 video 8
Fri F29-32 music 6
Fri F29-32 radio 2
Fri F29-32 video 22
Fri M22- music 1
Fri M22- video 2
Fri M23-26 music 4
Fri M23-26 video 8
Now, i would like to have a result by pivoting Category and Day, as follows.
Age Group music-Fri music-Mon music-Sun music-Tue music-Sat music-Thu music-Wed radio-Fri radio-Mon radio-Sun radio-Tues radio-Sat radio-Thu radio-Wed
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
F27-28 4 16 5 11 17 13 9 1 1 3 2 8 2
F29-32 6 2 6 4 4 4 2 5 2 3 2
F33-42 2 2 3 1 1 3
M22- 1 15 14 10 4 4
M23-26 4 7 5 2 12 14 7
I have tried few queries but could not achieve grouping 2 columns, day and category. Sorry for data format issue. Please help.
Here is my query
SELECT *
FROM (
SELECT
[group],
[day],
category,
appcount as counts
FROM monthly_age_apps
) as s
PIVOT
(
SUM(counts)
FOR [category] IN (video, radio, show, music)
)AS pvt
Thank you, Sathish

I find it simpler to just use conditional aggregation for many pivot queries:
select agegroup,
sum(case when category = 'music' and day = 'Fri' then appcount else 0 end) as music_fri,
sum(case when category = 'music' and day = 'Mon' then appcount else 0 end) as music_mon,
. . .
from
group by agegroup;

For ANSI-Sql you can use conditional aggregation :
SELECT t.group ,
MAX(CASE WHEN t.day = 'Sun' AND t.category = 'music' AND THEN t.appcount) as Sun-Music,
MAX(CASE WHEN t.day = 'Sun' AND t.category = 'radio' AND THEN t.appcount) as Sun-Radio,
MAX(CASE WHEN t.day = 'Mon' AND t.category = 'music' AND THEN t.appcount) as Mon-Music,
MAX(CASE WHEN t.day = 'Mon' AND t.category = 'radio' AND THEN t.appcount) as Mon-Radio,
........
FROM YourTable t
GROUP BY t.group

Related

How do I transpose a result set and group by week?

I have a view based on query:
SELECT CONVERT(VARCHAR(10), date, 103) AS date,
eventid, name, time, pts
FROM results
WHERE DATEPART(yy, date) = 2019;
This provides a data set such as this:
Date EventID Name Time Points
24/04/2019 10538 Fred Flintstone 22:27 10
24/04/2019 10538 Barney Rubble 22:50 9
24/04/2019 10538 Micky Mouse 23:17 8
24/04/2019 10538 Yogi Bear 23:54 7
24/04/2019 10538 Donald Duck 24:07 6
01/05/2019 10541 Barney Rubble 21:58 10
01/05/2019 10541 Fred Flintstone 22:00 9
01/05/2019 10541 Donald Duck 23:39 8
01/05/2019 10541 Yogi Bear 23:43 7
12/06/2019 10569 Fred Flintstone 22:06 10
12/06/2019 10569 Barney Rubble 22:22 9
12/06/2019 10569 Micky Mouse 23:05 8
12/06/2019 10569 Donald Duck 23:55 7
I need an output row for each name listing the pts per round and a total in the form:
Name 24/04/2019 01/05/2019 12/06/2019 total
Fred Flintstone 10 9 10 29
Barney Rubble 9 10 9 28
Yogi Bear 7 7 7 21
Micky Mouse 8 8 16
Donald Duck 6 8 14
There could be up to 16 non-consecutive event dates for the year.
Nothing wrong with PIVOT but, for me, the easiest and most performant way to do this would be to perform a Cross Tab. The syntax is less verbose, more portable, and easier to understand.
First for some DDL and easily consumable sample data. <<< Learn how to do this it will get you better answers more quickly.
SET NOCOUNT ON;
SET DATEFORMAT dmy; -- I need this because I'm American
-- DDL and easily consumable sample data
DECLARE #Results TABLE
(
[Date] DATE,
EventId INT,
[Name] VARCHAR(40), -- if indexed, go as narrow as possible
[Time] TIME,
Points INT,
INDEX uq_poc_results CLUSTERED([Name],[EventId]) -- a covering index is vital for a query like this
); -- note: ^^^ this bad clustered index candidate, I went this route for simplicity
INSERT #Results VALUES
('4/04/2019', 10538, 'Fred Flintstone', '22:27',10),
('24/04/2019',10538, 'Barney Rubble', '22:50',9),
('24/04/2019',10538, 'Micky Mouse ', '23:17',8),
('24/04/2019',10538, 'Yogi Bear', '23:54',7),
('24/04/2019',10538, 'Donald Duck', '2307',6),
('01/05/2019',10541, 'Barney Rubble', '21:58',10),
('01/05/2019',10541, 'Fred Flintstone', '22:00',9),
('01/05/2019',10541, 'Donald Duck', '23:39',8),
('01/05/2019',10541, 'Yogi Bear', '23:43',7),
('12/06/2019',10569, 'Fred Flintstone', '22:06',10),
('12/06/2019',10569, 'Barney Rubble', '22:22',9),
('12/06/2019',10569, 'Micky Mouse', '23:05',8),
('12/06/2019',10569, 'Donald Duck', '23:55',7);
Note that I created a clustered index on (Name,EventId) - I would use a non-clustered index that covered the columns you need in the real world. If you have a lot of rows then you will want that index.
Basic Cross-Tab
SELECT [Name] = r.[Name],
[24/04/2019] = MAX(CASE r.[Date] WHEN '24/04/2019' THEN r.Points ELSE 0 END),
[01/05/2019] = MAX(CASE r.[Date] WHEN '01/05/2019' THEN r.Points ELSE 0 END),
[12/06/2019] = MAX(CASE r.[Date] WHEN '12/06/2019' THEN r.Points ELSE 0 END)
FROM #Results AS r
GROUP BY r.[Name];
Results:
Name 24/04/2019 01/05/2019 12/06/2019
-------------------- ------------ ------------ ------------
Barney Rubble 9 10 9
Donald Duck 6 8 7
Fred Flintstone 0 9 10
Micky Mouse 8 0 8
Yogi Bear 7 7 0
To get the total we can wrap this in logic in a subquery and add the columns like this:
SELECT
[Name] = piv.N,
[24/04/2019] = piv.D1,
[01/05/2019] = piv.D2,
[12/06/2019] = piv.D3,
Total = piv.D1+piv.D2+piv.D3
FROM
(
SELECT r.[Name],
MAX(CASE r.[Date] WHEN '24/04/2019' THEN r.Points ELSE 0 END),
MAX(CASE r.[Date] WHEN '01/05/2019' THEN r.Points ELSE 0 END),
MAX(CASE r.[Date] WHEN '12/06/2019' THEN r.Points ELSE 0 END)
FROM #Results AS r
GROUP BY r.[Name]
) AS piv(N,D1,D2,D3);
Returns:
Name 24/04/2019 01/05/2019 12/06/2019 Total
------------------- ----------- ----------- ----------- -------
Barney Rubble 9 10 9 28
Donald Duck 6 8 7 21
Fred Flintstone 0 9 10 19
Micky Mouse 8 0 8 16
Yogi Bear 7 7 0 14
Not only does this get you what you need with very little SQL, you benefit from pre-aggregation inside the subquery. A huge benefit of this approach over PIVOT is how you can do multiple aggregations in one query. Below are two examples of how to use this approach for multiple aggregations; this first using a standard GROUP BY twice, the other using window aggregate functions (.. OVER (partition by, order by..):
--==== Traditional Approach
SELECT
[Name] = piv.N,
[24/04/2019] = MAX(piv.D1),
[01/05/2019] = MAX(piv.D2),
[12/06/2019] = MAX(piv.D3),
Total = MAX(f.Ttl),
Avg1 = AVG(piv.D1), -- 1st date (24/04/2019)
Avg2 = AVG(piv.D2), -- 2nd date...
Avg3 = AVG(piv.D3), -- 3rd date...
TotalAvg = AVG(f.Ttl) ,
Mn = MIN(f.Ttl) ,
Mx = MAX(f.Ttl)
FROM
(
SELECT r.[Name],
MAX(CASE r.[Date] WHEN '24/04/2019' THEN r.Points ELSE 0 END),
MAX(CASE r.[Date] WHEN '01/05/2019' THEN r.Points ELSE 0 END),
MAX(CASE r.[Date] WHEN '12/06/2019' THEN r.Points ELSE 0 END)
FROM #Results AS r
GROUP BY r.[Name]
) AS piv(N,D1,D2,D3)
CROSS APPLY (VALUES(piv.D1+piv.D2+piv.D3)) AS f(Ttl)
GROUP BY piv.N;
--==== Leveraging Window Aggregates
SELECT
[Name] = piv.N,
[24/04/2019] = piv.D1,
[01/05/2019] = piv.D2,
[12/06/2019] = piv.D3,
Total = f.Ttl,
Avg1 = AVG(piv.D1) OVER(PARTITION BY piv.N ORDER BY (SELECT NULL)), -- 1st date (24/04/2019)
Avg2 = AVG(piv.D2) OVER(PARTITION BY piv.N ORDER BY (SELECT NULL)), -- 2nd date...
Avg3 = AVG(piv.D3) OVER(PARTITION BY piv.N ORDER BY (SELECT NULL)), -- 3rd date...
TotalAvg = AVG(f.Ttl) OVER(PARTITION BY piv.N ORDER BY (SELECT NULL)),
Mn = MIN(f.Ttl) OVER(PARTITION BY piv.N ORDER BY (SELECT NULL)),
Mx = MAX(f.Ttl) OVER(PARTITION BY piv.N ORDER BY (SELECT NULL))
FROM
(
SELECT r.[Name],
MAX(CASE r.[Date] WHEN '24/04/2019' THEN r.Points ELSE 0 END),
MAX(CASE r.[Date] WHEN '01/05/2019' THEN r.Points ELSE 0 END),
MAX(CASE r.[Date] WHEN '12/06/2019' THEN r.Points ELSE 0 END)
FROM #Results AS r
GROUP BY r.[Name]
) AS piv(N,D1,D2,D3)
CROSS APPLY (VALUES(piv.D1+piv.D2+piv.D3)) AS f(Ttl);
Both Return:
Name 24/04/2019 01/05/2019 12/06/2019 Total Avg1 Avg2 Avg3 TotalAvg Mn Mx
----------------- ----------- ----------- ----------- ------ ------ ------ ------ ---------- ------ ------
Barney Rubble 9 10 9 28 9 10 9 28 28 28
Donald Duck 6 8 7 21 6 8 7 21 21 21
Fred Flintstone 0 9 10 19 0 9 10 19 19 19
Micky Mouse 8 0 8 16 8 0 8 16 16 16
Yogi Bear 7 7 0 14 7 7 0 14 14 14
To handle the columns dynamically you need to have a look at:
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs by Jeff Moden.

How do i transform calendar year column to multiple year to months column based on months column and calendar year columns

I have data like this
id MoYear CalenderYear jan feb mar dec
1 2017 2017 1 2 4 0
1 2017 2018 1 0 6 10
2 2018 2018 80 5 8 22
3 2017 2018 30 12 0 3
Now i want ouput like this
id MOyear jan_17 feb_17 mar_17 dec_17 jan_18 feb_18 mar_18 dec_18
1 2017 1 2 4 0 1 0 6 10
2 2018 null null null null 80 5 8 22
3 2017 null null null null 30 12 0 3
I have calendar year column and months columns, based on the calendar year and months column i need to make multiple year-months columns.
I can get to the solution by unpivoting and then after back to pivot. But, the data is so large it takes a lot of memory. The performance is very bad.
Not sure if this will be better approach but you can achieve your output using case statement as well if you don't want to do pivot/unpivot.
Data Creation:
select 1 as ID, 2017 as MOYEar, 2017 as calenderyear, 1 as Jan, 2 as feb,
4 as mar, 0 as dece into #temp union all
select 1 as ID, 2017 as MOYEar, 2018 as calenderyear, 1 as Jan, 0 as feb,
6 as mar, 10 as dece union all
select 2 as ID, 2018 as MOYEar, 2018 as calenderyear, 80 as Jan, 5 as feb,
8 as mar, 22 as dece union all
select 3 as ID, 2017 as MOYEar, 2018 as calenderyear, 30 as Jan, 12 as feb,
0 as mar, 3 as dece
Query:
Select ID, MOYEar, max(case when calenderyear = '2017' then Jan else null end) as Jan_17,
max(case when calenderyear = '2017' then Feb else null end ) as Feb_17,
max(case when calenderyear = '2017' then Mar else null end ) as Mar_17,
max(case when calenderyear = '2017' then Dece else null end) as Dece_17,
max(case when calenderyear = '2018' then Jan else null end ) as Jan_18,
max(case when calenderyear = '2018' then Feb else null end ) as Feb_18,
max(case when calenderyear = '2018' then Mar else null end ) as Mar_18,
max(case when calenderyear = '2018' then Dece else null end) as Dece_18 from #temp
Group by ID, MOYEar
Output:
ID MOYEar Jan_17 Feb_17 Mar_17 Dece_17 Jan_18 Feb_18 Mar_18 Dece_18
1 2017 1 2 4 0 1 0 6 10
3 2017 NULL NULL NULL NULL 30 12 0 3
2 2018 NULL NULL NULL NULL 80 5 8 22

Select Count usage divided by month

I do have a table license_Usage where which works like a log of the usage of licenses in a day
ID User license date
1 1 A 22/1/2015
2 1 A 23/1/2015
3 1 B 23/1/2015
4 1 A 24/1/2015
5 2 A 22/2/2015
6 2 A 23/2/2015
7 1 B 23/2/2015
Where I want to Count how many licenses a user used in a month, the result should look like:
User Jan Feb
1 2 1 ...
2 0 2
How can I manage to do that???
You need a PIVOT or cross tab query. e.g.
SELECT [User],
COUNT(CASE WHEN Month = 1 THEN 1 END) AS Jan,
COUNT(CASE WHEN Month = 2 THEN 1 END) AS Feb,
COUNT(CASE WHEN Month = 3 THEN 1 END) AS Mar
/*TODO - Fill in other 9 months using above pattern*/
FROM [license]
CROSS APPLY (SELECT MONTH([date])) AS CA(Month)
WHERE [date] >= '20150101'
AND [date] < '20160101'
AND [license] = 'A'
GROUP BY [User]
SQL Fiddle

How to convert Dynamic 7 day rows into columns with t-sql

Background Info
I have a large table 400M+ rows that changes daily (one days data drops out an a new days data drops in) The table is partitioned on a 'day' field so there are 31 paritions.
Each row in the table has data similar to this:
ID, Postcode, DeliveryPoint, Quantity, Day, Month
1 SN1 1BG A1 6 29 1
2 SN1 1BG A1 1 28 1
3 SN1 1BG A2 2 27 1
4 SN1 1BG A1 3 28 1
5 SN2 1AQ B1 1 29 12
6 SN1 1BG A1 2 26 12
I need to pull out 7 days of data in the format:
Postcode, Deliverypoint, 7dayAverage, Day1,day2,Day3,Day4,Day5,Day6,Day7
SN1 1BG A1 2 0 1 2 1 3 4 0
I can easily extract the data for the 7 day period but need to create a columnar version as shown above.
I have something like this:
select postcode,deliverypoint,
sum (case day when 23 then quantity else 0 end) as day1,
sum (case day when 24 then quantity else 0 end) as day2,
sum(case day when 25 then quantity else 0 end) as day3,
sum(case day when 26 then quantity else 0 end) as day4,
sum(case day when 27 then quantity else 0 end) as day5,
sum(case day when 28 then quantity else 0 end) as day6,
sum(case day when 29 then quantity else 0 end) as day7,
sum(quantity)*1.0/#daysinweek as wkavg
into #allweekdp
from maintable dp with (nolock)
where day in (select day from #days)
group by postcode,deliverypoint
where #days has the day numbers in the 7 day period.
But as you can see, I've hard-coded the day numbers into the query, I want to get them out of my temporary table #days but can't see a way of doing it (an array would be perfect here)
Or a I going about this in completely the wrong way ?
Kind Regards
Steve
If I understand correctly, what I would do is:
Convert the day and month columns into datetime values,
Get the first day of the week and day of the weekday (1-7) for each date, and
Pivot the data and group by the first day of the week
see here: sqlfiddle
As utexaspunk suggested, Pivot might be the way to go. I've never been comfortable with pivot and have preferred to pivot it manually so I control how everything looks, so I'm using a similar solution to how you did your script to solve the issue. No idea how the performance between my way and utexaspunk's will compare.
Declare #Min_Day Integer = Select MIN(day) as Min_Day From #days;
With Day_Coding_CTE as (
Select Distinct day
, day - #Min_Day + 1 as Day_Label
From #days
)
, Non_Columnar_CTE as (
Select dp.postcode
, dp.deliverypoint
, d.day
, c.Day_Label
, SUM(quantity) as Quantity
From maintable dp with (nolock)
Left Outer Join #days d
on dp.day = d.day --It also seems like you'll need more criteria here, but you'll have to figure out what those should be
Left Outer Join Day_Coding_CTE c
on d.day = c.day
)
Select postcode
, deliverypoint
, SUM(Case
When Day_Label = 1
Then Quantity
Else 0
End) as Day1
, SUM(Case
When Day_Label = 2
Then Quantity
Else 0
End) as Day2
, SUM(Case
When Day_Label = 3
Then Quantity
Else 0
End) as Day3
, SUM(Case
When Day_Label = 4
Then Quantity
Else 0
End) as Day4
, SUM(Case
When Day_Label = 5
Then Quantity
Else 0
End) as Day5
, SUM(Case
When Day_Label = 6
Then Quantity
Else 0
End) as Day6
, SUM(Case
When Day_Label = 7
Then Quantity
Else 0
End) as Day7
, SUM(Quantity)/#daysinweek as wkavg
From Non_Columnar_CTE
Group by postcode
deliverypoint

Count parts of total value as columns per row (pivot table)

I'm stuck with a seemingly easy query, but couldn't manage to get it working the last hours.
I have a table files that holds file names and some values like records in this file, DATE of creation (create_date), DATE of processing (processing_date) and so on. There can be multiple files for a create date in different hours and it is likely that they will not get processed in the same day of creaton, in fact it can even take up to three days or longer for them to get processed.
So let's assume I have these rows, as an example:
create_date | processing_date
------------------------------
2012-09-10 11:10:55.0 | 2012-09-11 18:00:18.0
2012-09-10 15:20:18.0 | 2012-09-11 13:38:19.0
2012-09-10 19:30:48.0 | 2012-09-12 10:59:00.0
2012-09-11 08:19:11.0 | 2012-09-11 18:14:44.0
2012-09-11 22:31:42.0 | 2012-09-21 03:51:09.0
What I want in a single query is to get a grouped column truncated to the day create_date with 11 additional columns for the differences between the processing_date and the create_date, so that the result should roughly look like this:
create_date | diff0days | diff1days | diff2days | ... | diff10days
------------------------------------------------------------------------
2012-09-10 | 0 2 1 ... 0
2012-09-11 | 1 0 0 ... 1
and so on, I hope you get the point :)
I have tried this and so far it works getting a single aggregated column for a create_date with a difference of - for example - 3:
SELECT TRUNC(f.create_date, 'DD') as created, count(1) FROM files f WHERE TRUNC(f.process_date, 'DD') - trunc(f.create_date, 'DD') = 3 GROUP BY TRUNC(f.create_date, 'DD')
I tried combining the single queries and I tried sub-queries, but that didn't help or at least my knowledge about SQL is not sufficient.
What I need is a hint so that I can include the various differences as columns, like shown above. How could I possibly achieve this?
That's basically the pivoting problem:
SELECT TRUNC(f.create_date, 'DD') as created
, sum(case TRUNC(f.process_date, 'DD') - trunc(f.create_date, 'DD')
when 0 then 1 end) as diff0days
, sum(case TRUNC(f.process_date, 'DD') - trunc(f.create_date, 'DD')
when 1 then 1 end) as diff1days
, sum(case TRUNC(f.process_date, 'DD') - trunc(f.create_date, 'DD')
when 2 then 1 end) as diff2days
, ...
FROM files f
GROUP BY
TRUNC(f.create_date, 'DD')
SELECT CreateDate,
sum(CASE WHEN DateDiff(day, CreateDate, ProcessDate) = 1 THEN 1 ELSE 0 END) AS Diff1,
sum(CASE WHEN DateDiff(day, CreateDate, ProcessDate) = 2 THEN 1 ELSE 0 END) AS Diff2,
...
FROM table
GROUP BY CreateDate
ORDER BY CreateDate
As you are using Oracle 11g you can also get desired result by using pivot query.
Here is an example:
-- sample of data from your question
SQL> create table Your_table(create_date, processing_date) as
2 (
3 select '2012-09-10', '2012-09-11' from dual union all
4 select '2012-09-10', '2012-09-11' from dual union all
5 select '2012-09-10', '2012-09-12' from dual union all
6 select '2012-09-11', '2012-09-11' from dual union all
7 select '2012-09-11', '2012-09-21' from dual
8 )
9 ;
Table created
SQL> with t2 as(
2 select create_date
3 , processing_date
4 , to_date(processing_date, 'YYYY-MM-DD')
- To_Date(create_date, 'YYYY-MM-DD') dif
5 from your_table
6 )
7 select create_date
8 , max(diff0) diff0
9 , max(diff1) diff1
10 , max(diff2) diff2
11 , max(diff3) diff3
12 , max(diff4) diff4
13 , max(diff5) diff5
14 , max(diff6) diff6
15 , max(diff7) diff7
16 , max(diff8) diff8
17 , max(diff9) diff9
18 , max(diff10) diff10
19 from (select *
20 from t2
21 pivot(
22 count(dif)
23 for dif in ( 0 diff0
24 , 1 diff1
25 , 2 diff2
26 , 3 diff3
27 , 4 diff4
28 , 5 diff5
29 , 6 diff6
30 , 7 diff7
31 , 8 diff8
32 , 9 diff9
33 , 10 diff10
34 )
35 ) pd
36 ) res
37 group by create_date
38 ;
Result:
Create_Date Diff0 Diff1 Diff2 Diff3 Diff4 Diff5 Diff6 Diff7 Diff8 Diff9 Diff10
--------------------------------------------------------------------------------
2012-09-10 0 2 1 0 0 0 0 0 0 0 0
2012-09-11 1 0 0 0 0 0 0 0 0 0 1