Efficient Multiple Group-bys - sql

I have the following table:
Year
Week
Day_1
Day_2
Day_3
2020
1
Walk
Jump
Swim
2020
3
Walk
Swim
Walk
2020
1
Jump
Walk
Swim
I want to group by YEAR, WEEK and Event (Walk, jump, Swim) and count the number of times each event occurs in Day_1, Day_2, Day_3. I.e.
Year
Week
Event
Count_Day_1
Count_Day_2
Count_Day_3
2020
1
Walk
1
1
0
2020
3
Walk
1
0
1
2020
1
Jump
1
1
0
2020
3
Jump
0
0
0
2020
1
Swim
0
0
2
2020
3
Swim
. 0
1
0
How can I do this efficiently?

In BigQuery, I would unpivot using arrays and then aggregate:
with t as (
select 2020 as year, 1 as week, 'Walk' as day_1, 'Jump' as day_2, 'Swim' as day_3 union all
select 2020, 3, 'Walk', 'Swim', 'Walk' union all
select 2020, 1, 'Jump', 'Walk', 'Swim'
)
select t.year, t.week, s.event,
countif(day = 1) as day_1, countif(day = 2) as day_2, countif(day = 3) as day_3
from t cross join
unnest([struct(t.day_1 as event, 1 as day),
struct(t.day_2 as event, 2 as day),
struct(t.day_3 as event, 3 as day)
]) s
group by t.year, t.week, s.event;

Consider this less verbose option
select year, week, event,
countif(offset = 0) as day_1,
countif(offset = 1) as day_2,
countif(offset = 2) as day_3
from `project.dataset.table`,
unnest([day_1, day_2, day_3]) event with offset
where not event is null
group by year, week, event
If applied to sample data in your question - output is

Demo code is MS SQL!
If you want to generate a full grid for every week and every year for every event then there are two pre-aggregates required, one for event and another one for every year and week.
Like:
DECLARE
#OriginalData
TABLE
(
numYear smallint,
numWeek tinyint,
dscDay1 nvarchar(20),
dscDay2 nvarchar(20),
dscDay3 nvarchar(20)
)
;
INSERT INTO
#OriginalData
(
numYear, numWeek, dscDay1, dscDay2, dscDay3
)
VALUES
( 2020, 1, N'Walk', N'Jump', N'Swim' ),
( 2020, 3, N'Walk', N'Swim', N'Walk' ),
( 2020, 1, N'Jump', N'Walk', N'Swim' )
;
SELECT
numYear, numWeek, dscDay1, dscDay2, dscDay3
FROM
#OriginalData
;
WITH
cteNormalise
(
dscActivity
)
AS
(
SELECT
dscDay1
FROM
#OriginalData
GROUP BY
dscDay1
UNION
SELECT
dscDay2
FROM
#OriginalData
GROUP BY
dscDay2
UNION
SELECT
dscDay3
FROM
#OriginalData
GROUP BY
dscDay3
),
cteGrid
(
numYear,
numWeek
)
AS
(
SELECT
numYear,
numWeek
FROM
#OriginalData
GROUP BY
numYear,
numWeek
)
SELECT
--/* Debug output */ *
YearWeek.numYear,
YearWeek.numWeek,
Normalised.dscActivity,
Count( Day1.dscDay1 ) AS CountDay1,
Count( Day2.dscDay2 ) AS CountDay2,
Count( Day3.dscDay3 ) AS CountDay3
FROM
cteNormalise AS Normalised
CROSS JOIN cteGrid AS YearWeek
LEFT OUTER JOIN #OriginalData AS Day1
ON Day1.dscDay1 = Normalised.dscActivity
AND Day1.numYear = YearWeek.numYear
AND Day1.numWeek = YearWeek.numWeek
LEFT OUTER JOIN #OriginalData AS Day2
ON Day2.dscDay2 = Normalised.dscActivity
AND Day2.numYear = YearWeek.numYear
AND Day2.numWeek = YearWeek.numWeek
LEFT OUTER JOIN #OriginalData AS Day3
ON Day3.dscDay3 = Normalised.dscActivity
AND Day3.numYear = YearWeek.numYear
AND Day3.numWeek = YearWeek.numWeek
GROUP BY
YearWeek.numYear,
YearWeek.numWeek,
Normalised.dscActivity
ORDER BY
YearWeek.numYear,
Normalised.dscActivity,
YearWeek.numWeek
;
This will work, however efficiency is questionable due to the steps to normalise the data before the actual aggregation happens.
If possible I suggest converting the table first into a 3NF with just key columns of Year, Week, Event and Day. Then a fairly efficient summary can be produced. At the cost of the normalisation beforehand. Otherwise the cost of transformation is required in the query.

You need to find distinct event, do cross join with your table and use conditional aggregation as follows:
select t.year, t.week, e.event,
count(case when t.day_1 = e.event then 1 end) as count_day_1,
count(case when t.day_2 = e.event then 1 end) as count_day_2,
count(case when t.day_3 = e.event then 1 end) as count_day_3
from your_Table t
cross join (select distinct day_1 as event from your_table
union all select day_2 from your_table
union all select day_3 from your_table) e
group by t.year, t.week, e.event

Related

SQL - '1' IF hour in month EXISTS, '0' IF NOT EXISTS

I have a table that has aggregations down to the hour level YYYYMMDDHH. The data is aggregated and loaded by an external process (I don't have control over). I want to test the data on a monthly basis.
The question I am looking to answer is: Does every hour in the month exist?
I'm looking to produce output that will return a 1 if the hour exists or 0 if the hour does not exist.
The aggregation table looks something like this...
YYYYMM YYYYMMDD YYYYMMDDHH DATA_AGG
201911 20191101 2019110100 100
201911 20191101 2019110101 125
201911 20191101 2019110103 135
201911 20191101 2019110105 95
… … … …
201911 20191130 2019113020 100
201911 20191130 2019113021 110
201911 20191130 2019113022 125
201911 20191130 2019113023 135
And defined as...
CREATE TABLE YYYYMMDDHH_DATA_AGG AS (
YYYYMM VARCHAR,
YYYYMMDD VARCHAR,
YYYYMMDDHH VARCHAR,
DATA_AGG INT
);
I'm looking to produce the following below...
YYYYMMDDHH HOUR_EXISTS
2019110100 1
2019110101 1
2019110102 0
2019110103 1
2019110104 0
2019110105 1
... ...
In the example above, two hours do not exist, 2019110102 and 2019110104.
I assume I'd have to join the aggregation table against a computed table that contains all the YYYYMMDDHH combos???
The database is Snowflake, but assume most generic ANSI SQL queries will work.
You can get what you want with a recursive CTE
The recursive CTE generates the list of possible Hours. And then a simple left outer join gets you the flag for if you have any records that match that hour.
WITH RECURSIVE CTE (YYYYMMDDHH) as
(
SELECT YYYYMMDDHH
FROM YYYYMMDDHH_DATA_AGG
WHERE YYYYMMDDHH = (SELECT MIN(YYYYMMDDHH) FROM YYYYMMDDHH_DATA_AGG)
UNION ALL
SELECT TO_VARCHAR(DATEADD(HOUR, 1, TO_TIMESTAMP(C.YYYYMMDDHH, 'YYYYMMDDHH')), 'YYYYMMDDHH') YYYYMMDDHH
FROM CTE C
WHERE TO_VARCHAR(DATEADD(HOUR, 1, TO_TIMESTAMP(C.YYYYMMDDHH, 'YYYYMMDDHH')), 'YYYYMMDDHH') <= (SELECT MAX(YYYYMMDDHH) FROM YYYYMMDDHH_DATA_AGG)
)
SELECT
C.YYYYMMDDHH,
IFF(A.YYYYMMDDHH IS NOT NULL, 1, 0) HOUR_EXISTS
FROM CTE C
LEFT OUTER JOIN YYYYMMDDHH_DATA_AGG A
ON C.YYYYMMDDHH = A.YYYYMMDDHH;
If your timerange is too long you'll have issues with the cte recursing too much. You can create a table or temp table with all of the possible hours instead. For example:
CREATE OR REPLACE TEMPORARY TABLE HOURS (YYYYMMDDHH VARCHAR) AS
SELECT TO_VARCHAR(DATEADD(HOUR, SEQ4(), TO_TIMESTAMP((SELECT MIN(YYYYMMDDHH) FROM YYYYMMDDHH_DATA_AGG), 'YYYYMMDDHH')), 'YYYYMMDDHH')
FROM TABLE(GENERATOR(ROWCOUNT => 10000)) V
ORDER BY 1;
SELECT
H.YYYYMMDDHH,
IFF(A.YYYYMMDDHH IS NOT NULL, 1, 0) HOUR_EXISTS
FROM HOURS H
LEFT OUTER JOIN YYYYMMDDHH_DATA_AGG A
ON H.YYYYMMDDHH = A.YYYYMMDDHH
WHERE H.YYYYMMDDHH <= (SELECT MAX(YYYYMMDDHH) FROM YYYYMMDDHH_DATA_AGG);
You can then fiddle with the generator count to make sure you have enough hours.
You can generate a table with every hour of the month and LEFT OUTER JOIN your aggregation to it:
WITH EVERY_HOUR AS (
SELECT TO_CHAR(DATEADD(HOUR, HH, TO_DATE(YYYYMM::TEXT, 'YYYYMM')),
'YYYYMMDDHH')::NUMBER YYYYMMDDHH
FROM (SELECT DISTINCT YYYYMM FROM YYYYMMDDHH_DATA_AGG) t
CROSS JOIN (
SELECT ROW_NUMBER() OVER (ORDER BY NULL) - 1 HH
FROM TABLE(GENERATOR(ROWCOUNT => 745))
) h
QUALIFY YYYYMMDDHH < (YYYYMM + 1) * 10000
)
SELECT h.YYYYMMDDHH, NVL2(a.YYYYMM, 1, 0) HOUR_EXISTS
FROM EVERY_HOUR h
LEFT OUTER JOIN YYYYMMDDHH_DATA_AGG a ON a.YYYYMMDDHH = h.YYYYMMDDHH
Here's something that might help get you started. I'm guessing you want to have 'synthetic' [YYYYMMDD] values? Otherwise, if the value aren't there, then they shouldn't appear in the list
DROP TABLE IF EXISTS #_hours
DROP TABLE IF EXISTS #_temp
--Populate a table with hours ranging from 00 to 23
CREATE TABLE #_hours ([hour_value] VARCHAR(2))
DECLARE #_i INT = 0
WHILE (#_i < 24)
BEGIN
INSERT INTO #_hours
SELECT FORMAT(#_i, '0#')
SET #_i += 1
END
-- Replicate OP's sample data set
CREATE TABLE #_temp (
[YYYYMM] INTEGER
, [YYYYMMDD] INTEGER
, [YYYYMMDDHH] INTEGER
, [DATA_AGG] INTEGER
)
INSERT INTO #_temp
VALUES
(201911, 20191101, 2019110100, 100),
(201911, 20191101, 2019110101, 125),
(201911, 20191101, 2019110103, 135),
(201911, 20191101, 2019110105, 95),
(201911, 20191130, 2019113020, 100),
(201911, 20191130, 2019113021, 110),
(201911, 20191130, 2019113022, 125),
(201911, 20191130, 2019113023, 135)
SELECT X.YYYYMM, X.YYYYMMDD, X.YYYYMMDDHH
-- Case: If 'target_hours' doesn't exist, then 0, else 1
, CASE WHEN X.target_hours IS NULL THEN '0' ELSE '1' END AS [HOUR_EXISTS]
FROM (
-- Select right 2 characters from converted [YYYYMMDDHH] to act as 'target values'
SELECT T.*
, RIGHT(CAST(T.[YYYYMMDDHH] AS VARCHAR(10)), 2) AS [target_hours]
FROM #_temp AS T
) AS X
-- Right join to keep all of our hours and only the target hours that match.
RIGHT JOIN #_hours AS H ON H.hour_value = X.target_hours
Sample output:
YYYYMM YYYYMMDD YYYYMMDDHH HOUR_EXISTS
201911 20191101 2019110100 1
201911 20191101 2019110101 1
NULL NULL NULL 0
201911 20191101 2019110103 1
NULL NULL NULL 0
201911 20191101 2019110105 1
NULL NULL NULL 0
With (almost) standard sql, you can do a cross join of the distinct values of YYYYMMDD to a list of all possible hours and then left join to the table:
select concat(d.YYYYMMDD, h.hour) as YYYYMMDDHH,
case when t.YYYYMMDDHH is null then 0 else 1 end as hour_exists
from (select distinct YYYYMMDD from tablename) as d
cross join (
select '00' as hour union all select '01' union all
select '02' union all select '03' union all
select '04' union all select '05' union all
select '06' union all select '07' union all
select '08' union all select '09' union all
select '10' union all select '11' union all
select '12' union all select '13' union all
select '14' union all select '15' union all
select '16' union all select '17' union all
select '18' union all select '19' union all
select '20' union all select '21' union all
select '22' union all select '23'
) as h
left join tablename as t
on concat(d.YYYYMMDD, h.hour) = t.YYYYMMDDHH
order by concat(d.YYYYMMDD, h.hour)
Maybe in Snowflake you can construct the list of hours with a sequence much easier instead of all those UNION ALLs.
This version accounts for the full range of days, across months and years. It's a simple cross join of the set of possible days with the set of possible hours of the day -- left joined to actual dates.
set first = (select min(yyyymmdd::number) from YYYYMMDDHH_DATA_AGG);
set last = (select max(yyyymmdd::number) from YYYYMMDDHH_DATA_AGG);
with
hours as (select row_number() over (order by null) - 1 h from table(generator(rowcount=>24))),
days as (
select
row_number() over (order by null) - 1 as n,
to_date($first::text, 'YYYYMMDD')::date + n as d,
to_char(d, 'YYYYMMDD') as yyyymmdd
from table(generator(rowcount=>($last-$first+1)))
)
select days.yyyymmdd || lpad(hours.h,2,0) as YYYYMMDDHH, nvl2(t.yyyymmddhh,1,0) as HOUR_EXISTS
from days cross join hours
left join YYYYMMDDHH_DATA_AGG t on t.yyyymmddhh = days.yyyymmdd || lpad(hours.h,2,0)
order by 1
;
$first and $last can be packed in as sub-queries if you prefer.

SQL Addition Formula

Noob alert...
I have an example table as followed.
I am trying to create a column in SQL that shows the what percentage each customer had of size S per year.
So output should be something like:
(Correction: the customer C for 2019 Percentage should be 1)
Window functions will get you there.
DECLARE #TestData TABLE
(
[Customer] NVARCHAR(2)
, [CustomerYear] INT
, [CustomerCount] INT
, [CustomerSize] NVARCHAR(2)
);
INSERT INTO #TestData (
[Customer]
, [CustomerYear]
, [CustomerCount]
, [CustomerSize]
)
VALUES ( 'A', 2017, 1, 'S' )
, ( 'A', 2017, 1, 'S' )
, ( 'B', 2017, 1, 'S' )
, ( 'B', 2017, 1, 'S' )
, ( 'B', 2018, 1, 'S' )
, ( 'A', 2018, 1, 'S' )
, ( 'C', 2017, 1, 'S' )
, ( 'C', 2019, 1, 'S' );
SELECT DISTINCT [Customer]
, [CustomerYear]
, SUM([CustomerCount]) OVER ( PARTITION BY [Customer]
, [CustomerYear]
) AS [CustomerCount]
, SUM([CustomerCount]) OVER ( PARTITION BY [CustomerYear] ) AS [TotalCount]
, SUM([CustomerCount]) OVER ( PARTITION BY [Customer]
, [CustomerYear]
) * 1.0 / SUM([CustomerCount]) OVER ( PARTITION BY [CustomerYear] ) AS [CustomerPercentage]
FROM #TestData
ORDER BY [CustomerYear]
, [Customer];
Will give you
Customer CustomerYear CustomerCount TotalCount CustomerPercentage
-------- ------------ ------------- ----------- ---------------------------------------
A 2017 2 5 0.400000000000
B 2017 2 5 0.400000000000
C 2017 1 5 0.200000000000
A 2018 1 2 0.500000000000
B 2018 1 2 0.500000000000
C 2019 1 1 1.000000000000
Assuming there are no duplicate rows for a customer in a year, you can use window functions:
select t.*,
sum(count) over (partition by year) as year_cnt,
count * 1.0 / sum(count) over (partition by year) as ratio
from t;
Break it apart into tasks - that's probably the best rule to follow when it comes to SQL. So, I created a variable table #tmp which I populated with your sample data, and started out with this query:
select
customer,
year
from #tmp
where size = 'S'
group by customer, year
... this gets a row for each customer/year combo for 'S' entries.
Next, I want the total count for that customer/year combo:
select
customer,
year,
SUM(itemCount) as customerItemCount
from #tmp
where size = 'S'
group by customer, year
... now, how do we get the count for all customers for a specific year? We need a subquery - and we need that subquery to reference the year from the main query.
select
customer,
year,
SUM(itemCount) as customerItemCount,
(select SUM(itemCount) from #tmp t2 where year=t.year) as FullTotalForYear
from #tmp t
where size = 'S'
GROUP BY customer, year
... that make sense? That new line in the ()'s is a subquery - and it's hitting the table again - but this time, its just getting a SUM() over the particular year that matches the main table.
Finally, we just need to divide one of those columns by the other to get the actual percent (making sure not to make it int/int - which will always be an int), and we'll have our final answer:
select
customer,
year,
cast(SUM(itemCount) as float) /
(select SUM(itemCount) from #tmp t2 where year=t.year)
as PercentageOfYear
from #tmp t
where size = 'S'
GROUP BY customer, year
Make sense?
With a join of 2 groupings:
the 1st by size, year, customer and
the 2nd by size, year.
select
t.customer, t.year, t.count, t.size,
ty.total_count, 1.0 * t.count / ty.total_count percentage
from (
select t.customer, t.year, sum(t.count) count, t.size
from tablename t
group by t.size, t.year, t.customer
) t inner join (
select t.year, sum(t.count) total_count, t.size
from tablename t
group by t.size, t.year
) ty
on ty.size = t.size and ty.year = t.year
order by t.size, t.year, t.customer;
See the demo

Group by in columns and rows, counts and percentages per day

I have a table that has data like following.
attr |time
----------------|--------------------------
abc |2018-08-06 10:17:25.282546
def |2018-08-06 10:17:25.325676
pqr |2018-08-05 10:17:25.366823
abc |2018-08-06 10:17:25.407941
def |2018-08-05 10:17:25.449249
I want to group them and count by attr column row wise and also create additional columns in to show their counts per day and percentages as shown below.
attr |day1_count| day1_%| day2_count| day2_%
----------------|----------|-------|-----------|-------
abc |2 |66.6% | 0 | 0.0%
def |1 |33.3% | 1 | 50.0%
pqr |0 |0.0% | 1 | 50.0%
I'm able to display one count by using group by but unable to find out how to even seperate them to multiple columns. I tried to generate day1 percentage with
SELECT attr, count(attr), count(attr) / sum(sub.day1_count) * 100 as percentage from (
SELECT attr, count(*) as day1_count FROM my_table WHERE DATEPART(week, time) = DATEPART(day, GETDate()) GROUP BY attr) as sub
GROUP BY attr;
But this also is not giving me correct answer, I'm getting all zeroes for percentage and count as 1. Any help is appreciated. I'm trying to do this in Redshift which follows postgresql syntax.
Let's nail the logic before presenting:
with CTE1 as
(
select attr, DATEPART(day, time) as theday, count(*) as thecount
from MyTable
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
select t1.attr, t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
From here you can pivot to create a day by day if you feel the need
I am trying to enhance the query #johnHC btw if you needs for 7days then you have to those days in case when
with CTE1 as
(
select attr, time::date as theday, count(*) as thecount
from t group by attr,time::date
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
,
CTE3 as
(
select t1.attr, EXTRACT(DOW FROM t1.theday) as day_nmbr,t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
)
select CTE3.attr,
max(case when day_nmbr=0 then CTE3.thecount end) as day1Cnt,
max(case when day_nmbr=0 then percentofday end) as day1,
max(case when day_nmbr=1 then CTE3.thecount end) as day2Cnt,
max( case when day_nmbr=1 then percentofday end) day2
from CTE3 group by CTE3.attr
http://sqlfiddle.com/#!17/54ace/20
In case that you have only 2 days:
http://sqlfiddle.com/#!17/3bdad/3 (days descending as in your example from left to right)
http://sqlfiddle.com/#!17/3bdad/5 (days ascending)
The main idea is already mentioned in the other answers. Instead of joining the CTEs for calculating the values I am using window functions which is a bit shorter and more readable I think. The pivot is done the same way.
SELECT
attr,
COALESCE(max(count) FILTER (WHERE day_number = 0), 0) as day1_count, -- D
COALESCE(max(percent) FILTER (WHERE day_number = 0), 0) as day1_percent,
COALESCE(max(count) FILTER (WHERE day_number = 1), 0) as day2_count,
COALESCE(max(percent) FILTER (WHERE day_number = 1), 0) as day2_percent
/*
Add more days here
*/
FROM(
SELECT *, (count::float/count_per_day)::decimal(5, 2) as percent -- C
FROM (
SELECT DISTINCT
attr,
MAX(time::date) OVER () - time::date as day_number, -- B
count(*) OVER (partition by time::date, attr) as count, -- A
count(*) OVER (partition by time::date) as count_per_day
FROM test_table
)s
)s
GROUP BY attr
ORDER BY attr
A counting the rows per day and counting the rows per day AND attr
B for more readability I convert the date into numbers. Here I take the difference between current date of the row and the maximum date available in the table. So I get a counter from 0 (first day) up to n - 1 (last day)
C calculating the percentage and rounding
D pivot by filter the day numbers. The COALESCE avoids the NULL values and switched them into 0. To add more days you can multiply these columns.
Edit: Made the day counter more flexible for more days; new SQL Fiddle
Basically, I see this as conditional aggregation. But you need to get an enumerator for the date for the pivoting. So:
SELECT attr,
COUNT(*) FILTER (WHERE day_number = 1) as day1_count,
COUNT(*) FILTER (WHERE day_number = 1) / cnt as day1_percent,
COUNT(*) FILTER (WHERE day_number = 2) as day2_count,
COUNT(*) FILTER (WHERE day_number = 2) / cnt as day2_percent
FROM (SELECT attr,
DENSE_RANK() OVER (ORDER BY time::date DESC) as day_number,
1.0 * COUNT(*) OVER (PARTITION BY attr) as cnt
FROM test_table
) s
GROUP BY attr, cnt
ORDER BY attr;
Here is a SQL Fiddle.

Full Outer Join, Coalesce, and Group By (Oh My!)

I'm going to ask this in two parts, because my logic may be way off, and if so, the syntax doesn't really matter.
I have 10 queries. Each query returns month, supplier, and count(some metric). The queries use various tables, joins, etc. Not all month/supplier combinations exist in the output for each query. I would like to combine these into a single data set that can be exported and pivoted on in Excel.
I'd like the output to look like this:
Month | Supplier | Metric1 |Metric2 |..| Metric 10
2018-01 | Supp1 | _value_ | _value_ |...| _value_ |
2018-01 | Supp2 | NULL | _value_ |...| NULL
What is the best / easiest / most efficient way to accomplish this?
I've tried various methods to accomplish the above, but I can't seem to get the syntax quite right. I wanted to make a very simple test case and build upon it, but I only have select privileges on the db, so I am unable to test it out. I was able to create a query that at least doesn't result in any squiggly red error lines, but applying the same logic to the bigger problem doesn't work.
This is what I've got:
create table test1(name varchar(20),credit int);
insert into test1 (name, credit) values ('Ed',1),('Ann',1),('Jim',1),('Ed',1),('Ann',1);
create table test2 (name varchar(10), debit int);
insert into test2 (name, debit) values ('Ann',1),('Sue',1),('Sue',1),('Sue',1);
select
coalesce(a.name, b.name) as name,
cred,
deb
from
(select name, count(credit) as cred
from test1
group by name) a
full outer join
(select name, count(debit) as deb
from test2
group by name) b on
a.name =b.name;
Am I headed down the right path?
UPDATE: Based on Gordon's input, I tried this on the first two queries:
select Month, Supp,
sum(case when which = 1 then metric end) as Exceptions,
sum(case when which = 2 then metric end) as BackOrders
from (
(
select Month, Supp, metric, 1 as which
from (
select (convert(char(4),E.PostDateTime,120)+'-'+convert(char(2),E.PostDateTime,101)) as Month, E.TradingPartner as Supp, count(distinct(E.excNum)) as metric
from db..TrexcMangr E
where (E.DSHERep in ('AVR','BTB') OR E.ReleasedBy in ('AVR','BTB')) AND year(E.PostDateTime) >= '2018'
) a
)
union all
(
select Month, Supp, metric, 2 as which
from (
select (convert(char(4),T.UpdatedDateTime,120)+'-'+convert(char(2),T.UpdatedDateTime,101)) as Month, P.Supplier as Supp, count(*) as metric
from db1..trordertext T
inner join mdid_Tran..trOrderPO P on P.PONum = T.RefNum
where T.TextType = 'BO' AND (T.CreatedBy in ('AVR','BTB') OR T.UpdatedBy in ('AVR','BTB')) AND year(UpdatedDateTime) >=2018
) b
)
) q
group by Month, Supp
... but I'm getting a group by error.
One method uses union all and group by:
select month, supplier,
sum(case when which = 1 then metric end) as metric_01,
sum(case when which = 2 then metric end) as metric_02,
. . .
from ((select Month, Supplier, Metric, 1 as which
from (<query1>) q
. . .
) union all
(select Month, Supplier, Metric, 2 as which
from (<query2>) q
. . .
) union all
. . .
) q
group by month, supplier;
SELECT
CalendarMonthStart,
Supp,
SUM(CASE WHEN metric_id = 1 THEN metric END) as Exceptions,
SUM(CASE WHEN metric_id = 2 THEN metric END) as BackOrders
FROM
(
SELECT
DATEADD(month, DATEDIFF(month, 0, E.PostDateTime), 0) AS CalendarMonthStart,
E.TradingPartner AS Supp,
COUNT(DISTINCT(E.excNum)) AS metric,
1 AS metric_id
FROM
db..TrexcMangr E
WHERE
( E.DSHERep in ('AVR','BTB')
OR E.ReleasedBy in ('AVR','BTB')
)
AND E.PostDateTime >= '2018-01-01'
GROUP BY
1, 2
UNION ALL
SELECT
DATEADD(month, DATEDIFF(month, 0, T.UpdatedDateTime), 0) AS CalendarMonthStart,
T.UpdatedDateTime,
P.Supplier AS Supp,
COUNT(*) AS metric,
2 AS metric_id
FROM
db1..trordertext T
INNER JOIN
mdid_Tran..trOrderPO P
ON P.PONum = T.RefNum
WHERE
( T.CreatedBy in ('AVR','BTB')
OR T.UpdatedBy in ('AVR','BTB')
)
AND T.TextType = 'BO'
AND T.UpdatedDateTime >= '2018-01-01'
GROUP BY
1, 2
)
combined
GROUP BY
CalendarMonthStart,
Supp

SQL Query in CRM Report

A "Case" in CRM has a field called "Status" with four options.
I'm trying to
build a report in CRM that fills a table with every week of the year (each row is a different week), and then counts the number of cases that have each Status option (the columns would be each of the Status options).
The table would look like this
Status 1 Status 2 Status 3
Week 1 3 55 4
Week 2 5 23 5
Week 3 14 11 33
So far I have the following:
SELECT
SUM(case WHEN status = 1 then 1 else 0 end) Status1,
SUM(case WHEN status = 2 then 1 else 0 end) Status2,
SUM(case WHEN status = 3 then 1 else 0 end) Status3,
SUM(case WHEN status = 4 then 1 else 0 end) Status4,
SUM(case WHEN status = 5 then 1 else 0 end) Status5
FROM [DB].[dbo].[Contact]
Which gives me the following:
Status 1 Status 2 Status 3
2 43 53
Now I need to somehow split this into 52 rows for the past year and filter these results by date (columns in the Contact table). I'm a bit new to SQL queries and CRM - any help here would be much appreciated.
Here is a SQLFiddle with my progress and sample data: http://sqlfiddle.com/#!2/85b19/1
Sounds like you want to group by a range. The trick is to create a new field that represents each range (for you one per year) and group by that.
Since it also seems like you want an infinite range of dates, marc_s has a good summary for how to do the group by trick with dates in a generic way: SQL group by frequency within a date range
So, let's break this down:
You want to make a report that shows, for each contact, a breakdown, week by week, of the number of cases registered to that contact, which is divided into three columns, one for each StateCode.
If this is the case, then you would need to have 52 date records (or so) for each contact. For calendar like requests, it's always good to have a separate calendar table that lets you query from it. Dan Guzman has a blog entry that creates a useful calendar table which I'll use in the query.
WITH WeekNumbers AS
(
SELECT
FirstDateOfWeek,
-- order by first date of week, grouping calendar year to produce week numbers
WeekNumber = row_number() OVER (PARTITION BY CalendarYear ORDER BY FirstDateOfWeek)
FROM
master.dbo.Calendar -- created from script
GROUP BY
FirstDateOfWeek,
CalendarYear
), Calendar AS
(
SELECT
WeekNumber =
(
SELECT
WeekNumber
FROM
WeekNumbers WN
WHERE
C.FirstDateOfWeek = WN.FirstDateOfWeek
),
*
FROM
master.dbo.Calendar C
WHERE
CalendarDate BETWEEN '1/1/2012' AND getutcdate()
)
SELECT
C.FullName,
----include the below if the data is necessary
--Cl.WeekNumber,
--Cl.CalendarYear,
--Cl.FirstDateOfWeek,
--Cl.LastDateOfWeek,
'Week: ' + CAST(Cl.WeekNumber AS VARCHAR(20))
+ ', Year: ' + CAST(Cl.CalendarYear AS VARCHAR(20)) WeekNumber
FROM
CRM.dbo.Contact C
-- use a cartesian join to produce a table list
CROSS JOIN
(
SELECT
DISTINCT WeekNumber,
CalendarYear,
FirstDateOfWeek,
LastDateOfWeek
FROM
Calendar
) Cl
ORDER BY
C.FullName,
Cl.WeekNumber
This is different from the solution Ben linked to because Marc's query only returns weeks where there is a matching value, whereas you may or may not want to see even the weeks where there is no activity.
Once you have your core tables of contacts split out week by week as in the above (or altered for your specific time period), you can simply add a subquery for each StateCode to see the breakdown in columns as in the final query below.
WITH WeekNumbers AS
(
SELECT
FirstDateOfWeek,
WeekNumber = row_number() OVER (PARTITION BY CalendarYear ORDER BY FirstDateOfWeek)
FROM
master.dbo.Calendar
GROUP BY
FirstDateOfWeek,
CalendarYear
), Calendar AS
(
SELECT
WeekNumber =
(
SELECT
WeekNumber
FROM
WeekNumbers WN
WHERE
C.FirstDateOfWeek = WN.FirstDateOfWeek
),
*
FROM
master.dbo.Calendar C
WHERE
CalendarDate BETWEEN '1/1/2012' AND getutcdate()
)
SELECT
C.FullName,
--Cl.WeekNumber,
--Cl.CalendarYear,
--Cl.FirstDateOfWeek,
--Cl.LastDateOfWeek,
'Week: ' + CAST(Cl.WeekNumber AS VARCHAR(20)) +', Year: ' + CAST(Cl.CalendarYear AS VARCHAR(20)) WeekNumber,
(
SELECT
count(*)
FROM
CRM.dbo.Incident I
INNER JOIN CRM.dbo.StringMap SM ON
I.StateCode = SM.AttributeValue
INNER JOIN
(
SELECT
DISTINCT ME.Name,
ME.ObjectTypeCode
FROM
CRM.MetadataSchema.Entity ME
) E ON
SM.ObjectTypeCode = E.ObjectTypeCode
WHERE
I.ModifiedOn >= Cl.FirstDateOfWeek
AND I.ModifiedOn < dateadd(day, 1, Cl.LastDateOfWeek)
AND E.Name = 'incident'
AND SM.AttributeName = 'statecode'
AND SM.LangId = 1033
AND I.CustomerId = C.ContactId
AND SM.Value = 'Active'
) ActiveCases,
(
SELECT
count(*)
FROM
CRM.dbo.Incident I
INNER JOIN CRM.dbo.StringMap SM ON
I.StateCode = SM.AttributeValue
INNER JOIN
(
SELECT
DISTINCT ME.Name,
ME.ObjectTypeCode
FROM
CRM.MetadataSchema.Entity ME
) E ON
SM.ObjectTypeCode = E.ObjectTypeCode
WHERE
I.ModifiedOn >= Cl.FirstDateOfWeek
AND I.ModifiedOn < dateadd(day, 1, Cl.LastDateOfWeek)
AND E.Name = 'incident'
AND SM.AttributeName = 'statecode'
AND SM.LangId = 1033
AND I.CustomerId = C.ContactId
AND SM.Value = 'Resolved'
) ResolvedCases,
(
SELECT
count(*)
FROM
CRM.dbo.Incident I
INNER JOIN CRM.dbo.StringMap SM ON
I.StateCode = SM.AttributeValue
INNER JOIN
(
SELECT
DISTINCT ME.Name,
ME.ObjectTypeCode
FROM
CRM.MetadataSchema.Entity ME
) E ON
SM.ObjectTypeCode = E.ObjectTypeCode
WHERE
I.ModifiedOn >= Cl.FirstDateOfWeek
AND I.ModifiedOn < dateadd(day, 1, Cl.LastDateOfWeek)
AND E.Name = 'incident'
AND SM.AttributeName = 'statecode'
AND SM.LangId = 1033
AND I.CustomerId = C.ContactId
AND SM.Value = 'Canceled'
) CancelledCases
FROM
CRM.dbo.Contact C
CROSS JOIN
(
SELECT
DISTINCT WeekNumber,
CalendarYear,
FirstDateOfWeek,
LastDateOfWeek
FROM
Calendar
) Cl
ORDER BY
C.FullName,
Cl.WeekNumber