Summarize days per month based on date ranges - sql

Being primarily a C# developer, I'm scratching my head when trying to create a pure T-SQL based solution to a problem involving summarizing days/month given a set of date ranges.
I have a set of data looking something like this:
UserID Department StartDate EndDate
====== ========== ========== ==========
1 A 2011-01-02 2011-01-05
1 A 2011-01-20 2011-01-25
1 A 2011-02-25 2011-03-05
1 B 2011-01-21 2011-01-22
2 A 2011-01-01 2011-01-20
3 C 2011-01-01 2011-02-03
The date ranges are non-overlapping, may span several months, there may exist several ranges for a specific user and department within a single month.
What I would like to do is to summarize the number of days (inclusive) per user, department, year and month, like this (with reservations for any math errors in my example...):
UserID Department Year Month Days
====== ========== ==== ===== ====
1 A 2011 01 10
1 A 2011 02 4
1 A 2011 03 5
1 B 2011 01 2
2 A 2011 01 20
3 C 2011 01 31
3 C 2011 02 3
This data is going into a new table used by reporting tools.
I hope the problem description is clear enough, this is my first posting here, be gentle :-)
Thanks in advance!

Working sample
-- sample data in a temp table
declare #t table (UserID int, Department char(1), StartDate datetime, EndDate datetime)
insert #t select
1 ,'A', '2011-01-02','2011-01-05'union all select
1 ,'A', '2011-01-20','2011-01-25'union all select
1 ,'A', '2011-02-25','2011-03-05'union all select
1 ,'B', '2011-01-21','2011-01-22'union all select
2 ,'A', '2011-01-01','2011-01-20'union all select
3 ,'C', '2011-01-01','2011-02-03'
-- the query you need is below this line
select UserID, Department,
YEAR(StartDate+v.number) Year,
MONTH(StartDate+v.number) Month, COUNT(*) Days
from #t t
inner join master..spt_values v
on v.type='P' and v.number <= DATEDIFF(d, startdate, enddate)
group by UserID, Department, YEAR(StartDate+v.number), MONTH(StartDate+v.number)
order by UserID, Department, Year, Month

Related

SQL Server multiple partitions by month, day and hour

In SQL Server I have a table like below:
processName initDateTime
processA 2020-06-15 13:31:15.330
processB 2020-06-20 10:00:30.000
processA 2020-06-20 13:31:15.330
...
and so on
I need to group by processName and for each processName I need to get the number of records by month (#byMonth), day (#byDay) and hour (#byHour).
What is the best way to do it? Something as below? What would be the SQL query?
Possible results:
processName Month Day Hour #byMonth #byDay #byHour #total(by process)
processA January 15 17 4 3 2 7
processA January 15 20 4 3 1 7
processA January 20 05 4 2 3 7
processA January 20 13 4 2 1 7
processA March 04 05 3 2 3 7
processA March 04 17 3 2 2 7
processA March 15 05 3 3 3 7
...and so on for the rest of processes name
I think that you want aggregation and window functions:
select
processName,
month(initDateTime),
day(initDateTime),
datepart(hour, initDateTime),
sum(count(*)) over(partition by processName, year(initDateTime), month(initDateTime)) byMonth,
sum(count(*)) over(partition by processName, year(initDateTime), month(initDateTime), day(initDateTime)) byDay,
count(*) byHour
from mytable
group by
processName,
year(initDateTime),
month(initDateTime),
day(initDateTime),
datepart(hour, initDateTime)
Wherever possible, I like to return dates as dates to the caller, so that they can also treat them as dates for things such as sorting, converting to local time, or even making sure that the language shown is relevant. So if it were me, i would do the following:
-- sample data
CREATE TABLE #T (processName VARCHAR(50), initDateTime DATETIME)
INSERT #T (processName, initDateTime)
VALUES
('processA', '2020-06-15 13:31:15.330'),
('processB', '2020-06-20 10:00:30.000'),
('processA', '2020-06-20 13:31:15.330')
SELECT t.processName,
i.InitHour,
ByMonth = SUM(COUNT(*)) OVER(PARTITION BY i.InitMonth),
ByDay = SUM(COUNT(*)) OVER(PARTITION BY i.InitDay),
ByHour = COUNT(*)
FROM #T AS t
CROSS APPLY
( SELECT InitHour = DATEADD(HOUR, DATEDIFF(HOUR, 0, initDateTime), 0),
InitDay = DATEADD(DAY, DATEDIFF(DAY, 0, initDateTime), 0),
InitMonth = DATEADD(MONTH, DATEDIFF(MONTH, 0, initDateTime), 0)
) AS i
GROUP BY t.processName, i.InitHour, i.InitDay, i.InitMonth;
Which returns:
processName InitHour ByMonth ByDay ByHour
--------------------------------------------------------------
processA 2020-06-15 13:00:00 3 1 1
processA 2020-06-20 13:00:00 3 2 1
processB 2020-06-20 10:00:00 3 2 1
If you need the day number, month name etc in SQL, you can get these using DATEPART or DATENAME, but as above, this is really better handled in the presentation layer, so you can deal with locales, or specific user settings.

Dynamic 12 month rolling total for each previous month, going back 12 months

I have the following sql Code (where clause just to limit rows currently)
select
month,
monthname,
year,
count(distinct case when a.dim_service_type_id_desc like '%Direct Payment%' then a.DIM_PERSON_ID else null end) as No_dp,
count(distinct a.DIM_PERSON_ID) as no_ppl
from
SERVICE_PROVISIONS a
inner join date_tbl d on CONVERT(VARCHAR(35),a.start_dttm,112) = d.dim_date_id
where
a.dim_person_id >0
and year = 2018
group by
month,
monthname,
year
my output is this
month monthname year No_dp no_ppl
1 January 2018 142 1604
2 February 2018 111 1526
3 March 2018 133 1636
4 April 2018 1107 3829
5 May 2018 140 1575
6 June 2018 131 1389
7 July 2018 200 893
8 August 2018 2 73
9 September 2018 1 32
10 October 2018 2 21
11 November 2018 2 21
12 December 2018 2 19
So my question is - the customer wants to see how many services were open (using start date and end date) during the previous 12 months (not how many were started, but how many were current and not ended). This is fine when using the current month, however they want to show this also for the previous 12 months as a rolling dynamic figure.
So for example this month in July they want to see how many services were open during the last 12 months. Last month June, they want to see how many services were open during the 12 months previous to June and so on for the previous 12 months.
The table needs to have the month name for the last 12 months and in a column show the number of services that were open in the previous 12 months next to that month.
I hope that makes sense, sorry if it doesn't, feel free to ask questions and I will try to clarify.
The output needs to look something like the current output table, but it is currently only showing how many services were started within that month, which isn't what we want.
The date table is a reference table which has different date formats etc. It can be used or added to if needed.
I've had to make several assumptions about your data. Hopefully the query I'll show in a minute will be easy for you to adjust if any of these are wrong:
I am guessing by its name that start_dttm is a datetime or datetime2 column.
I assume there is a corresponding column called end_dttm that gives the end date/time of a service, and that a null in this column would indicate that a service has not yet ended.
My best guess as to what it means for a service to be "open" in a given month is that it began sometime either within or prior to that month, and has not ended by the time that month is over.
I assume from your original query that multiple services having the same dim_person_id do not represent distinct services.
Since I don't know what's in your date_tbl, I'll show an example that doesn't require it. Consider the following query:
select
BeginDate = dateadd(month, -1, dateadd(day, 1, eomonth(getdate(), -Offset.X))),
EndDate = dateadd(day, 1, eomonth(getdate(), -Offset.X))
from
(values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11)) Offset(X)
This will give you 12 records, representing the current month and each of the 11 preceding months. Note that my EndDate here is not actually the last day of the month, but the first day of the following month. I've done this because of assumption 1 above; since your service dates may have time components, I'll determine whether they fall in a given month by checking if their dates are strictly earlier than the start of the following month. Here's what that query gives me:
BeginDate EndDate
2018-07-01 2018-08-01
2018-06-01 2018-07-01
2018-05-01 2018-06-01
2018-04-01 2018-05-01
2018-03-01 2018-04-01
2018-02-01 2018-03-01
2018-01-01 2018-02-01
2017-12-01 2018-01-01
2017-11-01 2017-12-01
2017-10-01 2017-11-01
2017-09-01 2017-10-01
2017-08-01 2017-09-01
Now I'll join the above result set to your SERVICE_PROVISIONS data, looking for records in each month that have dim_person_id > 0 (from your original query) and which satisfy assumption 3 above.
-- Some sample data (assumptions 1 & 2)
declare #SERVICE_PROVISIONS table (dim_person_id bigint, start_dttm datetime, end_dttm datetime);
insert #SERVICE_PROVISIONS values
(1, '20180101', '20180315'),
(1, '20180101', '20180315'),
(2, '20171215', '20180520');
-- The CTE defines the months we'll report on, as described earlier.
with MonthsCTE as
(
select
BeginDate = dateadd(month, -1, dateadd(day, 1, eomonth(getdate(), -Offset.X))),
EndDate = dateadd(day, 1, eomonth(getdate(), -Offset.X))
from
(values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11)) Offset(X)
)
-- This query matches the months from the CTE against the applicable services.
select
[Month] = datepart(month, M.BeginDate),
[MonthName] = datename(month, M.BeginDate),
[Year] = datepart(year, M.BeginDate),
ServicesOpen = count(distinct S.dim_person_id) -- Assumption 4
from
MonthsCTE M
left join #SERVICE_PROVISIONS S on
S.dim_person_id > 0 and
S.start_dttm < M.EndDate and -- Assumption 3
(
S.end_dttm >= M.EndDate or
S.end_dttm is null -- Assumption 2
)
group by
M.BeginDate,
M.EndDate
order by
M.BeginDate;
Note that I moved the dim_person_id > 0 from the WHERE clause to the JOIN so that each of the 12 months will still appear in the result set even if there were no services open during that time. Results:
Month MonthName Year ServicesOpen
8 August 2017 0
9 September 2017 0
10 October 2017 0
11 November 2017 0
12 December 2017 1
1 January 2018 2
2 February 2018 2
3 March 2018 1
4 April 2018 1
5 May 2018 0
6 June 2018 0
7 July 2018 0
something a bit like this - if you can write a query to get the value you want for a row in your ootput, then use cross apply to link to that query. Counting records that have an open record before the month, but no close record before the month seems feasible
SELECT IQ. *, OA.SERVICE_PROVISIONS FROM (select
month,
monthname,
year,
a.dim_person_id dim_person_id,
count(distinct case when a.dim_service_type_id_desc like '%Direct Payment%' then a.DIM_PERSON_ID else null end) as No_dp,
count(distinct a.DIM_PERSON_ID) as no_ppl
from
SERVICE_PROVISIONS a
inner join date_tbl d on CONVERT(VARCHAR(35),a.start_dttm,112) = d.dim_date_id
where
a.dim_person_id >0
and year = 2018
group by
month,
monthname,
year) IQ
CROSS APPLY
(SELECT count(0) OpenThings FROM SERVICE_PROVISIONS SP1 WHERE
(sp1.startdate < DATEFROMPARTS(IQ.year,iq.month,1)
AND
sp1.enddate is null or sp1.enddate > DATEFROMPARTS(IQ.year,iq.month,1)) and sp1.dim_person_id = iq.dim_person_id
) AS OA

Output number of occurrences of id in a table

PK Date ID
=== =========== ===
1 07/04/2017 22
2 07/05/2017 22
3 07/07/2017 03
4 07/08/2017 04
5 07/09/2017 22
6 07/09/2017 22
7 07/10/2017 05
8 07/11/2017 03
9 07/11/2017 03
10 07/11/2017 03
I want to count the number of ID occurred in a given week/month, something like this.
ID Count
22 3 --> count as 1 only in the same date occurred twice one 07/09/2017
03 2 --> same as above, increment only one regardless how many times it occurred in a same date
04 1
05 1
I'm trying to implement this in a perl file, to output/print it in a csv file, I have no idea on what query will I execute.
Seems like a simple case of count distinct and group by:
SELECT Id, COUNT(DISTINCT [Date]) As [Count]
FROM TableName
WHERE [Date] >= #StartDate
AND [Date] <= #EndDate
GROUP BY Id
ORDER BY [Count] DESC
You can use COUNT with DISTINCT e.g.:
SELECT ID, COUNT(DISTINCT Date)
FROM table
GROUP BY ID;
You can read more abot how to get month from a date in get month from a date (it also works for year).
Your query will be :
select DATEPART(mm,Date) AS month, COUNT(ID) AS count from table group by month
Hope that helped you.

How to aggregate 7 days in SQL

I was trying to aggregate a 7 days data for FY13 (starts on 10/1/2012 and ends on 9/30/2013) in SQL Server but so far no luck yet. Could someone please take a look. Below is my example data.
DATE BREAD MILK
10/1/12 1 3
10/2/12 2 4
10/3/12 2 3
10/4/12 0 4
10/5/12 4 0
10/6/12 2 1
10/7/12 1 3
10/8/12 2 4
10/9/12 2 3
10/10/12 0 4
10/11/12 4 0
10/12/12 2 1
10/13/12 2 1
So, my desired output would be like:
DATE BREAD MILK
10/1/12 1 3
10/2/12 2 4
10/3/12 2 3
10/4/12 0 4
10/5/12 4 0
10/6/12 2 1
Total 11 15
10/7/12 1 3
10/8/12 2 4
10/9/12 2 3
10/10/12 0 4
10/11/12 4 0
10/12/12 2 1
10/13/12 2 1
Total 13 16
--------through 9/30/2013
Please note, since FY13 starts on 10/1/2012 and ends on 9/30/2012, the first week of FY13 is 6 days instead of 7 days.
I am using SQL server 2008.
You could add a new computed column for the date values to group them by week and sum the other columns, something like this:
SELECT DATEPART(ww, DATEADD(d,-2,[DATE])) AS WEEK_NO,
SUM(Bread) AS Bread_Total, SUM(Milk) as Milk_Total
FROM YOUR_TABLE
GROUP BY DATEPART(ww, DATEADD(d,-2,[DATE]))
Note: I used DATEADD and subtracted 2 days to set the first day of the week to Monday based on your dates. You can modify this if required.
Use option with GROUP BY ROLLUP operator
SELECT CASE WHEN DATE IS NULL THEN 'Total' ELSE CONVERT(nvarchar(10), DATE, 101) END AS DATE,
SUM(BREAD) AS BREAD, SUM(MILK) AS MILK
FROM dbo.test54
GROUP BY ROLLUP(DATE),(DATENAME(week, DATE))
Demo on SQLFiddle
Result:
DATE BREAD MILK
10/01/2012 1 3
10/02/2012 2 4
10/03/2012 2 3
10/04/2012 0 4
10/05/2012 4 0
10/06/2012 2 1
Total 11 15
10/07/2012 1 3
10/08/2012 4 7
10/10/2012 0 4
10/11/2012 4 0
10/12/2012 2 1
10/13/2012 2 1
Total 13 16
You are looking for a rollup. In this case, you will need at least one more column to group by to do your rollup on, the easiest way to do that is to add a computed column that groups them into weeks by date.
Take a lookg at: Summarizing Data Using ROLLUP
Here is the general idea of how it could be done:
You need a derived column for each row to determine which fiscal week that record belongs to. In general you could subtract that record's date from 10/1, get the number of days that have elapsed, divide by 7, and floor the result.
Then you can GROUP BY that derived column and use the SUM aggregate function.
The biggest wrinkle is that 6 day week you start with. You may have to add some logic to make sure that the weeks start on Sunday or whatever day you use but this should get you started.
The WITH ROLLUP suggestions above can help; you'll need to save the data and transform it as you need.
The biggest thing you'll need to be able to do is identify your weeks properly. If you don't have those loaded into tables already so you can identify them, you can build them on the fly. Here's one way to do that:
CREATE TABLE #fy (fyear int, fstart datetime, fend datetime);
CREATE TABLE #fylist(fyyear int, fydate DATETIME, fyweek int);
INSERT INTO #fy
SELECT 2012, '2011-10-01', '2012-09-30'
UNION ALL
SELECT 2013, '2012-10-01', '2013-09-30';
INSERT INTO #fylist
( fyyear, fydate )
SELECT fyear, DATEADD(DAY, Number, DATEADD(DAY, -1, fy.fstart)) AS fydate
FROM Common.NUMBERS
CROSS APPLY (SELECT * FROM #fy WHERE fyear = 2013) fy
WHERE fy.fend >= DATEADD(DAY, Number, DATEADD(DAY, -1, fy.fstart));
WITH weekcalc AS
(
SELECT DISTINCT DATEPART(YEAR, fydate) yr, DATEPART(week, fydate) dt
FROM #fylist
),
ridcalc AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY yr, dt) AS rid, yr, dt
FROM weekcalc
)
UPDATE #fylist
SET fyweek = rid
FROM #fylist
JOIN ridcalc
ON DATEPART(YEAR, fydate) = yr
AND DATEPART(week, fydate) = dt;
SELECT list.fyyear, list.fyweek, p.[date], COUNT(bread) AS Bread, COUNT(Milk) AS Milk
FROM products p
JOIN #fylist list
ON p.[date] = list.fydate
GROUP BY list.fyyear, list.fyweek, p.[date] WITH ROLLUP;
The Common.Numbers reference above is a simple numbers table that I use for this sort of thing (goes from 1 to 1M). You could also build that on the fly as needed.

Generate year to date by month report in SQL [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Running total by grouped records in table
I am trying to put together an SQL statement that returns the SUM of a value by month, but on a year to date basis. In other words, for the month of March, I am looking to get the sum of a value for the months of January, February, and March.
I can easily do a group by to get a total for each month by itself, and potentially calculate the year to date value I need in my application from this data by looping through the results set. However, I was hoping to have some of this work handled with my SQL statement.
Has anyone ever tackled this type of problem with an SQL statement, and if so, what is the trick that I am missing?
My current sql statement for monthly data is similar to the following:
Select month, year, sum(value) from mytable group by month, year
If I include a where clause on the month, and only group by the year, I can get the result for a single month that I am looking for:
select year, sum(value) from mytable where month <= selectedMonth group by year
However, this requires me to have a particular month pre-selected or to utilize 12 different SQL statements to generate one clean result set.
Any guidance that can be provided would be greatly appreciated!
Update: The data is stored on an IBM iSeries.
declare #Q as table
(
mmonth INT,
value int
)
insert into #Q
values
(1,10),
(1,12),
(2,45),
(3,23)
select sum(January) as UpToJanuary,
sum(February)as UpToFebruary,
sum(March) as UpToMarch from (
select
case when mmonth<=1 then sum(value) end as [January] ,
case when mmonth<=2 then sum(value) end as [February],
case when mmonth<=3 then sum(value) end as [March]
from #Q
group by mmonth
) t
Produces:
UpToJanuary UpToFebruary UpToMarch
22 67 90
You get the idea, right?
NOTE: This could be done easier with PIVOT tables but I don't know if you are using SQL Server or not.
As far as I know DB2 does support windowing functions although I don't know if this is also supported on the iSeries version.
If windowing functions are supported (I believe IBM calls them OLAP functions) then the following should return what you want (provided I understood your question correctly)
select month,
year,
value,
sum(value) over (partition by year order by month asc) as sum_to_date
from mytable
order by year, month
create table mon
(
[y] int not null,
[m] int not null,
[value] int not null,
primary key (y,m))
select a.y, a.m, a.value, sum(b.value)
from mon a, mon b
where a.y = b.y and a.m >= b.m
group by a.y, a.m, a.value
2011 1 120 120
2011 2 130 250
2011 3 500 750
2011 4 10 760
2011 5 140 900
2011 6 100 1000
2011 7 110 1110
2011 8 90 1200
2011 9 70 1270
2011 10 150 1420
2011 11 170 1590
2011 12 600 2190
You should try to join the table to itself by month-behind-a-month condition and generate a synthetic month-group code to group by as follows:
select
sum(value),
year,
up_to_month
from (
select a.value,
a.year,
b.month as up_to_month
from table as a join table as b on a.year = b.year and b.month => a.month
)
group by up_to_month, year
gives that:
db2 => select * from my.rep
VALUE YEAR MONTH
----------- ----------- -----------
100 2011 1
200 2011 2
300 2011 3
400 2011 4
db2 -t -f rep.sql
1 YEAR UP_TO_MONTH
----------- ----------- -----------
100 2011 1
300 2011 2
600 2011 3
1000 2011 4