SQL SUM up date ranges that collide on a group by - sql

I have a table with columns: name, start date (a date value) and finish date(a date value). I want to group by name adding up the dates so I get the total time with no collisions. So, if I have a table
name | start date | finish date
===============================
a | 20/10/2015 | 22/10/2015
a | 21/10/2015 | 22/10/2015
a | 26/10/2015 | 27/10/2015
So, if I group by name, the 3 rows will aggregate, if I simply add the DATEDIFF day per row I'll get 4, if I calculate the DATEDIFF between the MIN start date and the MAX finish date it will be 7, when in reality the right answer would be 3, since the second row collides with the first one and I only need to count that time once.

Thanks for your comments below. I have used a completely different approach. First L build a calendar CTE a with all the dates that exist in your table. You may use an existing calendar table from your database if you have one. Then in the CTE b I CROSS JOIN the calendar CTE to get the dates that exist for the date ranges. In this CTE it does not matter how many overlapping ranges you have as The date will be included once only using the GROUP BY [name] clause. And now all you need to do is to count the number of the individual dates in the CTE c:
SQL Fiddle
MS SQL Server 2008 Schema Setup:
CREATE TABLE Table1
([name] varchar(1), [start date] datetime, [finish date] datetime)
;
INSERT INTO Table1
([name], [start date], [finish date])
VALUES
('a', '2015-10-20 00:00:00', '2015-10-22 00:00:00'),
('a', '2015-10-21 00:00:00', '2015-10-22 00:00:00'),
('a', '2015-10-21 00:00:00', '2015-10-23 00:00:00'),
('a', '2015-10-26 00:00:00', '2015-10-27 00:00:00')
;
Query 1:
with dt as(
select min([start date]) as sd, max([finish date]) as fd from Table1
),
a as (
select sd from dt
union all
select dateadd(day, 1, a.sd)
FROM a cross join dt
where a.sd < fd
),
b as(
select [name], sd
from table1 cross join a where a.sd between [start date] and [finish date]
group by [name], sd
),
c as (
select [name], count(*) days from b group by [name]
)
select * from c
option (maxrecursion 0)
Results:
| name | days |
|------|------|
| a | 6 |

Related

Find missing months in SQL

So this post remains unanswered and not useful
Finding missing month from my table
This link Get Missing Month from table requires a lookup table... which is not my first choice.
I have a table with Financial Periods, and a reference number. Each reference numbers has a series of financial periods which may start anywhere, and end anywhere. The test is simply that between the start and end, there is no gap - i.e. there must be every financial period period the smallest and largest dates, when grouped by reference number.
A financial period is a month.
So... in this example below, Reference Number A is missing May 2016.
REF MONTH
A 2016-04-01
A 2016-06-01
A 2016-07-01
B 2016-03-01
B 2016-04-01
B 2016-05-01
C 2022-05-01
-- Find the boundaries of each ref
select REF
, MIN(Month) as smallest
, MAX(Month) as largest
from myTable
group by REF
-- But how to find missing items?
SQL Server 2019.
Clearly a Calendar Table would make this a small task (among many others)
Here is an alternative using the window function lead() over()
Example
Declare #YourTable Table ([REF] varchar(50),[MONTH] date) Insert Into #YourTable Values
('A','2016-04-01')
,('A','2016-06-01')
,('A','2016-07-01')
,('B','2016-03-01')
,('B','2016-04-01')
,('B','2016-05-01')
,('C','2022-05-01')
;with cte as (
Select *
,Missing = datediff(MONTH,[Month],lead([Month],1) over (partition by Ref order by [Month]))-1
From #YourTable
)
Select * from cte where Missing>0
Results
REF MONTH Missing
A 2016-04-01 1
I added one more row of input to demonstrate the solution better.
with forecast (
REF,
[MONTH]
) as (
select REF
, [MONTH]
from (
values
('A', {d '2016-04-01'})
, ('A', {d '2016-06-01'})
, ('A', {d '2016-07-01'})
, ('B', {d '2016-03-01'})
, ('B', {d '2016-04-01'})
, ('B', {d '2016-05-01'})
, ('B', {d '2016-09-01'})
, ('C', {d '2022-05-01'})
) x (REF, [MONTH])
),
-- define the date ranges
daterange as (
select REF
, min([MONTH]) as dtmin
, max([MONTH]) as dtmax
from forecast
group by REF
),
-- get all of the [end of month] dates in the range
dt (
REF,
[MONTH]
) as (
select REF
, dtmin
from daterange dr
union all
select dt.REF
, dateadd(month, 1, [MONTH])
from dt dt
inner join daterange dr on dr.REF = dt.REF
where dateadd(month, 1, [MONTH]) <= dr.dtmax
)
-- find the missing months
select REF
, [MONTH]
from dt
except
select REF
, [MONTH]
from forecast
order by 1, 2
-- or list all of the months for each REF
--select REF
--, [MONTH]
--from dt

Calculate weekly hours of operation in T-SQL from overlapping date/time data?

I am trying to calculate the number of hours of operation per week for each facility in a region. The part I am struggling with is that there are multiple programs each day that overlap which contribute to the total hours.
Here is a sample of the table I am working with:
location
program
date
start_time
end_time
a
1
09-22-21
14:45:00
15:45:00
a
2
09-22-21
15:30:00
16:30:00
b
88
09-22-21
10:45:00
12:45:00
b
89
09-22-21
10:45:00
14:45:00
I am hoping to get:
location
hours of operation
a
1.75
b
4
I've tried using SUM DATEDIFF with some WHERE statements but couldn't get them to work. What I have found is how to identify the overlapping ranges(Detect overlapping date ranges from the same table), but not how to sum the difference to get the desired outcome of total non-overlapping hours of operation.
Believe you are trying to identify the total hours of operation for each location. Now because some programs can overlap, you want to rule those out. To do this, I generate a tally table of each possible 15 minute increment in the date and then count the time periods that have a program operating
Identify Total Hours of Operation per Date
DROP TABLE IF EXISTS #OperationSchedule
CREATE TABLE #OperationSchedule (ID INT IDENTITY(1,1),Location CHAR(1),Program INT,OpDate DATE,OpStart TIME(0),OpEnd TIME(0))
INSERT INTO #OperationSchedule
VALUES ('a',1,'09-22-21','14:45:00','15:45:00')
,('a',2,'09-22-21','15:30:00','16:30:00')
,('b',88,'09-22-21','10:45:00','12:45:00')
,('b',89,'09-22-21','10:45:00','14:45:00');
/*1 row per 15 minute increment in a day*/
;WITH cte_TimeIncrement AS (
SELECT StartTime = CAST('00:00' AS TIME(0))
UNION ALL
SELECT DATEADD(minute,15,StartTime)
FROM cte_TimeIncrement
WHERE StartTime < '23:45'
),
/*1 row per date in data*/
cte_DistinctDate AS (
SELECT OpDate
FROM #OperationSchedule
GROUP BY OpDate
),
/*Cross join to generate 1 row for each time increment*/
cte_DatetimeIncrement AS (
SELECT *
FROM cte_DistinctDate
CROSS JOIN cte_TimeIncrement
)
/*Join and count each time interval that has a match to identify times when location is operating*/
SELECT Location
,A.OpDate
,HoursOfOperation = CAST(COUNT(DISTINCT StartTime) * 15/60.0 AS Decimal(4,2))
FROM cte_DatetimeIncrement AS A
INNER JOIN #OperationSchedule AS B
ON A.OpDate = B.OpDate
AND A.StartTime >= B.OpStart
AND A.StartTime < B.OpEnd
GROUP BY Location,A.OpDate
Here is an alternative method without having to round to nearest 15 minute increments:
Declare #OperationSchedule table (
ID int Identity(1, 1)
, Location char(1)
, Program int
, OpDate date
, OpStart time(0)
, OpEnd time(0)
);
Insert Into #OperationSchedule (Location, Program, OpDate, OpStart, OpEnd)
Values ('a', 1, '09-22-21', '14:45:00', '15:45:00')
, ('a', 2, '09-22-21', '15:30:00', '16:30:00')
, ('b', 88, '09-22-21', '10:45:00', '12:45:00')
, ('b', 89, '09-22-21', '10:45:00', '14:45:00')
, ('c', 23, '09-22-21', '12:45:00', '13:45:00')
, ('c', 24, '09-22-21', '14:45:00', '15:15:00')
, ('3', 48, '09-22-21', '09:05:00', '13:55:00')
, ('3', 49, '09-22-21', '14:25:00', '15:38:00')
;
With overlappedData
As (
Select *
, overlap_op = lead(os.OpStart, 1, os.OpEnd) Over(Partition By os.Location Order By os.ID)
From #OperationSchedule os
)
Select od.Location
, start_date = min(od.OpStart)
, end_date = max(iif(od.OpEnd < od.overlap_op, od.OpEnd, od.overlap_op))
, hours_of_operation = sum(datediff(minute, od.OpStart, iif(od.OpEnd < od.overlap_op, od.OpEnd, od.overlap_op)) / 60.0)
From overlappedData od
Group By
od.Location;

Calculate inactive customers from single table

I have table with fields Customer.No. , Posting date, Order_ID . I want to find total inactive customers for last 12 months on month basis which means they have placed order before 12 months back and became in active. So want calculate this every month basis to under stand how inactive customers are growing month by month.
if I run the query in July it should go back 365 days from the previous month end and give total number of inactive customers. I want to do this month by month.
I am in learning stage please help.
Thanks for your time in advance.
to get the customers
SELECT DISTINCT a.CustomerNo
FROM YourTable a
WHERE NOT EXISTS
(SELECT 0 FROM YourTable b WHere a.CustomerNo = b.CustomerNo
and b.PostingDate >
dateadd(day,-365 -datepart(day,getdate()),getdate())
)
to get a count
SELECT DISTINCT count(0) as InnactiveCount
FROM YourTable a
WHERE NOT EXISTS
(SELECT 0 FROM YourTable b WHere a.CustomerNo = b.CustomerNo
and b.PostingDate >
dateadd(day,-365 -datepart(day,getdate()),getdate())
..
generate a 'months' table by CTE, then look for inactive in those months
;WITH month_gen as (SELECT dateadd(day,-0 -datepart(day,getdate()),getdate()) eom, 1 as x
UNION ALL
SELECT dateadd(day,-datepart(day,eom),eom) eom, x + 1 x FROM month_gen where x < 12
)
SELECT DISTINCT CONVERT(varchar(7), month_gen.eom, 102), count(0) innactiveCount FROM YourTable a
cross join month_gen
WHERE NOT EXISTS(SELECT 0 FROM YourTable b WHere a.CustomerNo = b.CustomerNo and
YEAR(b.PostingDate) = YEAR(eom) and
MONTH(b.PostingDate) = MONTH(eom)
)
GROUP BY CONVERT(varchar(7), month_gen.eom, 102)
if that gets you anywhere, maybe a final step is to filter out anything getting 'counted' before it was ever active i.e. don't count 'new' customers before they became active
Try below query. To achieve your goal you need calendar table (which I defined with CTE). Below query counts inactivity for the first day of a month:
declare #tbl table (custNumber int, postDate date, orderId int);
insert into #tbl values
(1, '2017-01-01', 123),
(2, '2017-02-01', 124),
(3, '2017-02-01', 125),
(1, '2018-02-02', 126),
(2, '2018-05-01', 127),
(3, '2018-06-01', 128)
;with cte as (
select cast('2018-01-01' as date) dt
union all
select dateadd(month, 1, dt) from cte
where dt < '2018-12-01'
)
select dt, sum(case when t2.custNumber is null then 1 else 0 end)
from cte c
left join #tbl t1 on dateadd(year, -1, c.dt) >= t1.postDate
left join #tbl t2 on t2.postDate > dateadd(year, -1, c.dt) and t2.postDate <= c.dt and t1.custNumber = t2.custNumber
group by dt

Check Missing Time Interval In SQL in Minutes

I have a SQL statement as below and I wish to check the data in time interval in minutes where the D.DatalogValue didn't have any value and it won't show as Null or zero value either. The sample as below output result will be show 2016-06-01 00:32:29 as missing createdDate.
SELECT
A.DefID, A.ObjID,
C.ObjName, C.Dev_ID,
A.Pro_ID, A.ArrayIndex,
A.DefType, A.TObjID, A.DimeId, A.DefId,
D.DatalogValue, D.PanelDt, D.CreatedDate
FROM
Table A, Table C, Table D
WHERE
A.ObjID = C.ObjID
AND C.ObjID = '2627'
AND A.DefID = D.DefID
AND D.CreatedDate BETWEEN '2016-06-01' AND '2016-06-02'
ORDER BY
C.ObID,C.ObjName;
Sample data:
Create Date DatalogValue
-------------------------------------
2016-06-01 00:29:29 0.01
2016-06-01 00:30:29 0.02
2016-06-01 00:31:29 0.03
2016-06-01 00:33:29 0.04
By using the solution provided i have come out a SQL statement but it still no able to show the result i want. I not sure which part i doing wrong my code as below:
DECLARE #StartDate DATETIME = '2016-07-01';
DECLARE #EndDate DATETIME = '2016-07-31';
WITH Check_Dates AS (
SELECT #StartDate [Date]
UNION ALL
SELECT DATEADD(MINUTE, 1, [Date]) FROM Check_Dates
WHERE [Date] < DATEADD(DAY, 1, #EndDate)
)
SELECT
FORMAT(d.Date, 'yyyy-MM-dd HH:mm') [Created Date]
FROM Check_Dates d
WHERE
NOT EXISTS(
SELECT
Format(D.CreatedDate, 'yyyy-MM-dd HH:mm')as created_dt
FROM TABLE A
,TABLE C
,TABLE D
WHERE A.ObjID=C.ObjID
AND C.ObjID IN('3915')
AND A.DefID=D.DefID
AND D.CreatedDate BETWEEN '2016-07-01' AND '2016-08-01'
)
OPTION (MAXRECURSION 0);
A solution is to use a CTE to create a list of DATETIMEs then LEFT JOIN these onto your original query. You can also create a pair of tables instead (as mentioned in the comments) - google DimDate and/or DimTime.
Something like (untested):
DECLARE #StartDate DATETIME = '2016-06-01';
DECLARE #EndDate DATETIME = '2016-06-02';
WITH Dates AS (
SELECT #StartDate [Date]
UNION ALL
SELECT DATEADD(SECOND, 1, [Date]) FROM Dates
WHERE [Date] < DATEADD(DAY, 1, #EndDate)
)
SELECT
d.Date [Created Date]
,COALESCE(Qry.DatalogValue, 0) DatalogValue
FROM Dates d
LEFT JOIN (
Your query goes here
) Qry
ON d.Date = Qry.CreatedDate
OPTION (MAXRECURSION 0)
Your solution seems very risky to me. Are you sure seconds should be compared? I would truncate to minutes. I suggest more robust solution:
WITH Dates AS
( --Your dates and values
SELECT * FROM (VALUES
('2016-06-01 00:29:29', 0.01),
('2016-06-01 00:30:29', 0.02),
('2016-06-01 00:31:29', 0.03),
('2016-06-01 00:33:29', 0.04)--,('2016-06-01 01:00:28', 0.05)
) T(CreateDate, CatalogValues)
), Minute10 AS --Generate numbers from 0-999999
(
SELECT * FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(0)) T1(Value)
), Minute1000 AS
(
SELECT M1.Value FROM Minute10 M1 CROSS JOIN Minute10 M2 CROSS JOIN Minute10
), Minute1000000 AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1))-1 Value
FROM Minute1000
CROSS JOIN Minute1000 M2
), RangeValues AS --for simplicity, min and max values from dates
(
SELECT DATEADD(MINUTE, DATEDIFF(MINUTE, 0, MIN(CreateDate)), 0) MinDate,
DATEADD(MINUTE, DATEDIFF(MINUTE, 0, MAX(CreateDate)), 0) MaxDate
FROM Dates
)
SELECT TOP(1+DATEDIFF(MINUTE, (SELECT MinDate FROM RangeValues), (SELECT MaxDate FROM RangeValues)))
DATEADD(MINUTE,Value,MinDate) ExpectedDate, CreateDate, CatalogValues
FROM Minute1000000
CROSS APPLY (SELECT MinDate FROM RangeValues) T
LEFT JOIN Dates ON DATEADD(MINUTE,Value,MinDate)=DATEADD(MINUTE, DATEDIFF(MINUTE, 0, CreateDate), 0)
Note that all dates are truncated to minutes. You can simplyfy query by removing number generation part (numbers can be placed in utility table, 1440 values if that's all you need). Min and Max can be precalculated.
This results in following output (can handle ranges minDate+999999 minutes, can be simply extended):
ExprectedDate CreateDate CatalogValues
2016-06-01 00:29:00.000 2016-06-01 00:29:29 0.01
2016-06-01 00:30:00.000 2016-06-01 00:30:29 0.02
2016-06-01 00:31:00.000 2016-06-01 00:31:29 0.03
2016-06-01 00:32:00.000 NULL NULL
2016-06-01 00:33:00.000 2016-06-01 00:33:29 0.04
Explanation:
Dates is just source table. Tables Minute10..Minute1000000 are to generate numbers from 0 to 999999 (10 cross joined 10 = 100, 100 cross joined x3 = 100^3 = 1000000. Records from last table are numbered to get sequential values. Don't worry, TOP prevents from evaluating all 1000000 values. RangeValues contains MAX and MIN dates, for simplicity.
Algorithm:
Since you need records from MIN date to MAX date every minute, you evaluate TOP DATETIFF(MINUTE,MIN,MAX)+1 records (+1 to avoid Fencepost error). All required tables are joined (CROSS APLLY adds MIN column to every record), Expected date is calculated as MIN date + sequential value in minutes. Last join, LEFT one, matches date generated for every minute with source table. If there is match, record is appended (joined). If there is no match, NULL is appended. Note `DATEADD(MINUTE, DATEDIFF(MINUTE, 0, #someDate), 0)' truncates seconds from date.

Get average of last 7 days

I'm attacking a problem, where I have a value for a a range of dates. I would like to consolidate the rows in my table by averaging them and reassigning the date column to be relative to the last 7 days. My SQL experience is lacking and could use some help. Thanks for giving this a look!!
E.g.
7 rows with dates and values.
UniqueId Date Value
........ .... .....
a 2014-03-20 2
a 2014-03-21 2
a 2014-03-22 3
a 2014-03-23 5
a 2014-03-24 1
a 2014-03-25 0
a 2014-03-26 1
Resulting row
UniqueId Date AvgValue
........ .... ........
a 2014-03-26 2
First off I am not even sure this is possible. I'm am trying to attack a problem with this data at hand. I thought maybe using a framing window with a partition to roll the dates into one date with the averaged result, but am not exactly sure how to say that in SQL.
Am taking following as sample
CREATE TABLE some_data1 (unique_id text, date date, value integer);
INSERT INTO some_data1 (unique_id, date, value) VALUES
( 'a', '2014-03-20', 2),
( 'a', '2014-03-21', 2),
( 'a', '2014-03-22', 3),
( 'a', '2014-03-23', 5),
( 'a', '2014-03-24', 1),
( 'a', '2014-03-25', 0),
( 'a', '2014-03-26', 1),
( 'b', '2014-03-01', 1),
( 'b', '2014-03-02', 1),
( 'b', '2014-03-03', 1),
( 'b', '2014-03-04', 1),
( 'b', '2014-03-05', 1),
( 'b', '2014-03-06', 1),
( 'b', '2014-03-07', 1)
OPTION A : - Using PostgreSQL Specific Function WITH
with cte as (
select unique_id
,max(date) date
from some_data1
group by unique_id
)
select max(sd.unique_id),max(sd.date),avg(sd.value)
from some_data1 sd inner join cte using(unique_id)
where sd.date <=cte.date
group by cte.unique_id
limit 7
> SQLFIDDLE DEMO
OPTION B : - To work in PostgreSQL and MySQL
select max(sd.unique_id)
,max(sd.date)
,avg(sd.value)
from (
select unique_id
,max(date) date
from some_data1
group by unique_id
) cte inner join some_data1 sd using(unique_id)
where sd.date <=cte.date
group by cte.unique_id
limit 7
> SQLFDDLE DEMO
Maybe something along the lines of SELECT AVG(Value) AS 'AvgValue' FROM tableName WHERE Date BETWEEN dateStart AND dateEnd That will get you the average between those dates and you have dateEnd already so you could use that result to create the row you're looking for.
For PostgreSQL a window function might be what you want:
DROP TABLE IF EXISTS some_data;
CREATE TABLE some_data (unique_id text, date date, value integer);
INSERT INTO some_data (unique_id, date, value) VALUES
( 'a', '2014-03-20', 2),
( 'a', '2014-03-21', 2),
( 'a', '2014-03-22', 3),
( 'a', '2014-03-23', 5),
( 'a', '2014-03-24', 1),
( 'a', '2014-03-25', 0),
( 'a', '2014-03-26', 1),
( 'a', '2014-03-27', 3);
WITH avgs AS (
SELECT unique_id, date,
avg(value) OVER w AS week_avg,
count(value) OVER w AS num_days
FROM some_data
WINDOW w AS (
PARTITION BY unique_id
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW))
SELECT unique_id, date, week_avg
FROM avgs
WHERE num_days=7
Result:
unique_id | date | week_avg
-----------+------------+--------------------
a | 2014-03-26 | 2.0000000000000000
a | 2014-03-27 | 2.1428571428571429
Questions include:
What happens if a day from the preceding six days is missing? Do we want to add it and count it as zero?
What happens if you add a day? Is the result of the code above what you want (a rolling 7-day average)?
For SQL Server, you can follow the below approach. Try this
1. For weekly value's average
SET DATEFIRST 4
;WITH CTE AS
(
SELECT *,
DATEPART(WEEK,[DATE])WK,
--Find last day in that week
ROW_NUMBER() OVER(PARTITION BY UNIQUEID,DATEPART(WEEK,[DATE]) ORDER BY [DATE] DESC) RNO,
-- Find average value of that week
AVG(VALUE) OVER(PARTITION BY UNIQUEID,DATEPART(WEEK,[DATE])) AVGVALUE
FROM DATETAB
)
SELECT UNIQUEID,[DATE],AVGVALUE
FROM CTE
WHERE RNO=1
Click here to view result
2. For last 7 days value's average
DECLARE #DATE DATE = '2014-03-26'
;WITH CTE AS
(
SELECT UNIQUEID,[DATE],VALUE,#DATE CURRENTDATE
FROM DATETAB
WHERE [DATE] BETWEEN DATEADD(DAY,-7,#DATE) AND #DATE
)
SELECT UNIQUEID,CURRENTDATE [DATE],AVG(VALUE) AVGVALUE
FROM CTE
GROUP BY UNIQUEID,CURRENTDATE
Click here to view result