It's not as simple as creating intervals of time that are N minutes long. One record might be 10:04, and the other 10:17 where N is 15.
Perhaps a user-function will work, maybe a CTE. It could require multiple joins on the same source table.
I'm looking for the most "elegant" solution. Maybe there's a feature in SQL I didn't know about which makes this easy.
Here is a reference scenario to make answers more consistent with each other:
create table Comparisons (
DateField DateTime NOT NULL,
Amount int not null, -- default to 5
)
insert into Comparisons (DateField) values ('2000-01-01 10:04'),('2000-01-01 10:17'),
('2000-01-01 12:01'),('2000-01-01 11:54'),('2000-01-01 03:02'),('2000-01-01 03:05'),
('2000-01-01 05:02'),('2000-01-01 05:05'),('2000-01-01 05:19')
output expected:
min: .. 10:04, max: .. 10:17, sum: 10
min: .. 11:54, max: .. 12:01, sum: 10
min: .. 03:02, max: .. 03:05, sum: 10
min: .. 05:02, max: .. 05:19, sum: 15 [optional]
The last output is optional, but if an elegant solution has that as a side-effect, it's acceptable. If an elegant solution can't achieve that optional last output, it won't be a deal breaker.
I believe this produces the results you want:
DECLARE #Comparisons TABLE (i DATETIME, amt INT NOT NULL DEFAULT(5));
INSERT #Comparisons (i) VALUES ('2016-01-01 10:04:00.000')
, ('2016-01-01 10:17:00.000')
, ('2016-01-01 10:25:00.000')
, ('2016-01-01 10:37:00.000')
, ('2016-01-01 10:44:00.000')
, ('2016-01-01 11:52:00.000')
, ('2016-01-01 11:59:00.000')
, ('2016-01-01 12:10:00.000')
, ('2016-01-01 12:22:00.000')
, ('2016-01-01 13:00:00.000')
, ('2016-01-01 09:00:00.000');
DECLARE #N INT = 15;
WITH T AS (
SELECT i
, amt
, CASE WHEN DATEDIFF(MINUTE, previ, i) <= #N THEN 0 ELSE 1 END RN1
, CASE WHEN DATEDIFF(MINUTE, i, nexti) > #N THEN 1 ELSE 0 END RN2
FROM #Comparisons t
OUTER APPLY (SELECT MAX(i) FROM #Comparisons WHERE i < t.i)x(previ)
OUTER APPLY (SELECT MIN(i) FROM #Comparisons WHERE i > t.i)y(nexti)
)
, T2 AS (
SELECT CASE RN1 WHEN 1 THEN i ELSE (SELECT MAX(i) FROM T WHERE RN1 = 1 AND i < T1.i) END mintime
, CASE WHEN RN2 = 1 THEN i ELSE ISNULL((SELECT MIN(i) FROM T WHERE RN2 = 1 AND i > T1.i), i) END maxtime
, amt
FROM T T1
)
SELECT mintime, maxtime, sum(amt) total
FROM T2
GROUP BY mintime, maxtime
ORDER BY mintime;
It's probably a little clunkier than it could be, but it's basically just grouping anything within an #N-minute chain.
It looks like you want to group records based on gaps between them of at least <N> minutes.
In SQL Server 2012+, you would use lag() to identify when groups start and cumulative sum to identify the groups:
select min(datefield), max(datefield), count(*) as num, sum(amount)
from (select c.*,
sum(case when prev_datefield < dateadd(minute, -N, datefield)
then 1 else 0
end) over (order by datefield) as grp
from (select c.*,
lag(datefield) over (order by datefield) as prev_datefield
from Comparisons c
) c
) c
group by grp;
In earlier versions you can use correlated subqueries or apply for the same functionality (albeit at much worse performance).
Intervals could be used, if adjacent intervals are checked. This would require multiplying the source table records by 3
Pseudo-code:
select *
from Comparisons C, {-1, 0, 1} M
group by (datediff(mi, C.DateField, 0) / N) + M
The problem with this approach is how to eliminate the extra results. I suspect this is a deadend approach but someone else might see value in it.
Update: This approach would not work with the 4th expected output [min: .. 05:02, max: .. 05:19, sum: 15]
Related
There's a table with three columns: start date, end date and task duration in hours. For example, something like that:
Id
StartDate
EndDate
Duration
1
07-11-2022
15-11-2022
40
2
02-09-2022
02-11-2022
122
3
10-10-2022
05-11-2022
52
And I want to get a table like that:
Id
Month
HoursPerMonth
1
11
40
2
09
56
2
10
62
2
11
4
3
10
42
3
11
10
Briefly, I wanted to know, how many working hours is in each month between start and end dates. Proportionally. How can I achieve that by MS SQL Query? Data is quite big so the query speed is important enough. Thanks in advance!
I've tried DATEDIFF and EOMONTH, but that solution doesn't work with tasks > 2 months. And I'm sure that this solution is bad decision. I hope, that it can be done more elegant way.
Here is an option using an ad-hoc tally/calendar table
Not sure I'm agree with your desired results
Select ID
,Month = month(D)
,HoursPerMonth = (sum(1.0) / (1+max(datediff(DAY,StartDate,EndDate)))) * max(Duration)
From YourTable A
Join (
Select Top 75000 D=dateadd(day,Row_Number() Over (Order By (Select NULL)),0)
From master..spt_values n1, master..spt_values n2
) B on D between StartDate and EndDate
Group By ID,month(D)
Order by ID,Month
Results
This answer uses CTE recursion.
This part just sets up a temp table with the OP's example data.
DECLARE #source
TABLE (
SOURCE_ID INT
,STARTDATE DATE
,ENDDATE DATE
,DURATION INT
)
;
INSERT
INTO
#source
VALUES
(1, '20221107', '20221115', 40 )
,(2, '20220902', '20221102', 122 )
,(3, '20221010', '20221105', 52 )
;
This part is the query based on the above data. The recursive CTE breaks the time period into months. The second CTE does the math. The final selection does some more math and presents the results the way you want to seem them.
WITH CTE AS (
SELECT
SRC.SOURCE_ID
,SRC.STARTDATE
,SRC.ENDDATE
,SRC.STARTDATE AS 'INTERIM_START_DATE'
,CASE WHEN EOMONTH(SRC.STARTDATE) < SRC.ENDDATE
THEN EOMONTH(SRC.STARTDATE)
ELSE SRC.ENDDATE
END AS 'INTERIM_END_DATE'
,SRC.DURATION
FROM
#source SRC
UNION ALL
SELECT
CTE.SOURCE_ID
,CTE.STARTDATE
,CTE.ENDDATE
,CASE WHEN EOMONTH(CTE.INTERIM_START_DATE) < CTE.ENDDATE
THEN DATEADD( DAY, 1, EOMONTH(CTE.INTERIM_START_DATE) )
ELSE CTE.STARTDATE
END
,CASE WHEN EOMONTH(CTE.INTERIM_START_DATE, 1) < CTE.ENDDATE
THEN EOMONTH(CTE.INTERIM_START_DATE, 1)
ELSE CTE.ENDDATE
END
,CTE.DURATION
FROM
CTE
WHERE
CTE.INTERIM_END_DATE < CTE.ENDDATE
)
, CTE2 AS (
SELECT
CTE.SOURCE_ID
,CTE.STARTDATE
,CTE.ENDDATE
,CTE.INTERIM_START_DATE
,CTE.INTERIM_END_DATE
,CAST( DATEDIFF( DAY, CTE.INTERIM_START_DATE, CTE.INTERIM_END_DATE ) + 1 AS FLOAT ) AS 'MNTH_DAYS'
,CAST( DATEDIFF( DAY, CTE.STARTDATE, CTE.ENDDATE ) + 1 AS FLOAT ) AS 'TTL_DAYS'
,CAST( CTE.DURATION AS FLOAT ) AS 'DURATION'
FROM
CTE
)
SELECT
CTE2.SOURCE_ID AS 'Id'
,MONTH( CTE2.INTERIM_START_DATE ) AS 'Month'
,ROUND( CTE2.MNTH_DAYS/CTE2.TTL_DAYS * CTE2.DURATION, 0 ) AS 'HoursPerMonth'
FROM
CTE2
ORDER BY
CTE2.SOURCE_ID
,CTE2.INTERIM_END_DATE
;
My results agree with Mr. Cappelletti's, not the OP's. Perhaps some tweaking regarding the definition of a "Day" is needed. I don't know.
If time between start and end date is large (more than 100 months) you may want to specify OPTION (MAXRECURSION 0) at the end.
I need to group data together that are related to each other by overlapping timespans based on the records start and end times. SQL-fiddle here: http://sqlfiddle.com/#!18/87e4b/1/0
The current query I have built is giving incorrect results. Callid 3 should give a callCount of 4. It does not because record 6 is not included since it does not overlap with 3, but should be included because it does overlap with one of the other related records. So I believe a recursive CTE may be in need but I am unsure how to write this.
Schema:
CREATE TABLE Calls
([callid] int, [src] varchar(10), [start] datetime, [end] datetime, [conf] varchar(5));
INSERT INTO Calls
([callid],[src],[start],[end],[conf])
VALUES
('1','5555550001','2019-07-09 10:00:00', '2019-07-09 10:10:00', '111'),
('2','5555550002','2019-07-09 10:00:01', '2019-07-09 10:11:00', '111'),
('3','5555550011','2019-07-09 11:00:00', '2019-07-09 11:10:00', '111'),
('4','5555550012','2019-07-09 11:00:01', '2019-07-09 11:11:00', '111'),
('5','5555550013','2019-07-09 11:01:00', '2019-07-09 11:15:00', '111'),
('6','5555550014','2019-07-09 11:12:00', '2019-07-09 11:16:00', '111'),
('7','5555550014','2019-07-09 15:00:00', '2019-07-09 15:01:00', '111');
Current query:
SELECT
detail_record.callid,
detail_record.conf,
MIN(related_record.start) AS sessionStart,
MAX(related_record.[end]) As sessionEnd,
COUNT(related_record.callid) AS callCount
FROM
Calls AS detail_record
INNER JOIN
Calls AS related_record
ON related_record.conf = detail_record.conf
AND ((related_record.start >= detail_record.start
AND related_record.start < detail_record.[end])
OR (related_record.[end] > detail_record.start
AND related_record.[end] <= detail_record.[end])
OR (related_record.start <= detail_record.start
AND related_record.[end] >= detail_record.[end])
)
WHERE
detail_record.start > '1/1/2019'
AND detail_record.conf = '111'
GROUP BY
detail_record.callid,
detail_record.start,
detail_record.conf
HAVING
MIN(related_record.start) >= detail_record.start
ORDER BY sessionStart DESC
Expected Results:
callid conf sessionStart sessionEnd callCount
7 111 2019-07-09T15:00:00Z 2019-07-09T15:01:00Z 1
3 111 2019-07-09T11:00:00Z 2019-07-09T11:15:00Z 4
1 111 2019-07-09T10:00:00Z 2019-07-09T10:11:00Z 2
This is a gaps-and-islands problem. It does not require a recursive CTE. You can use window functions:
select min(callid), conf, grouping, min([start]), max([end]), count(*)
from (select c.*,
sum(case when prev_end < [start] then 1 else 0 end) over (order by start) as grouping
from (select c.*,
max([end]) over (partition by conf order by [start] rows between unbounded preceding and 1 preceding) as prev_end
from calls c
) c
) c
group by conf, grouping;
The innermost subquery calculates the previous end. The middle subquery compares this to the current start, to determine when groups of adjacent rows are the beginning of a new group. A cumulative sum then determines the grouping.
And, the outer query aggregates to summarize information about each group.
Here is a db<>fiddle.
I have a table that has data like following.
attr |time
----------------|--------------------------
abc |2018-08-06 10:17:25.282546
def |2018-08-06 10:17:25.325676
pqr |2018-08-05 10:17:25.366823
abc |2018-08-06 10:17:25.407941
def |2018-08-05 10:17:25.449249
I want to group them and count by attr column row wise and also create additional columns in to show their counts per day and percentages as shown below.
attr |day1_count| day1_%| day2_count| day2_%
----------------|----------|-------|-----------|-------
abc |2 |66.6% | 0 | 0.0%
def |1 |33.3% | 1 | 50.0%
pqr |0 |0.0% | 1 | 50.0%
I'm able to display one count by using group by but unable to find out how to even seperate them to multiple columns. I tried to generate day1 percentage with
SELECT attr, count(attr), count(attr) / sum(sub.day1_count) * 100 as percentage from (
SELECT attr, count(*) as day1_count FROM my_table WHERE DATEPART(week, time) = DATEPART(day, GETDate()) GROUP BY attr) as sub
GROUP BY attr;
But this also is not giving me correct answer, I'm getting all zeroes for percentage and count as 1. Any help is appreciated. I'm trying to do this in Redshift which follows postgresql syntax.
Let's nail the logic before presenting:
with CTE1 as
(
select attr, DATEPART(day, time) as theday, count(*) as thecount
from MyTable
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
select t1.attr, t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
From here you can pivot to create a day by day if you feel the need
I am trying to enhance the query #johnHC btw if you needs for 7days then you have to those days in case when
with CTE1 as
(
select attr, time::date as theday, count(*) as thecount
from t group by attr,time::date
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
,
CTE3 as
(
select t1.attr, EXTRACT(DOW FROM t1.theday) as day_nmbr,t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
)
select CTE3.attr,
max(case when day_nmbr=0 then CTE3.thecount end) as day1Cnt,
max(case when day_nmbr=0 then percentofday end) as day1,
max(case when day_nmbr=1 then CTE3.thecount end) as day2Cnt,
max( case when day_nmbr=1 then percentofday end) day2
from CTE3 group by CTE3.attr
http://sqlfiddle.com/#!17/54ace/20
In case that you have only 2 days:
http://sqlfiddle.com/#!17/3bdad/3 (days descending as in your example from left to right)
http://sqlfiddle.com/#!17/3bdad/5 (days ascending)
The main idea is already mentioned in the other answers. Instead of joining the CTEs for calculating the values I am using window functions which is a bit shorter and more readable I think. The pivot is done the same way.
SELECT
attr,
COALESCE(max(count) FILTER (WHERE day_number = 0), 0) as day1_count, -- D
COALESCE(max(percent) FILTER (WHERE day_number = 0), 0) as day1_percent,
COALESCE(max(count) FILTER (WHERE day_number = 1), 0) as day2_count,
COALESCE(max(percent) FILTER (WHERE day_number = 1), 0) as day2_percent
/*
Add more days here
*/
FROM(
SELECT *, (count::float/count_per_day)::decimal(5, 2) as percent -- C
FROM (
SELECT DISTINCT
attr,
MAX(time::date) OVER () - time::date as day_number, -- B
count(*) OVER (partition by time::date, attr) as count, -- A
count(*) OVER (partition by time::date) as count_per_day
FROM test_table
)s
)s
GROUP BY attr
ORDER BY attr
A counting the rows per day and counting the rows per day AND attr
B for more readability I convert the date into numbers. Here I take the difference between current date of the row and the maximum date available in the table. So I get a counter from 0 (first day) up to n - 1 (last day)
C calculating the percentage and rounding
D pivot by filter the day numbers. The COALESCE avoids the NULL values and switched them into 0. To add more days you can multiply these columns.
Edit: Made the day counter more flexible for more days; new SQL Fiddle
Basically, I see this as conditional aggregation. But you need to get an enumerator for the date for the pivoting. So:
SELECT attr,
COUNT(*) FILTER (WHERE day_number = 1) as day1_count,
COUNT(*) FILTER (WHERE day_number = 1) / cnt as day1_percent,
COUNT(*) FILTER (WHERE day_number = 2) as day2_count,
COUNT(*) FILTER (WHERE day_number = 2) / cnt as day2_percent
FROM (SELECT attr,
DENSE_RANK() OVER (ORDER BY time::date DESC) as day_number,
1.0 * COUNT(*) OVER (PARTITION BY attr) as cnt
FROM test_table
) s
GROUP BY attr, cnt
ORDER BY attr;
Here is a SQL Fiddle.
My goal is to go through my dataset, compare each ITEM_NO/LOC day-by-day, and identify days where the VAL has changed from the day before. Right now, I do that by sorting, creating a column of row numbers, joining the table to itself offset by a row, and then only picking rows where VAL has changed.
Each month has about half a billion records. In total there's around 2.7 billion records. The data is stored in DB2 BLU. The table already has indices for ITEM_NO, LOC, and ARCV_DATE. I only have select access to the table.
I think the big bottleneck is the order by in the select statement given that n is so large. One idea I had was to try to do the sorting month-by-month and then union each of the months together.
Here's what I have so far:
with x as (
select ITEM_NO, LOC, ARCV_DATE, VAL, ROW_NUMBER() over (order by ITEM_NO, LOC, ARCV_DATE) as RN
from MY_SCHEMA.MY_TABLE a
where
ARCV_DATE >= '2017-06-01'
and ARCV_DATE < '2017-07-01'
)
SELECT
x.ITEM_NO,
x.LOC,
y.ARCV_DATE as CHANGE_DATE,
y.VAL,
x.VAL as OLD_VAL
FROM x
INNER JOIN x AS y
ON x.rn = y.rn + 1
WHERE
x.VAL <> y.VAL
and x.ITEM_NO = y.ITEM_NO
and x.LOC = y.LOC
What could I do to improve performance on this for such a dataset?
Without any write access your options are very limited because the query isn't that complex. You could try avoiding the join altogether by using LAG() OVER() such as this:
SELECT
*
FROM (
SELECT
ITEM_NO
, LOC
, ARCV_DATE
, VAL
, LAG(ARCV_DATE, 1) OVER (PARTITION BY ITEM_NO, LOC ORDER BY ARCV_DATE DESC) AS CHANGE_DATE
, LAG(VAL, 1) OVER (PARTITION BY ITEM_NO, LOC ORDER BY ARCV_DATE DESC) AS OLD_VAL
FROM MY_SCHEMA.MY_TABLE
WHERE ARCV_DATE >= '2017-06-01'
AND ARCV_DATE < '2017-07-01'
) d
WHERE ( VAL <> OLD_VAL OR OLD_VAL IS NULL )
But tuning this further could require adding or changing indexes.
SELECT currentval.ITEM,
currentval.LOC
currentval.ARCV_DATE currentdate
prevval.ARCV_DATE Previousdate
currentval.val currentval
prevval.val Previousval
FROM MY_SCHEMA.MY_TABLE currentval JOIN
MY_SCHEMA.MY_TABLE prevval ON
currentval.ITEM_NO = prevval.ITEM_NO
WHERE currentval.loc = prevval.loc
AND currentval.val <> prevval.val
AND currentval.ARCV_DATE = prevval.ARCV_DATE+1
AND currentval.ARCV_DATE >= '2017-06-01'
AND prevval.ARCV_DATE < '2017-07-01'
Assuming that values will change from one day to next day. This query will retrieve the values that changes from previous day to current day.
AND currentval.ARCV_DATE = prevval.ARCV_DATE+1
I'm not sure if this has been asked before because I'm having trouble even asking it myself. I think the best way to explain my dilemma is to use an example.
Say I've rated my happiness on a scale of 1-10 every day for 10 years and I have the results in a big table where I have a single date correspond to a single integer value of my happiness rating. I say, though, that I only care about my happiness over 60 day periods on average (this may seem weird but this is a simplified example). So I wrap up this information to a table where I now have a start date field, an end date field, and an average rating field where the start days are every day from the first day to the last over all 10 years, but the end dates are exactly 60 days later. To be clear, these 60 day periods are overlapping (one would share 59 days with the next one, 58 with the next, and so on).
Next I pick a threshold rating, say 5, where I want to categorize everything below it into a "bad" category and everything above into a "good" category. I could easily add another field and use a case structure to give every 60-day range a "good" or "bad" flag.
Then to sum it up, I want to display the total periods of "good" and "bad" from maximum beginning to maximum end date. This is where I'm stuck. I could group by the good/bad category and then just take min(start date) and max(end date), but then if, say, the ranges go from good to bad to good then to bad again, output would show overlapping ranges of good and bad. In the aforementioned situation, I would want to show four different ranges.
I realize this may seem clearer to me that it would to someone else so if you need clarification just ask.
Thank you
---EDIT---
Here's an example of what the before would look like:
StartDate| EndDate| MoodRating
------------+------------+------------
1/1/1991 |3/1/1991 | 7
1/2/1991 |3/2/1991 | 7
1/3/1991 |3/3/1991 | 4
1/4/1991 |3/4/1991 | 4
1/5/1991 |3/5/1991 | 7
1/6/1991 |3/6/1991 | 7
1/7/1991 |3/7/1991 | 4
1/8/1991 |3/8/1991 | 4
1/9/1991 |3/9/1991 | 4
And the after:
MinStart| MaxEnd | Good/Bad
-----------+------------+----------
1/1/1991|3/2/1991 |good
1/3/1991|3/4/1991 |bad
1/5/1991|3/6/1991 |good
1/7/1991|3/9/1991 |bad
Currently my query with the group by rating would show:
MinStart| MaxEnd | Good/Bad
-----------+------------+----------
1/1/1991|3/6/1991 |good
1/3/1991|3/9/1991 |bad
This is something along the lines of
select min(StartDate), max(EndDate), Good_Bad
from sourcetable
group by Good_Bad
While Jason A Long's answer may be correct - I can't read it or figure it out, so I figured I would post my own answer. Assuming that this isn't a process that you're going to be constantly running, the CURSOR's performance hit shouldn't matter. But (at least to me) this solution is very readable and can be easily modified.
In a nutshell - we insert the first record from your source table into our results table. Next, we grab the next record and see if the mood score is the same as the previous record. If it is, we simply update the previous record's end date with the current record's end date (extending the range). If not, we insert a new record. Rinse, repeat. Simple.
Here is your setup and some sample data:
DECLARE #MoodRanges TABLE (StartDate DATE, EndDate DATE, MoodRating int)
INSERT INTO #MoodRanges
VALUES
('1/1/1991','3/1/1991', 7),
('1/2/1991','3/2/1991', 7),
('1/3/1991','3/3/1991', 4),
('1/4/1991','3/4/1991', 4),
('1/5/1991','3/5/1991', 7),
('1/6/1991','3/6/1991', 7),
('1/7/1991','3/7/1991', 4),
('1/8/1991','3/8/1991', 4),
('1/9/1991','3/9/1991', 4)
Next, we can create a table to store our results, as well as some variable placeholders for our cursor:
DECLARE #MoodResults TABLE(ID INT IDENTITY(1, 1), StartDate DATE, EndDate DATE, MoodScore varchar(50))
DECLARE #CurrentStartDate DATE, #CurrentEndDate DATE, #CurrentMoodScore INT,
#PreviousStartDate DATE, #PreviousEndDate DATE, #PreviousMoodScore INT
Now we put all of the sample data into our CURSOR:
DECLARE MoodCursor CURSOR FOR
SELECT StartDate, EndDate, MoodRating
FROM #MoodRanges
OPEN MoodCursor
FETCH NEXT FROM MoodCursor INTO #CurrentStartDate, #CurrentEndDate, #CurrentMoodScore
WHILE ##FETCH_STATUS = 0
BEGIN
IF #PreviousStartDate IS NOT NULL
BEGIN
IF (#PreviousMoodScore >= 5 AND #CurrentMoodScore >= 5)
OR (#PreviousMoodScore < 5 AND #CurrentMoodScore < 5)
BEGIN
UPDATE #MoodResults
SET EndDate = #CurrentEndDate
WHERE ID = (SELECT MAX(ID) FROM #MoodResults)
END
ELSE
BEGIN
INSERT INTO
#MoodResults
VALUES
(#CurrentStartDate, #CurrentEndDate, CASE WHEN #CurrentMoodScore >= 5 THEN 'GOOD' ELSE 'BAD' END)
END
END
ELSE
BEGIN
INSERT INTO
#MoodResults
VALUES
(#CurrentStartDate, #CurrentEndDate, CASE WHEN #CurrentMoodScore >= 5 THEN 'GOOD' ELSE 'BAD' END)
END
SET #PreviousStartDate = #CurrentStartDate
SET #PreviousEndDate = #CurrentEndDate
SET #PreviousMoodScore = #CurrentMoodScore
FETCH NEXT FROM MoodCursor INTO #CurrentStartDate, #CurrentEndDate, #CurrentMoodScore
END
CLOSE MoodCursor
DEALLOCATE MoodCursor
And here are the results:
SELECT * FROM #MoodResults
ID StartDate EndDate MoodScore
----------- ---------- ---------- --------------------------------------------------
1 1991-01-01 1991-03-02 GOOD
2 1991-01-03 1991-03-04 BAD
3 1991-01-05 1991-03-06 GOOD
4 1991-01-07 1991-03-09 BAD
Is this what you're looking for?
IF OBJECT_ID('tempdb..#MyDailyMood', 'U') IS NOT NULL
DROP TABLE #MyDailyMood;
CREATE TABLE #MyDailyMood (
TheDate DATE NOT NULL,
MoodLevel INT NOT NULL
);
WITH
cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b),
cte_Calendar (dt) AS (
SELECT TOP (DATEDIFF(dd, '2007-01-01', '2017-01-01'))
DATEADD(dd, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1, '2007-01-01')
FROM
cte_n3 a CROSS JOIN cte_n3 b
)
INSERT #MyDailyMood (TheDate, MoodLevel)
SELECT
c.dt,
ABS(CHECKSUM(NEWID()) % 10) + 1
FROM
cte_Calendar c;
--==========================================================
WITH
cte_AddRN AS (
SELECT
*,
RN = ISNULL(NULLIF(ROW_NUMBER() OVER (ORDER BY mdm.TheDate) % 60, 0), 60)
FROM
#MyDailyMood mdm
),
cte_AssignGroups AS (
SELECT
*,
DateGroup = DENSE_RANK() OVER (PARTITION BY arn.RN ORDER BY arn.TheDate)
FROM
cte_AddRN arn
)
SELECT
BegOfRange = MIN(ag.TheDate),
EndOfRange = MAX(ag.TheDate),
AverageMoodLevel = AVG(ag.MoodLevel),
CASE WHEN AVG(ag.MoodLevel) >= 5 THEN 'Good' ELSE 'Bad' END
FROM
cte_AssignGroups ag
GROUP BY
ag.DateGroup;
Post OP update solution...
WITH
cte_AddRN AS ( -- Add a row number to each row that resets to 1 ever 60 rows.
SELECT
*,
RN = ISNULL(NULLIF(ROW_NUMBER() OVER (ORDER BY mdm.TheDate) % 60, 0), 60)
FROM
#MyDailyMood mdm
),
cte_AssignGroups AS ( -- Use DENSE_RANK to create groups based on the RN added above.
-- How it works: RN set the row number 1 - 60 then repeats itself
-- but we dont want ever 60th row grouped together. We want blocks of 60 consecutive rows grouped together
-- DENSE_RANK accompolishes this by ranking within all the "1's", "2's"... and so on.
-- verify with the following query... SELECT * FROM cte_AssignGroups ag ORDER BY ag.TheDate
SELECT
*,
DateGroup = DENSE_RANK() OVER (PARTITION BY arn.RN ORDER BY arn.TheDate)
FROM
cte_AddRN arn
),
cte_AggRange AS ( -- This is just a straight forward aggregation/rollup. It produces the results similar to the sample data you posed in your edit.
SELECT
BegOfRange = MIN(ag.TheDate),
EndOfRange = MAX(ag.TheDate),
AverageMoodLevel = AVG(ag.MoodLevel),
GorB = CASE WHEN AVG(ag.MoodLevel) >= 5 THEN 'Good' ELSE 'Bad' END,
ag.DateGroup
FROM
cte_AssignGroups ag
GROUP BY
ag.DateGroup
),
cte_CompactGroup AS ( -- This time we're using dense rank to group all of the consecutive "Good" and "Bad" values so that they can be further aggregated below.
SELECT
ar.BegOfRange, ar.EndOfRange, ar.AverageMoodLevel, ar.GorB, ar.DateGroup,
DenseGroup = ar.DateGroup - DENSE_RANK() OVER (PARTITION BY ar.GorB ORDER BY ar.BegOfRange)
FROM
cte_AggRange ar
)
-- The final aggregation step...
SELECT
BegOfRange = MIN(cg.BegOfRange),
EndOfRange = MAX(cg.EndOfRange),
cg.GorB
FROM
cte_CompactGroup cg
GROUP BY
cg.DenseGroup,
cg.GorB
ORDER BY
BegOfRange;