Fill missing rows using RECURSIVE CTE in SQL SERVER - sql-server-2012

Input
id, date, value
1, '2020-01-01', 100
1, '2020-01-03', 200
1, '2020-01-05', 500
Output expected
1, '2020-01-01', 100
1, '2020-01-02', 100
1, '2020-01-03', 200
1, '2020-01-04', 200
1, '2020-01-05', 500
I do not want to use any calendar table joined along with this.
I want to achieve this with Recursive CTE

Sample data
create table data
(
d date,
i int
);
insert into data (d, i) values
('2020-01-01', 100),
('2020-01-03', 200),
('2020-01-05', 500);
Solution
with maxDate as
(
select max(d.d) as maxDate
from data d
),
rcte as
(
select d.d, d.i
from data d
union all
select dateadd(day, 1, r.d), r.i
from rcte r
cross join maxDate md
where r.d < md.maxDate
)
select '1' as [id],
r.d as [date],
max(r.i) as [value]
from rcte r
group by r.d
order by r.d;
Result
id date value
--- ----------- -----
1 2020-01-01 100
1 2020-01-02 100
1 2020-01-03 200
1 2020-01-04 200
1 2020-01-05 500
Fiddle

Related

How to make a query showing purchases of a client on the same day, but only if those were made in diffrent stores (oracle)?

I want to show cases of clients with at least 2 purchases on the same day. But I only want to count those purchases that were made in different stores.
So far I have:
Select Purchase.PurClientId, Purchase.PurDate, Purchase.PurId
from Purchase
join
(
Select count(Purchase.PurId),
Purchase.PurClientId,
to_date(Purchase.PurDate)
from Purchases
group by Purchase.PurClientId,
to_date(Purchase.PurDate)
having count (Purchase.PurId) >=2
) k
on k.PurClientId=Purchase.PurClientId
But I have no clue how to make it count purchases only if those were made in different stores. The column which would allow to identify shop is Purchase.PurShopId.
Thanks for help!
You can use:
SELECT PurId,
PurDate,
PurClientId,
PurShopId
FROM (
SELECT p.*,
COUNT(DISTINCT PurShopId) OVER (
PARTITION BY PurClientId, TRUNC(PurDate)
) AS num_stores
FROM Purchase p
)
WHERE num_stores >= 2;
Or
SELECT *
FROM Purchase p
WHERE EXISTS(
SELECT 1
FROM Purchase x
WHERE p.purclientid = x.purclientid
AND p.purshopid != x.purshopid
AND TRUNC(p.purdate) = TRUNC(x.purdate)
);
Which, for the sample data:
CREATE TABLE purchase (
purid PRIMARY KEY,
purdate,
purclientid,
PurShopId
) AS
SELECT 1, DATE '2021-01-01', 1, 1 FROM DUAL UNION ALL
SELECT 2, DATE '2021-01-02', 1, 1 FROM DUAL UNION ALL
SELECT 3, DATE '2021-01-02', 1, 2 FROM DUAL UNION ALL
SELECT 4, DATE '2021-01-03', 1, 1 FROM DUAL UNION ALL
SELECT 5, DATE '2021-01-03', 1, 1 FROM DUAL UNION ALL
SELECT 6, DATE '2021-01-04', 1, 2 FROM DUAL;
Both output:
PURID
PURDATE
PURCLIENTID
PURSHOPID
2
2021-01-02 00:00:00
1
1
3
2021-01-02 00:00:00
1
2
db<>fiddle here

Select hours as columns from Oracle table

I am working with an Oracle database table that is structured like this:
TRANS_DATE TRANS_HOUR_ENDING TRANS_HOUR_SUFFIX READING
1/1/2021 1 1 100
1/1/2021 2 1 105
... ... ... ...
1/1/2021 24 1 115
The TRANS_HOUR_SUFFIX is only used to track hourly readings on days where day light savings time ends (when there could be 2 hours with the same TRANS_HOUR value). This column is the bane of this database's design, however I'm trying to do something to select this data in a certain way. We need a report that columnizes this data based on the hour. Therefore, it would be structured like this (last day shows a day on which DST would end):
TRANS_DATE HOUR_1 HOUR_2_1 HOUR_2_2 ... HOUR_24
1/1/2021 100 105 0 ... 115
1/2/2021 112 108 0 ... 135
... ... ... ... ... ...
11/7/2021 117 108 107 ... 121
I have done something like this before with a PIVOT, however in this case I'm having trouble determining what I should do to account for the suffix. When DST ending happens, we have to account for this hour. I know that we can do this by selecting each hourly value individually with decode or case statements, but that is some messy code. Is there a cleaner way to do this?
You can include multiple source columns in the pivot for() and in() clauses, so you could do:
select *
from (
select trans_date,
trans_hour_ending,
trans_hour_suffix,
reading
from your_table
)
pivot (max(reading) for (trans_hour_ending, trans_hour_suffix)
in ((1, 1) as hour_1, (2, 1) as hour_2_1, (2, 2) as hour_2_2, (3, 1) as hour_3,
-- snip
(23, 1) as hour_23, (24, 1) as hour_24))
order by trans_date;
where every hour has a (24, 1) tuple, and the DST-relevant hour has an extra (2, 2) tuple.
If you don't have rows for every hour - which you don't appear to have form the very brief sample data, at least for suffix 2 for non-DST days - then you will get null results for those, but can replace them with zeros:
select trans_date,
coalesce(hour_1, 0) as hour_1,
coalesce(hour_2_1, 0) as hour_2_1,
coalesce(hour_2_2, 0) as hour_2_2,
coalesce(hour_3, 0) as hour_3,
-- snip
coalesce(hour_23, 0) as hour_23,
coalesce(hour_24, 0) as hour_24
from (
select trans_date,
trans_hour_ending,
trans_hour_suffix,
reading
from your_table
)
pivot (max(reading) for (trans_hour_ending, trans_hour_suffix)
in ((1, 1) as hour_1, (2, 1) as hour_2_1, (2, 2) as hour_2_2, (3, 1) as hour_3,
-- snip
(23, 1) as hour_23, (24, 1) as hour_24))
order by trans_date;
which with slightly expanded sample data gets:
TRANS_DATE HOUR_1 HOUR_2_1 HOUR_2_2 HOUR_3 HOUR_23 HOUR_24
---------- ---------- ---------- ---------- ---------- ---------- ----------
2021-01-01 100 105 0 0 0 115
2021-01-02 112 108 0 0 0 135
2021-11-07 117 108 107 0 0 121
Which is a bit long-winded when you have to include all 25 columns everywhere; but to avoid that you'd have to do a dynamic pivot.
Like I said in my comment, if you can format it with an additional row, I would recommend just having a row for the extra hour. Every other day would look normal. The query to do it would look like this:
CREATE TABLE READINGS
(
TRANS_DATE DATE,
TRANS_HOUR INTEGER,
TRANS_SUFFIX INTEGER,
READING INTEGER
);
INSERT INTO readings
SELECT TO_DATE('01/01/2021', 'MM/DD/YYYY'), 1, 1, 100 FROM DUAL UNION ALL
SELECT TO_DATE('01/01/2021', 'MM/DD/YYYY'), 2, 1, 100 FROM DUAL UNION ALL
SELECT TO_DATE('11/07/2021', 'MM/DD/YYYY'), 1, 1, 200 FROM DUAL UNION ALL
SELECT TO_DATE('11/07/2021', 'MM/DD/YYYY'), 1, 2, 300 FROM DUAL UNION ALL
SELECT TO_DATE('11/07/2021', 'MM/DD/YYYY'), 2, 1, 500 FROM DUAL UNION ALL
SELECT TO_DATE('11/07/2021', 'MM/DD/YYYY'), 2, 2, 350 FROM DUAL;
SELECT TRANS_DATE||DECODE(MAX(TRANS_SUFFIX) OVER (PARTITION BY TRANS_DATE), 1, NULL, 2, ' - '||TRANS_SUFFIX) AS TRANS_DATE,
HOUR_1, HOUR_2, /*...*/ HOUR_24
FROM readings
PIVOT (MAX(READING) FOR TRANS_HOUR IN (1 AS HOUR_1, 2 AS HOUR_2, /*...*/ 24 AS HOUR_24));
This would result in the following results (Sorry, I can't get dbfiddle to work):
TRANS_DATE
HOUR_1
HOUR_2
HOUR_24
01-JAN-21
100
100
-
07-NOV-21 - 1
200
500
-
07-NOV-21 - 2
300
350
-

Complex query analyzing historical records

I am using Oracle and trying to retrieve the total number of days a person was out of the office during the year. I have 2 tables involved:
Statuses
1 - Active
2 - Out of the Office
3 - Other
ScheduleHistory
RecordID - primary key
PersonID
PreviousStatusID
NextStatusID
DateChanged
I can easily find when the person went on vacation and when they came back, using
SELECT DateChanged FROM ScheduleHistory WHERE PersonID=111 AND NextStatusID = 2
and
SELECT DateChanged FROM ScheduleHistory WHERE PersonID=111 AND PreviousStatusID = 2
But in case a person went on vacation more than once, how can I can I calculate total number of days a person was out of the office. Is it possible to do programmatically, given only PersonID?
Here is some sample data:
RecordID PersonID PreviousStatusID NextStatusID DateChanged
-----------------------------------------------------------------------------
1 111 1 2 03/11/2020
2 111 2 1 03/13/2020
3 111 1 3 04/01/2020
4 111 3 1 04/07/2020
5 111 1 2 06/03/2020
6 111 2 1 06/05/2020
7 111 1 2 09/14/2020
8 111 2 1 09/17/2020
So from the data above, for the year 2020 for PersonID 111 the query should return 7
Try this:
with aux1 AS (
SELECT
a.*,
to_date(datechanged, 'MM/DD/YYYY') - LAG(to_date(datechanged, 'MM/DD/YYYY')) OVER(
PARTITION BY personid
ORDER BY
recordid
) lag_date
FROM
ScheduleHistory a
)
SELECT
personid,
SUM(lag_date) tot_days_ooo
FROM
aux1
WHERE
previousstatusid = 2
GROUP BY
personid;
If you want total days (or weekdays) for each year (and to account for periods when it goes over the year boundary) then:
WITH date_ranges ( personid, status, start_date, end_date ) AS (
SELECT personid,
nextstatusid,
datechanged,
LEAD(datechanged, 1, datechanged) OVER(
PARTITION BY personid
ORDER BY datechanged
)
FROM table_name
),
split_year_ranges ( personid, year, start_date, end_date, max_date ) AS (
SELECT personid,
TRUNC( start_date, 'YY' ),
start_date,
LEAST(
end_date,
ADD_MONTHS( TRUNC( start_date, 'YY' ), 12 )
),
end_date
FROM date_ranges
WHERE status = 2
UNION ALL
SELECT personid,
end_date,
end_date,
LEAST( max_date, ADD_MONTHS( end_date, 12 ) ),
max_date
FROM split_year_ranges
WHERE end_date < max_date
)
SELECT personid,
EXTRACT( YEAR FROM year) AS year,
SUM( end_date - start_date ) AS total_days,
SUM(
( TRUNC( end_date, 'IW' ) - TRUNC( start_date, 'IW' ) ) * 5 / 7
+ LEAST( end_date - TRUNC( end_date, 'IW' ), 5 )
- LEAST( start_date - TRUNC( start_date, 'IW' ), 5 )
) AS total_weekdays
FROM split_year_ranges
GROUP BY personid, year
ORDER BY personid, year
Which, for the sample data:
CREATE TABLE table_name ( RecordID, PersonID, PreviousStatusID, NextStatusID, DateChanged ) AS
SELECT 1, 111, 1, 2, DATE '2020-03-11' FROM DUAL UNION ALL
SELECT 2, 111, 2, 1, DATE '2020-03-13' FROM DUAL UNION ALL
SELECT 3, 111, 1, 3, DATE '2020-04-01' FROM DUAL UNION ALL
SELECT 4, 111, 3, 1, DATE '2020-04-07' FROM DUAL UNION ALL
SELECT 5, 111, 1, 2, DATE '2020-06-03' FROM DUAL UNION ALL
SELECT 6, 111, 2, 1, DATE '2020-06-05' FROM DUAL UNION ALL
SELECT 7, 111, 1, 2, DATE '2020-09-14' FROM DUAL UNION ALL
SELECT 8, 111, 2, 1, DATE '2020-09-17' FROM DUAL UNION ALL
SELECT 9, 222, 1, 2, DATE '2019-12-31' FROM DUAL UNION ALL
SELECT 10, 222, 2, 2, DATE '2020-12-01' FROM DUAL UNION ALL
SELECT 11, 222, 2, 2, DATE '2021-01-02' FROM DUAL;
Outputs:
PERSONID
YEAR
TOTAL_DAYS
TOTAL_WEEKDAYS
111
2020
7
7
222
2019
1
1
222
2020
366
262
222
2021
1
1
db<>fiddle here
Provided no vacation crosses a year boundary
with grps as (
SELECT sh.*,
row_number() over (partition by PersonID, NextStatusID order by DateChanged) grp
FROM ScheduleHistory sh
WHERE NextStatusID in (1,2) and 3 not in (NextStatusID, PreviousStatusID)
), durations as (
SELECT PersonID, min(DateChanged) DateChanged, max(DateChanged) - min(DateChanged) duration
FROM grps
GROUP BY PersonID, grp
)
SELECT PersonID, sum(duration) days_out
FROM durations
GROUP BY PersonID;
db<>fiddle
year_span is used to split an interval spanning across two years in two different records
H1 adds a row number dependent from PersonID to get the right sequence for each person
H2 gets the periods for each status change and extract 1st day of the year of the interval end
H3 split records that span across two years and calculate the right date_start and date_end for each interval
H calculates days elapsed in each interval for each year
final query sum up the records to get output
EDIT
If you need workdays instead of total days, you should not use total_days/7*5 because it is a bad approximation and in some cases gives weird results.
I have posted a solution to jump on fridays to mondays here
with
statuses (sid, sdescr) as (
select 1, 'Active' from dual union all
select 2, 'Out of the Office' from dual union all
select 3, 'Other' from dual
),
ScheduleHistory(RecordID, PersonID, PreviousStatusID, NextStatusID , DateChanged) as (
select 1, 111, 1, 2, date '2020-03-11' from dual union all
select 2, 111, 2, 1, date '2020-03-13' from dual union all
select 3, 111, 1, 3, date '2020-04-01' from dual union all
select 4, 111, 3, 1, date '2020-04-07' from dual union all
select 5, 111, 1, 2, date '2020-06-03' from dual union all
select 6, 111, 2, 1, date '2020-06-05' from dual union all
select 7, 111, 1, 2, date '2020-09-14' from dual union all
select 8, 111, 2, 1, date '2020-09-17' from dual union all
SELECT 9, 222, 1, 2, date '2019-12-31' from dual UNION ALL
SELECT 10, 222, 2, 2, date '2020-12-01' from dual UNION ALL
SELECT 11, 222, 2, 2, date '2021-01-02' from dual
),
year_span (n) as (
select 1 from dual union all
select 2 from dual
),
H1 AS (
SELECT ROW_NUMBER() OVER (PARTITION BY PersonID ORDER BY RecordID) PID, H.*
FROM ScheduleHistory H
),
H2 as (
SELECT
H1.*, H2.DateChanged DateChanged2,
EXTRACT(YEAR FROM H2.DateChanged) - EXTRACT(YEAR FROM H1.DateChanged) + 1 Y,
trunc(H2.DateChanged,'YEAR') Y2
FROM H1 H1
LEFT JOIN H1 H2 ON H1.PID = H2.PID-1 AND H1.PersonID = H2.PersonID
),
H3 AS (
SELECT Y, N, H2.PID, H2.RecordID, H2.PersonID, H2.NextStatusID,
CASE WHEN Y=1 THEN H2.DateChanged ELSE CASE WHEN N=1 THEN H2.DateChanged ELSE Y2 END END D1,
CASE WHEN Y=1 THEN H2.DateChanged2 ELSE CASE WHEN N=1 THEN Y2 ELSE H2.DateChanged2 END END D2
FROM H2
JOIN year_span N ON N.N <=Y
),
H AS (
SELECT PersonID, NextStatusID, EXTRACT(year FROM d1) Y, d2-d1 D
FROM H3
)
select PersonID, sdescr Status, Y, sum(d) d
from H
join statuses s on NextStatusID = s.sid
group by PersonID, sdescr, Y
order by PersonID, sdescr, Y
output
PersonID Status Y d
111 Active 2020 177
111 Other 2020 6
111 Out of the Office 2020 7
222 Out of the Office 2019 1
222 Out of the Office 2020 366
222 Out of the Office 2021 1
check the fiddle here

Determine contiguous date intervals

I have the following table structure:
id int -- more like a group id, not unique in the table
AddedOn datetime -- when the record was added
For a specific id there is at most one record each day. I have to write a query that returns contiguous (at day level) date intervals for each id.
The expected result structure is:
id int
StartDate datetime
EndDate datetime
Note that the time part of AddedOn is available but it is not important here.
To make it clearer, here is some input data:
with data as
(
select * from
(
values
(0, getdate()), --dummy record used to infer column types
(1, '20150101'),
(1, '20150102'),
(1, '20150104'),
(1, '20150105'),
(1, '20150106'),
(2, '20150101'),
(2, '20150102'),
(2, '20150103'),
(2, '20150104'),
(2, '20150106'),
(2, '20150107'),
(3, '20150101'),
(3, '20150103'),
(3, '20150105'),
(3, '20150106'),
(3, '20150108'),
(3, '20150109'),
(3, '20150110')
) as d(id, AddedOn)
where id > 0 -- exclude dummy record
)
select * from data
And the expected result:
id StartDate EndDate
1 2015-01-01 2015-01-02
1 2015-01-04 2015-01-06
2 2015-01-01 2015-01-04
2 2015-01-06 2015-01-07
3 2015-01-01 2015-01-01
3 2015-01-03 2015-01-03
3 2015-01-05 2015-01-06
3 2015-01-08 2015-01-10
Although it looks like a common problem I couldn't find a similar enough question. Also I'm getting closer to a solution and I will post it when (and if) it works but I feel that there should be a more elegant one.
Here's answer without any fancy joining, but simply using group by and row_number, which is not only simple but also more efficient.
WITH CTE_dayOfYear
AS
(
SELECT id,
AddedOn,
DATEDIFF(DAY,'20000101',AddedOn) dyID,
ROW_NUMBER() OVER (ORDER BY ID,AddedOn) row_num
FROM data
)
SELECT ID,
MIN(AddedOn) StartDate,
MAX(AddedOn) EndDate,
dyID-row_num AS groupID
FROM CTE_dayOfYear
GROUP BY ID,dyID - row_num
ORDER BY ID,2,3
The logic is that the dyID is based on the date so there are gaps while row_num has no gaps. So every time there is a gap in dyID, then it changes the difference between row_num and dyID. Then I simply use that difference as my groupID.
In Sql Server 2008 it is a little bit pain without LEAD and LAG functions:
WITH data
AS ( SELECT * ,
ROW_NUMBER() OVER ( ORDER BY id, AddedOn ) AS rn
FROM ( VALUES ( 0, GETDATE()), --dummy record used to infer column types
( 1, '20150101'), ( 1, '20150102'), ( 1, '20150104'),
( 1, '20150105'), ( 1, '20150106'), ( 2, '20150101'),
( 2, '20150102'), ( 2, '20150103'), ( 2, '20150104'),
( 2, '20150106'), ( 2, '20150107'), ( 3, '20150101'),
( 3, '20150103'), ( 3, '20150105'), ( 3, '20150106'),
( 3, '20150108'), ( 3, '20150109'), ( 3, '20150110') )
AS d ( id, AddedOn )
WHERE id > 0 -- exclude dummy record
),
diff
AS ( SELECT d1.* ,
CASE WHEN ISNULL(DATEDIFF(dd, d2.AddedOn, d1.AddedOn),
1) = 1 THEN 0
ELSE 1
END AS diff
FROM data d1
LEFT JOIN data d2 ON d1.id = d2.id
AND d1.rn = d2.rn + 1
),
parts
AS ( SELECT * ,
( SELECT SUM(diff)
FROM diff d2
WHERE d2.rn <= d1.rn
) AS p
FROM diff d1
)
SELECT id ,
MIN(AddedOn) AS StartDate ,
MAX(AddedOn) AS EndDate
FROM parts
GROUP BY id ,
p
Output:
id StartDate EndDate
1 2015-01-01 00:00:00.000 2015-01-02 00:00:00.000
1 2015-01-04 00:00:00.000 2015-01-06 00:00:00.000
2 2015-01-01 00:00:00.000 2015-01-04 00:00:00.000
2 2015-01-06 00:00:00.000 2015-01-07 00:00:00.000
3 2015-01-01 00:00:00.000 2015-01-01 00:00:00.000
3 2015-01-03 00:00:00.000 2015-01-03 00:00:00.000
3 2015-01-05 00:00:00.000 2015-01-06 00:00:00.000
3 2015-01-08 00:00:00.000 2015-01-10 00:00:00.000
Walkthrough:
diff
This CTE returns data:
1 2015-01-01 00:00:00.000 1 0
1 2015-01-02 00:00:00.000 2 0
1 2015-01-04 00:00:00.000 3 1
1 2015-01-05 00:00:00.000 4 0
1 2015-01-06 00:00:00.000 5 0
You are joining same table on itself to get the previous row. Then you calculate difference in days between current row and previous row and if the result is 1 day then pick 0 else pick 1.
parts
This CTE selects result from previous step and sums up the new column(it is a cumulative sum. sum of all values of new column from starting till current row), so you are getting partitions to group by:
1 2015-01-01 00:00:00.000 1 0 0
1 2015-01-02 00:00:00.000 2 0 0
1 2015-01-04 00:00:00.000 3 1 1
1 2015-01-05 00:00:00.000 4 0 1
1 2015-01-06 00:00:00.000 5 0 1
2 2015-01-01 00:00:00.000 6 0 1
2 2015-01-02 00:00:00.000 7 0 1
2 2015-01-03 00:00:00.000 8 0 1
2 2015-01-04 00:00:00.000 9 0 1
2 2015-01-06 00:00:00.000 10 1 2
2 2015-01-07 00:00:00.000 11 0 2
3 2015-01-01 00:00:00.000 12 0 2
3 2015-01-03 00:00:00.000 13 1 3
The last step is just a grouping by ID and new column and picking min and max values for dates.
I took the "Islands Solution #3 from SQL MVP Deep Dives" solution from https://www.simple-talk.com/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/ and applied to your test data:
with
data as
(
select * from
(
values
(0, getdate()), --dummy record used to infer column types
(1, '20150101'),
(1, '20150102'),
(1, '20150104'),
(1, '20150105'),
(1, '20150106'),
(2, '20150101'),
(2, '20150102'),
(2, '20150103'),
(2, '20150104'),
(2, '20150106'),
(2, '20150107'),
(3, '20150101'),
(3, '20150103'),
(3, '20150105'),
(3, '20150106'),
(3, '20150108'),
(3, '20150109'),
(3, '20150110')
) as d(id, AddedOn)
where id > 0 -- exclude dummy record
)
,CTE_Seq
AS
(
SELECT
ID
,SeqNo
,SeqNo - ROW_NUMBER() OVER (PARTITION BY ID ORDER BY SeqNo) AS rn
FROM
data
CROSS APPLY
(
SELECT DATEDIFF(day, '20150101', AddedOn) AS SeqNo
) AS CA
)
SELECT
ID
,DATEADD(day, MIN(SeqNo), '20150101') AS StartDate
,DATEADD(day, MAX(SeqNo), '20150101') AS EndDate
FROM CTE_Seq
GROUP BY ID, rn
ORDER BY ID, StartDate;
Result set
ID StartDate EndDate
1 2015-01-01 00:00:00.000 2015-01-02 00:00:00.000
1 2015-01-04 00:00:00.000 2015-01-06 00:00:00.000
2 2015-01-01 00:00:00.000 2015-01-04 00:00:00.000
2 2015-01-06 00:00:00.000 2015-01-07 00:00:00.000
3 2015-01-01 00:00:00.000 2015-01-01 00:00:00.000
3 2015-01-03 00:00:00.000 2015-01-03 00:00:00.000
3 2015-01-05 00:00:00.000 2015-01-06 00:00:00.000
3 2015-01-08 00:00:00.000 2015-01-10 00:00:00.000
I'd recommend you to examine the intermediate results of CTE_Seq to understand how it actually works. Just put
select * from CTE_Seq
instead of the final SELECT ... GROUP BY .... You'll get this result set:
ID SeqNo rn
1 0 -1
1 1 -1
1 3 0
1 4 0
1 5 0
2 0 -1
2 1 -1
2 2 -1
2 3 -1
2 5 0
2 6 0
3 0 -1
3 2 0
3 4 1
3 5 1
3 7 2
3 8 2
3 9 2
Each date is converted into a sequence number by DATEDIFF(day, '20150101', AddedOn). ROW_NUMBER() generates a set of sequential numbers without gaps, so when these numbers are subtracted from a sequence with gaps the difference jumps/changes. The difference stays the same until the next gap, so in the final SELECT GROUP BY ID, rn brings all rows from the same island together.
Here is a simple solution that does not use analytics. I tend not to use analytics because I work with many different DBMSs and many don't (yet) have them emplemented and even those who do have different syntaxes. I just have the habit of writing generic code whenever possible.
with
Data( ID, AddedOn )as(
select 1, convert( date, '20150101' ) union all
select 1, '20150102' union all
select 1, '20150104' union all
select 1, '20150105' union all
select 1, '20150106' union all
select 2, '20150101' union all
select 2, '20150102' union all
select 2, '20150103' union all
select 2, '20150104' union all
select 2, '20150106' union all
select 2, '20150107' union all
select 3, '20150101' union all
select 3, '20150103' union all
select 3, '20150105' union all
select 3, '20150106' union all
select 3, '20150108' union all
select 3, '20150109' union all
select 3, '20150110'
)
select d.ID, d.AddedOn StartDate, IsNull( d1.AddedOn, '99991231' ) EndDate
from Data d
left join Data d1
on d1.ID = d.ID
and d1.AddedOn =(
select Min( AddedOn )
from data
where ID = d.ID
and AddedOn > d.AddedOn );
In your situation I assume that ID and AddedOn form a composite PK and so are indexed. Thus, the query will run impressively fast even on very large tables.
Also, I used the outer join because it seemed like the last AddedOn date of each ID should be seen in the StartDate column. Instead of NULL I used a common MaxDate value. The NULL could work just as well as a "this is the latest StartDate row" flag.
Here is the output for ID=1:
ID StartDate EndDate
----------- ---------- ----------
1 2015-01-01 2015-01-02
1 2015-01-02 2015-01-04
1 2015-01-04 2015-01-05
1 2015-01-05 2015-01-06
1 2015-01-06 9999-12-31
I'd like to post my own solution too because it's yet another approach:
with data as
(
...
),
temp as
(
select d.id
,d.AddedOn
,dprev.AddedOn as PrevAddedOn
,dnext.AddedOn as NextAddedOn
FROM data d
left JOIN
data dprev on dprev.id = d.id
and dprev.AddedOn = dateadd(d, -1, d.AddedOn)
left JOIN
data dnext on dnext.id = d.id
and dnext.AddedOn = dateadd(d, 1, d.AddedOn)
),
starts AS
(
select id
,AddedOn
from temp
where PrevAddedOn is NULL
),
ends as
(
select id
,AddedOn
from temp
where NextAddedon is NULL
)
SELECT s.id as id
,s.AddedOn as StartDate
,(select min(e.AddedOn) from ends e where e.id = s.id and e.AddedOn >= s.AddedOn) as EndDate
from starts s

How do I get running total for distinct values in a column

I'd like to group by date range as the below example
Date ItemNo Qty
==================================
1/1/2014 101 20
2/1/2014 102 10
3/1/2014 103 5
4/1/2014 104 10
1/1/2014 101 5
2/1/2014 101 10
3/1/2014 102 15
4/1/2014 104 20
I want to get the balance daily by sum the qty till that day grouped by ItemNo to be as below
Date ItemNo Qty
==================================
1/1/2014 101 25
2/1/2014 101 35
2/1/2014 102 10
3/1/2014 102 25
3/1/2014 103 5
4/1/2014 104 30
I know I can solve the problem by using cursors but I need another solution
thanks
so just use SUM
SELECT Date, ItemNo, SUM(Qty)
FROM table
GROUP BY Date, ItemNo
please read on agregate function and sum
Edit
i took your comment and did this:
SELECT a.Date, a.ItemNo, tmp.qty + a.ItemNo
FROM table a
JOIN (SELECT TOP 1 * FROM table t WHERE t.date < a.Date ORDER BY t.date DESC) tmp ON a.ItemNo = tmp.ItemNo
i'm checking it now, so it might need some tweaks, but i wanted to release it straight away so you'll have the general idea
Here is your sample table
SELECT * INTO #TEMP
FROM
(
SELECT '1/1/2014' [DATE], 101 [ItemNo], 20 QTY
UNION ALL
SELECT '2/1/2014', 102, 10
UNION ALL
SELECT '3/1/2014', 103, 5
UNION ALL
SELECT '4/1/2014', 104, 10
UNION ALL
SELECT '1/1/2014', 101, 5
UNION ALL
SELECT '2/1/2014', 101, 10
UNION ALL
SELECT '3/1/2014', 102, 15
UNION ALL
SELECT '4/1/2014', 104, 20
)TAB
Use Row_Number to get number for each Item's date do the sum inside CTE
;WITH CTE1 AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY [ItemNo] ORDER BY CAST([DATE] AS DATE))RNO,
[DATE],[ItemNo],SUM(Qty)Qty
FROM #TEMP
GROUP BY [DATE],[ItemNo]
)
SELECT A.RNO,[DATE],[ItemNo],
CASE WHEN RNO=1 THEN Qty
ELSE (SELECT SUM(b.Qty)
FROM CTE1 b
WHERE A.ItemNo=B.ItemNo AND B.RNO<=A.RNO)
END QTY
FROM CTE1 A
ORDER BY A.itemno,CAST(A.[DATE] AS DATE);
RESULT
Here's a solution using a recursive common table expression.
Not sure if it will be faster or not than the answer by Sarath Avanavu, but you can try!
Sample data:
DECLARE #t TABLE([Date] DATETIME, ItemNo INT, QTY INT)
INSERT #t
( Date, ItemNo, QTY )
SELECT '1/1/2014', 101, 20
UNION ALL SELECT '2/1/2014', 102, 10
UNION ALL SELECT '3/1/2014', 103, 5
UNION ALL SELECT '4/1/2014', 104, 10
UNION ALL SELECT '1/1/2014', 101, 5
UNION ALL SELECT '2/1/2014', 101, 10
UNION ALL SELECT '3/1/2014', 102, 15
UNION ALL SELECT '4/1/2014', 104, 20
Query:
;WITH dSum AS (
SELECT [Date], ItemNo, SUM(QTY) AS QTY
FROM #t AS t
GROUP BY [Date], [ItemNo]
), dSumRN AS (
SELECT [Date], ItemNo, QTY, ROW_NUMBER() OVER(PARTITION BY ItemNo ORDER BY [Date]) AS rn
FROM dSum
), cte AS (
SELECT [Date], ItemNo, QTY, rn
FROM dSumRN
WHERE rn = 1
UNION ALL SELECT
dSumRN.[Date], dSumRN.ItemNo, cte.QTY + dSumRN.QTY AS QTY, cte.rn + 1 AS rn
FROM cte
JOIN dSumRN ON cte.ItemNo = dSumRN.ItemNo AND cte.rn + 1 = dSumRN.rn
)
SELECT [Date], [ItemNo], QTY FROM cte
ORDER BY [Date], [ItemNo]
OPTION (MAXRECURSION 1000) -- maximum this can be set to is 32767
Easiest code below for your query:
select Date,itemno,
(select sum(Qty) from #temp where date<=T.date and itemno=T.itemno)
from #temp T
group by Date,itemno order by date