SQL cumulative based on condition - sql

I need to take cumulative based on condition. If it is holiday I dont want to take cumulative.
If the row is the first row in date order and if that day is holiday then it should take daywiseplan value. For all subsequent rows, if IsHolday equals zero, accumulate DaywisePlan into the total. If IsHolday equals one, accumulate the value of DaywisePlan on the next row in date order where IsHolday equals zero
Date
DaywisePlan
IsHoliday
ExpectedOutput
7/1/2022
34
1
34
7/2/2022
34
1
34
7/3/2022
34
0
34
7/4/2022
34
0
68
7/5/2022
34
0
102
7/6/2022
34
0
136
7/7/2022
34
1
136
7/8/2022
34
1
136
7/9/2022
34
0
170
7/10/2022
34
0
204
7/11/2022
34
1
204
7/12/2022
34
0
238

in one Query id´ can't think of, but use a CTE , it is quite easy using The window Function SUM and FIRST_VALUE
if you have more month and want to have a SUM for all month sereately, you need to PARTITION both window function mothwise
WITH CTE AS
(SELECT
[Date], [DaywisePlan], [IsHoliday],
FIRST_VALUE([DaywisePlan]) OVER(PARTITION BY [IsHoliday] ORDER By [Date]) [First],
SUM(CASE WHEN [IsHoliday] = 0 THEN [DaywisePlan] ELSe 0 END) OVER(ORDER By [Date]) as [Sum]
FROM tab1)
SELECT [Date], [DaywisePlan], [IsHoliday]
,CASE WHEN [Sum] = 0 AND [IsHoliday] = 1 THEN [Sum]+ [first] ELSe [Sum] END as [Sum] FROM CTE
Date | DaywisePlan | IsHoliday | Sum
:---------------------- | ----------: | --------: | --:
2022-07-01 02:00:00.000 | 34 | 1 | 34
2022-07-02 02:00:00.000 | 34 | 1 | 34
2022-07-03 02:00:00.000 | 34 | 0 | 34
2022-07-04 02:00:00.000 | 34 | 0 | 68
2022-07-05 02:00:00.000 | 34 | 0 | 102
2022-07-06 02:00:00.000 | 34 | 0 | 136
2022-07-07 02:00:00.000 | 34 | 1 | 136
2022-07-08 02:00:00.000 | 34 | 1 | 136
2022-07-09 02:00:00.000 | 34 | 0 | 170
2022-07-10 02:00:00.000 | 34 | 0 | 204
2022-07-11 02:00:00.000 | 34 | 1 | 204
2022-07-12 02:00:00.000 | 34 | 0 | 238
db<>fiddle here

In the comments I proposed logic based on the first version of your expected results.
The expected results in the question as currently posed do not match that logic. Instead they seem to match this logic:
Do not accumulate DaywisePlan until arriving at the first row in order of date ascending where IsHolday equals zero. For that row and all subsequent rows, if IsHolday equals zero, accumulate DaywisePlan into the total.
You have also used an ambiguous date format which I infer (given the nature of your question) to be 'month/day/year', but could also be valid 'day/month/year' values. Here it happens to be the caes that the interpretation makes no difference to the ordering, but you should make it a habit of using non-ambiguous date formats like 'yyyyMMdd'.
In any case, here is a query which will produce the original expected results, and another query which will produce the new expected results. I have used similar CTE's for both to make the logic (and the difference between them) a little easier to read.
create table #mytable
(
[Date] date primary key,
DaywisePlan int,
IsHoliday bit,
ExpectedOutput int
);
set dateformat mdy;
-- original dataset
insert #mytable values
('7/1/2022', 34, 1, 34 ),
('7/2/2022', 34, 1, 34 ),
('7/3/2022', 34, 0, 68 ),
('7/4/2022', 34, 0, 102 ),
('7/5/2022', 34, 0, 136 ),
('7/6/2022', 34, 0, 170 ),
('7/7/2022', 34, 1, 170 ),
('7/8/2022', 34, 1, 170 ),
('7/9/2022', 34, 0, 204 ),
('7/10/2022', 34, 0, 238 ),
('7/11/2022', 34, 1, 238 ),
('7/12/2022', 34, 0, 272 );
-- logic producing original dataset
with working as
(
select [date],
DaywisePlan,
IsHoliday,
ExpectedOutput,
FullAccum = sum(DayWisePlan)
over (order by [date] rows unbounded preceding),
HoldayAccum = sum
(
iif(isHoliday = 1 and [date] != t.mindate, DayWisePlan, 0)
) over (order by [date] rows unbounded preceding)
from #mytable
cross join (select min([date]) from #myTable) t(mindate)
)
select [date],
daywiseplan,
isholiday,
expectedoutput,
CalculatedOutput = FullAccum - HoldayAccum
from working;
-- edited dataset
delete from #mytable;
insert #mytable values
('7/1/2022', 34, 1, 34 ),
('7/2/2022', 34, 1, 34 ),
('7/3/2022', 34, 0, 34 ),
('7/4/2022', 34, 0, 68 ),
('7/5/2022', 34, 0, 102 ),
('7/6/2022', 34, 0, 136 ),
('7/7/2022', 34, 1, 136 ),
('7/8/2022', 34, 1, 136 ),
('7/9/2022', 34, 0, 170 ),
('7/10/2022', 34, 0, 204 ),
('7/11/2022', 34, 1, 204 ),
('7/12/2022', 34, 0, 238 );
-- logic to produce edited dataset
with working as
(
select [date],
DaywisePlan,
IsHoliday,
ExpectedOutput,
firstNonHoliday = (select min([date]) from #myTable where IsHoliday = 0),
FullAccum = sum(DayWisePlan)
over (order by [date] rows unbounded preceding),
HoldayAccum = sum
(
iif(isHoliday = 1, DayWisePlan, 0)
) over (order by [date] rows unbounded preceding)
from #mytable
)
select [date],
daywiseplan,
isholiday,
expectedoutput,
CalculatedOutput = iif([date] < firstNonHoliday, daywiseplan, FullAccum - HoldayAccum)
from working;
If you just mean to say "ignore any holidays after the first non-holiday" then the logic can be significantly simplified (keeping the CTE for comparative purposes):
with working as
(
select [date],
DaywisePlan,
IsHoliday,
ExpectedOutput,
firstNonHoliday = (select min([date]) from #myTable where IsHoliday = 0),
FullAccum = sum(iif(isHoliday = 0, DayWisePlan, 0))
over(order by date rows unbounded preceding)
from #mytable
)
select [date],
daywiseplan,
isholiday,
expectedoutput,
CalculatedOutput = iif([date] <= firstNonHoliday, dayWisePlan, fullaccum)
from working;

Related

Timespan calculation

I have a table like this:
#Row ID Status1 Status2 TimeStatusChange
------------------------------------------
1 24 0 0 2020-09-02 09:18:02.233
2 48 0 0 2020-09-02 09:18:58.540
3 24 1 0 2020-09-02 09:19:47.233
4 24 0 0 2020-09-02 09:19:47.587
5 48 0 1 2020-09-02 09:22:53.923
6 36 1 0 2020-09-02 09:24:14.343
7 48 0 0 2020-09-02 09:24:49.670
8 24 1 0 2020-09-02 09:38:37.820
and would like to know, how to calculate the sum of timespans for all status (1 or 2) changes from 0 to 1 (or 1 to 0) grouped by ID.
In this example for ID 24, Status1 from 0 to 1, it would be the difference of TimeStatusChange of #Row 3 and #row 1 + difference of TimeStatusChange of #Row 8 and #row 4, roughly 21 minutes.
The perfect output would look like this:
ID Change TimeSpanInMinutes
----------------------------------------
24 Status1_from_0_1 20
36 .....
Although I have some experience with PL/SQL, I am not getting anywhere.
Sample data
I added a couple rows to have some more result data and validate the scenario where there are successive rows with the same status for a given ID.
declare #data table
(
ID int,
Status1 int,
Stamp datetime
)
insert into #data (ID, Status1, Stamp) values
(48, 1, '2020-09-02 09:00:00.000'), --added row
(24, 0, '2020-09-02 09:18:02.233'),
(48, 0, '2020-09-02 09:18:58.540'),
(24, 1, '2020-09-02 09:19:47.233'),
(24, 0, '2020-09-02 09:19:47.587'),
(48, 0, '2020-09-02 09:22:53.923'),
(36, 1, '2020-09-02 09:24:14.343'),
(48, 0, '2020-09-02 09:24:49.670'),
(24, 1, '2020-09-02 09:38:37.820'),
(48, 1, '2020-09-02 10:00:00.000'); --added row
Solution
Uses a common table expression (CTE, cte_data) to fetch the previous record for the same ID (regardless of its status value) with the help of the lag() function. Succeeding rows with the same value as the previous row are removed in the where clause outside the CTE.
with cte_data as
(
select d.ID,
d.Status1,
d.Stamp,
lag(d.Status1) over(partition by d.ID order by d.Stamp) as Status1Prev,
lag(d.Stamp) over(partition by d.ID order by d.Stamp) as StampPrev
from #data d
)
select d.ID,
d.Status1Prev as Status1From,
d.Status1 as Status1To,
sum(datediff(MI, d.StampPrev, d.Stamp)) as StampDiffSumM, --minutes
convert(time(3), dateadd(MS, sum(datediff(MS, d.StampPrev, d.Stamp)), '1900-01-01 00:00:00.000')) as StampDiffSumF --formatted
from cte_data d
where d.Status1 <> d.Status1Prev
and d.Status1Prev is not null
group by d.ID, d.Status1Prev, d.Status1
order by d.ID;
Result
ID Status1From Status1To StampDiffSumM StampDiffSumF
----------- ----------- ----------- ------------- ----------------
24 0 1 20 00:20:35.233
24 1 0 0 00:00:00.353
48 0 1 36 00:35:10.330
48 1 0 18 00:18:58.540

SQL Gaps/Islands Question - Determine if someone has worked for X years without a Y days break

Working on problem for a company in Japan. The government has some rules such as... If you are on a work visa:
You cannot work for more than 3 years at a company without taking 30 days off
You cannot work for the same staffing company for more than 5 years without taking 6 months off
So we want to figure out if anyone will be violating either rule in the next 30/60/90 days.
Sample data (list of contracts):
if object_id('tempdb..#sampleDates') is not null drop table #sampleDates
create table #sampleDates (UserId int, CompanyID int, WorkPeriodStart datetime, WorkPeriodEnd datetime)
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27809, 972, '2019-10-10', '2020-10-10')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27853, 484, '2019-10-10', '2020-10-10')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27856, 172, '2019-10-10', '2020-10-10')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27857, 1234, '2015-01-01', '2015-12-31')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27857, 1234, '2016-01-01', '2017-02-28')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27857, 1234, '2017-01-01', '2017-12-31')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27857, 1234, '2018-01-01', '2018-12-31')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27857, 1234, '2019-01-01', '2020-01-31')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27857, 1234, '2020-01-01', '2020-12-31')
insert #sampleDates (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values (27897, 179, '2019-10-10', '2020-10-10')
My first issue is possibly overlapping dates. I am close to a solution on that already, but until I know how to solve the Working X years/ Y Days off issue, I'm not sure what the output of my cte or temp table should look like.
I don't expect anyone to do the work for me, but I want to find an article that can tell me:
How can I determine if someone has taken any breaks in the time period, and for how long (gaps between date ranges)?
How can I figure if they will have worked for 3/5 years without a 30/180 days break in the next 30/60/90 days?
This seemed so simple until I started coding the procedure.
Thanks for any help in advance.
EDIT:
For what it's worth, here's my second working attempt at eliminating overlapping dates (first version used a dense_rank approach and it worked until I screwed something up, went with something simple):
;with CJ as (
select UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd from #sampleDates c
)
select
c.CompanyID,
c.WorkPeriodStart,
min(t1.WorkPeriodEnd) as EndDate
from CJ c
inner join CJ t1 on c.WorkPeriodStart <= t1.WorkPeriodEnd and c.UserId = t1.UserId and c.CompanyID = t1.CompanyID
and not exists(select * from CJ t2 where t1.UserId = t2.UserId and t1.CompanyID = t2.CompanyID and t1.WorkPeriodEnd >= t2.WorkPeriodStart AND t1.WorkPeriodEnd < t2.WorkPeriodEnd)
where not exists(select * from CJ c2 where c.UserId = c2.UserId and c.CompanyID = c2.CompanyID and c.WorkPeriodStart > c2.WorkPeriodStart AND c.WorkPeriodStart <= c2.WorkPeriodEnd)
group by c.UserId, c.CompanyID, c.WorkPeriodStart
order by c.UserId, c.WorkPeriodStart
Disclaimer: This is an incomplete answer.
I can continue later, but this shows how to compute the islands. Then identifying the offender ones shouldn't be that complicated.
See augmented example. I added user 27897 that has three islands: 0, 1, and 2. See below:
create table t (UserId int, CompanyID int, WorkPeriodStart date, WorkPeriodEnd date);
insert t (UserId, CompanyID, WorkPeriodStart, WorkPeriodEnd) values
(27809, 972, '2019-10-10', '2020-10-10'),
(27853, 484, '2019-10-10', '2020-10-10'),
(27856, 172, '2019-10-10', '2020-10-10'),
(27857, 1234, '2015-01-01', '2015-12-31'),
(27857, 1234, '2016-01-01', '2017-02-28'),
(27857, 1234, '2017-01-01', '2017-12-31'),
(27857, 1234, '2018-01-01', '2018-12-31'),
(27857, 1234, '2019-01-01', '2020-01-31'),
(27857, 1234, '2020-01-01', '2020-12-31'),
(27897, 179, '2015-05-28', '2015-09-30'),
(27897, 179, '2017-03-11', '2017-04-30'),
(27897, 188, '2017-02-20', '2017-07-07'),
(27897, 179, '2019-10-10', '2020-10-10');
With this data, the query that computes the island for each row can look like:
select *,
sum(hop) over(partition by UserId order by WorkPeriodStart) as island
from (
select *,
case when WorkPeriodStart > dateadd(day, 1, max(WorkPeriodEnd)
over(partition by UserId
order by WorkPeriodStart
rows between unbounded preceding and 1 preceding))
then 1 else 0 end as hop
from t
) x
order by UserId, WorkPeriodStart
Result:
UserId CompanyID WorkPeriodStart WorkPeriodEnd hop island
------ --------- --------------- ------------- --- ------
27809 972 2019-10-10 2020-10-10 0 0
27853 484 2019-10-10 2020-10-10 0 0
27856 172 2019-10-10 2020-10-10 0 0
27857 1234 2015-01-01 2015-12-31 0 0
27857 1234 2016-01-01 2017-02-28 0 0
27857 1234 2017-01-01 2017-12-31 0 0
27857 1234 2018-01-01 2018-12-31 0 0
27857 1234 2019-01-01 2020-01-31 0 0
27857 1234 2020-01-01 2020-12-31 0 0
27897 179 2015-05-28 2015-09-30 0 0
27897 188 2017-02-20 2017-07-07 1 1
27897 179 2017-03-11 2017-04-30 0 1
27897 179 2019-10-10 2020-10-10 1 2
Now, we can augment this query to get the "worked days" for each island, and the "days off" before each island, by doing:
select *,
datediff(day, s, e) + 1 as worked,
datediff(day, lag(e) over(partition by UserId order by island), s) as prev_days_off
from (
select UserId, island, min(WorkPeriodStart) as s, max(WorkPeriodEnd) as e
from (
select *,
sum(hop) over(partition by UserId order by WorkPeriodStart) as island
from (
select *,
case when WorkPeriodStart > dateadd(day, 1, max(WorkPeriodEnd)
over(partition by UserId
order by WorkPeriodStart
rows between unbounded preceding and 1 preceding))
then 1 else 0 end as hop
from t
) x
) y
group by UserId, island
) x
order by UserId, island
Result:
UserId island s e worked prev_days_off
------ ------ ---------- ---------- ------ -------------
27809 0 2019-10-10 2020-10-10 367 <null>
27853 0 2019-10-10 2020-10-10 367 <null>
27856 0 2019-10-10 2020-10-10 367 <null>
27857 0 2015-01-01 2020-12-31 2192 <null>
27897 0 2015-05-28 2015-09-30 126 <null>
27897 1 2017-02-20 2017-07-07 138 509
27897 2 2019-10-10 2020-10-10 367 825
This result is much close to what you need. That data is actually useful to filter rows according to your criteria.
This script merges any overlapping work periods and then calculates the total days worked within the previous 3 and 5 year periods. Then takes this value and determines if this is more than the maximum working days allowed within that period by UserId and CompanyId for the 3 year limit, and just by UserId for the 5 year limit. (Is this a correct interpretation of the rules in your question?)
From this it then simply adds on 30, 60 and 90 days to that total, to see if that larger value would be over the respective limits. Given the different grouping rules, this would be cleaner as 2 queries (no duplication of UserId for 5 year rule) but the result is still a flag against any offending UserId.
In the example below you can see UserId = 27857 only violating the 5 year rule at present, but then also violating the 3 year rule should they stay on for another 60 days. In addition, UserId = 27858 is currently okay, but will violate the 5 year rule in 60 days.
I have made some assumptions about how you define a year and whether or not your WorkPeriodEnd values are inclusive or not, so do check that your required logic is properly applied.
Script
if object_id('tempdb..#sampleDates') is not null drop table #sampleDates
create table #sampleDates (UserId int, CompanyId int, WorkPeriodStart datetime, WorkPeriodEnd datetime)
insert #sampleDates values
(27809, 972, '2019-10-10', '2020-10-10')
,(27853, 484, '2019-10-10', '2020-10-10')
,(27856, 172, '2019-10-10', '2020-10-10')
,(27857, 1234, '2015-01-01', '2015-12-31')
,(27857, 1234, '2016-01-01', '2017-02-28')
,(27857, 1234, '2017-01-01', '2017-12-31')
,(27857, 1234, '2018-01-01', '2018-12-31')
,(27857, 1234, '2019-01-01', '2020-01-31')
,(27857, 1234, '2020-01-01', '2020-05-31')
,(27858, 1234, '2015-01-01', '2015-12-31')
,(27858, 1234, '2016-01-01', '2017-02-28')
,(27858, 1234, '2017-01-01', '2017-12-31')
,(27858, 1234, '2018-01-01', '2018-12-31')
,(27858, 1234, '2019-09-01', '2020-01-31')
,(27858, 1234, '2020-01-01', '2020-08-31')
,(27859, 12345, '2015-01-01', '2015-12-31')
,(27859, 12346, '2016-01-01', '2017-02-28')
,(27859, 12347, '2017-01-01', '2017-12-31')
,(27859, 12348, '2018-01-01', '2018-12-31')
,(27859, 12349, '2019-01-01', '2020-01-31')
,(27859, 12340, '2020-01-01', '2020-12-31')
,(27897, 179, '2019-10-10', '2020-10-10')
;
declare #3YearsAgo date = dateadd(year,-3,getdate());
declare #3YearWorkingDays int = (365*3)-30;
declare #5YearsAgo date = dateadd(year,-5,getdate());
declare #5YearWorkingDays int = (365*5)-(365/2);
with p as
(
select UserId
,CompanyId
,min(WorkPeriodStart) as WorkPeriodStart
,max(WorkPeriodEnd) as WorkPeriodEnd
from(select l.*,
sum(case when dateadd(day,1,l.PrevEnd) < l.WorkPeriodStart then 1 else 0 end) over (partition by l.UserId, l.CompanyId order by l.WorkPeriodStart rows unbounded preceding) as grp
from(select d.*,
lag(d.WorkPeriodEnd) over (partition by d.UserId, d.CompanyId order by d.WorkPeriodEnd) as PrevEnd
from #sampleDates as d
) as l
) as g
group by grp
,UserId
,CompanyId
)
,d as
(
select UserId
,CompanyId
,sum(case when #3YearsAgo < WorkPeriodEnd
then datediff(day
,case when #3YearsAgo between WorkPeriodStart and WorkPeriodEnd then #3YearsAgo else WorkPeriodStart end
,WorkPeriodEnd
)
else 0
end
) as WorkingDays3YearsToToday
,sum(case when #5YearsAgo < WorkPeriodEnd
then datediff(day
,case when #5YearsAgo between WorkPeriodStart and WorkPeriodEnd then #5YearsAgo else WorkPeriodStart end
,WorkPeriodEnd
)
else 0
end
) as WorkingDays5YearsToToday
from p
group by UserId
,CompanyId
)
select UserId
,CompanyId
,#3YearWorkingDays as Limit3Year
,#5YearWorkingDays as Limit5Year
,WorkingDays3YearsToToday
,WorkingDays5YearsToToday
,case when WorkingDays3YearsToToday > #3YearWorkingDays then 1 else 0 end as Violation3YearNow
,case when sum(WorkingDays5YearsToToday) over (partition by UserId) > #5YearWorkingDays then 1 else 0 end as Violation5YearNow
,case when WorkingDays3YearsToToday + 30 > #3YearWorkingDays then 1 else 0 end as Violation3Year30Day
,case when sum(WorkingDays5YearsToToday) over (partition by UserId) + 30 > #5YearWorkingDays then 1 else 0 end as Violation5Year30Day
,case when WorkingDays3YearsToToday + 60 > #3YearWorkingDays then 1 else 0 end as Violation3Year60Day
,case when sum(WorkingDays5YearsToToday) over (partition by UserId) + 60 > #5YearWorkingDays then 1 else 0 end as Violation5Year60Day
,case when WorkingDays3YearsToToday + 90 > #3YearWorkingDays then 1 else 0 end as Violation3Year90Day
,case when sum(WorkingDays5YearsToToday) over (partition by UserId) + 90 > #5YearWorkingDays then 1 else 0 end as Violation5Year90Day
from d
order by UserId
,CompanyId;
Output
+--------+-----------+------------+------------+--------------------------+--------------------------+-------------------+-------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+
| UserId | CompanyId | Limit3Year | Limit5Year | WorkingDays3YearsToToday | WorkingDays5YearsToToday | Violation3YearNow | Violation5YearNow | Violation3Year30Day | Violation5Year30Day | Violation3Year60Day | Violation5Year60Day | Violation3Year90Day | Violation5Year90Day |
+--------+-----------+------------+------------+--------------------------+--------------------------+-------------------+-------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+
| 27809 | 972 | 1065 | 1643 | 366 | 366 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 27853 | 484 | 1065 | 1643 | 366 | 366 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 27856 | 172 | 1065 | 1643 | 366 | 366 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 27857 | 1234 | 1065 | 1643 | 1029 | 1760 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
| 27858 | 1234 | 1065 | 1643 | 877 | 1608 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| 27859 | 12340 | 1065 | 1643 | 365 | 365 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 27859 | 12345 | 1065 | 1643 | 0 | 147 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 27859 | 12346 | 1065 | 1643 | 0 | 424 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 27859 | 12347 | 1065 | 1643 | 147 | 364 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 27859 | 12348 | 1065 | 1643 | 364 | 364 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 27859 | 12349 | 1065 | 1643 | 395 | 395 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 27897 | 179 | 1065 | 1643 | 366 | 366 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+--------+-----------+------------+------------+--------------------------+--------------------------+-------------------+-------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+
Here is what I ended up with.
<UselessExplanation>
The issues I kept facing were:
How can I handle any and all date range overlaps and determine just the days within the contract date ranges.
The client is STILL using SQL 2008, so I need some old(er) school tsql.
Ensure that the break times (times between contracts) is accurately calculated.
So I chose to come up with my own solution which is probably dumb, being that it needs to generate a record in memory for every Workday/Candidate combination. I do not see the contracts table going beyond the 5-10k record range. Only reason I'm going this direction.
I created a calendar table with every date in it from 1/1/1980 - 12/31/2050
I then left joined the contract ranges against the calendar table by CandidateId. These will be the dates worked.
Any dates in the calendar table that do not match a date within a contract range is a Break Day.
</UselessExplanation>
Calendar table
if object_id('CalendarTable') is not null drop table CalendarTable
go
create table CalendarTable (pk int identity, CalendarDate date )
declare #StartDate date = cast('1980-01-01' as date)
declare #EndDate date = cast('2050-12-31' as date)
while #StartDate <= #EndDate
begin
insert into CalendarTable ( CalendarDate ) values ( #StartDate )
set #StartDate = dateadd(dd, 1, #StartDate)
end
go
Query for 5 year violations (working 5 years without a 6 month cool off period)
declare #enddate date = dateadd(dd, 30, getdate())
declare #beginDate date = dateadd(dd, -180, dateadd(year, -5, getdate()))
select poss.CandidateId,
min(work.CalendarDate) as FirstWorkDate,
count(work.CandidateId) as workedDays,
sum(case when work.CandidateId is null then 1 else 0 end) as breakDays,
case when count(work.CandidateId) > (365*5) and sum(case when work.CandidateId is null then 1 else 0 end) < (365/2) then 1 else 0 end as Year5Violation,
case when count(work.CandidateId) > (365*5) and sum(case when work.CandidateId is null then 1 else 0 end) < (365/2) then DATEADD(year, 5, min(work.CalendarDate)) else null end as ViolationDate
from
(
select cand.CandidateId, cal.CalendarDate
from CalendarTable cal
join (select distinct c.CandidateId from contracts c where c.WorkPeriodStart is not null and c.WorkPeriodEnd is not null and c.Deleted = 0) cand on 1 = 1
where cal.CalendarDate between #beginDate and #enddate
) as poss
left join
(
select distinct c.CandidateId, cal.CalendarDate
from contracts c
join CalendarTable cal on cal.CalendarDate between c.WorkPeriodStart and c.WorkPeriodEnd
where c.WorkPeriodStart is not null and c.WorkPeriodEnd is not null and c.Deleted = 0
) as work on work.CandidateId = poss.CandidateId and work.CalendarDate = poss.CalendarDate
group by poss.CandidateId

Last Changed Date

ID DATE AMT
A 20180401 110
A 20180301 110
A 20180201 100
A 20171010 90
B 20181001 90
B 20180901 90
B 20180707 80
My Output should be
ID DATE AMT Result
A 20180401 110 20180201
A 20180301 110 20180201
A 20180201 100 20171010
A 20171010 90 null
B 20181001 90 20180707
B 20180901 90 20180707
B 20180707 80 null
So i need to get the result column date of Last value different from current value with in same ID
so if we take the first record in this case current AMT value is 110 and next record also has 110 and the next record is 100 which is different from current value so I need to get that date -
I have used
LAST_VALUE ( DATE) OVER ( PARTITION BY ID, AMT ORDER BY ID ) AS LASTVALUE -I'm getting the date for the records with same Amount
This is after the
LAST_VALUE ( DATE) OVER ( PARTITION BY ID, AMT ORDER BY ID ) AS LASTVALUE2
ID;DAT;AMT;LASTVALUE2 -After Last Value
A;Mar 1, 2018;130;Mar 1, 2018
A;Feb 1, 2018;110;Jan 1, 2018
A;Jan 1, 2018;110;Jan 1, 2018
A;Nov 1, 2017;140;Nov 1, 2017
B;Jun 1, 2018;110;Apr 1, 2018
B;May 1, 2018;110;Apr 1, 2018
B;Apr 1, 2018;110;Apr 1, 2018
B;Mar 1, 2018;130;Mar 1, 2018
ID;DAT;AMT;PREV_DIFF_VALUE -After Lag
A;Nov 1, 2017;140;?
A;Jan 1, 2018;110;Nov 1, 2017
A;Feb 1, 2018;110;Jan 1, 2018
A;Mar 1, 2018;130;Feb 1, 2018
B;Mar 1, 2018;130;?
B;Apr 1, 2018;110;Mar 1, 2018
B;May 1, 2018;110;Apr 1, 2018
B;Jun 1, 2018;110;May 1, 2018
The third record should be Nov 1 2017
Thanks in advance
This is tricky. I think this does what you want:
select t.*,
max(case when result <> next_result then date end) over (partition by id order by date rows between unbounded preceding and 1 preceding)
from (select t.*,
lead(result) over (partition by a order by b) as next_result
from t
) t;
Try:
SELECT s1.ID
, FORMAT(s1.theDate,'MM-dd-yyyy') AS theDate
, s1.Amt
--, s1.PrevAmt
, CASE
WHEN Amt <> prevAmt
THEN FORMAT(
LAG(theDate) OVER ( PARTITION BY ID ORDER BY theDate )
,'MM-dd-yyyy' )
END AS prevDate
FROM (
SELECT ID, theDate, Amt
, LAG(AMT) OVER ( PARTITION BY ID ORDER BY theDate) AS prevAmt
FROM t1
) s1
ORDER BY ID, theDate DESC
This should give:
ID | theDate | Amt | prevDate
:- | :--------- | --: | :---------
A | 10-10-2017 | 90 | null
A | 04-04-2018 | 110 | null
A | 03-03-2018 | 110 | 02-02-2018
A | 02-02-2018 | 100 | 10-10-2017
B | 10-10-2018 | 90 | null
B | 09-09-2018 | 90 | 07-07-2018
B | 07-07-2018 | 80 | null
db<>fiddle here
For rows that don't have a previous row to pull the date from, it will return a NULL in the prevDate field.

How can I selectively NULL out values from one column but not another?

If I select all from my dbo.targetsvssales table I get the following result:
Date | Sales | Targets
_____________________________
2017-01-01 10 10
2017-02-01 19 20
2017-03-01 31 30
2017-04-01 38 40
2017-05-01 49 50
2017-06-01 62 60
2017-07-01 70 70
2017-08-01 75 80
2017-09-01 88 90
2017-10-01 101 100
2017-11-01 105 110
2017-12-01 105 120
I would like to only select the sales data that is from a date less than the current date, leaving the result of future dates as null, but keep the target values as they are. So the desired result from the select statement would be:
Date | Sales | Targets
_____________________________
2017-01-01 10 10
2017-02-01 19 20
2017-03-01 31 30
2017-04-01 38 40
2017-05-01 49 50
2017-06-01 62 60
2017-07-01 70 70
2017-08-01 75 80
2017-09-01 88 90
2017-10-01 101 100
2017-11-01 105 110
2017-12-01 NULL 120
This needs to be able to work year round, as well as on tables with weekly and daily precision as the Date column, so something that uses
WHERE DATE > GETDATE()
or something similar would be ideal. Any help or advice would be greatly appreciated.
Use case to define the result for this column:
CASE WHEN DATE <= GETDATE()
THEN Sales
END AS Sales
Note I've reversed your logic and skipped the else null because this is anyway default if omitted.
More about case: http://modern-sql.com/feature/case
Case will work for this problem.
CASE WHEN DATE<=GETDATE() THEN Sales ELSE NULL END
You can use Cas When statement
select date,
CASE
WHEN Date <= GETDATE()
THEN Sales
else NULL
End as Sales,
targets
from dbo.targetsvssales
DECLARE #targetsvssales AS TABLE ([Date] DATE, Sales MONEY, Targets MONEY)
INSERT INTO #targetsvssales VALUES
('2017-01-01',10,10),
('2017-02-01',19,20),
('2017-03-01',31,30),
('2017-04-01',38,40),
('2017-05-01',49,50),
('2017-06-01',62,60),
('2017-07-01',70,70),
('2017-08-01',75,80),
('2017-09-01',88,90),
('2017-10-01',101,100),
('2017-11-01',105,110),
('2017-12-01',105,120)
SELECT
[Date]
,CASE WHEN [DATE] > GETDATE() THEN NULL ELSE Sales END AS Sales
,Targets
FROM #targetsvssales
Use this :
select Date, case when Date > getdate() then NULL else Sales end as Sales, Targets
from dbo.targetsvssales
SQL HERE
Use ROW_NUMBER and CASE:
CREATE TABLE T (
MDate DATE,
Sales INT,
Targetes INT
);
INSERT INTO T VALUES
('2017-01-01', 10 , 10),
('2017-02-01', 19 , 20),
('2017-03-01', 31 , 30),
('2017-04-01', 38 , 40),
('2017-05-01', 49 , 50),
('2017-06-01', 62 , 60),
('2017-07-01', 70 , 70),
('2017-08-01', 75 , 80),
('2017-09-01', 88 , 90),
('2017-10-01', 101, 100),
('2017-11-01', 105, 110),
('2017-12-01', 105, 120);
WITH CTE AS (
SELECT *, ROW_NUMBER () OVER (PARTITION BY Sales ORDER BY MDate) RN
FROM T
)
SELECT MDate, CASE WHEN RN = 1 THEN Sales ELSE NULL END AS Sales, Targetes
FROM CTE;
If you really want to compare with GETDATE(), then I'll suggest using IIF or CASE as:
SELECT MDate, IIF(IIF(MDate <= GetDate(), Sales, NULL) AS Sales, Targetes
FROM T;
Demo

SQL - running total when data already grouped

I am trying to do a running total for some data, and have seen the easy way to do it. However, I have already grouped some data and this is throwing off my code. I currently have dates and payment types, and the totals that it relates to.
What I have at the moment is:
create table #testdata
(
mdate date,
pmttype varchar(64),
totalpmtamt int
)
insert into #testdata
select getdate()-7, 'DD', 10
union
select getdate() -7, 'SO', 12
union
select getdate()-6, 'DD', 3
union
select getdate()-5, 'DD', 13
union
select getdate()-5, 'SO', 23
union
select getdate()-5, 'PO', 8
What I want to have is:
mdate | paymenttype | totalpmtamt | incrtotal
2016-08-29 | DD | 10 | 10
2016-08-29 | SO | 12 | 22
2016-08-30 | DD | 3 | 25
2016-08-31 | DD | 13 | 38
2016-08-31 | SO | 8 | 46
2016-08-31 | PO | 23 | 69
I've tried adapting other code I've found here into:
select t1.mdate,
t1.pmttype,
t1.totalpmtamt,
SUM(t2.totalpmtamt) as runningsum
from #testdata t1
join #testdata t2 on t1.mdate >= t2.mdate and t1.pmttype >= t2.pmttype
group by t1.mdate, t1.pmttype, t1.totalpmtamt
order by t1.mdate
but all I get is
mdate | paymenttype | totalpmtamt | incrtotal
2016-08-29 | DD | 10 | 10
2016-08-29 | SO | 12 | 22
2016-08-30 | DD | 3 | 13
2016-08-31 | DD | 13 | 26
2016-08-31 | SO | 8 | 34
2016-08-31 | PO | 23 | 69
Can anyone help please?
The ANSI standard way of doing a cumulative sum is:
select t.*, sum(totalpmtamt) over (order by mdate) as runningsum
from #testdata t
order by t.mdate;
Not all databases support this functionality.
If your database doesn't support that functionality, I would go for a correlated subquery:
select t.*,
(select sum(t2.totalpmtamt)
from #testdata t2
where t2.mdate <= t.mdate
) as runningsum
from #testdata
order by t.mdate;
Use the below query for the desired result (for SQL Server).
with cte_1
as
(SELECT *,ROW_NUMBER() OVER(order by mdate ) RNO
FROM #testdata)
SELECT mdate,pmttype,totalpmtamt,(select sum(c2.totalpmtamt)
from cte_1 c2
where c2.RNO <= c1.RNO
) as incrtotal
FROM cte_1 c1
Output :
Sounds like SQL Server.
DECLARE #testdata TABLE
(
mdate DATE ,
pmttype VARCHAR(64) ,
totalpmtamt INT
);
INSERT INTO #testdata
( mdate, pmttype, totalpmtamt )
VALUES ( GETDATE() - 7, 'DD', 10 ),
( GETDATE() - 7, 'SO', 12 ),
( GETDATE() - 6, 'DD', 3 ),
( GETDATE() - 5, 'DD', 13 ),
( GETDATE() - 5, 'SO', 23 ),
( GETDATE() - 5, 'PO', 8 );
SELECT *,
SUM(totalpmtamt) OVER ( ORDER BY mdate ROWS UNBOUNDED PRECEDING )
AS RunningTotal
FROM #testdata t;