SQL query to merge and sum time-periods - sql

I have a database table containing time-periods and amounts. Think of them as contracts with a duration and a price per day:
start | end | amount_per_day
2013-01-01 | 2013-01-31 | 100
2013-02-01 | 2013-06-30 | 200
2013-01-01 | 2013-06-30 | 100
2013-05-01 | 2013-05-15 | 50
2013-05-16 | 2013-05-31 | 50
I would like to make a query that will display the totals for each period, i.e.:
From 2013-01-01 to 2013-01-31, the first and third contract are active, so the total amount per day is 200. From 2013-02-01 to 2013-04-30, the second and third row are active, so the total is 300. From 2013-05-01 to 2013-05-15 the second, third and fourth row are active, so the total is 350. From 2013-05-16 to 2013-05-31 the second, third and fifth row are active, so the total is again 350. Finally, from 2013-06-01 to 2013-06-30 only the second and third are active, so the total is back to 300.
start | end | total_amount_per_day
2013-01-01 | 2013-01-31 | 200
2013-02-01 | 2013-04-30 | 300
2013-05-01 | 2013-05-31 | 350
2013-06-01 | 2013-06-30 | 300
(It is not necessary to detect that the intervals 2013-05-01 -> 2013-05-15 and 2013-05-16 -> 2013-05-31 have the same totals and merge them, but it would be nice).
I would prefer a portable solution, but if it is not possible a SQL Server will work, too.
I can make small changes to the structure of the table, so if it would make the query simpler to e.g. notate the time-periods with the end-date exclusive (so the first period would be start = 2013-01-01, end = 2013-02-01) feel free to make such suggestions.

I'll start with the full query and then break it down and explain it. This is SQL-Server specific, but with minor tweaks could be adapted to any DMBS that supports analytical functions.
WITH Data AS
( SELECT Start, [End], Amount_Per_Day
FROM (VALUES
('20130101', '20130131', 100),
('20130201', '20130630', 200),
('20130101', '20130630', 100),
('20130501', '20130515', 50),
('20130516', '20130531', 50)
) t (Start, [End], Amount_Per_Day)
), Numbers AS
( SELECT Number
FROM Master..spt_values
WHERE Type = 'P'
), DailyData AS
( SELECT [Date] = DATEADD(DAY, Number, Start),
[AmountPerDay] = SUM(Amount_Per_Day)
FROM Data
INNER JOIN Numbers
ON Number BETWEEN 0 AND DATEDIFF(DAY, Start, [End])
GROUP BY DATEADD(DAY, Number, Start)
), GroupedData AS
( SELECT [Date],
AmountPerDay,
[GroupByValue] = DATEADD(DAY, -ROW_NUMBER() OVER(PARTITION BY AmountPerDay ORDER BY [Date]), [Date])
FROM DailyData
)
SELECT [Start] = MIN([Date]),
[End] = MAX([Date]),
AmountPerDay
FROM GroupedData
GROUP BY AmountPerDay, GroupByValue
ORDER BY [Start], [End];
The Data CTE is just your sample data.
The Numbers CTE is just a sequence of numbers from 0 - 2047 (If your start and end dates are more than 2047 days apart this will fail and will need adapting slightly)
The Next CTE DailyData simply uses the numbers to expand your ranges into their individual dates, so
20130101, 20130131, 100
Becomes
20130101, 100
20130102, 100
20130103, 100
....
20130131, 100
Then it is just a case of grouping the data by the amount per day with the help of the ROW_NUMBER function to find when it changes and define ranges of similar amounts per day, then getting the MIN and MAX date for each range.
I always struggle to explain/demonstrate the exact workings of this method of grouping ranges, if it doesn't make sense it is perhaps easiest seen for your self if you just use SELECT * FROM DailyData at the end to see the raw unaggregated data

Related

SQL - Calculate relative amounts within a year from date segments

I am currently coding an existing Payroll system and I have the below problem. I need to count the Vacation days taken of one employee in one year in order to transfer them to the next. The days can be either complete, or hours in a day (e.g. 6 hour vacation from default 8 hour working day)
However the existing functionality only stores the aforementioned data in a table with columns like this.
EmployeeID | StartDate | EndDate | Hours
1 01-02-2018 04-02-2018 24
1 08-03-2018 08-03-2018 4
2 30-12-2017 04-01-2018 48
3 30-12-2018 04-01-2019 48
Now the issue is that I want to limit the dates to the previous year only. So since we have 2019, I need vacations only from 2018. Meaning records with different Start and End Year, need special handling
The result table should look like this
EmployeeID | HoursPreviousYear
1 28
2 32
3 16
I am already aware of some helpful SQL functions such as DATEDIFF() or YEAR(), but since each record is different, I would probably need to use a cursor and iterate the table. Then to pass the results to a different table, I would have create in the query and return it.
To be honest I am baffled...
I never had to use cursors before and as far as I can see, I am not sure even if I can return a table as a result (which I also need to use in a join later on). I am not sure if it is worth to continue struggling with it, but it seems that there should be an easier way.
My other option was to change the behavior of the Save button, to save 2 different records, with no overlapping years, but I cannot since we are having legacy data...
There are obviously some edge cases where this isn't thorough enough, but it should get you started.
This assumes 8 hours taken per day off, totally fails to account for date ranges that span a weekend or holiday, and wouldn't account for someone taking, say three full days off followed by a half day.
DECLARE #Year int = 2018;
SELECT
EmployeeID,
SUM(CASE WHEN StartDate < DATEFROMPARTS(#Year,1,1)
THEN DATEDIFF(DAY,DATEFROMPARTS(#Year-1,12,31),EndDate)*8
WHEN EndDate > DATEFROMPARTS(#Year,12,31)
THEN DATEDIFF(DAY,StartDate,DATEFROMPARTS(#Year+1,1,1))*8
ELSE [Hours]
END) AS HoursPreviousYear
FROM
#table
GROUP BY
EmployeeID;
+------------+-------------------+
| EmployeeID | HoursPreviousYear |
+------------+-------------------+
| 1 | 28 |
| 2 | 32 |
| 3 | 16 |
+------------+-------------------+
You can use DATEDIFF to calculate additional days for start and end date to deduct extra hours from total hours as shown in the following query-
SELECT EmployeeID,
SUM(Hours) - (SUM(StDiff)+SUM(EndDiff))*8 HoursPreviousYear
FROM
(
SELECT EmployeeID,
CONVERT(DATE, StartDate , 103) StartDate,
CONVERT(DATE, EndDate , 103) EndDate,
Hours,
CASE
WHEN YEAR(CONVERT(DATE, StartDate , 103)) = 2018 THEN 0
ELSE DATEDIFF(DD,CONVERT(DATE, StartDate , 103),CONVERT(DATE, '01-01-2018' , 103))
END StDiff,
CASE
WHEN YEAR(CONVERT(DATE, EndDate , 103)) = 2018 THEN 0
ELSE DATEDIFF(DD,CONVERT(DATE, '31-12-2018' , 103),CONVERT(DATE, EndDate , 103))
END EndDiff
FROM your_table
WHERE YEAR(CONVERT(DATE, StartDate , 103)) <= 2018
AND YEAR(CONVERT(DATE, EndDate , 103)) >= 2018
)A
GROUP BY EmployeeID

Selecting the most recent date

I have data structured like this:
ID | Enrolment_Date | Appointment1_Date | Appointment2_Date | .... | Appointment150_Date |
112 01/01/2015 01/02/2015 01/03/2018 01/08/2018
113 01/06/2018 01/07/2018 NULL NULL
114 01/04/2018 01/05/2018 01/06/2018 NULL
I need a new variable which counts the number of months between the enrolment_date and the most recent appointment. The challenge is is that all individuals have a different number of appointments.
Update: I agree with the comments that this is poor table design and it needs to be reformatted. Could proposed solutions please include suggested code on how to transform the table?
Since the OP is currently stuck with this bad design, I will point out a temporary solution. As others have suggested, you really must change the structure here. For now, this will suffice:
SELECT '['+ NAME + '],' FROM sys.columns WHERE OBJECT_ID = OBJECT_ID ('TableA') -- find all columns, last one probably max appointment date
SELECT ID,
Enrolment_Date,
CASE WHEN Appointment150_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment150_Date)
WHEN Appointment149_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment149_Date)
WHEN Appointment148_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment148_Date)
WHEN Appointment147_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment147_Date)
WHEN Appointment146_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment146_Date)
WHEN Appointment145_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment145_Date)
WHEN Appointment144_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment144_Date) -- and so on
END AS NumberOfMonths
FROM TableA
This is a very ugly temporary solution and should be considered as such.
You will need to restructure your data, the given structure is poor database design. Create two separate tables - one called users and one called appointments. The users table contains the user id, enrollment date and any other specific user information. Each row in the appointments table contains the user's unique id and a specific appointment date. Structuring your tables like this will make it easier to write a query to get days/months since last appointment.
For example:
Users Table:
ID, Enrollment_Date
1, 2018-01-01
2, 2018-03-02
3, 2018-05-02
Appointments Table:
ID, Appointment_Date
1, 2018-01-02
1, 2018-02-02
1, 2018-02-10
2, 2018-05-01
You would then be able to write a query to join the two tables together and calculate the difference between the enrollment date and min value of the appointment date.
It is better if you can create two tables.
Enrolment Table (dbo.Enrolments)
ID | EnrolmentDate
1 | 2018-08-30
2 | 2018-08-31
Appointments Table (dbo.Appointments)
ID | EnrolmentID | AppointmentDate
1 | 1 | 2018-09-02
2 | 1 | 2018-09-03
3 | 2 | 2018-09-01
4 | 2 | 2018-09-03
Then you can try something like this.
If you want the count of months from Enrolment Date to the final appointment date then use below query.
SELECT E.ID, E.EnrolmentDate, A.NoOfMonths
FROM dbo.Enrolments E
OUTER APPLY
(
SELECT DATEDIFF(mm, E.EnrolmentDate, MAX(A.AppointmentDate)) AS NoOfMonths
FROM dbo.Appointments A
WHERE A.EnrolmentId = E.ID
) A
And, If you want the count of months from Enrolment Date to the nearest appointment date then use below query.
SELECT E.ID, E.EnrolmentDate, A.NoOfMonths
FROM dbo.Enrolments E
OUTER APPLY
(
SELECT DATEDIFF(mm, E.EnrolmentDate, MIN(A.AppointmentDate)) AS NoOfMonths
FROM dbo.Appointments A
WHERE A.EnrolmentId = E.ID
) A
Try this on sqlfiddle
You have a lousy data structure, as others have noted. You really one a table with one row per appointment. After all, what happens after the 150th appointment?
select t.id, t.Enrolment_Date,
datediff(month, t.Enrolment_Date, m.max_Appointment_Date) as months_diff
from t cross apply
(select max(Appointment_Date) as max_Appointment_Date
from (values (Appointment1_Date),
(Appointment2_Date),
. . .
(Appointment150_Date)
) v(Appointment_Date)
) m;

Return Prorated Amount From Range Of Dates

I was able to find a script that given a range of start and end dates, it will create new rows based on the range of dates. The problem I am running into is that for each record I have an AMOUNT field I need to properly prorate across the date range.
CREATE TABLE #TempData (Company VARCHAR(6), InvoiceDate DATE, StartPeriod DATE, EndPeriod DATE, SchoolDistrict VARCHAR(100), Amount NUMERIC(10,2))
INSERT INTO #TempData (Company,InvoiceDate,StartPeriod,EndPeriod,SchoolDistrict,Amount)
SELECT '000123','1/1/2016','12/1/2015','12/31/2015','School District 123',140 UNION ALL
SELECT '000123','12/1/2016','6/15/2015','11/30/2015','School District 123',500
;WITH Recurse AS (
SELECT Company,InvoiceDate, StartPeriod
,CAST(DATEADD(DAY,-1,DATEADD(MONTH,DATEDIFF(MONTH,0,StartPeriod)+1,0)) AS DATE) EOM,EndPeriod
,SchoolDistrict,Amount
FROM #TempData
UNION ALL
SELECT Company,InvoiceDate
,CAST(DATEADD(MONTH,DATEDIFF(MONTH,0,StartPeriod)+1,0) AS DATE) StartPeriod
,CAST(DATEADD(DAY,-1,DATEADD(MONTH,DATEDIFF(MONTH,0,StartPeriod)+2,0)) AS DATE)
,EndPeriod
,SchoolDistrict,Amount
FROM Recurse
WHERE EOM<EndPeriod
)
SELECT Company,InvoiceDate,StartPeriod
,CASE WHEN EndPeriod<EOM THEN EndPeriod ELSE EOM END EndPeriod
,SchoolDistrict,Amount
FROM Recurse
DROP TABLE TempData
My Output looks like this:
Company InvoiceDate StartPeriod EndPeriod SchoolDistrict Amount
000123 2016-01-01 2015-12-01 2015-12-31 School District 123 140.00
000123 2016-12-01 2015-06-15 2015-06-30 School District 123 500.00
000123 2016-12-01 2015-07-01 2015-07-31 School District 123 500.00
000123 2016-12-01 2015-08-01 2015-08-31 School District 123 500.00
000123 2016-12-01 2015-09-01 2015-09-30 School District 123 500.00
000123 2016-12-01 2015-10-01 2015-10-31 School District 123 500.00
000123 2016-12-01 2015-11-01 2015-11-30 School District 123 500.00
As for the first record return, no need to do any prorating as it only is for 1 month, but the other records, I am needing assistance on how can I properly prorate the AMOUNT of 500 properly over the 6 records returned.
NOTE Update: On full months are equal distribution,then any StartPeriod and or EndPeriod months that are not full periods, get partial prorate distribution.
Here is a chain of expression derived from the original input dates and amount. You can readily feed this into your Recurse method although I recommend one of the other methods for generating the months like using a numbers table, especially if the dates can range over many years.
For the partial months it calculates a fraction based on the number of days covered in that month. The divisor is the total number of days in that month. Sometimes accountants treat a month as having 30 days so you'll have to decide if this is appropriate.
The full amount is split across the full months, weighted equally regardless of length, plus the two partials weighted by their individual proportions of their respective months. The full month amount is computed first and that result is rounded; the partial months depend on that calculation and note my comment at the end regarding the consequences of rounding to the penny. The final results need to take some care to distribute the last penny correctly so that the sum is correct.
with Expr1 as (
select *,
StartPeriod as RangeStart, EndPeriod as RangeEnd,
case when datediff(month, StartPeriod, EndPeriod) < 1 then null else
datediff(month, StartPeriod, EndPeriod) + 1
- case when datepart(day, StartPeriod) <> 1
then 1 else 0 end
- case when month(EndPeriod) = month(dateadd(day, 1, EndPeriod))
then 1 else 0 end
end as WholeMonths,
case when datepart(day, StartPeriod) <> 1
then 1 else 0 end as IsPartialStart,
case when month(EndPeriod) = month(dateadd(day, 1, EndPeriod))
then 1 else 0 end as IsPartialEnd,
datepart(day, StartPeriod) as StartPartialComplement,
datepart(day, EndPeriod) as EndPartialOffset,
datepart(day,
dateadd(day, -1, dateadd(month, datediff(month, 0, StartPeriod) + 1, 0))
) as StartPartialDaysInMonth,
datepart(day,
dateadd(day, -1, dateadd(month, datediff(month, 0, EndPeriod) + 1, 0))
) as EndPartialDaysInMonth
from #TempData
),
Expr2 as (
select *,
case when IsPartialStart = 1
then StartPartialDaysInMonth - StartPartialComplement + 1
else 0 end as StartPartialDays,
case when IsPartialEnd = 1
then EndPartialOffset else 0 end as EndPartialDays
from Expr1
),
Expr3 as (
select *,
cast(round(Amount / (
WholeMonths
+ StartPartialDays / cast(StartPartialDaysInMonth as float)
+ EndPartialDays / cast(EndPartialDaysInMonth as float)
), 2) as numeric(10, 2)) as WholeMonthAllocation,
StartPartialDays / cast(StartPartialDaysInMonth as float) as StartPartialFraction,
EndPartialDays / cast(EndPartialDaysInMonth as float) as EndPartialFraction
from Expr2
),
Expr4 as (
select *,
cast(case when IsPartialEnd = 0
then Amount - WholeMonthAllocation * WholeMonths
else StartPartialFraction * WholeMonthAllocation
end as numeric(10, 2)) as StartPartialAmount,
cast(case when IsPartialEnd = 0 then 0
else Amount
- WholeMonthAllocation * WholeMonths
- StartPartialFraction * WholeMonthAllocation
end as numeric(10, 2)) as EndPartialAmount
from Expr3
),
...
From those values you can determine which amount should end up in the final result after you've created all the extra rows. This expression will do the trick by incorporating your original query. (Since SQL Fiddle has been down I haven't been able to test any of this:)
... /* all of the above */
Recurse AS (
SELECT
RangeStart, RangeEnd, IsPartialStart, IsPartialEnd,
StartPartialAmount, EndPartialAmount, WholeMonthAllocation,
Company, InvoiceDate, StartPeriod,
CAST(DATEADD(DAY,-1,DATEADD(MONTH,DATEDIFF(MONTH,0,StartPeriod)+1,0)) AS DATE) EOM,
EndPeriod, SchoolDistrict,
case
when datediff(month, RangeStart, RangeEnd) = 0 then Amount
when IsPartialStart = 1 then StartPartialAmount
else WholeMonthAllocation
end as Amount
FROM Expr4
UNION ALL
SELECT
RangeStart, RangeEnd, IsPartialStart, IsPartialEnd,
StartPartialAmount, EndPartialAmount, WholeMonthAllocation,
Company, InvoiceDate,
CAST(DATEADD(MONTH,DATEDIFF(MONTH,0,StartPeriod)+1,0) AS DATE) AS StartPeriod,
CAST(DATEADD(DAY,-1,DATEADD(MONTH,DATEDIFF(MONTH,0,StartPeriod)+2,0)) AS DATE) EOM,
EndPeriod, SchoolDistrict,
case
-- final month is when StartPeriod is one month before RangeEnd.
-- remember this is recursive!
when IsPartialEnd = 1 and datediff(month, StartPeriod, RangeEnd) = 1
then EndPartialAmount
else WholeMonthAllocation
end as Amount
FROM Recurse
WHERE EOM < EndPeriod
)
SELECT
Company, InvoiceDate, StartPeriod,
CASE WHEN EndPeriod < EOM THEN EndPeriod ELSE EOM END EndPeriod,
SchoolDistrict, Amount
FROM Recurse
I've added/aliased RangeStart and RangeEnd values to avoid confusion with StartPeriod and EndPeriod which you're using in both your temp table and output query. The Range- values represent the start and end of the full span and the Period- values are the computed values that break out the individual periods. Adapt as you see fit.
Edit #1: I realized that I had not handled the case where start and end fall in the same month: perhaps there's a cleaner way to do this whole thing. I just ended up nulling the WholeMonths expression to avoid a possible divide by zero. The case expression at the end catches this condition and just returns the original Amount value. Although you probably don't have to worry about dealing with start and end dates getting reversed I went ahead and roped them all together with the same < 1 test.
Edit #2: Once I had a place to try this out your test case showed that the rounding was losing a penny and was getting picked up by the final partial month calculation even when it was actually one of the whole months. So I had to adjust to look for the case where there is no final partial month. That's in Expr4. I also spotted several of the minor syntax errors that you noted.
The recursive query allows for seeing the months in order and simplifies the logic a little bit. The anchor is always going to be the start month and so none of the final month logic applies and similarly for the other half of the query. If you end up switching this out with a regular join against a numbers table you'd want to use an expression like this instead:
case
when datediff(month, RangeStart, RangeEnd) = 0
then Amount
when IsPartialStart = 1 and is first month...
then StartPartialAmount
when IsPartialEnd = 1 and is final month...
then EndPartialAmount
else WholeMonthAllocation
end as Amount
Edit #3: Also be aware that this method is not appropriate when dealing with very small amounts where the rounding is going to skew the results. Examples:
$0.13 divided January 02 to December 01 gives [.01, .01, .01, .01, .01, .01, .01, .01, .01, .01, .01, .02]
$0.08 divided January 02 to December 01 gives [.01, .01, .01, .01, .01, .01, .01, .01, .01, .01, .01, -.03]
$0.08 divided January 31 to December 31 gives [-.03, .01, .01, .01, .01, .01, .01, .01, .01, .01, .01, .01]
$0.05 divided January 31 to November 30 gives [.05, .00, .00, .00, .00, .00, .00, .00, .00, .00, .00]
$0.05 divided January 31 to December 01 gives [.00, .00, .00, .00, .00, .00, .00, .00, .00, .00, .00, .05]
$0.30 divided January 02 to March 1 gives [.15, .15, .00]
It's an interesting problem because it requires both expanding the number of rows, and because of rounding problems that can be detected and corrected within the query.
First some fiddly date calcs are required to work out how many days in each month fall within the StartPeriod and EndPeriod.
Then an Estimate is calculated for each month as a simple proportion, but rounding errors will mean that the sum of these Estimates does not add up to the total Invoice Amount. Window functions are then used to calculated the total rounding error, so that the last payment can be adjusted.
As an aside, instead of generating a row for each month by using a recursive CTE, I recommend using a join with a simple "numbers" view. For more info see this question about number tables
-- I use the #tempdata table mentioned in the question
; WITH numbers AS ( -- A fast way to get a sequence of integers starting at 0
SELECT TOP(10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1 as n
FROM sys.all_columns a CROSS JOIN sys.all_columns b
),
data_with_pk AS ( -- Add a primary key so that we know how to sort output
SELECT ROW_NUMBER() OVER (ORDER BY company, invoicedate) AS InvoiceId, *
FROM #tempdata
),
step1 AS ( -- Calc first and last day of each month in which payment is due
SELECT data_with_pk.*,
CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, StartPeriod) + numbers.n, 0)
AS DATE) AS StartOfMonth,
CAST(DATEADD(DAY, -1,
DATEADD(MONTH, DATEDIFF(MONTH,0,StartPeriod) + numbers.n + 1, 0))
AS DATE) AS EndOfMonth
FROM data_with_pk
-- This join is a simpler way to generate multiple rows than using a recursive CTE
JOIN numbers ON numbers.n <= DATEDIFF(MONTH, StartPeriod, EndPeriod)
),
step2 AS ( -- Calc block of days in each month which fall within whole period
SELECT *,
CASE WHEN StartPeriod > StartOfMonth THEN StartPeriod ELSE StartOfMonth END
AS StartOfBlock,
CASE WHEN EndPeriod < EndOfMonth THEN EndPeriod ELSE EndOfMonth END
AS EndOfBlock
FROM step1
),
step3 AS ( -- Whole months count as 30 days for purposes of calculated proportions
SELECT *,
CASE WHEN StartOfBlock = StartOfMonth AND EndOfBlock = EndOfMonth
THEN 30
ELSE DATEDIFF(DAY, StartOfBlock, EndOfBlock) + 1 END AS DaysInBlock
FROM step2
),
step3b AS (
SELECT *,
SUM(DaysInBlock) OVER (PARTITION BY InvoiceId) AS DaysInPeriod
FROM step3
),
step4 AS ( -- Calc proportion of whole amount due in this block
SELECT *,
CAST(Amount * DaysInBlock / DaysInPeriod AS NUMERIC(10,2)) AS Estimate
FROM step3b
),
step5 AS ( -- Calc running total of estimates
SELECT *,
SUM(Estimate) OVER (PARTITION BY InvoiceId ORDER BY EndOfBlock) AS RunningEstimate
FROM step4
),
step6 AS ( -- Adjust last estimate to ensure final Prorata total is equal to Amount
SELECT *,
CASE WHEN EndOfBlock = EndPeriod
THEN Estimate + amount - RunningEstimate
ELSE Estimate end AS Prorata
FROM step5
),
step7 AS ( -- Just for illustration to prove that payments sum to the Invoice Amount
SELECT *,
SUM(Prorata) OVER (PARTITION BY InvoiceId ORDER BY EndOfBlock) AS RunningProrata
FROM step6
)
SELECT InvoiceId, InvoiceDate, StartPeriod, EndPeriod, Amount, DaysInBlock, EndOfBlock,
Estimate, RunningEstimate, Prorata, RunningProrata
FROM step7
ORDER BY InvoiceId, EndOfBlock
You can see the "Estimate" and "RunningEstimate" columns in the result set below end up being $0.01 out, but are corrected in the "Prorata" column.
+-----------+-------------+-------------+------------+--------+-------------+------------+----------+-----------------+---------+----------------+
| InvoiceId | InvoiceDate | StartPeriod | EndPeriod | Amount | DaysInBlock | EndOfBlock | Estimate | RunningEstimate | Prorata | RunningProrata |
+-----------+-------------+-------------+------------+--------+-------------+------------+----------+-----------------+---------+----------------+
| 1 | 2016-01-01 | 2015-12-01 | 2015-12-31 | 140.00 | 30 | 2015-12-31 | 140.00 | 140.00 | 140.00 | 140.00 |
| 2 | 2016-12-01 | 2015-06-15 | 2015-11-30 | 500.00 | 16 | 2015-06-30 | 48.19 | 48.19 | 48.19 | 48.19 |
| 2 | 2016-12-01 | 2015-06-15 | 2015-11-30 | 500.00 | 30 | 2015-07-31 | 90.36 | 138.55 | 90.36 | 138.55 |
| 2 | 2016-12-01 | 2015-06-15 | 2015-11-30 | 500.00 | 30 | 2015-08-31 | 90.36 | 228.91 | 90.36 | 228.91 |
| 2 | 2016-12-01 | 2015-06-15 | 2015-11-30 | 500.00 | 30 | 2015-09-30 | 90.36 | 319.27 | 90.36 | 319.27 |
| 2 | 2016-12-01 | 2015-06-15 | 2015-11-30 | 500.00 | 30 | 2015-10-31 | 90.36 | 409.63 | 90.36 | 409.63 |
| 2 | 2016-12-01 | 2015-06-15 | 2015-11-30 | 500.00 | 30 | 2015-11-30 | 90.36 | 499.99 | 90.37 | 500.00 |
+-----------+-------------+-------------+------------+--------+-------------+------------+----------+-----------------+---------+----------------+

Assign a counter in SQL Server to records with sequential dates, and only increment when dates not sequential

I am trying to assign a Trip # to records for Customers with sequential days, and increment the Trip ID if they have a break in sequential days, and come later in the month for example. The data structure looks like this:
CustomerID Date
1 2014-01-01
1 2014-01-02
1 2014-01-04
2 2014-01-01
2 2014-01-05
2 2014-01-06
2 2014-01-08
The desired output based upon the above example dataset would be:
CustomerID Date Trip
1 2014-01-01 1
1 2014-01-02 1
1 2014-01-04 2
2 2014-01-01 1
2 2014-01-05 2
2 2014-01-06 2
2 2014-01-08 3
So if the Dates for that Customer are back-to-back, it is considered the same Trip, and has the same Trip #. Is there a way to do this in SQL Server? I am using MSSQL 2012.
My initial thoughts are to use the LAG, ROW_NUMBER, or OVER/PARTITION BY function, or even a Recursive Table Variable Function. I can paste some code, but in all honesty, my code isn't working so far. If this is a simple query, but I am just not thinking about it correctly, that would be great.
Thank you in advance.
Since Date is a DATE (ie has no hours), you could for example use DENSE_RANK() by Date - ROW_NUMBER() days which will give a constant value for continuous days, something like;
WITH cte AS (
SELECT CustomerID, Date,
DATEADD(DAY,
-ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY Date),
Date) dt
FROM trips
)
SELECT CustomerID, Date,
DENSE_RANK() OVER (PARTITION BY CustomerID ORDER BY dt)
FROM cte;
An SQLfiddle to test with.

SQL Server: Finding date given EndDate and # Days, excluding days from specific date ranges

I have a TableA in a database similar to the following:
Id | Status | Start | End
1 | Illness | 2013-04-02 | 2013-04-23
2 | Illness | 2013-05-05 | 2014-01-01
3 | Vacation | 2014-02-01 | 2014-03-01
4 | Illness | 2014-03-08 | 2014-03-09
5 | Vacation | 2014-05-05 | NULL
Imagine it's keeping track of a specific user's "Away" days. Given the following Inputs:
SomeEndDate (Date),
NumDays (Integer)
I want to find the SomeStartDate (Date) that is Numdays non-illness days from EndDate. In other words, say I am given a SomeEndDate value '2014-03-10' and a NumDays value of 60; the matching SomeStartDate would be:
2014-03-10 to 2014-03-09 = 1
2014-03-08 to 2014-01-01 = 57
2013-05-05 to 2013-05-03 = 2
So, at 60 non-illness days, we get a SomeStartDate of '2013-05-03'. IS there any easy way to accomplish this in SQL? I imagine I could loop each day, check whether or not it falls into one of the illness ranges, and increment a counter if not (exiting the loop after counter = #numdays)... but that seems wildly inefficient. Appreciate any help.
Make a Calendar table that has a list of all the dates you will ever care about.
SELECT MIN([date])
FROM (
SELECT TOP(#NumDays) [date]
FROM Calendar c
WHERE c.Date < #SomeEndDate
AND NOT EXISTS (
SELECT 1
FROM TableA a
WHERE c.Date BETWEEN a.Start AND a.END
AND Status = 'Illness'
)
ORDER BY c.Date
) t
The Calendar table method lets you also easily exclude holidays, weekends, etc.
SQL Server 2012:
Try this solution:
DECLARE #NumDays INT = 70, #SomeEndDate DATE = '2014-03-10';
SELECT
[RangeStop],
CASE
WHEN RunningTotal_NumOfDays <= #NumDays THEN [RangeStart]
WHEN RunningTotal_NumOfDays - Current_NumOfDays <= #NumDays THEN DATEADD(DAY, -(#NumDays - (RunningTotal_NumOfDays - Current_NumOfDays))+1, [RangeStop])
END AS [RangeStart]
FROM (
SELECT
y.*,
DATEDIFF(DAY, y.RangeStart, y.RangeStop) AS Current_NumOfDays,
SUM( DATEDIFF(DAY, y.RangeStart, y.RangeStop) ) OVER(ORDER BY y.RangeStart DESC) AS RunningTotal_NumOfDays
FROM (
SELECT LEAD(x.[End]) OVER(ORDER BY x.[End] DESC) AS RangeStart, -- It's previous date because of "ORDER BY x.[End] DESC"
x.[Start] AS RangeStop
FROM (
SELECT #SomeEndDate AS [Start], '9999-12-31' AS [End]
UNION ALL
SELECT x.[Start], x.[End]
FROM #MyTable AS x
WHERE x.[Status] = 'Illness'
AND x.[End] <= #SomeEndDate
) x
) y
) z
WHERE RunningTotal_NumOfDays - Current_NumOfDays <= #NumDays;
/*
Output:
RangeStop RangeStart
---------- ----------
2014-03-10 2014-03-09
2014-03-08 2014-01-01
2013-05-05 2013-05-03
*/
Note #1: LEAD(End) will return the previous End date (previous because of ORDER BY End DESC)
Note #2: DATEDIFF(DAY, RangeStart, RangeStop) computes the num. of days between current start (alias x.RangeStop) and "previous" end (alias x.RangeStar) => Current_NumOfDays
Note #3: SUM( Current_NumOfDays ) computes a running total thus: 1 + 66 + (3)
Note #4: I've used #NumOfDays = 70 (not 60)