Work out how many days it took from one status to another : SQL - sql

Please feast your eyes on this current structure of our DB.
Our DBA is currently away for the next two weeks, I have very limited SQL knowledge, I like to stay with the UI and middle tier.
What we are trying to figure out is how can we do the following, we need to write a query to calculate the average period (in days) all commissions have taken to transition from ‘Verified’ to ‘Paid’ for a single dealer, currently the status are
Created
Verified
Rejected
Awaiting Payment
Paid
Refunded
I think this query needs to aim directly at the Commission History Table?
I'm not sure how I would go about writing such query due to the fact my knowledge on SQL is limited...
Any help would be great.

Here's a method to achieve what you're after, although it might not be the most efficient. It seems to me that it's more of a one off query you are looking to run, rather than something that you're going to run on a frequent enough to impact database performance.
Test Table Setup:
CREATE TABLE Commission
(
CommissionId INT,
DealerId INT
)
CREATE TABLE CommissionHistory
(
CommissionId INT,
ActionDate DATETIME,
NewPaymentStatusId INT
)
Insert Dummy Data - 5 Commissions for 1 Dealer:
INSERT INTO dbo.Commission
( CommissionId ,
DealerId
)
VALUES ( 1 , 1 ),
( 2 , 1 ),
( 3 , 1 ),
( 4 , 1 ),
( 5 , 1 ),
INSERT INTO dbo.CommissionHistory
( CommissionId ,
ActionDate ,
NewPaymentStatusId
)
VALUES ( 1 , GETDATE() -25, 1 ),
( 1 , GETDATE() -21, 2 ),
( 1 , GETDATE() -18, 3 ),
( 1 , GETDATE() -16, 4 ),
( 1 , GETDATE() -5, 5 ),
( 2 , GETDATE() -10, 1 ),
( 2 , GETDATE() -9, 2 ),
( 2 , GETDATE() -8, 3 ),
( 2 , GETDATE() -7, 4 ),
( 2 , GETDATE() -6, 5 ),
( 3 , GETDATE() -10, 1 ),
( 3 , GETDATE() -8, 2 ),
( 3 , GETDATE() -6, 3 ),
( 3 , GETDATE() -4, 4 ),
( 3 , GETDATE() -2, 5 ),
( 3 , GETDATE() -25, 6 ),
( 4 , GETDATE() -10, 1 ),
( 4 , GETDATE() -7, 2 ),
( 4 , GETDATE() -6, 3 ),
( 4 , GETDATE() -4, 4 ),
( 4 , GETDATE() -1, 5 ),
( 5 , GETDATE() -1, 1 ),
( 5 , GETDATE() -1, 2 )
So with the dummy data, Commissions 1, 2 &, 4 are classified as valid records as they have status 2 and 5. 3 is excluded as it is refunded and 5 is excluded as it's not paid.
To generate the averages I wrote the below query:
-- set the required dealer id
DECLARE #DealerId INT = 1
-- return all CommissionId's in to a temp table that have statuses 2 and 5, but not 6
SELECT DISTINCT CommissionId
INTO #DealerCommissions
FROM dbo.CommissionHistory t1
WHERE CommissionId IN (SELECT CommissionId
FROM dbo.Commission
WHERE DealerId = #DealerId)
AND NOT EXISTS (SELECT CommissionId
FROM dbo.CommissionHistory t2
WHERE t2.NewPaymentStatusId = 6 AND t2.CommissionId = t1.CommissionId)
AND EXISTS (SELECT CommissionId
FROM dbo.CommissionHistory t2
WHERE t2.NewPaymentStatusId = 2 AND t2.CommissionId = t1.CommissionId)
AND EXISTS (SELECT CommissionId
FROM dbo.CommissionHistory t2
WHERE t2.NewPaymentStatusId = 5 AND t2.CommissionId = t1.CommissionId)
-- use the temp table to return average difference between the MIN & MAX date
;WITH cte AS (
SELECT CommissionId FROM #DealerCommissions
)
SELECT AVG(CAST(DaysToCompletion AS DECIMAL(10,8)))
FROM (
SELECT DATEDIFF(DAY, MIN(ch.ActionDate), MAX(ch.ActionDate)) DaysToCompletion
FROM cte
INNER JOIN dbo.CommissionHistory ch ON ch.CommissionId = cte.CommissionId
GROUP BY ch.CommissionId
) AS averageDays
-- remove temp table
DROP TABLE #DealerCommissions

For every commission in history table you could get the max verified date and min paid date, assuming paid date always later than verified date. Then you can join commission table to group by dealer id to get the average duration in days.
with comm as(
select
commissionid,
max(case NewPamentStatus when 'Verified' then ActionDate else null end) as verified_date,
min(case NewPamentStatus when 'Paid' then ActionDate else null end) as paid_date
--using max or min just incase that same status will be recorded more than one time.
from
CommissionHistory
group by
commistionid
)
select
c.DealerId,
avg(datediff(day,comm.verified_date,comm.paid_date))
from
comm
inner join
commission c
on c.commissionid = comm.commissionid
where
datediff(day,comm.verified_date,comm.paid_date)>0
-- to get rid off the commissions with paid date before the verified date or in same day
group by
c.DealerId

Related

Optimal method for inserting missing dates into a partitioned Snowflake table?

The data is currently in the following format:
account_id
sale_month
revenue_new
revenue_expansion
revenue_churn
000001
2022-01-01
100
0
0
000001
2022-03-01
0
200
0
000001
2022-06-01
0
0
-300
I would like the data to be in the following format:
account_id
sale_month
revenue_opening
revenue_new
revenue_expansion
revenue_churn
revenue_closing
000001
2022-01-01
0
100
0
0
100
000001
2022-02-01
100
0
0
0
100
000001
2022-03-01
100
0
200
0
300
000001
2022-04-01
300
0
0
0
300
000001
2022-05-01
300
0
0
0
300
000001
2022-06-01
300
0
0
-300
0
I see this occurring in four steps:
Partitioning by account_id and ordering by sale_month
Inserting missing dates within these groups as new records into the table
Calculating revenue_closing
Calculating revenue_opening using a window function
It is step 1 and 2 that have me stumped. I am not sure how to write an insert statement that operates within an ordered partition and knows to only insert records for dates that do not exist.
I suppose I could always create a dim_dates table and left join revenue_table to that, but that approach strikes me as being clunky.
Any / all help is appreciated!
I agree that creating dim_dates and using window functions is the right approach to solve this. However, it can be solved using recursive CTE as well.
I am adding my version of SQL that uses recursive CTE.
with sales_data (account_id, sale_month, revenue_new, revenue_expansion, revenue_churn) as /* sample date */
(
select * from
(
values ('000001', '2022-01-01', 100, 0, 0)
, ('000001', '2022-03-01', 0, 200, 0)
, ('000001', '2022-06-01', 0, 0, -300)
-- added two more records for validation
, ('000002', '2022-01-01', 100, 0, 0)
, ('000002', '2022-04-01', 0, 400, 0)
, ('000002', '2022-07-01', 0, 0, -500)
)
), sales_month_range as /* getting date range for each account_id */
(
select account_id
, min(sale_month::date) as begin_date
, max(sale_month::date) as end_date
from sales_data
group by account_id
), rec_sales_month as /* recursive cte to fill missing dates and calculate opening and closing balance */
(
select sd.account_id
, sd.sale_month::date as sale_month
, 0 as revenue_opening
, sd.revenue_new
, sd.revenue_expansion
, sd.revenue_churn
, (nvl(sd.revenue_new, 0)
+ nvl(sd.revenue_expansion, 0))
+ nvl(sd.revenue_churn, 0
) as revenue_closing
, r.end_date as end_date
from sales_month_range r
inner join sales_data sd
on sd.account_id = r.account_id
and sd.sale_month::date = r.begin_date::date
where r.begin_date::date <= r.end_date::date
union all
select r.account_id
, dateadd(month, 1, r.sale_month::date) as sale_month
, r.revenue_closing as revenue_opening
, nvl(sd.revenue_new, 0) as revenue_new
, nvl(sd.revenue_expansion, 0) as revenue_expansion
, nvl(sd.revenue_churn, 0) as revenue_churn
, (
nvl(r.revenue_closing, 0)
+ nvl(sd.revenue_new, 0)
+ nvl(sd.revenue_expansion, 0)
+ nvl(sd.revenue_churn, 0)
) as revenue_closing
, r.end_date
from rec_sales_month r
left join sales_data sd
on sd.account_id = r.account_id
and dateadd(month, 1, r.sale_month::date) = sd.sale_month
where dateadd(month, 1, r.sale_month::date) <= r.end_date::date
)
select account_id
, sale_month
, revenue_opening
, revenue_new
, revenue_expansion
, revenue_churn
, revenue_closing
from rec_sales_month
--where account_id = '000001'
order by account_id, sale_month::date
sales_data
This CTE contains sample data
I have added few more records for another account_id 000002 to validate the solution
sales_month_range
This CTE will give me the date range for each account. So that I can use it to fill the missing dates
rec_sales_month
This is recursive CTE and it contains the main logic to fill the missing dates and calculate the opening/closing balance.
Snowflake documentation is available here that you can refer to understand the recursive CTE.
https://docs.snowflake.com/en/user-guide/queries-cte.html#recursive-ctes-and-hierarchical-data
Hope this will help!

How can I divide hours to next working days in SQL?

I have a table that stores the start-date and number of the hours. I have also another time table as reference to working days. My main goal is the divide this hours to the working days.
For examle:
ID Date Hour
1 20210504 40
I want it to be structured as
ID Date Hour
1 20210504 8
1 20210505 8
1 20210506 8
1 20210507 8
1 20210510 8
I manage to divide the hours with the given code but couldn't manage to make it in working days.
WITH cte1 AS
(
select 1 AS ID, 20210504 AS Date, 40 AS Hours --just a test case
), working_days AS
(
select date from dateTable
),
cte2 AS
(
select ID, Date, Hours, IIF(Hours<=8, Hours, 8) AS dailyHours FROM cte1
UNION ALL
SELECT
cte2.ID,
cte2.Date + 1
,cte2.Hours - 8
,IIF(Hours<=8, Hours, 8)
FROM cte2
JOIN cte1 t ON cte2.ID = t.ID
WHERE cte2.HOURS > 8 AND cte2.Date + 1 IN (select * from working_days)
When I use it like this it only gives me this output with one day missing
ID Date Hour
1 20210504 8
1 20210505 8
1 20210506 8
1 20210507 8
To solve your problem you need to build your calendar in the right way,
adding also to working_days a ROW_NUMBER to get correct progression.
declare #date_start date = '2021-05-01'
;WITH
cte1 AS (
SELECT * FROM
(VALUES
(1, '20210504', 40),
(2, '20210505', 55),
(3, '20210503', 44)
) X (ID, Date, Hour)
),
numbers as (
SELECT ROW_NUMBER() over (order by o.object_id) N
FROM sys.objects o
),
cal as (
SELECT cast(DATEADD(day, n, #date_start) as date) d, n-1 n
FROM numbers n
where n.n<32
),
working_days as (
select d, ROW_NUMBER() over (order by n) dn
from cal
where DATEPART(weekday, d) < 6 /* monday to friday in italy (country dependent) */
),
base as (
SELECT t.ID, t.Hour, w.d, w.dn
from cte1 t
join working_days w on w.d = t.date
)
SELECT t.ID, w.d, iif((8*n)<=Hour, 8, 8 + Hour - (8*n) ) h
FROM base t
join numbers m on m.n <= (t.Hour / 8.0) + 0.5
join working_days w on w.dn = t.dn + N -1
order by 1,2
You can use a recursive CTE. This should do the trick:
with cte as (
select id, date, 8 as hour, hour as total_hour
from t
union all
select id, dateadd(day, 1, date),
(case when total_hour < 8 then total_hour else 8 end),
total_hour - 8
from cte
where total_hour > 0
)
select *
from cte;
Note: This assumes that total_hour is at least 8, just to avoid a case expression in the anchor part of the CTE. That can trivially be added.
Also, if there might be more than 100 days, you will need option (maxrecursion 0).

Rolling total in SQL that Resets to 0 when going above 90

First time post. Learning SQL over the past 6 months so help is appreciated. I have data structured as below:
DECLARE #tmp4 as TABLE (
AccountNumber int,
Date date,
DateRank int
)
INSERT INTO #tmp4
VALUES (001, '11/13/2018' , 1)
, (002, '12/19/2018', 2)
, (003, '1/23/2019' , 3)
, (004, '2/5/2019' , 4)
, (005, '3/10/2019' , 5)
, (006, '3/20/2019' , 6)
, (007, '4/8/2019' , 7)
, (008, '5/20/2019' , 8)
What I need to do with this data is calculate a rolling total that resets to 0 once a threshold of 90 days is reached. I have used the DateDiff function to calculate the DateDiffs between consecutive dates and have tried multiple things using LAG and other window functions but can't make it reset. The goal is to find "index visits" which can only occur once every 90 days. So my plan is to have a field that reads 0 on the first visit and resets to 0 for the next stay after 90 days is up from the first visit then only pull visits with a value of 0.
One solution I tried was correct for most sets but did not return the right values for the above set (rows 4 and 8 should start over as "index visits").
The results I would expect for this query would be:
Account Date DateRank RollingTotal
001 |'11/13/2018' | 1 | 0
002 |'12/19/2018' | 2 | 35
003 |'1/23/2019' | 3 | 71
004 |'2/5/2019' | 4 | 84
005 |'3/10/2019' | 5 | 0 (not 117)
006 |'3/20/2019' | 6 | 10
007 |'4/8/2019' | 7 | 29
008 |'5/20/2019' | 8 | 71
Thanks for any help.
Here's the code I tried:
DECLARE #tmp2 as TABLE
(EmrNumber varchar(255)
, AdmitDateTime datetime
, DateRank int
, LagDateDiff int
, RunningTotal int
)
INSERT INTO #tmp2
SELECT tmp1.EmrNumber
, tmp1.AdmitDateTime
, tmp1.DateRank
--, LAG(tmp1.AdmitDateTime) OVER(PARTITION BY tmp1.EmrNumber ORDER BY tmp1.DateRank) as NextAdmitDate
, -DATEDIFF(DAY, tmp1.AdmitDateTime, LAG(tmp1.AdmitDateTime) OVER(PARTITION BY tmp1.EmrNumber ORDER BY tmp1.DateRank)) LagDateDiff
, IIF((SELECT SUM(sumt.total)
FROM (
SELECT -DATEDIFF(DAY, tmpsum.AdmitDateTime, LAG(tmpsum.AdmitDateTime) OVER(PARTITION BY tmpsum.EmrNumber ORDER BY tmpsum.DateRank)) total
FROM #tmp tmpsum
WHERE tmp1.EmrNumber = tmpsum.EmrNumber
AND tmpsum.AdmitDateTime <= tmp1.AdmitDateTime
) sumt) IS NULL, 0, (SELECT SUM(sumt.total)
FROM (
SELECT -DATEDIFF(DAY, tmpsum.AdmitDateTime, LAG(tmpsum.AdmitDateTime) OVER(PARTITION BY tmpsum.EmrNumber ORDER BY tmpsum.DateRank)) total
FROM #tmp tmpsum
WHERE tmp1.EmrNumber = tmpsum.EmrNumber
AND tmpsum.AdmitDateTime <= tmp1.AdmitDateTime
) sumt) ) as RunningTotal
FROM #tmp tmp1
SELECT *
, CASE WHEN LagDateDiff >90 THEN 0
WHEN RunningTotal = 0 THEN 0
ELSE LAG(LagDateDiff) OVER(PARTITION BY EmrNumber ORDER BY DateRank) + RunningTotal END AS RollingTotal
FROM #tmp2
You need a recursive query for this, because the running total has to be checked iteratively, row after row:
with cte as (
select
Account,
Date,
DateRank,
0 RollingTotal
from #tmp4
where DateRank = 1
union all
select
t.Account,
t.Date,
t.DateRank,
case when RollingTotal + datediff(day, c.Date, t.Date) > 90
then 0
else RollingTotal + datediff(day, c.Date, t.Date)
end
from cte c
inner join #tmp4 t on t.DateRank = c.DateRank + 1
)
select * from cte
The anchor of the cte selects the first record (as indicated by DateRank. Then, the recursive part processes rows one by one, and resets the running count when it crosses 90.

SQL Server query return next date with an event

Basic SQL question but I have a mind blank. I have a table with the following setup:
date eventType
-----------------------
01/01/2016 0
02/01/2016 0
03/01/2016 2
03/01/2016 2
04/01/2016 6
04/01/2016 6
04/01/2016 6
04/01/2016 6
05/01/2016 0
06/01/2016 ...
I want to return the "next set of events where eventType<>0"
So, if "today" was 01/01/2016, the query would return:
03/01/2016 2
03/01/2016 2
If "today" was 03/01/2016, the query would return:
04/01/2016 6
04/01/2016 6
04/01/2016 6
04/01/2016 6
Etc.
Many thanks
Hmmm. I think this may be a bit trickier than it seems. This does what you want for the data in the question:
select e.*
from events e cross join
(select top 1 eventType
from events
where date > getdate() and eventType <> 0
order by date
) as nexte
where e.date > getdate() and
e.eventType = nexte.eventType;
Or, perhaps a better fit:
select e.*
from events e cross join
(select top (1) e.*
from events
where date > getdate() and eventType <> 0
order by date
) as nexte
where e.date > nexte.date and
e.eventType = nexte.eventType;
Or, more simply:
select top (1) with ties e.*
from events e
where date > getdate() and eventType <> 0
order by date, eventType
I have a different solution, check this:
DECLARE #dtEventType DATE = '20160101'
DECLARE #table TABLE ( cDate DATE , eventType TINYINT )
INSERT INTO #table
VALUES( '20160101' , 0 )
, ( '20160102' , 0 )
, ( '20160103' , 2 )
, ( '20160103' , 2 )
, ( '20160104' , 6 )
, ( '20160104' , 6 )
, ( '20160104' , 6 )
, ( '20160104' , 6 )
, ( '20160105' , 0 )
SELECT *
FROM #table L
WHERE cDate = (
SELECT MIN( cDate ) AS mnDate
FROM #table
WHERE eventType <> 0
AND cDate > #dtEventType
)
But I liked the #GordonLiff's 3rd solution .
Maybe this will work:
SELECT eventDate, event
FROM events
WHERE eventDayte > GETDATE()+1 -- limit here to datePart date to avoid confusion with time as this can lead to issues
-- we should provide limit here to avoid return all future events
AND eventDate <= GETDATE()+2
AND eventType<>0

hard tsql problem - how many row values are in a sequential order

lets say I have a table with
date,personid
1/1/2001 1
1/2/2001 3
1/3/2001 2
1/4/2001 2
1/5/2001 5
1/6/2001 5
1/7/2001 6
and I'm going to either update 1/2/2001 or 1/5/2001 with personid 2 but before I can update I have to make sure it passes a rule that says you can't have a person three days in a row.
how can i solve this in a mssql stored procedure?
update: It also need to solve this layout as well where I'd update 1/5/2001
date,personid
1/1/2001 1
1/2/2001 3
1/3/2001 2
1/4/2001 2
1/5/2001 1
1/6/2001 2
1/7/2001 2
1/8/2001 5
1/9/2001 5
1/10/2001 6
I've assumed that date is unique let me know if that is not the case!
DECLARE #basedata TABLE ([date] UNIQUE DATE,personid INT)
INSERT INTO #basedata
SELECT GETDATE()+1, 2 union all
SELECT GETDATE()+2, 3 union all
SELECT GETDATE()+3, 2 union all
SELECT GETDATE()+4, 2 union all
SELECT GETDATE()+5, 5 union all
SELECT GETDATE()+6, 5 union all
SELECT GETDATE()+7, 6
DECLARE #date date = GETDATE()+5
DECLARE #personid int = 2
;WITH T AS
(
SELECT TOP 2 [date],personid
FROM #basedata
WHERE [date] < #date
ORDER BY [date] DESC
UNION ALL
SELECT #date, #personid
UNION ALL
SELECT TOP 2 [date],personid
FROM #basedata
WHERE [date] > #date
ORDER BY [date]
),T2 AS
(
SELECT *,
ROW_NUMBER() OVER (ORDER BY [date]) -
ROW_NUMBER() OVER (PARTITION BY personid ORDER BY [date]) AS Grp
FROM T
)
SELECT COUNT(*) /*Will return a result if that date/personid
would cause a sequence of 3*/
FROM T2
GROUP BY personid,Grp
HAVING COUNT(*) >=3
There is a third case not listed, it is the between date case. I included it in the solution below.
The output is
PersonId TrackDate UnallowedBefore UnallowedAfter
----------- ---------- --------------- --------------
2 01/04/2001 01/02/2001 01/05/2001
5 01/06/2001 01/04/2001 01/07/2001
6 01/08/2001 01/08/2001 01/08/2001
USE tempdb
GO
IF OBJECT_ID('PersonDates') IS NOT NULL DROP TABLE PersonDates
CREATE TABLE PersonDates
(
PersonId int NOT NULL,
TrackDate datetime NOT NULL
)
INSERT INTO PersonDates
(
TrackDate,
PersonId
)
SELECT '1/1/2001', 1
UNION ALL
SELECT '1/2/2001', 3
UNION ALL
SELECT '1/3/2001', 2
UNION ALL
SELECT '1/4/2001', 2
UNION ALL
SELECT '1/5/2001', 5
UNION ALL
SELECT '1/6/2001', 5
UNION ALL
SELECT '1/7/2001', 6
UNION ALL
SELECT '1/8/2001', 2
UNION ALL
SELECT '1/9/2001', 6
SELECT
P.PersonId,
TrackDate = CONVERT(varchar(10), DATEADD(day, 1, P.TrackDate), 101),
T.UnallowedBefore,
T.UnallowedAfter
FROM
PersonDates P
CROSS APPLY
(
SELECT TOP 1
UnallowedAfter = CASE
WHEN DATEDIFF(day, P.TrackDate, TrackDate) = 1
THEN CONVERT(varchar(10), DATEADD(day, 1, TrackDate), 101)
ELSE CONVERT(varchar(10), DATEADD(day, -1, TrackDate), 101)
END,
UnallowedBefore = CASE
WHEN DATEDIFF(day, P.TrackDate, TrackDate) = 1
THEN CONVERT(varchar(10), DATEADD(day, -2, TrackDate), 101)
ELSE CONVERT(varchar(10), DATEADD(day, -1, TrackDate), 101)
END
FROM
PersonDates
WHERE
PersonId = P.PersonId
AND
DATEDIFF(day, P.TrackDate, TrackDate) IN (1,2)
) T
SET #TargetDate = '1/2/2001'
SELECT #ForwardCount = COUNT(*) FROM table WHERE ([date] BETWEEN #TargetDate AND DATEADD(dd, 2, #TargetDate)) WHERE PersonID = #PersonID
SELECT #BackwardCount = COUNT(*) FROM table WHERE ([date] BETWEEN #TargetDate AND DATEADD(dd, -2, #TargetDate)) WHERE PersonID = #PersonID
SELECT #BracketCount = COUNT(*) FROM table WHERE ([date] BETWEEN DATEADD(dd, -1, #TargetDate) AND DATEADD(dd, 1, #TargetDate)) WHERE PersonID = #PersonID
IF (#ForwardCount < 2) AND (#BackwardCount < 2) AND (#BracketCount < 2)
BEGIN
-- Do your update here
END
Here's my parametrised solution:
WITH nearby AS (
SELECT
date,
personid = CASE date WHEN #date THEN #personid ELSE personid END
FROM atable
WHERE date BETWEEN DATEADD(day, -#MaxInARow, #date)
AND DATEADD(day, #MaxInARow, #date)
),
nearbyGroups AS (
SELECT
*,
Grp = DATEDIFF(day, 0, date) -
ROW_NUMBER() OVER (PARTITION BY personid ORDER BY date)
FROM nearby
)
UPDATE atable
SET personid = #personid
WHERE date = #date
AND NOT EXISTS (
SELECT Grp
FROM nearbyGroups
GROUP BY Grp
HAVING COUNT(*) > #MaxInARow
)
#date represents the date for which the personid column should be updated. #personid is the new value to be stored. #MaxInARow is the maximum number of days in a row for which the same personid is allowed to be stored.