Use recursive CTE to handle date logic - sql

At work, one of my assignments is to calculate commission to the sales staff. One rule has been more challenging than the others.
Two sales teams A and B work together each selling different products. Team A can send leads to team B. The same customer might be send multiple times. The first time a customer (ex. lead 1)* is send a commission is paid to the salesperson in team A who created the lead. Now the customer is “locked” for the next 365 days (counting from the date lead 1 was created). Meaning that no one can get additional commission for that customer in that period by sending additional leads (ex. Lead 2 and 3 gets no commission). After the 365 days have expired. A new lead can be created and receive commission (ex. Lead 4). Then the customer is locked again for 365 days counting from the day lead 4 was created. Therefore, lead 5 gets no commission. The tricky part is to reset the date that the 365 days is counted from.
'* Reference to tables #LEADS and #DISERED result.
I have solved the problem in tSQL using a cursor, but I wonder if it was possible to use a recursive CTE instead. I have made several attempts the best one is pasted in below. The problem with my solution is, that I refer to the recursive table more than once. I have tried to fix that problem with nesting a CTE inside a CTE. That’s is not allowed. I have tried using a temporary table inside the CTE that is not allowed either. I tried several times to recode the recursive part of the CTE so that the recursive table is referenced only once, but then I am not able to get the logic to work.
I am using SQL 2008
IF OBJECT_ID('tempdb.dbo.#LEADS', 'U') IS NOT NULL
DROP TABLE #LEADS;
CREATE TABLE #LEADS (LEAD_ID INT, CUSTOMER_ID INT, LEAD_CREATED_DATE DATETIME, SALESPERSON_NAME varchar(20))
INSERT INTO #LEADS
VALUES (1, 1, '2013-09-01', 'Rasmus')
,(2, 1, '2013-11-01', 'Christian')
,(3, 1, '2014-01-01', 'Nadja')
,(4, 1, '2014-12-24', 'Roar')
,(5, 1, '2015-12-01', 'Kristian')
,(6, 2, '2014-01-05', 'Knud')
,(7, 2, '2015-01-02', 'Rasmus')
,(8, 2, '2015-01-08', 'Roar')
,(9, 2, '2016-02-05', 'Kristian')
,(10, 2, '2016-03-05', 'Casper')
SELECT *
FROM #LEADS;
IF OBJECT_ID('tempdb.dbo.#DISERED_RESULT', 'U') IS NOT NULL
DROP TABLE #DISERED_RESULT;
CREATE TABLE #DISERED_RESULT (LEAD_ID INT, DESIRED_COMMISION_RESULT CHAR(3))
INSERT INTO #DISERED_RESULT
VALUES (1, 'YES')
,(2, 'NO')
,(3, 'NO')
,(4, 'YES')
,(5, 'NO')
,(6, 'YES')
,(7, 'NO')
,(8, 'YES')
,(9, 'YES')
,(10, 'NO')
SELECT *
FROM #DISERED_RESULT;
WITH COMMISSION_CALCULATION AS
(
SELECT T1.*
,COMMISSION = 'YES'
,MIN_LEAD_CREATED_DATE AS COMMISSION_DATE
FROM #LEADS AS T1
INNER JOIN (
SELECT A.CUSTOMER_ID
,MIN(A.LEAD_CREATED_DATE) AS MIN_LEAD_CREATED_DATE
FROM #LEADS AS A
GROUP BY A.CUSTOMER_ID
) AS T2 ON T1.CUSTOMER_ID = T2.CUSTOMER_ID AND T1.LEAD_CREATED_DATE = T2.MIN_LEAD_CREATED_DATE
UNION ALL
SELECT T10.LEAD_ID
,T10.CUSTOMER_ID
,T10.LEAD_CREATED_DATE
,T10.SALESPERSON_NAME
,T10.COMMISSION
,T10.COMMISSION_DATE
FROM (SELECT ROW_NUMBER() OVER(PARTITION BY T5.CUSTOMER_ID ORDER BY T5.LEAD_CREATED_DATE ASC) AS RN
,T5.*
,T6.MAX_COMMISSION_DATE
,DATEDIFF(DAY, T6.MAX_COMMISSION_DATE, T5.LEAD_CREATED_DATE) AS ANTAL_DAGE_SIDEN_SIDSTE_COMMISSION
,CASE
WHEN DATEDIFF(DAY, T6.MAX_COMMISSION_DATE, T5.LEAD_CREATED_DATE) > 365 THEN 'YES'
ELSE 'NO'
END AS COMMISSION
,CASE
WHEN DATEDIFF(DAY, T6.MAX_COMMISSION_DATE, T5.LEAD_CREATED_DATE) > 365 THEN T5.LEAD_CREATED_DATE
ELSE NULL
END AS COMMISSION_DATE
FROM #LEADS AS T5
INNER JOIN (SELECT T4.CUSTOMER_ID
,MAX(T4.COMMISSION_DATE) AS MAX_COMMISSION_DATE
FROM COMMISSION_CALCULATION AS T4
GROUP BY T4.CUSTOMER_ID) AS T6 ON T5.CUSTOMER_ID = T6.CUSTOMER_ID
WHERE T5.LEAD_ID NOT IN (SELECT LEAD_ID FROM COMMISSION_CALCULATION)
) AS T10
WHERE RN = 1
)
SELECT *
FROM COMMISSION_CALCULATION;

I have made some assumptions where your description does not fully make sense as written, but the below achieves your desired result:
if object_id('tempdb.dbo.#leads', 'u') is not null
drop table #leads;
create table #leads (lead_id int, customer_id int, lead_created_date datetime, salesperson_name varchar(20))
insert into #leads
values (1, 1, '2013-09-01', 'rasmus')
,(2, 1, '2013-11-01', 'christian')
,(3, 1, '2014-01-01', 'nadja')
,(4, 1, '2014-12-24', 'roar')
,(5, 1, '2015-12-01', 'kristian')
,(6, 2, '2014-01-05', 'knud')
,(7, 2, '2015-01-02', 'rasmus')
,(8, 2, '2015-01-08', 'roar')
,(9, 2, '2016-02-05', 'kristian')
,(10, 2, '2016-03-05', 'casper')
if object_id('tempdb.dbo.#disered_result', 'u') is not null
drop table #disered_result;
create table #disered_result (lead_id int, desired_commision_result char(3))
insert into #disered_result
values (1, 'yes'),(2, 'no'),(3, 'no'),(4, 'yes'),(5, 'no'),(6, 'yes'),(7, 'no'),(8, 'yes'),(9, 'yes'),(10, 'no')
with rownum
as
(
select row_number() over (order by customer_id, lead_created_date) as rn -- This is to ensure an incremantal ordering id
,lead_id
,customer_id
,lead_created_date
,salesperson_name
from #leads
)
,cte
as
(
select rn
,lead_id
,customer_id
,lead_created_date
,salesperson_name
,'yes' as commission_result
,lead_created_date as commission_window_start
from rownum
where rn = 1
union all
select r.rn
,r.lead_id
,r.customer_id
,r.lead_created_date
,r.salesperson_name
,case when r.customer_id <> c.customer_id -- If the customer ID has changed, we are at a new commission window.
then 'yes'
else case when r.lead_created_date > dateadd(d,365,c.commission_window_start) -- This assumes the window is 365 days and not one year (ie. Leap years don't add a day)
then 'yes'
else 'no'
end
end as commission_result
,case when r.customer_id <> c.customer_id
then r.lead_created_date
else case when r.lead_created_date > dateadd(d,365,c.commission_window_start) -- This assumes the window is 365 days and not one year (ie. Leap years don't add a day)
then r.lead_created_date
else c.commission_window_start
end
end as commission_window_start
from rownum r
inner join cte c
on(r.rn = c.rn+1)
)
select lead_id
,commission_result
from cte
order by customer_id
,lead_created_date;

Related

sql that finds records within 3 days of a condition being met

I am trying to find all records that exist within a date range prior to an event occurring. In my table below, I want to pull all records that are 3 days or less from when the switch field changes from 0 to 1, ordered by date, partitioned by product. My solution does not work, it includes the first record when it should skip as it's outside the 3 day window. I am scanning a table with millions of records, is there a way to reduce the complexity/cost while maintaining my desired results?
http://sqlfiddle.com/#!18/eebe7
CREATE TABLE productlist
([product] varchar(13), [switch] int, [switchday] date)
;
INSERT INTO productlist
([product], [switch], [switchday])
VALUES
('a', 0, '2019-12-28'),
('a', 0, '2020-01-02'),
('a', 1, '2020-01-03'),
('a', 0, '2020-01-06'),
('a', 0, '2020-01-07'),
('a', 1, '2020-01-09'),
('a', 1, '2020-01-10'),
('a', 1, '2020-01-11'),
('b', 1, '2020-01-01'),
('b', 0, '2020-01-02'),
('b', 0, '2020-01-03'),
('b', 1, '2020-01-04')
;
my solution:
with switches as (
SELECT
*,
case when lead(switch) over (partition by product order by switchday)=1
and switch=0 then 'first day switch'
else null end as leadswitch
from productlist
),
switchdays as (
select * from switches
where leadswitch='first day switch'
)
select pl.*
,'lead'
from productlist pl
left join switchdays ss
on pl.product=ss.product
and pl.switchday = ss.switchday
and datediff(day, pl.switchday, ss.switchday)<=3
where pl.switch=0
desired output, capturing records that occur within 3 days of a switch going from 0 to 1, for each product, ordered by date:
product switch switchday
a 0 2020-01-02 lead
a 0 2020-01-06 lead
a 0 2020-01-07 lead
b 0 2020-01-02 lead
b 0 2020-01-03 lead
If I understand correctly, you can just use lead() twice:
select pl.*
from (select pl.*,
lead(switch) over (partition by product order by switchday) as next_switch_1,
lead(switch, 2) over (partition by product order by switchday) as next_switch_2
from productlist pl
) pl
where switch = 0 and
1 in (next_switch_1, next_switch_2);
Here is a db<>fiddle.
EDIT (based on comment):
select pl.*
from (select pl.*,
min(case when switch = 1 then switchdate end) over (partition by product order by switchdate desc) as next_switch_1_day
from productlist pl
) pl
where switch = 0 and
next_switch_one_day <= dateadd(day, 2, switchdate);

SQL: select data before first occurence of a certain value

For example, I have order data come from customers, like this
test = spark.createDataFrame([
(0, 1, 1, "2018-06-03"),
(1, 1, 1, "2018-06-04"),
(2, 1, 3, "2018-06-04"),
(3, 1, 2, "2018-06-05"),
(4, 1, 1, "2018-06-06"),
(5, 2, 3, "2018-06-01"),
(6, 2, 1, "2018-06-01"),
(7, 3, 1, "2018-06-02"),
(8, 3, 1, "2018-06-02"),
(9, 3, 1, "2018-06-05")
])\
.toDF("order_id", "customer_id", "order_status", "created_at")
test.show()
Each order has its own status, 1 means newly created but not finished, 3 means it's payed and finished.
Now, I want to do analysis for order comes from
new customers (who has not made purchase before)
old customers (who has finished purchase before)
so I want to add a feature to the above the data, turn into like this
The logic is for every customer, every order created before first order with status 3 (include itself) is counted as come from new customer, and every order after that is counted as old customer.
Or put it into another way, select the data before the first occurance of value 3 (for each customer's order, sort by date asc)
How can I do this, in SQL?
I searched around but didn't find good solution. If in Python, I think maybe I'll simply do some loop to get the values.
This is tested for SQLite:
SELECT order_id, customer_id, order_status, created_at,
CASE
WHEN order_id > (SELECT MIN(order_id) FROM orders WHERE customer_id = o.customer_id AND order_status = 3) THEN 'old'
ELSE 'new'
END AS customer_status
FROM orders o
You can do this using window functions in Spark:
select t.*,
(case when created_at > min(case when status = 3 then created_at end) over (partition by customer_id)
then 'old'
else 'new'
end) as customer_status
from test t;
Note that this assigns "new" to customers with no order with status "3".
You can also write this using join and group by:
select t.*,
coalesce(t3.customer_status, 'old') as customer_status
from test t left join
(select t.customer_id, min(created_at) as min_created_at,
'new' as customer_status
from t
where status = 3
group by t.customer_id
) t3
on t.customer_id = t3.customer_id and
t.created_at <= t3.min_created_at;

Get the aggregated result of a GROUP BY for each value on WHERE clause in TSQL

I have a table in SQL Server with the following format
MType (Integer), MDate (Datetime), Status (SmallInt)
1, 10-05-2018, 1
1, 15-05-2018, 1
2, 25-3-2018, 0
3, 12-01-2018, 1
....
I want to get the MIN MDate for specific MTypes for future dates. In case there isn't one, then the MType should be returned but with NULL value.
Here is what I have done until now:
SELECT m.MType,
MIN(m.MDate)
FROM MyTypes m
WHERE m.MType IN ( 1, 2, 3, 4)
AND m.MDate > GETDATE()
AND m.Status = 1
GROUP BY m.MType
Obviously, the above will return only the following:
1, 10-05-2018
Since there are any other rows with future date and status equals to 1.
However, the results I want are:
1, 10-05-2018
2, NULL
3, NULL
4, NULL //this is missing in general from the table. No MType with value 4
The table is big, so performance is something to take into account. Any ideas how to proceed?
One way is to join the table to itself and filter the date in the ON clause.
SELECT a.Mtype, MIN(b.MDate)
FROM MyTypes a
LEFT JOIN MyTypes b
ON a.MType = b.MType
AND b.MDate > GETDATE()
AND b.Status = 1
WHERE a.MType IN ( 1, 2, 3)
GROUP BY a.MType
Here's a Demo.
I don't know what is logic behind but it seems to use of look-up tables
SELECT a.MType, l.MDate
FROM
(
values (1),(2),(3),(4)
)a (MType)
LEFT JOIN (
SELECT m.MType,
MIN(m.MDate) MDate
FROM MyTypes m
WHERE m.MDate > GETDATE()
AND m.Status = 1
GROUP BY m.MType
)l on l.MType = a.MType
Use a windows function and a union to a numbers table:
declare #t table (MType int, MDate datetime, [Status] smallint)
Insert into #t values (1, convert(date, '10-05-2018', 103), 1)
,(1, convert(date, '15-05-2018', 103), 1)
,(2, convert(date, '25-03-2018', 103), 0)
,(3, convert(date, '12-01-2018', 103), 1)
Select DISTINCT Mtype
, min(iiF(MDate>getdate() and status = 1, MDate, NUll)) over (Partition By Mtype) as MDate
from ( SELECT TOP 10000 row_number() over(order by t1.number) as MType
, '1900-01-01' as MDate, 0 as [Status]
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
union
Select Mtype, MDate, [Status] from #t
) x
where MType in (1,2,3,4)
order by x.MType

SQL find average time difference between rows for a given category

I browsed SO but could not quite find the exact answer or maybe it was for a different language.
Let's say I have a table, where each row is a record of a trade:
trade_id customer trade_date
1 A 2013-05-01 00:00:00
2 B 2013-05-01 10:00:00
3 A 2013-05-02 00:00:00
4 A 2013-05-05 00:00:00
5 B 2013-05-06 12:00:00
I would like to have the average time between trades, in days or fraction of days, for each customer, and the number of days since last trade. So for instance for customer A, time between trades 1 and 3 is 1 day and between trades 3 and 4 is 3 days, for an average of 2. So the end table would look like something like this (assuming today it's the 2013-05-10):
customer avg_time_btw_trades time_since_last_trade
A 2.0 5.0
B 5.08 3.5
If a customer has only got 1 trade I guess NULL is fine as output.
Not even sure SQL is the best way to do this (I am working with SQL server), but any help is appreciated!
SELECT
customer,
DATEDIFF(second, MIN(trade_date), MAX(trade_date)) / (NULLIF(COUNT(*), 1) - 1) / 86400.0,
DATEDIFF(second, MAX(trade_date), GETDATE() ) / 86400.0
FROM
yourTable
GROUP BY
customer
http://sqlfiddle.com/#!6/eb46e/7
EDIT: Added final field that I didn't notice, apologies.
The following SQL script uses your data and gives the expected results.
DECLARE #temp TABLE
( trade_id INT,
customer CHAR(1),
trade_date DATETIME );
INSERT INTO #temp VALUES (1, 'A', '20130501');
INSERT INTO #temp VALUES (2, 'B', '20130501 10:00');
INSERT INTO #temp VALUES (3, 'A', '20130502');
INSERT INTO #temp VALUES (4, 'A', '20130505');
INSERT INTO #temp VALUES (5, 'B', '20130506 12:00');
DECLARE #getdate DATETIME
-- SET #getdate = getdate();
SET #getdate = '20130510';
SELECT s.customer
, AVG(s.days_btw_trades) AS avg_time_between_trades
, CAST(DATEDIFF(hour, MAX(s.trade_date), #getdate) AS float)
/ 24.0 AS time_since_last_trade
FROM (
SELECT CAST(DATEDIFF(HOUR, t2.trade_date, t.trade_date) AS float)
/ 24.0 AS days_btw_trades
, t.customer
, t.trade_date
FROM #temp t
LEFT JOIN #temp t2 ON t2.customer = t.customer
AND t2.trade_date = ( SELECT MAX(t3.trade_date)
FROM #temp t3
WHERE t3.customer = t.customer
AND t3.trade_date < t.trade_date)
) s
GROUP BY s.customer
You need a date difference between every trade and average them.
select
a.customer
,avg(datediff(a.trade_date, b.trade_date))
,datediff(now(),max(a.trade_date))
from yourTable a, yourTable b
where a.customer = b.customer
and b.trade_date = (
select max(trade_date)
from yourTable c
where c.customer = a.customer
and a.trade_date > c.trade_date)
#gets the one earlier date for every trade
group by a.customer
Just for grins I added a solution that would use CTE's. You could probably use a temp table if the first query is too large. I used #MatBailie creation script for the table:
CREATE TABLE customer_trades (
id INT IDENTITY(1,1),
customer_id INT,
trade_date DATETIME,
PRIMARY KEY (id),
INDEX ix_user_trades (customer_id, trade_date)
)
INSERT INTO
customer_trades (
customer_id,
trade_date
)
VALUES
(1, '2013-05-01 00:00:00'),
(2, '2013-05-01 10:00:00'),
(1, '2013-05-02 00:00:00'),
(1, '2013-05-05 00:00:00'),
(2, '2013-05-06 12:00:00')
;
;WITH CTE as(
select customer_id, trade_date, datediff(hour,trade_date,ISNULL(LEAD(trade_date,1) over (partition by customer_id order by trade_date),GETDATE())) Trade_diff
from customer_trades
)
, CTE2 as
(SELECT customer_id, trade_diff, LAST_VALUE(trade_diff) OVER(Partition by customer_id order by trade_date) Curr_Trade from CTE)
SELECT Customer_id, AVG(trade_diff) AV, Max(Curr_Trade) Curr_Trade
FROM CTE2
GROUP BY customer_id

Count previous consecutive rows in SQL Server

I have attendance data list which is showing below. Now I am trying to find data by a specific date range (01/05/2016 – 07/05/2016) with total Present Column, Total Present Column will be calculated from previous present data (P). Suppose today is 04/05/2016. If a person has 01,02,03,04 status ‘p’ then it will show date 04-05-2016 total present 4.
Could you help me to find total present from this result set.
You can check this example, which have logic to calculate previous sum value.
declare #t table (employeeid int, datecol date, status varchar(2) )
insert into #t values (10001, '01-05-2016', 'P'),
(10001, '02-05-2016', 'P'),
(10001, '03-05-2016', 'P'),
(10001, '04-05-2016', 'P'),
(10001, '05-05-2016', 'A'),
(10001, '06-05-2016', 'P'),
(10001, '07-05-2016', 'P'),
(10001, '08-05-2016', 'L'),
(10002, '07-05-2016', 'P'),
(10002, '08-05-2016', 'L')
--select * from #t
select * ,
SUM(case when status = 'P' then 1 else 0 end) OVER (PARTITION BY employeeid ORDER BY employeeid, datecol
ROWS BETWEEN UNBOUNDED PRECEDING
AND current row)
from
#t
Another twist of the same thing via cte (as you written SQLSERVER2012, this below solution only work in Sqlserver 2012 and above)
;with cte as
(
select employeeid , datecol , ROW_NUMBER() over(partition by employeeid order by employeeid, datecol) rowno
from
#t where status = 'P'
)
select t.*, cte.rowno ,
case when ( isnull(cte.rowno, 0) = 0)
then LAG(cte.rowno) OVER (ORDER BY t.employeeid, t.datecol)
else cte.rowno
end LagValue
from #t t left join cte on t.employeeid = cte.employeeid and t.datecol = cte.datecol
order by t.employeeid, t.datecol
You could use a subquery to calculate TotalPresent for each row:
SELECT
main.EmployeeID,
main.[Date],
main.[Status],
(
SELECT SUM(CASE WHEN t.[Status] = 'P' THEN 1 ELSE 0 END)
FROM [TableName] t
WHERE t.EmployeeID = main.EmployeeID AND t.[Date] <= main.[Date]
) as TotalPresent
FROM [TableName] main
ORDER BY
main.EmployeeID,
main.[Date]
Here I used subquery to count the sum of records that have the same EmployeeID and date is less or equal to the date of current row. If status of the record is 'P', then 1 is added to the sum, otherwise 0, which counts only records that have status P.
Interesting question, this should work:
select *
, (select count(retail) from p g
where g.date <= p.date and g.id = p.id and retail = 'P')
from p
order by ID, Date;
So I believe I understand correctly. You would like to count the occurences of P per ID datewise.
This makes a lot of sense. That is why the first occurrence of ID2 was L and the Total is 0. This query will count P status for each occurrence, pause at non-P for each ID.
Here is an example