Have a dataset like below and would like to know various ways to solve the question of : what % of orders were within 20 minutes of each other?
CustomerId
Order_#
Order_Date
123
000112
12/25/2011 10:30
123
000113
12/25/2011 10:35
123
000114
12/25/2011 10:45
123
000115
12/25/2011 10:55
456
000113
12/25/2011 10:35
456
000113
1/25/2011 10:30
789
000117
9/25/2011 2:00
Result set should look like this:
3/7 = 0.42%
My approach was to first do a Self join with the table to get a count of rows which fall within the 20% but struggling to take out the duplicate rows.
Anyways, look forward to seeing some crafty answers.
Thank you.
You can use lead() and lag():
select avg( case when prev_order_date > order_date - interval '20 minute' or
next_order_date < order_date + interval '20 minute'
then 1.0 else 0
end) as ratio_within_20_minutes
from (select t.*,
lag(order_date) over (partition by customer_id order by order_date) as prev_order_date,
lead(order_date) over (partition by customer_id order by order_date) as next_order_date
from t
) t;
Note that date/time functions vary a lot among databases. This uses Standard SQL syntax for the comparisons. The exact syntax probably varies, depending on your database.
If you want this per customer then add group by customer_id to the query and customer_id to the select.
EDIT:
In SQL Server, this would be:
select avg( case when prev_order_date > dateadd(minute, -20, order_date) or
next_order_date < dateadd(minute, 20, order_date)
then 1.0 else 0
end) as ratio_within_20_minutes
from (select t.*,
lag(order_date) over (partition by customer_id order by order_date) as prev_order_date,
lead(order_date) over (partition by customer_id order by order_date) as next_order_date
from t
) t;
Here is my data finger table, [dbo].[tFPLog]
CardID Date Time TransactionCode
100 2020-09-01 08:00 IN
100 2020-09-01 17:00 OUT
100 2020-09-01 17:10 OUT
200 2020-09-01 16:00 IN
200 2020-09-02 02:00 OUT
200 2020-09-02 02:15 OUT
100 2020-09-02 07:00 IN
100 2020-09-02 16:00 OUT
200 2020-09-02 09:55 IN
200 2020-09-02 10:00 IN
200 2020-09-02 21:00 OUT
Conditions
Assume Employees will be IN and OUT in same day/next day.
Assume There will be multiple IN and OUT for same day/next day for employees. So need first IN and Last Out.
Duration = (FirstInTime - LastOutTime)
The current result i get using the query:
WITH CTE AS(
SELECT CardID,
[Date] AS DateIn,
MIN(CASE TransactionCode WHEN 'In' THEN [time] ELSE '23:59:59.999' END) AS TimeIn, --'23:59:59.999' as we are after the MIN, and NULL is the lowest value
[Date] AS DateOut,
MAX(CASE TransactionCode WHEN 'Out' THEN [time] END) AS TimeOut
FROM YourTable
GROUP BY CardID, [Date])
SELECT C.DateIn,
C.TimeIn,
C.DateOut,
C.TimeOut,
DATEADD(MINUTE,DATEDIFF(MINUTE,C.TimeIn,C.TimeOut),CONVERT(time(0),'00:00:00')) AS Duration
FROM CTE C;
=====The Current Result======
CardID DateIN TimeIN DateOUT TimeOUT Duration
100 2020-09-01 08:00 2020-09-01 17:10 09:10
200 2020-09-01 16:00 ? ? ?
100 2020-09-02 07:00 2020-09-02 16:00 09:00
200 2020-09-02 09:55 2020-09-02 21:00 11:05
=====The Result Needed=====
I want this result.
CardID DateIN TimeIN DateOUT TimeOUT Duration
100 2020-09-01 08:00 2020-09-01 17:10 09:10
200 2020-09-01 16:00 2020-09-02 02:15 10:15
100 2020-09-02 07:00 2020-09-02 16:00 09:00
200 2020-09-02 09:55 2020-09-02 21:00 11:05
How to get the DateOUT and TimeOUT in the nextday? with the condition FIRST IN AND LAST OUT. Please help, thank you in advance.
This seems like you were really overly complicating the problem. Just use some conditional aggregation, and then get the difference in minutes:
WITH CTE AS(
SELECT CardID,
[Date] AS DateIn,
MIN(CASE TransactionCode WHEN 'In' THEN [time] ELSE '23:59:59.999' END) AS TimeIn, --'23:59:59.999' as we are after the MIN, and NULL is the lowest value
[Date] AS DateOut,
MAX(CASE TransactionCode WHEN 'Out' THEN [time] END) AS TimeOut
FROM YourTable
GROUP BY CardID, [Date])
SELECT C.DateIn,
C.TimeIn,
C.DateOut,
C.TimeOut,
DATEADD(MINUTE,DATEDIFF(MINUTE,C.TimeIn,C.TimeOut),CONVERT(time(0),'00:00:00')) AS Duration
FROM CTE C;
This assumes that [date] is a date and [time] is a time (because, after all, that is what they are called...).
Side Note: it seems some what redundant have a DateIn and DateOut column when they will always have the same value. Might as well just have a [Date] Column.
Or perhaps, you are actually after this?
WITH CTE AS(
SELECT CardID,
[Date] AS DateIn,
[Time] AS TimeIn,
LEAD([Date]) OVER (PARTITION BY CardID ORDER BY [Date], [Time]) AS DateOut,
LEAD([Time]) OVER (PARTITION BY CardID ORDER BY [Date], [Time]) AS TimeOut,
TransactionCode
FROM dbo.YourTable)
SELECT C.DateIn,
C.TimeIn,
C.DateOut,
C.TimeOut
FROM CTE C
WHERE TransactionCode = 'IN';
Note that if that is the case, you would actually be better off storing the values [date] and [time] in a single column as a datetime/datetime2, not separate ones; as the values are clearly not distinct from each other.
Based on the (hopefully) final goal posts:
WITH VTE AS(
SELECT *
FROM (VALUES(100,CONVERT(date,'20200901'),CONVERT(time(0),'08:00:00'),'IN'),
(100,CONVERT(date,'20200901'),CONVERT(time(0),'17:00:00'),'OUT'),
(100,CONVERT(date,'20200901'),CONVERT(time(0),'17:10:00'),'OUT'),
(200,CONVERT(date,'20200901'),CONVERT(time(0),'16:00:00'),'IN'),
(200,CONVERT(date,'20200902'),CONVERT(time(0),'02:00:00'),'OUT'),
(200,CONVERT(date,'20200902'),CONVERT(time(0),'02:15:00'),'OUT'),
(100,CONVERT(date,'20200902'),CONVERT(time(0),'07:00:00'),'IN'),
(100,CONVERT(date,'20200902'),CONVERT(time(0),'16:00:00'),'OUT'),
(200,CONVERT(date,'20200902'),CONVERT(time(0),'09:55:00'),'IN'),
(200,CONVERT(date,'20200902'),CONVERT(time(0),'10:00:00'),'IN'),
(200,CONVERT(date,'20200902'),CONVERT(time(0),'21:00:00'),'OUT'))V(CardID,[Date],[Time],TransactionCode)),
Changes AS(
SELECT CardID,
DATEADD(MINUTE,DATEDIFF(MINUTE, '00:00:00',[time]),CONVERT(datetime2(0),[date])) AS Dt2, --Way easier to work with later
TransactionCode,
CASE TransactionCode WHEN LEAD(TransactionCode) OVER (PARTITION BY CardID ORDER BY [Date],[Time]) THEN 0 ELSE 1 END AS CodeChange
FROM VTE V),
Groups AS(
SELECT CardID,
dt2,
TransactionCode,
ISNULL(SUM(CodeChange) OVER (PARTITION BY CardID ORDER BY dt2 ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS Grp
FROM Changes),
MinMax AS(
SELECT CardID,
TransactionCode,
CASE TransactionCode WHEN 'IN' THEN MIN(dt2) WHEN 'Out' THEN MAX(dt2) END AS GrpDt2
FROM Groups
GROUP BY CardID,
TransactionCode,
Grp),
--And now original Logic
CTE AS(
SELECT CardID,
GrpDt2 AS DatetimeIn,
LEAD([GrpDt2]) OVER (PARTITION BY CardID ORDER BY GrpDt2) AS DateTimeOut,
TransactionCode
FROM MinMax)
SELECT C.CardID,
CONVERT(date,DatetimeIn) AS DateIn,
CONVERT(time(0),DatetimeIn) AS TimeIn,
CONVERT(date,DatetimeOut) AS DateOtt,
CONVERT(time(0),DatetimeOut) AS TimeOut,
DATEADD(MINUTE, DATEDIFF(MINUTE,DatetimeIn, DateTimeOut), CONVERT(time(0),'00:00:00')) AS Duration
FROM CTE C
WHERE TransactionCode = 'IN';
Please Suggest good sql query to find the start and end date of stock difference
imagine i data in a table like below.
Sample_table
transaction_date stock
2018-12-01 10
2018-12-02 10
2018-12-03 20
2018-12-04 20
2018-12-05 20
2018-12-06 20
2018-12-07 20
2018-12-08 10
2018-12-09 10
2018-12-10 30
Expected result should be
Start_date end_date stock
2018-12-01 2018-12-02 10
2018-12-03 2018-12-07 20
2018-12-08 2018-12-09 10
2018-12-10 null 30
It is the gap and island problem. You may use row_numer and group by for this.
select t.stock, min(transaction_date), max(transaction_date)
from (
select row_number() over (order by transaction_date) -
row_number() over (partition by stock order by transaction_date) grp,
transaction_date,
stock
from data
) t
group by t.grp, t.stock
In the following DBFIDDLE DEMO I solve also the null value of the last group, but the main idea of finding consecutive rows is build on the above query.
You may check this for an explanation of this solution.
You can try below using row_number()
select stock,min(transaction_date) as start_date,
case when min(transaction_date)=max(transaction_date) then null else max(transaction_date) end as end_date
from
(
select *,row_number() over(order by transaction_date)-
row_number() over(partition by stock order by transaction_date) as rn
from t1
)A group by stock,rn
Try to use GROUP BY with MIN and MAX:
SELECT
stock,
MIN(transaction_date) Start_date,
CASE WHEN COUNT(*)>1 THEN MAX(transaction_date) END end_date
FROM Sample_table
GROUP BY stock
ORDER BY stock
You can try with LEAD, LAG functions as below:
select currentStockDate as startDate,
LEAD(currentStockDate,1) as EndDate,
currentStock
from
(select *
from
(select
LAG(transaction_date,1) over(order by transaction_date) as prevStockDate,
transaction_date as CurrentstockDate,
LAG(stock,1) over(order by transaction_date) as prevStock,
stock as currentStock
from sample_table) as t
where (prevStock <> currentStock) or (prevStock is null)
) as t2
For each customer, I am trying to retrieve the records that are within 45 days of the most recent submit_date.
customer submit_date salary
A 2019-12-31 10000
B 2019-01-01 12000
A 2017-11-02 11000
A 2019-03-03 3000
B 2019-03-04 5500
C 2019-01-05 6750
D 2019-02-06 12256
E 2019-01-07 11345
F 2019-01-08 12345
Window functions come to the rescue:
SELECT customer, submit_date, salary
FROM (SELECT customer, submit_date, salary,
max(submit_date) OVER (PARTITION BY customer) AS latest_date
FROM thetable) AS q
WHERE submit_date >= latest_date - 45;
I am inclined to try:
select t.*
from t
where t.submit_date >= (select max(t2.submit_date) - interval '45 day'
from t t2
);
I think this can very much take advantage of an index on (submit_date).
If you want this relative to each customer, use a correlation clause:
select t.*
from t
where t.submit_date >= (select max(t2.submit_date) - interval '45 day'
from t t2
where t2.customer = t.customer
);
This wants an index on (customer, submit_date).
I have the current code that is working
select format_date('%Y%m', date) as yyyymm,
(sum(sum(val)) over (order by min(date)) /
sum(count(*)) over (order by min(date))
) as running_avg
from t
group by yyyymm
order by yyyymm;
Returns
yyyymm Score
201712 25.57931742
201801 24.69794466
201802 24.23110781
201803 23.85651947
201804 23.66164799
201805 23.43029053
201806 23.17074628
201807 23.09766588
201808 23.08902284
I am now trying to add an additional group by clause, for department. The query runs however the results are inaccurate, can anyone recognize what i am doing incorrectly?
select format_date('%Y%m', date) as yyyymm, department
(sum(sum(val)) over (order by min(date)) /
sum(count(*)) over (order by min(date))
) as running_avg
from t
group by yyyymm, department
order by yyyymm;
Returns
yyyymm department Score
201712 HR 6.704365079
201712 F&B 8.550338502
201712 Marketing 8.550338502
201712 I.T. 9.857502908
201712 Security 9.551491994
201712 Contractors 9.411654456
201712 Executive Office 9.637075283
201712 Property Services 9.45905826
201712 Corporate 9.57458477
201712 Legal 9.700320268
You need to add department to the partition by:
select department, format_date('%Y%m', date) as yyyymm,
(sum(sum(val)) over (partition by department order by min(date)) /
sum(count(*)) over (partition by department order by min(date))
) as running_avg
from t
group by yyyymm, department
order by department, yyyymm;