Get the next and prev row data manipulations in SQL Server - sql

I have a data like below format in table:
Id EmployeeCode JobNumber TransferNo FromDate Todate
--------------------------------------------------------------------------
1 127 1.0 0 01-Mar-19 10-Mar-19
2 127 1.0 NULL 11-Mar-19 15-Mar-19
3 127 J-1 1 16-Mar-19 NULL
4 136 1.0 0 01-Mar-19 15-Mar-19
5 136 J-1 1 16-Mar-19 20-Mar-19
6 136 1.0 2 21-Mar-19 NULL
And I want result like this:
Id EmployeeCode JobNumber TransferNo FromDate Todate
--------------------------------------------------------------------------
2 127 1.0 NULL 01-Mar-19 15-Mar-19
3 127 J-1 1 16-Mar-19 NULL
4 136 1.0 0 01-Mar-19 15-Mar-19
5 136 J-1 1 16-Mar-19 20-Mar-19
6 136 1.0 2 21-Mar-19 NULL
The idea is
If Job is same in continuous than Single row with max id with min date and max date. For example, for employee 127 first job and second job number is same and second and third row is different, then the first and second row will be returned, with minimum fromdate and max todate, and third row will be returned as is.
If job number is different with its next job number than all rows will be returned.
For example: for employee 136: first job number is different with second, second is different with third, so all rows will be returned.

You can group by jobNumber and EmployeeCode and use the Max/Min-Aggregate-Functions to get the dates you want

I doubt you will get a result from simple set-based queries.
So my advice: Declare a cursor on SELECT DISTINCT EmployeeCode .... Within that cursor select all rows with that EmployeeCode. Work in this set to figure out your values and construct a resultset from that.

This is an example of a gaps and islands problem. The solution here is to define the "islands" by their starts, so the process is:
determine when a new grouping begins (i.e. no overlap with previous row)
do a cumulative sum of the the starts to get the grouping value
aggregate
This looks like
select max(id), EmployeeCode, JobNumber,
min(fromdate), max(todate)
from (select t.*,
sum(case when fromdate = dateadd(day, 1, prev_todate) then 0 else 1 end) over
(partition by EmployeeCode, JobNumber order by id
) as grouping
from (select t.*,
lag(todate) over (partition by EmployeeCode, JobNumber order by id) as prev_todate
from t
) t
) t
group by grouping, EmployeeCode, JobNumber;
It is unclear what the logic is for TransferNo. The simplest solution is just min() or max(), but that will not return NULL.

Related

How to effective get the max data in sub group and get the min data in among big groups?

Firstly, I hope to get the max date in each sub-groups.
Group A = action 1 & 2
Group B = action 3 & 4
actionName
action
actionBy
actiontime
999
1
Tom
2022-07-15 09:18:00
999
1
Tom
2022-07-15 15:21:00
999
2
Peter
2022-07-15 14:06:00
999
2
Peter
2022-07-15 14:08:00
999
3
Sally
2022-07-15 14:20:00
999
3
Mary
2022-07-15 14:22:00
999
4
Mary
2022-07-15 14:25:00
In this example:
The max time of group A is "1 | Tom | 2022-07-15 15:21:00 "
The max time of group B is " 4 | Mary | 2022-07-15 14:25:00 "
The final answer is "1 | Tom | 2022-07-15 14:25:00 ", which is the minimum data among groups.
I have a method how to get the max date in each group like the following code.
with cte1
as (select actionName,
actiontime,
actionBy,
row_number() over (partition by actionName order by actiontime desc) as rn
from actionDetails
where action in ( '1', '2' )
UNION
select actionName,
actiontime,
actionBy,
row_number() over (partition by actionName order by actiontime desc) as rn
from actionDetails
where action in ( '3', '4' )
)
select *
from cte1
where rn = 1
ActionName is not PK. It would get the max data in each group.
Then, I don't know how to use an effective way to get the minimum data between group A and group B. Would you give me some ideas?
I know one of the methods is self join again. However, I think that is not the best solution.
First of all, you can simplify your query by putting the action groups into the partition clause. Use a case expression to get one group for actions 1 and 2 and another for actions 3 and 4.
Then after getting the maximum dates per actionname and action group you want to get the minimum dates of these per actionname. This means you want a second CTE building up on the first one:
with max_per_group as
(
select top(1) with ties
actionname,
actiontime,
actionby
from actiondetails
where action in (1, 2, 3, 4)
order by row_number()
over (partition by actionname, case when action <= 2 then 1 else 2 end
order by actiontime desc)
)
, min_of_max as
(
select top(1) with ties
actionname,
actiontime,
actionby
from max_per_group
order by row_number() over (partition by actionname order by actiontime)
)
select actionname, actiontime, actionby
from min_of_max
order by actionname;
As you see, instead of computing a row number and then have to limit rows based on that in the next query, I limit the rows right away by putting the row numbering into the ORDER BY clause and applying TOP(1) WITH TIES to get all rows numbered 1. I like this a tad better, because the CTE already produces the rows that I want to work with rather than only marking them in a bigger data set. But that's personal preference I guess.
Discaimer:
In my query I assume that the column action is numeric. If the column is a string instead, because it can hold values that are not numbers, then work with strings:
where action in ('1', '2', '3', '4')
partition by actionname, case when action in ('1', '2') then 1 else 2 end
If on the other hand the column is a string, but there are only numbers in that column, fix your table instead.

T-SQL filtering records based on dates and time difference with other records

I have a table for which I have to perform a rather complex filter: first a filter by date is applied, but then records from the previous and next days should be included if their time difference does not exceed 8 hours compared to its prev or next record (depending if the date is less or greater than filter date).
For those adjacent days the selection should stop at the first record that does not satisfy this condition.
This is how my raw data looks like:
Id
Desc
EntryDate
1
Event type 1
2021-03-12 21:55:00.000
2
Event type 1
2021-03-12 01:10:00.000
3
Event type 1
2021-03-11 20:17:00.000
4
Event type 1
2021-03-11 05:04:00.000
5
Event type 1
2021-03-10 23:58:00.000
6
Event type 1
2021-03-10 11:01:00.000
7
Event type 1
2021-03-10 10:00:00.000
In this example set, if my filter date is '2021-03-11', my expected result set should be all records from that day plus adjacent records from 03-12 and 03-10 that satisfy the 8 hours condition. Note how record with Id 7 is not be included because record with Id 6 does not comply:
Id
EntryDate
2
2021-03-12 01:10:00.000
3
2021-03-11 20:17:00.000
4
2021-03-11 05:04:00.000
5
2021-03-10 23:58:00.000
Need advice how to write this complex query
This is a variant of gaps-and-islands. Define the difference . . . and then groups based on the differences:
with e as (
select t.*
from (select t.*,
sum(case when prev_entrydate > dateadd(hour, -8, entrydate) then 0 else 1 end) over (order by entrydate) as grp
from (select t.*,
lag(entrydate) over (order by entrydate) as prev_entrydate
from t
) t
)
select e.*
from e.*
where e.grp in (select e2.grp
from t e2
where date(e2.entrydate) = #filterdate
);
Note: I'm not sure exactly how filter date is applied. This assumes that it is any events on the entire day, which means that there might be multiple groups. If there is only one group (say the first group on the day), the query can be simplified a bit from a performance perspective.
declare #DateTime datetime = '2021-03-11'
select *
from t
where t.EntryDate between DATEADD(hour , -8 , #DateTime) and DATEADD(hour , 32 , #DateTime)

SQL How to calculate Average time between Order Purchases? (do sql calculations based on next and previous row)

I have a simple table that contains the customer email, their order count (so if this is their 1st order, 3rd, 5th, etc), the date that order was created, the value of that order, and the total order count for that customer.
Here is what my table looks like
Email Order Date Value Total
r2n1w#gmail.com 1 12/1/2016 85 5
r2n1w#gmail.com 2 2/6/2017 125 5
r2n1w#gmail.com 3 2/17/2017 75 5
r2n1w#gmail.com 4 3/2/2017 65 5
r2n1w#gmail.com 5 3/20/2017 130 5
ation#gmail.com 1 2/12/2018 150 1
ylove#gmail.com 1 6/15/2018 36 3
ylove#gmail.com 2 7/16/2018 41 3
ylove#gmail.com 3 1/21/2019 140 3
keria#gmail.com 1 8/10/2018 54 2
keria#gmail.com 2 11/16/2018 65 2
What I want to do is calculate the time average between purchase for each customer. So lets take customer ylove. First purchase is on 6/15/18. Next one is 7/16/18, so thats 31 days, and next purchase is on 1/21/2019, so that is 189 days. Average purchase time between orders would be 110 days.
But I have no idea how to make SQL look at the next row and calculate based on that, but then restart when it reaches a new customer.
Here is my query to get that table:
SELECT
F.CustomerEmail
,F.OrderCountBase
,F.Date_Created
,F.Total
,F.TotalOrdersBase
FROM #FullBase F
ORDER BY f.CustomerEmail
If anyone can give me some suggestions, that would be greatly appreciated.
And then maybe I can calculate value differences (in percentage). So for example, ylove spent $36 on their first order, $41 on their second which is a 13% increase. Then their second order was $140 which is a 341% increase. So on average, this customer increased their purchase order value by 177%. Unrelated to SQL, but is this the correct way of calculating a metric like this?
looking to your sample you clould try using the diff form min and max date divided by total
select email, datediff(day, min(Order_Date), max(Order_Date))/(total-1) as avg_days
from your_table
group by email
and for manage also the one order only
select email,
case when total-1 > 0 then
datediff(day, min(Order_Date), max(Order_Date))/(total-1)
else datediff(day, min(Order_Date), max(Order_Date)) end as avg_days
from your_table
group by email
The simplest formulation is:
select email,
datediff(day, min(Order_Date), max(Order_Date)) / nullif(total-1, 0) as avg_days
from t
group by email;
You can see this is the case. Consider three orders with od1, od2, and od3 as the order dates. The average is:
( (od2 - od1) + (od3 - od2) ) / 2
Check the arithmetic:
--> ( od2 - od1 + od3 - od2 ) / 2
--> ( od3 - od1 ) / 2
This pretty obviously generalizes to more orders.
Hence the max() minus min().

Snapshot Table Status Change

I am trying to write a sql query (in amazon redshift) that counts the number of times that customer goes from not meeting criteria to meeting criteria, so when a 1 occurs the date after a 0.
I'm stuggling to figure out the logic to do this
ID Snapshot_date Meets Criteria
55 1/1/2018 0
55 1/5/2018 1
55 1/10/2018 1
55 1/15/2018 1
55 1/20/2018 0
55 1/25/2018 1
Use lag to get the previous value,check for the conditions and count.
select id,count(*)
from (select id,snapshot_date
,lag(meets_critetria,1) over(partition by id order by snapshot_date) as prev_m_c
from tbl
) t
where prev_m_c = 0 and meets_criteria = 1
group by id

Runing Total sum minus different condition

I looked at some SQL Server running total examples, but I can't manage thing like this.
I have a table where I have columns id, name, type of operation, date, value.
I want to calculate balance for each record. Balance should be calculated like this:
Starting balance must be 0 and then if operation type is IN there will be plus, if operation is OUT there will be minus. Each next record should see previous record balance and then +value or -value depending on operation Type.
This operation should be ordered by date (not Id).
For example, if the table looks like this:
ID Name Op_Type Date Value
1 box Out 2017-05-13 15
2 table In 2017-04-31 65
3 box2 In 2017-05-31 65
then result should look like this
ID Name Op_Type Date Value Balance
2 table In 2017-04-31 65 65
1 box Out 2017-05-13 15 50
3 box2 In 2017-05-31 65 115
result of this code :
select *,
sum(case when Op_Type = 'Out' then -Value else Value end)Over(Order by [Date]) as Balance
From Yourtable
is:
ID Date Type Value Balance
143 2016-12-31 In 980 664.75
89 2016-12-31 Out 300 664.75
90 2016-12-31 Out 80 664.75
But I expect the following result:
ID Date Type Value Balance
143 2016-12-31 In 980 980
89 2016-12-31 Out 300 680
90 2016-12-31 Out 80 600
The problem with answer by Prdp is that SUM(...) OVER (ORDER BY ...) by default uses RANGE option instead of ROW.
This is why you see unexpected results when dates are not unique. This is how the default RANGE option works.
To get results that you expect spell it our explicitly:
SELECT
*
,SUM(CASE WHEN Op_Type = 'Out'
THEN -Value ELSE Value END)
OVER(ORDER BY [Date], Op_Type, ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Balance
FROM YourTable
ORDER BY [Date], Op_Type, ID;
I also added Op_Type into the ORDER BY to add positive values first in cases when there are several rows with the same date.
I added ID into the ORDER BY to make results stable in cases when there are several rows with the same date.