In T-SQL How Can I Select Up To The 5 Most Recent Rows, Grouped By An Identifier, If They Contain A Specific Value? - sql

Long title.
I am using T-SQL and attempting to find all accounts who's most recent transactions are ACHFAIL, and determine how many in a row they have had, up to 5.
I already wrote a huge, insanely convoluted query to group and count all ACHFAIL transactions that have had x ACHFAILs in a row. Now the requirements are the simpler "only count the most recent transactions"
Below is what I have so far, but I cannot wrap my head around the next step to take. I was trying to simplify my task by only counting up the 5, but if I could provide an accurate count of all the ACHFAIL attempts in a row, that would more ideal.
WITH grouped
AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY TRANSACTIONS.deal_id
ORDER BY TRANSACTIONS.deal_id, tran_date DESC) AS row_num
,TRANSACTIONS.tran_code
,TRANSACTIONS.tran_date
,TRANSACTIONS.deal_id
FROM TRANSACTIONS
)
SELECT TOP 1000 * FROM grouped
which returns rows such as:
row_num tran_code tran_date deal_id
1 ACHFAIL 2014-08-05 09:20:38.000 {01xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}
2 ACHCLEAR 2014-08-04 16:27:17.473 {01xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}
1 ACHCLEAR 2014-09-09 15:14:48.337 {02xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}
2 ACHCLEAR 2014-09-08 14:23:00.737 {02xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}
1 ACHFAIL 2014-07-18 14:35:38.037 {03xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}
2 ACHFAIL 2014-07-18 13:58:52.000 {03xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}
3 ACHCLEAR 2014-07-17 14:48:58.617 {03xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}
4 ACHFAIL 2014-07-16 15:04:28.023 {03xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}
01xxxxxx has 1 ACHFAIL
02xxxxxx has 0 ACHFAIL
03xxxxxx has 2 ACHFAIL

You are half way there. With any sort of problem with "consecutive rows", you will need a recursive CTE (that's TEMP2 below):
;WITH
TEMP1 AS
(
SELECT tran_code,
deal_id,
ROW_NUMBER() OVER (PARTITION BY deal_id ORDER BY tran_date DESC) AS tran_rank
FROM TRANSACTIONS
),
TEMP2 AS
(
SELECT tran_code,
deal_id,
tran_rank
FROM TEMP1
WHERE tran_rank = 1 -- last transaction for a deal
AND tran_code = 'ACHFAIL' -- failed transactions only
UNION ALL
SELECT curr.tran_code,
curr.deal_id,
curr.tran_rank
FROM TEMP1 curr
INNER JOIN TEMP2 prev ON curr.deal_id = prev.deal_id -- transaction must be for the same deal
AND curr.tran_rank = prev.tran_rank + 1 -- must be consecutive
WHERE curr.tran_code = 'ACHFAIL' -- must have failed
AND curr.tran_rank <= 5 -- up to 5 only
)
SELECT t.deal_id,
ISNULL(MAX(tran_rank),0) AS FailCount
FROM TRANSACTIONS t
LEFT JOIN TEMP2 t2 ON t.deal_id = t2.deal_id
GROUP BY t.deal_id
SQL Fiddle

If I understand correctly, you want the number of fails in the five most recent transactions for each deal. That would be something like:
WITH grouped AS (
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.deal_id ORDER BY tran_date DESC
) AS seqnum
FROM TRANSACTIONS t
)
SELECT deal_id, sum(case when tran_code = 'ACHFAIL' then 1 else 0 end) as NuMFails
FROM grouped
WHERE seqnum <= 5
GROUP BY deal_id;
The CTE enumerates the rows. The where clause takes the 5 most recent rows for each deal. The group by then aggregates by deal_id.
Note that you do not need to include the partition by column(s) in the order by when you use over.

Related

How to filter out rows based on information in other columns and rows

I have a pricing policy table that determines the price a customer is given based on qty purchased as per below.
debtor_code
stock_code
min_quantity
default_price
contract_price
2393
GRBAG100GTALL-50
0
295
236
2393
GRBAG100GTALL-50
5
295
265.5
2393
GRBAG100GTALL-50
10
295
221.25
The pricing offered is based on the cheapest contract_price available for the lowest qty, meaning that the second row is obsolete as the min_quantity, and the cheaper price from the first row overrides the second row.
How can I use a SQL Server query to filter out obsolete rows like this as the first row supersedes it by having a cheaper contract_price at a lower min_quantity.
The result should look like:
debtor_code
stock_code
min_quantity
default_price
contract_price
2393
GRBAG100GTALL-50
0
295
236
2393
GRBAG100GTALL-50
10
295
221.25
use LEAD() to find the next tier's contract_price and compare with current level. Set the flag and filter out accordingly in the final query.
Based on assumption that price at higher tier (higher min_quantity value) should be cheaper than current tier.
with cte as
(
select *,
case when lead(contract_price) over (partition by debtor_code
order by min_quantity) < contract_price
then 1
else 0
end as flag
from pricing
)
select *
from cte
where flag = 0
dbfiddle demo
EDIT :
The following query uses recursive cte to compare current row with previous row to determine the validity of the price
with cte as
(
select *, rn = row_number() over (partition by debtor_code, stock_code
order by min_quantity)
from pricing
),
rcte as
(
select debtor_code, stock_code, rn, min_quantity, default_price,
contract_price,
valid_price = contract_price, valid = 1
from cte
where rn = 1
union all
select c.debtor_code, c.stock_code, c.rn, c.min_quantity, c.default_price,
c.contract_price,
valid_price = case when c.contract_price < r.contract_price
then c.contract_price
else r.contract_price
end,
valid = case when c.contract_price < r.contract_price
then 1
else 0
end
from rcte r
inner join cte c on r.rn = c.rn - 1
)
select *
from rcte
where valid = 1
dbfiddle demo
Edit 2
A much simplified solution. First is to find the min(contract_price) in the sequence of min_quantity. Then simply compare the current contract_price with that. It is same or equal, it is valid
select *
from
(
select *, valid_price = min(contract_price)
over (partition by debtor_code, stock_code
order by min_quantity)
from pricing
) p
where contract_price <= valid_price
dbfiddle demo

Oracle SQL - select last 3 rows after a specific row

Below is my data:
My requirement is to get the first 3 consecutive approvals. So from above data, ID 4, 5 and 6 are the rows that I need to select. ID 1 and 2 are not eligible, because ID 3 is a rejection and hence breaks the consecutive condition of actions. Basically, I am looking for the last rejection in the list and then finding the 3 consecutive approvals after that.
Also, if there are no rejections in the chain of actions then the first 3 actions should be the result. For below data:
So my output should be ID 11, 12 and 13.
And if there are less than 3 approvals, then the output should be the list of approvals. For below data:
output should be ID 21 and 22.
Is there any way to achieve this with SQL query only - i.e. no PL-SQL code?
Here is one method that uses window functions:
Find the first row where there are three approvals.
Find the minimum action_at among the rows with three approvals
Filter
Keep the three rows you want
This version uses fetch which is in Oracle 12+:
select t.*
from (select t.*,
min(case when has_approval_3 = 3 then action_at end) over () as first_action_at
from (select t.*,
sum(case when action = 'APPROVAL' then 1 else 0 end) over (order by action_at rows between current row and 2 following) as has_approval_3
from t
) t
) t
where action = 'APPROVAL' and
(action_at >= first_action_at or first_action_at is null)
order by action_at
fetch first 3 rows only;
You can use IN and ROW_NUMBER analytical function as following:
SELECT * FROM
( SELECT
T.*,
ROW_NUMBER() OVER(ORDER BY Y.ACTION_AT) AS RN
FROM YOUR_TABLE Y
WHERE Y.ACTION = 'APPROVE'
AND Y.ACTION_AT >= COALESCE(
(SELECT MAX(YIN.ACTION_AT)
FROM YOUR_TABLE YIN
WHERE YIN.ACTION = 'REJECT'
), Y.ACTION_AT) )
WHERE RN <= 3;
Cheers!!

Need sum of a column from a filter condition for each row

Need to get total sum of defect between main_date column and past 365 day (a year) from it, if any, for a single ID.
And The value need to be populated for each row.
Have tried below queries and tried to use CSUM also but it's not working:
1) select sum(Defect) as "sum",Id,MAIN_DT
from check_diff
where MAIN_DT between ADD_MONTHS(MAIN_DT,-12) and MAIN_DT group by 2,3;
2)select Defect,
Type1,
Type2,
Id,
MAIN_DT,
ADD_MONTHS(TIM_MAIN_DT,-12) year_old,
CSUM(Defect,MAIN_DT)
from check_diff
where
MAIN_DT between ADD_MONTHS(MAIN_DT,-12) and MAIN_DT group by id;
The expected output is as below:
Defect Type1 Type2 Id main_dt sum
1 a a 1 3/10/2017 1
99 a a 1 4/10/2018 99
0 a b 1 7/26/2018 99
1 a b 1 11/21/2018 100
1 a c 2 12/20/2018 1
Teradata doesn't support RANGE for Cumulative Sums, but you can rewrite it using a Correlated Scalar SUbquery:
select Defect, Id, MAIN_DT,
( select sum(Defect) as "sum"
from check_diff as t2
where t2.Id = t1.Id
and t2.MAIN_DT > ADD_MONTHS(t1.MAIN_DT,-12)
and t2.MAIN_DT <= t1.MAIN_DT group by 2,3;
) as dt
from check_diff as t1
Performance might be bad depending on the overall number of rows and the number of rows per ID.

Joining next Sequential Row

I am planing an SQL Statement right now and would need someone to look over my thougts.
This is my Table:
id stat period
--- ------- --------
1 10 1/1/2008
2 25 2/1/2008
3 5 3/1/2008
4 15 4/1/2008
5 30 5/1/2008
6 9 6/1/2008
7 22 7/1/2008
8 29 8/1/2008
Create Table
CREATE TABLE tbstats
(
id INT IDENTITY(1, 1) PRIMARY KEY,
stat INT NOT NULL,
period DATETIME NOT NULL
)
go
INSERT INTO tbstats
(stat,period)
SELECT 10,CONVERT(DATETIME, '20080101')
UNION ALL
SELECT 25,CONVERT(DATETIME, '20080102')
UNION ALL
SELECT 5,CONVERT(DATETIME, '20080103')
UNION ALL
SELECT 15,CONVERT(DATETIME, '20080104')
UNION ALL
SELECT 30,CONVERT(DATETIME, '20080105')
UNION ALL
SELECT 9,CONVERT(DATETIME, '20080106')
UNION ALL
SELECT 22,CONVERT(DATETIME, '20080107')
UNION ALL
SELECT 29,CONVERT(DATETIME, '20080108')
go
I want to calculate the difference between each statistic and the next, and then calculate the mean value of the 'gaps.'
Thougts:
I need to join each record with it's subsequent row. I can do that using the ever flexible joining syntax, thanks to the fact that I know the id field is an integer sequence with no gaps.
By aliasing the table I could incorporate it into the SQL query twice, then join them together in a staggered fashion by adding 1 to the id of the first aliased table. The first record in the table has an id of 1. 1 + 1 = 2 so it should join on the row with id of 2 in the second aliased table. And so on.
Now I would simply subtract one from the other.
Then I would use the ABS function to ensure that I always get positive integers as a result of the subtraction regardless of which side of the expression is the higher figure.
Is there an easier way to achieve what I want?
The lead analytic function should do the trick:
SELECT period, stat, stat - LEAD(stat) OVER (ORDER BY period) AS gap
FROM tbstats
The average value of the gaps can be done by calculating the difference between the first value and the last value and dividing by one less than the number of elements:
select sum(case when seqnum = num then stat else - stat end) / (max(num) - 1);
from (select period, row_number() over (order by period) as seqnum,
count(*) over () as num
from tbstats
) t
where seqnum = num or seqnum = 1;
Of course, you can also do the calculation using lead(), but this will also work in SQL Server 2005 and 2008.
By using Join also you achieve this
SELECT t1.period,
t1.stat,
t1.stat - t2.stat gap
FROM #tbstats t1
LEFT JOIN #tbstats t2
ON t1.id + 1 = t2.id
To calculate the difference between each statistic and the next, LEAD() and LAG() may be the simplest option. You provide an ORDER BY, and LEAD(something) returns the next something and LAG(something) returns the previous something in the given order.
select
x.id thisStatId,
LAG(x.id) OVER (ORDER BY x.id) lastStatId,
x.stat thisStatValue,
LAG(x.stat) OVER (ORDER BY x.id) lastStatValue,
x.stat - LAG(x.stat) OVER (ORDER BY x.id) diff
from tbStats x

Find the longest sequence of consecutive increasing numbers in SQL

For this example say I have a table with two fields, AREA varchar(30) and OrderNumber INT.
The table has the following data
AREA | OrderNumber
Fontana | 32
Fontana | 42
Fontana | 76
Fontana | 12
Fontana | 3
Fontana | 99
RC | 32
RC | 1
RC | 8
RC | 9
RC | 4
I would like to return
The results I would like to return is for each area the longest length of increasing consecutive values. For Fontana it is 3 (32, 42, 76). For RC it is 2 (8,9)
AREA | LongestLength
Fontana | 3
RC | 2
How would I do this on MS Sql 2005?
One way is to use a recursive CTE that steps over each row. If the row meets the criteria (increasing order number for the same area), you increase the chain length by one. If it doesn't, you start a new chain:
; with numbered as
(
select row_number() over (order by area, eventtime) rn
, *
from Table1
)
, recurse as
(
select rn
, area
, OrderNumber
, 1 as ChainLength
from numbered
where rn = 1
union all
select cur.rn
, cur.area
, cur.OrderNumber
, case
when cur.area = prev.area
and cur.OrderNumber > prev.OrderNumber
then prev.ChainLength + 1
else 1
end
from recurse prev
join numbered cur
on prev.rn + 1 = cur.rn
)
select area
, max(ChainLength)
from recurse
group by
area
Live example at SQL Fiddle.
An alternative way is to use a query to find "breaks", that is, rows that end a sequence of increasing order numbers for the same area. The number of rows between breaks is the length.
; with numbered as
(
select row_number() over (order by area, eventtime) rn
, *
from Table1 t1
)
-- Select rows that break an increasing chain
, breaks as
(
select row_number() over (order by cur.rn) rn2
, cur.rn
, cur.Area
from numbered cur
left join
numbered prev
on cur.rn = prev.rn + 1
where cur.OrderNumber <= prev.OrderNumber
or cur.Area <> prev.Area
or prev.Area is null
)
-- Add a final break after the last row
, breaks2 as
(
select *
from breaks
union all
select count(*) + 1
, max(rn) + 1
, null
from breaks
)
select series_start.area
, max(series_end.rn - series_start.rn)
from breaks2 series_start
join breaks2 series_end
on series_end.rn2 = series_start.rn2 + 1
group by
series_start.area
Live example at SQL Fiddle.
You do not explain why RC's longest sequence does not include 1 while Fontana's does include 32. I take it, the 1 is excluded because it is a decrease: it comes after 32. The Fontana's 32, however, is the first ever item in the group, and I've got two ideas how to explain why it is considered an increase. That's either exactly because it's the group's first item or because it is also positive (i.e. as if coming after 0 and, therefore, an increase).
For the purpose of this answer, I'm assuming the latter, i.e. a group's first item is an increase if it is positive. The below script implements the following idea:
Enumerate the rows in every AREA group in the order of the eventtime column you nearly forgot to mention.
Join the enumerated set to itself to link every row with it's predecessor.
Get the sign of the difference between the row and its preceding value (defaulting the latter to 0). At this point the problem turns into a gaps-and-islands one.
Partition every AREA group by the signs determined in #3 and enumerate every subgroup's rows.
Find the difference between the row numbers from #1 and those found in #4. That would be a criterion to identify individual streaks (together with AREA).
Finally, group the results by AREA, the sign from #3 and the result from #5, count the rows and get the maximum count per AREA.
I implemented the above like this:
WITH enumerated AS (
SELECT
*,
row = ROW_NUMBER() OVER (PARTITION BY AREA ORDER BY eventtime)
FROM atable
),
signed AS (
SELECT
this.eventtime,
this.AREA,
this.row,
sgn = SIGN(this.OrderNumber - COALESCE(last.OrderNumber, 0))
FROM enumerated AS this
LEFT JOIN enumerated AS last
ON this.AREA = last.AREA
AND this.row = last.row + 1
),
partitioned AS (
SELECT
AREA,
sgn,
grp = row - ROW_NUMBER() OVER (PARTITION BY AREA, sgn ORDER BY eventtime)
FROM signed
)
SELECT DISTINCT
AREA,
LongestIncSeq = MAX(COUNT(*)) OVER (PARTITION BY AREA)
FROM partitioned
WHERE sgn = 1
GROUP BY
AREA,
grp
;
A SQL Fiddle demo can be found here.
You can do some math by ROW_NUMBER() to figure out where you have consecutive items.
Here's the code sample:
;WITH rownums AS
(
SELECT [area],
ROW_NUMBER() OVER(PARTITION BY [area] ORDER BY [ordernumber]) AS rid1,
ROW_NUMBER() OVER(PARTITION BY [area] ORDER BY [eventtime]) AS rid2
FROM SomeTable
),
differences AS
(
SELECT [area],
[calc] = rid1 - rid2
FROM rownums
),
summation AS
(
SELECT [area], [calc], COUNT(*) AS lengths
FROM differences
GROUP BY [area], [calc]
)
SELECT [area], MAX(lengths) AS LongestLength
FROM differences
JOIN summation
ON differences.[calc] = summation.[calc]
AND differences.area = calc.area
GROUP BY [area]
So if I do one set of row numbers ordered by my ordernumber and another set of row numbers by my event time, the difference between those two numbers will always be the same, so long as their order is the same.
You can then get a count grouped by those differences and then pull the largest count to get what you need.
EDIT: ...
Ignore the first edit, what I get for rushing.