If the difference between two sequences is bigger than 30, deduct bigger sequence - sql

I'm having a hard time trying to make a query that gets a lot of numbers, a sequence of numbers, and if the difference between two of them is bigger than 30, then the sequence resets from this number. So, I have the following table, which has another column other than the number one, which should be maintained intact:
+----+--------+--------+
| Id | Number | Status |
+----+--------+--------+
| 1 | 1 | OK |
| 2 | 1 | Failed |
| 3 | 2 | Failed |
| 4 | 3 | OK |
| 5 | 4 | OK |
| 6 | 36 | Failed |
| 7 | 39 | OK |
| 8 | 47 | OK |
| 9 | 80 | Failed |
| 10 | 110 | Failed |
| 11 | 111 | OK |
| 12 | 150 | Failed |
| 13 | 165 | OK |
+----+--------+--------+
It should turn it into this one:
+----+--------+--------+
| Id | Number | Status |
+----+--------+--------+
| 1 | 1 | OK |
| 2 | 1 | Failed |
| 3 | 2 | Failed |
| 4 | 3 | OK |
| 5 | 4 | OK |
| 6 | 1 | Failed |
| 7 | 4 | OK |
| 8 | 12 | OK |
| 9 | 1 | Failed |
| 10 | 1 | Failed |
| 11 | 2 | OK |
| 12 | 1 | Failed |
| 13 | 16 | OK |
+----+--------+--------+
Thanks for your attention, I will be available to clear any doubt regarding my problem! :)
EDIT: Sample of this table here: http://sqlfiddle.com/#!6/ded5af

With this test case:
declare #data table (id int identity, Number int, Status varchar(20));
insert #data(number, status) values
( 1,'OK')
,( 1,'Failed')
,( 2,'Failed')
,( 3,'OK')
,( 4,'OK')
,( 4,'OK') -- to be deleted, ensures IDs are not sequential
,(36,'Failed') -- to be deleted, ensures IDs are not sequential
,(36,'Failed')
,(39,'OK')
,(47,'OK')
,(80,'Failed')
,(110,'Failed')
,(111,'OK')
,(150,'Failed')
,(165,'OK')
;
delete #data where id between 6 and 7;
This SQL:
with renumbered as (
select rn = row_number() over (order by id), data.*
from #data data
),
paired as (
select
this.*,
startNewGroup = case when this.number - prev.number >= 30
or prev.id is null then 1 else 0 end
from renumbered this
left join renumbered prev on prev.rn = this.rn -1
),
groups as (
select Id,Number, GroupNo = Number from paired where startNewGroup = 1
)
select
Id
,Number = 1 + Number - (
select top 1 GroupNo
from groups where groups.id <= paired.id
order by GroupNo desc)
,status
from paired
;
yields as desired:
Id Number status
----------- ----------- --------------------
1 1 OK
2 1 Failed
3 2 Failed
4 3 OK
5 4 OK
8 1 Failed
9 4 OK
10 12 OK
11 1 Failed
12 1 Failed
13 2 OK
14 1 Failed
15 16 OK
Update: using the new LAG() function allows somewhat simpler SQL without a self-join early on:
with renumbered as (
select
data.*
,gap = number - lag(number, 1) over (order by number)
from #data data
),
paired as (
select
*,
startNewGroup = case when gap >= 30 or gap is null then 1 else 0 end
from renumbered
),
groups as (
select Id,Number, GroupNo = Number from paired where startNewGroup = 1
)
select
Id
,Number = 1 + Number - ( select top 1 GroupNo
from groups
where groups.id <= paired.id
order by GroupNo desc
)
,status
from paired
;

I don't deserve answer but I think this is even shorter
with gapped as
( select id, number, gap = number - lag(number, 1) over (order by id)
from #data data
),
select Id, status
ReNumber = Number + 1 - isnull( (select top 1 gapped.Number
from gapped
where gapped.id <= data.id
and gap >= 30
order by gapped.id desc), 1)
from #data data;

This is simply Pieter Geerkens's answer slightly simplified. I removed some intermediate results and columns:
with renumbered as (
select data.*, gap = number - lag(number, 1) over (order by number)
from #data data
),
paired as (
select *
from renumbered
where gap >= 30 or gap is null
)
select Id, Number = 1 + Number - (select top 1 Number
from paired
where paired.id <= renumbered.id
order by Number desc)
, status
from renumbered;
It should have been a comment, but it's too long for that and wouldn't be understandable.

You might need to make another cte before this and use row_number instead of ID to join the recursive cte, if your ID's are not in sequential order
WITH cte AS
( SELECT
Id, [Number], [Status],
0 AS Diff,
[Number] AS [NewNumber]
FROM
Table1
WHERE Id = 1
UNION ALL
SELECT
t1.Id, t1.[Number], t1.[Status],
CASE WHEN t1.[Number] - cte.[Number] >= 30 THEN t1.Number - 1 ELSE Diff END,
CASE WHEN t1.[Number] - cte.[Number] >= 30 THEN 1 ELSE t1.[Number] - Diff END
FROM Table1 t1
JOIN cte ON cte.Id + 1 = t1.Id
)
SELECT Id, [NewNumber], [Status]
FROM cte
SQL Fiddle
Here is another SQL Fiddle with an example of what you would do if the ID is not sequential..
SQL Fiddle 2
In case sql fiddle stops working
--Order table to make sure there is a sequence to follow
WITH OrderedSequence AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY Id) RnId,
Id,
[Number],
[Status]
FROM
Sequence
),
RecursiveCte AS
( SELECT
Id, [Number], [Status],
0 AS Diff,
[Number] AS [NewNumber],
RnId
FROM
OrderedSequence
WHERE Id = 1
UNION ALL
SELECT
t1.Id, t1.[Number], t1.[Status],
CASE WHEN t1.[Number] - cte.[Number] >= 30 THEN t1.Number - 1 ELSE Diff END,
CASE WHEN t1.[Number] - cte.[Number] >= 30 THEN 1 ELSE t1.[Number] - Diff END,
t1.RnId
FROM OrderedSequence t1
JOIN RecursiveCte cte ON cte.RnId + 1 = t1.RnId
)
SELECT Id, [NewNumber], [Status]
FROM RecursiveCte

I tried to optimize the queries here, since it took 1h20m to process my data. I had it down to 30s after some further research.
WITH AuxTable AS
( SELECT
id,
number,
status,
relevantId = CASE WHEN
number = 1 OR
((number - LAG(number, 1) OVER (ORDER BY id)) > 29)
THEN id
ELSE NULL
END,
deduct = CASE WHEN
((number - LAG(number, 1) OVER (ORDER BY id)) > 29)
THEN number - 1
ELSE 0
END
FROM #data data
)
,AuxTable2 AS
(
SELECT
id,
number,
status,
AT.deduct,
MAX(AT.relevantId) OVER (ORDER BY AT.id ROWS UNBOUNDED PRECEDING ) AS lastRelevantId
FROM AuxTable AT
)
SELECT
id,
number,
status,
number - MAX(deduct) OVER(PARTITION BY lastRelevantId ORDER BY id ROWS UNBOUNDED PRECEDING ) AS ReNumber,
FROM AuxTable2
I think this runs faster, but it's not shorter.

Related

query SQL table for the same data in column for 3 times in a row

I have a table
Id, Response
1, Yes
2, Yes
3, No
4, No
5, Yes
6, No
7, No
8, No
I would like to be able to query the table and check for the response of No and if it occurs 3 times in a row return a value.
So I am trying
select count(response) where response = no
order by id
Basically, the theory goes, if there are 3 responses of No, I want to trigger something else to happen. So I need to query the table each time an entry is made, and if the last 3 entries are no then return value.
I only want to know if the latest values are 3 no. for example if the last 4 entries were no, no, no, yes - I don't care as there is a yes value
so the last 3 values have to be no
I don't know which RDBMS you use, but you can try something like that:
select count(*)
from
(select id,
response
from your_table
order by id desc
limit 3) t
where t.response = 'No';
Here is a solution in Bigquery. You may need to tweak the syntax for you SQL base:
SELECT
* ,
SUM( CASE WHEN response ="No" THEN 1 ELSE 0 END )
OVER (ORDER BY id RANGE BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM dataset
It returns output like this:
Which I think is what you want.
The key part is the window functions using RANGE BETWEEN 2 PRECEDING AND CURRENT ROW. The case statement is checking if the current row and the 2 before are "No". If they are return a 1. So when three in a row occur this will SUM to 3.
I would use two lag()s:
select t.*
from (select t.*,
lag(id, 2) over (order by id) as prev2_id,
lag(id, 2) over (order by id) as prev2_id_response
from t
) t
where response = 'no' and prev2_id = prev2_id_response;
The first lag() determines the id "2 back". The second determines the id "2 back" for the same response. If the response is the same for those three rows, then these are the same.
This returns each occurrence of "no" where this occurs. You can use exists if you just want to know if this ever occurs.
This can be done with window functions and a derived table or CTE term. The following takes you through how it can be done, step by step:
Full Example with data
WITH cte1 AS (
SELECT x.*
, CASE WHEN COALESCE(LAG(response) OVER (ORDER BY id), 'NA') <> response THEN 1 ELSE 0 END AS edge
FROM xlogs AS x
)
, cte2 AS (
SELECT x.*
, SUM(edge) OVER (ORDER BY id) AS xgroup
FROM cte1 AS x
)
, cte3 AS (
SELECT x.*
, ROW_NUMBER() OVER (PARTITION BY xgroup ORDER BY id) AS xposition
FROM cte2 AS x
)
, cte4 AS (
SELECT x.*
, CASE WHEN xposition >= 3 AND response = 'No' THEN 1 END AS xtrigger
FROM cte3 AS x
)
, cte5 AS (
SELECT x.*
FROM cte4 AS x
ORDER BY id DESC
LIMIT 1
)
SELECT *
FROM cte5
WHERE response = 'No'
;
The result of cte4 provides useful detail about the logic:
+----+----------+------+--------+-----------+----------+
| id | response | edge | xgroup | xposition | xtrigger |
+----+----------+------+--------+-----------+----------+
| 1 | Yes | 1 | 1 | 1 | NULL |
| 2 | Yes | 0 | 1 | 2 | NULL |
| 3 | No | 1 | 2 | 1 | NULL |
| 4 | No | 0 | 2 | 2 | NULL |
| 5 | Yes | 1 | 3 | 1 | NULL |
| 6 | No | 1 | 4 | 1 | NULL |
| 7 | No | 0 | 4 | 2 | NULL |
| 8 | No | 0 | 4 | 3 | 1 |
+----+----------+------+--------+-----------+----------+

How to make sure the sql result is continued range?

I have table like:
id | low_number | high_number
-------------------------------
1 | 12 | 32
-------------------------------
2 | 13 | 33
-------------------------------
3 | 15 | 36
-------------------------------
4 | 33 | 50
-------------------------------
5 | 35 | 52
...
-------------------------------
17 | 52 | 80
I want to get result like:
id | low_number | high_number
-------------------------------
1 | 12 | 32
-------------------------------
4 | 33 | 50
-------------------------------
17 | 52 | 80
that is because the low_number bigger than the pervious row high_number.
How to write sql to get these result? I use postgresql
This seems like a recursive CTE problem. You want to choose the first row (by id) and then choose the next row based on that.
The idea is to cycle through the rows, one at a time. Then when the condition is met, transition to that row. And so on.
As a query, this looks like:
with recursive tt as (
select id, low_number, high_number, row_number() over (order by id) as seqnum
from t
),
cte as (
select id, low_number, high_number, seqnum, true as is_change, id as grouping_id
from tt
where seqnum = 1
union all
select tt.id, tt.low_number, tt.high_number, tt.seqnum, tt.low_number > t.high_number,
(case when tt.low_number > t.high_number then tt.id else cte.grouping_id end)
from cte join
t
on cte.grouping_id = t.id join
tt
on tt.seqnum = cte.seqnum + 1
)
select *
from cte
where is_change;
Here is a db<>fiddle.
Use the window function LAG() to get a value of a previous row, e.g.
WITH j AS (
SELECT
id,low_number,high_number,
LAG(high_number) OVER (ORDER BY id) AS prev_high_number
FROM t)
SELECT id,low_number,high_number FROM j
WHERE low_number > prev_high_number OR prev_high_number IS NULL;
Demo: db<>fiddle

SQL How to filter table with values having more than one unique value of another column

I have data table Customers that looks like this:
ID | Sequence No |
1 | 1 |
1 | 2 |
1 | 3 |
2 | 1 |
2 | 1 |
2 | 1 |
3 | 1 |
3 | 2 |
I would like to filter the table so that only IDs with more than 1 distinct count of Sequence No remain.
Expected output:
ID | Sequence No |
1 | 1 |
1 | 2 |
1 | 3 |
3 | 1 |
3 | 2 |
I tried
select ID, Sequence No
from Customers
where count(distinct Sequence No) > 1
order by ID
but I'm getting error. How to solve this?
You can get the desired result by using the below query. This is similar to what you were trying -
Sample Table & Data
Declare #Data table
(Id int, [Sequence No] int)
Insert into #Data
values
(1 , 1 ),
(1 , 2 ),
(1 , 3 ),
(2 , 1 ),
(2 , 1 ),
(2 , 1 ),
(3 , 1 ),
(3 , 2 )
Query
Select * from #Data
where ID in(
select ID
from #Data
Group by ID
Having count(distinct [Sequence No]) > 1
)
Using analytic functions, we can try:
WITH cte AS (
SELECT *, MIN([Sequence No]) OVER (PARTITION BY ID) min_seq,
MAX([Sequence No]) OVER (PARTITION BY ID) max_seq
FROM Customers
)
SELECT ID, [Sequence No]
FROM cte
WHERE min_seq <> max_seq
ORDER BY ID, [Sequence No];
Demo
We are checking for a distinct count of sequence number by asserting that the minimum and maximum sequence numbers are not the same for a given ID. The above query could benefit from the following index:
CREATE INDEX idx ON Customers (ID, [Sequence No]);
This would let the min and max values be looked up faster.

Grouping SQL Results based on order

I have table with data something like this:
ID | RowNumber | Data
------------------------------
1 | 1 | Data
2 | 2 | Data
3 | 3 | Data
4 | 1 | Data
5 | 2 | Data
6 | 1 | Data
7 | 2 | Data
8 | 3 | Data
9 | 4 | Data
I want to group each set of RowNumbers So that my result is something like this:
ID | RowNumber | Group | Data
--------------------------------------
1 | 1 | a | Data
2 | 2 | a | Data
3 | 3 | a | Data
4 | 1 | b | Data
5 | 2 | b | Data
6 | 1 | c | Data
7 | 2 | c | Data
8 | 3 | c | Data
9 | 4 | c | Data
The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.
Additional Info
ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.
For the clarified requirements in the comments
The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they
could be "1,2,4,6" ... a higher number followed by a lower would be a
new group.
A SQL Server 2012 solution could be as follows.
Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.
Code
WITH T1 AS
(
SELECT *,
LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
RowNumber,
Data,
SUM(NewGroup) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM T2
SQL Fiddle
Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.
If the ids are truly sequential, you can do:
select t.*,
(id - rowNumber) as grp
from t
Also you can use recursive CTE
;WITH cte AS
(
SELECT ID, RowNumber, Data, 1 AS [Group]
FROM dbo.test1
WHERE ID = 1
UNION ALL
SELECT t.ID, t.RowNumber, t.Data,
CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
)
SELECT *
FROM cte
Demo on SQLFiddle
How about:
select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
from [Your Table] t
) t
order by ID
This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.

How can I calculate the remaining amount per row?

I have a table that I want to find for each row id the amount remaining from the total. However, the order of amounts is in an ascending order.
id amount
1 3
2 2
3 1
4 5
The results should look like this:
id remainder
1 10
2 8
3 5
4 0
Any thoughts on how to accomplish this? I'm guessing that the over clause is the way to go, but I can't quite piece it together.Thanks.
Since you didn't specify your RDBMS, I will just assume it's Postgresql ;-)
select *, sum(amount) over() - sum(amount) over(order by amount) as remainder
from tbl;
Output:
| ID | AMOUNT | REMAINDER |
---------------------------
| 3 | 1 | 10 |
| 2 | 2 | 8 |
| 1 | 3 | 5 |
| 4 | 5 | 0 |
How it works: http://www.sqlfiddle.com/#!1/c446a/5
It works in SQL Server 2012 too: http://www.sqlfiddle.com/#!6/c446a/1
Thinking of solution for SQL Server 2008...
Btw, is your ID just a mere row number? If it is, just do this:
select
row_number() over(order by amount) as rn
, sum(amount) over() - sum(amount) over(order by amount) as remainder
from tbl
order by rn;
Output:
| RN | REMAINDER |
------------------
| 1 | 10 |
| 2 | 8 |
| 3 | 5 |
| 4 | 0 |
But if you really need the ID intact and move the smallest amount on top, do this:
with a as
(
select *, sum(amount) over() - sum(amount) over(order by amount) as remainder,
row_number() over(order by id) as id_sort,
row_number() over(order by amount) as amount_sort
from tbl
)
select a.id, sort.remainder
from a
join a sort on sort.amount_sort = a.id_sort
order by a.id_sort;
Output:
| ID | REMAINDER |
------------------
| 1 | 10 |
| 2 | 8 |
| 3 | 5 |
| 4 | 0 |
See query progression here: http://www.sqlfiddle.com/#!6/c446a/11
I just want to offer a simpler way to do this in descending order:
select id, sum(amount) over (order by id desc) as Remainder
from t
This will work in Oracle, SQL Server 2012, and Postgres.
The general solution requres a self join:
select t.id, coalesce(sum(tafter.amount), 0) as Remainder
from t left outer join
t tafter
on t.id < tafter.id
group by t.id
SQL Server 2008 answer, I can't provide an SQL Fiddle, it seems it strips the begin keyword, resulting to syntax errors. I tested this on my machine though:
create function RunningTotalGuarded()
returns #ReturnTable table(
Id int,
Amount int not null,
RunningTotal int not null,
RN int identity(1,1) not null primary key clustered
)
as
begin
insert into #ReturnTable(id, amount, RunningTotal)
select id, amount, 0 from tbl order by amount;
declare #RunningTotal numeric(16,4) = 0;
declare #rn_check int = 0;
update #ReturnTable
set
#rn_check = #rn_check + 1
,#RunningTotal =
case when rn = #rn_check then
#RunningTotal + Amount
else
1 / 0
end
,RunningTotal = #RunningTotal;
return;
end;
To achieve your desired output:
with a as
(
select *, sum(amount) over() - RunningTotal as remainder
, row_number() over(order by id) as id_order
from RunningTotalGuarded()
)
select a.id, amount_order.remainder
from a
inner join a amount_order on amount_order.rn = a.id_order;
Rationale for guarded running total: http://www.ienablemuch.com/2012/05/recursive-cte-is-evil-and-cursor-is.html
Choose the lesser evil ;-)