SQL Pattern Length in New Column - sql

I have a (very very large) table of similar format to the following:
+--------+-------+
| id | value |
+--------+-------+
| 1 | 5 |
| 2 | 6 |
| 3 | 6 |
| 4 | 4 |
| 5 | 3 |
| 6 | 2 |
| 7 | 4 |
| 8 | 5 |
+--------+-------+
What I'd like to be able to do is return the pattern length of the value column increasing or decreasing in a third column (with pattern being negative for decreasing and positive for increasing), while ignoring IDs where there is no change. The pattern should reset to 1 or -1 when the pattern is broken.
I've not explained that well at all, so with the table above, ideally the result would be:
+--------+-------+---------+
| id | value | pattern |
+--------+-------+---------+
| 1 | 5 | 0/NULL |
| 2 | 6 | 1 |
| 3 | 6 | 1 |
| 4 | 4 | -1 |
| 5 | 3 | -2 |
| 6 | 2 | -3 |
| 7 | 4 | 1 |
| 8 | 5 | 2 |
+--------+-------+---------+
I did some research and came across pattern matching, but it turns out either the version of SQL I'm using (it's the version used by/on Amazon Redshift , which according to them is 'based on' PostgreSQL 8.0.2 http://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html)) doesn't support it, or I'm being very silly.
So, is this something that is even possible with SQL, and if so how should I go about it? Many thanks.

In SQL Server 2012, you can do this with lead() and lag() and cumulative sum.
Something that comes quite close is this:
select t.*, sum(nextinc) over (order by id) as pattern
from (select t.*,
(case when lead(t.value) > t.value then 1
when lead(t.value) = t.value then 0
else -1 end) as nextinc,
(case when lag(t.value) > t.value then 1 else 0 end) as previnc
from table t
) t;
However, the pattern goes up and down in increments of 1 instead of starting over. So, we need to find the pattern breaks. The following defines the breaks in the pattern and then increments pattern for for sequences of increasing/decreasing values:
select t.*,
sum(nextinc) over (partition by grp order by id) as pattern
from (select t.*,
sum(case when (prev_value <= value and value <= next_value) or
(prev_value >= value and value >= next_value)
then 0 else 1
end) over (order by id) as grp
from (select t.*, lead(t.value) over (order by id) as next_value,
lag(t.value) over (order by id) as prev_value,
(case when lead(t.value) over (order by id) > t.value then 1
when lead(t.value) over (order by id) = t.value then 0
else -1 end) as nextinc
from table t
) t
) t

For the given example, the following seems to do the job:
SELECT
S3.id
, S3.value
, S3.pattern
, SUM(minusNullPlus) OVER (PARTITION BY sequenceID ORDER BY id) calculated
FROM
(SELECT
S2.*
, SUM(newSequence) OVER (ORDER BY id) sequenceID
FROM
(SELECT
S1.*
, CASE
WHEN minusNullPlus = LAG(minusNullPlus, 1, NULL) OVER (ORDER BY id)
OR
minusNullPlus = 0
OR
(minusNullPlus = 1
AND
value - LAG(value, 1, NULL) OVER (ORDER BY id) = 1
)
OR
(minusNullPlus = -1
AND
value - LAG(value, 1, NULL) OVER (ORDER BY id) = -1
)
THEN 0
ELSE 1
END newSequence
FROM
(SELECT
id
, value
, CASE
WHEN value > LAG(value, 1, NULL) OVER (ORDER BY id) THEN 1
WHEN value < LAG(value, 1, NULL) OVER (ORDER BY id) THEN -1
WHEN value = LAG(value, 1, NULL) OVER (ORDER BY id) THEN 0
ELSE 0
END minusNullPlus
, CASE
WHEN value - LAG(value, 1, NULL) OVER (ORDER BY id) = 0 THEN 0
ELSE 1
END change
, pattern
FROM SomeTable
) S1
) S2
) S3
ORDER BY id
;
See it in action: SQL Fiddle
It uses some additional data to check against - please verify the respective patterns to be actually in line with your expectations/requirements.
NB: The suggested solution relies on some of the particularities of the provided sample data (and its expansion in above SQL Fiddle).
Please comment, if and as adjustment / further detail is required.

Related

query SQL table for the same data in column for 3 times in a row

I have a table
Id, Response
1, Yes
2, Yes
3, No
4, No
5, Yes
6, No
7, No
8, No
I would like to be able to query the table and check for the response of No and if it occurs 3 times in a row return a value.
So I am trying
select count(response) where response = no
order by id
Basically, the theory goes, if there are 3 responses of No, I want to trigger something else to happen. So I need to query the table each time an entry is made, and if the last 3 entries are no then return value.
I only want to know if the latest values are 3 no. for example if the last 4 entries were no, no, no, yes - I don't care as there is a yes value
so the last 3 values have to be no
I don't know which RDBMS you use, but you can try something like that:
select count(*)
from
(select id,
response
from your_table
order by id desc
limit 3) t
where t.response = 'No';
Here is a solution in Bigquery. You may need to tweak the syntax for you SQL base:
SELECT
* ,
SUM( CASE WHEN response ="No" THEN 1 ELSE 0 END )
OVER (ORDER BY id RANGE BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM dataset
It returns output like this:
Which I think is what you want.
The key part is the window functions using RANGE BETWEEN 2 PRECEDING AND CURRENT ROW. The case statement is checking if the current row and the 2 before are "No". If they are return a 1. So when three in a row occur this will SUM to 3.
I would use two lag()s:
select t.*
from (select t.*,
lag(id, 2) over (order by id) as prev2_id,
lag(id, 2) over (order by id) as prev2_id_response
from t
) t
where response = 'no' and prev2_id = prev2_id_response;
The first lag() determines the id "2 back". The second determines the id "2 back" for the same response. If the response is the same for those three rows, then these are the same.
This returns each occurrence of "no" where this occurs. You can use exists if you just want to know if this ever occurs.
This can be done with window functions and a derived table or CTE term. The following takes you through how it can be done, step by step:
Full Example with data
WITH cte1 AS (
SELECT x.*
, CASE WHEN COALESCE(LAG(response) OVER (ORDER BY id), 'NA') <> response THEN 1 ELSE 0 END AS edge
FROM xlogs AS x
)
, cte2 AS (
SELECT x.*
, SUM(edge) OVER (ORDER BY id) AS xgroup
FROM cte1 AS x
)
, cte3 AS (
SELECT x.*
, ROW_NUMBER() OVER (PARTITION BY xgroup ORDER BY id) AS xposition
FROM cte2 AS x
)
, cte4 AS (
SELECT x.*
, CASE WHEN xposition >= 3 AND response = 'No' THEN 1 END AS xtrigger
FROM cte3 AS x
)
, cte5 AS (
SELECT x.*
FROM cte4 AS x
ORDER BY id DESC
LIMIT 1
)
SELECT *
FROM cte5
WHERE response = 'No'
;
The result of cte4 provides useful detail about the logic:
+----+----------+------+--------+-----------+----------+
| id | response | edge | xgroup | xposition | xtrigger |
+----+----------+------+--------+-----------+----------+
| 1 | Yes | 1 | 1 | 1 | NULL |
| 2 | Yes | 0 | 1 | 2 | NULL |
| 3 | No | 1 | 2 | 1 | NULL |
| 4 | No | 0 | 2 | 2 | NULL |
| 5 | Yes | 1 | 3 | 1 | NULL |
| 6 | No | 1 | 4 | 1 | NULL |
| 7 | No | 0 | 4 | 2 | NULL |
| 8 | No | 0 | 4 | 3 | 1 |
+----+----------+------+--------+-----------+----------+

Dynamic update in SQL based on previous value

I am trying to explore the dynamic update.
My actual source table is:
Expected result of the source table after update :
The query i tried :
WITH t AS
(
SELECT key,
Begin_POS,
Length,
(Begin_POS+ Length) as res
from tab
)
SELECT src_column_id,
Length,res,
COALESCE(Length + lag(res) OVER (ORDER BY src_column_id),1) AS PRE_VS
from t
Can you assist what should be my approach like ?
I think that’s a window sum:
select
t.*,
1 + coalesce(
sum(length) over(
order by key
rows between unbounded preceding and 1 preceding
),
0
) new_begin_pos
from mytable t
You can use SUM() window function like this:
select
[key],
sum(length) over (order by [key]) - length + begin_pos begin_pos,
length
from tab
If you want to update the table:
with cte as (
select *, sum(length) over (order by [key]) - length + begin_pos new_begin_pos
from tab
)
update cte
set begin_pos = new_begin_pos
See the demo.
Results:
> key | begin_pos | length
> --: | --------: | -----:
> 1 | 1 | 1
> 2 | 2 | 9
> 3 | 11 | 3
> 4 | 14 | 7
> 5 | 21 | 3
> 6 | 24 | 6
> 7 | 30 | 16

Complex SQL query by pattern with timestamps

I have the following info in my SQLite database:
ID | timestamp | val
1 | 1577644027 | 0
2 | 1577644028 | 0
3 | 1577644029 | 1
4 | 1577644030 | 1
5 | 1577644031 | 2
6 | 1577644032 | 2
7 | 1577644033 | 3
8 | 1577644034 | 2
9 | 1577644035 | 1
10 | 1577644036 | 0
11 | 1577644037 | 1
12 | 1577644038 | 1
13 | 1577644039 | 1
14 | 1577644040 | 0
I want to perform a query that returns the elements that compose an episode. An episode is a set of ordered registers that comply the following requirements:
The first element is greater than zero.
The previous element of the first one is zero.
The last element is greater than zero.
The next element of the last one is zero.
The expected result of the query on this example would be something like this:
[
[{"id":3, tmstamp:1577644029, value:1}
{"id":4, tmstamp:1577644030, value:1}
{"id":5, tmstamp:1577644031, value:2}
{"id":6, tmstamp:1577644032, value:2}
{"id":7, tmstamp:1577644033, value:3}
{"id":8, tmstamp:1577644034, value:2}
{"id":9, tmstamp:1577644035, value:1}],
[{"id":11, tmstamp:1577644037, value:1}
{"id":12, tmstamp:1577644038, value:1}
{"id":13, tmstamp:1577644039, value:1}]
]
Currently, I am avoiding this query and I am using an auxiliary table to store the initial and end timestamp of episodes, but this is only because I do not know how to perform this query.
Threfore, my question is quite straightforward: does anyone know how can I perform this query in order to obtain something similar to the stated ouput?
This answer assumes that the "before" and "after" conditions are not really important. That is, an episode can be the first row in the table.
You can identify the episodes by counting the number of 0s before each row. Then filter out the 0 values:
select t.*,
dense_rank() over (order by grp) as episode
from (select t.*,
sum(case when val = 0 then 1 else 0 end) over (order by timestamp) as grp
from t
) t
where val <> 0;
If this is not the case, then lag() and lead() and a cumulative sum can handle the previous value being 0:
select t.*,
sum(case when prev_val = 0 and val > 0 then 1 else 0 end) over (order by timestamp) as episode
from (select t.*,
lag(val) over (order by timestamp) as prev_val,
lead(val) over (order by timestamp) as next_val
from t
) t
where val <> 0;
If you want the result as JSON objects then you must use the JSON1 Extension functions of SQLite:
with cte as (
select *, sum(val = 0) over (order by timestamp) grp
from tablename
)
select
json_group_array(
json_object('id', id, 'timestamp', timestamp, 'val', val)
) result
from cte
where val > 0
group by grp
See the demo.
Results:
| result |
| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [{"id":3,"timestamp":1577644029,"val":1},{"id":4,"timestamp":1577644030,"val":1},{"id":5,"timestamp":1577644031,"val":2},{"id":6,"timestamp":1577644032,"val":2},{"id":7,"timestamp":1577644033,"val":3},{"id":8,"timestamp":1577644034,"val":2},{"id":9,"timestamp":1577644035,"val":1}] |
| [{"id":11,"timestamp":1577644037,"val":1},{"id":12,"timestamp":1577644038,"val":1},{"id":13,"timestamp":1577644039,"val":1}] |

If the difference between two sequences is bigger than 30, deduct bigger sequence

I'm having a hard time trying to make a query that gets a lot of numbers, a sequence of numbers, and if the difference between two of them is bigger than 30, then the sequence resets from this number. So, I have the following table, which has another column other than the number one, which should be maintained intact:
+----+--------+--------+
| Id | Number | Status |
+----+--------+--------+
| 1 | 1 | OK |
| 2 | 1 | Failed |
| 3 | 2 | Failed |
| 4 | 3 | OK |
| 5 | 4 | OK |
| 6 | 36 | Failed |
| 7 | 39 | OK |
| 8 | 47 | OK |
| 9 | 80 | Failed |
| 10 | 110 | Failed |
| 11 | 111 | OK |
| 12 | 150 | Failed |
| 13 | 165 | OK |
+----+--------+--------+
It should turn it into this one:
+----+--------+--------+
| Id | Number | Status |
+----+--------+--------+
| 1 | 1 | OK |
| 2 | 1 | Failed |
| 3 | 2 | Failed |
| 4 | 3 | OK |
| 5 | 4 | OK |
| 6 | 1 | Failed |
| 7 | 4 | OK |
| 8 | 12 | OK |
| 9 | 1 | Failed |
| 10 | 1 | Failed |
| 11 | 2 | OK |
| 12 | 1 | Failed |
| 13 | 16 | OK |
+----+--------+--------+
Thanks for your attention, I will be available to clear any doubt regarding my problem! :)
EDIT: Sample of this table here: http://sqlfiddle.com/#!6/ded5af
With this test case:
declare #data table (id int identity, Number int, Status varchar(20));
insert #data(number, status) values
( 1,'OK')
,( 1,'Failed')
,( 2,'Failed')
,( 3,'OK')
,( 4,'OK')
,( 4,'OK') -- to be deleted, ensures IDs are not sequential
,(36,'Failed') -- to be deleted, ensures IDs are not sequential
,(36,'Failed')
,(39,'OK')
,(47,'OK')
,(80,'Failed')
,(110,'Failed')
,(111,'OK')
,(150,'Failed')
,(165,'OK')
;
delete #data where id between 6 and 7;
This SQL:
with renumbered as (
select rn = row_number() over (order by id), data.*
from #data data
),
paired as (
select
this.*,
startNewGroup = case when this.number - prev.number >= 30
or prev.id is null then 1 else 0 end
from renumbered this
left join renumbered prev on prev.rn = this.rn -1
),
groups as (
select Id,Number, GroupNo = Number from paired where startNewGroup = 1
)
select
Id
,Number = 1 + Number - (
select top 1 GroupNo
from groups where groups.id <= paired.id
order by GroupNo desc)
,status
from paired
;
yields as desired:
Id Number status
----------- ----------- --------------------
1 1 OK
2 1 Failed
3 2 Failed
4 3 OK
5 4 OK
8 1 Failed
9 4 OK
10 12 OK
11 1 Failed
12 1 Failed
13 2 OK
14 1 Failed
15 16 OK
Update: using the new LAG() function allows somewhat simpler SQL without a self-join early on:
with renumbered as (
select
data.*
,gap = number - lag(number, 1) over (order by number)
from #data data
),
paired as (
select
*,
startNewGroup = case when gap >= 30 or gap is null then 1 else 0 end
from renumbered
),
groups as (
select Id,Number, GroupNo = Number from paired where startNewGroup = 1
)
select
Id
,Number = 1 + Number - ( select top 1 GroupNo
from groups
where groups.id <= paired.id
order by GroupNo desc
)
,status
from paired
;
I don't deserve answer but I think this is even shorter
with gapped as
( select id, number, gap = number - lag(number, 1) over (order by id)
from #data data
),
select Id, status
ReNumber = Number + 1 - isnull( (select top 1 gapped.Number
from gapped
where gapped.id <= data.id
and gap >= 30
order by gapped.id desc), 1)
from #data data;
This is simply Pieter Geerkens's answer slightly simplified. I removed some intermediate results and columns:
with renumbered as (
select data.*, gap = number - lag(number, 1) over (order by number)
from #data data
),
paired as (
select *
from renumbered
where gap >= 30 or gap is null
)
select Id, Number = 1 + Number - (select top 1 Number
from paired
where paired.id <= renumbered.id
order by Number desc)
, status
from renumbered;
It should have been a comment, but it's too long for that and wouldn't be understandable.
You might need to make another cte before this and use row_number instead of ID to join the recursive cte, if your ID's are not in sequential order
WITH cte AS
( SELECT
Id, [Number], [Status],
0 AS Diff,
[Number] AS [NewNumber]
FROM
Table1
WHERE Id = 1
UNION ALL
SELECT
t1.Id, t1.[Number], t1.[Status],
CASE WHEN t1.[Number] - cte.[Number] >= 30 THEN t1.Number - 1 ELSE Diff END,
CASE WHEN t1.[Number] - cte.[Number] >= 30 THEN 1 ELSE t1.[Number] - Diff END
FROM Table1 t1
JOIN cte ON cte.Id + 1 = t1.Id
)
SELECT Id, [NewNumber], [Status]
FROM cte
SQL Fiddle
Here is another SQL Fiddle with an example of what you would do if the ID is not sequential..
SQL Fiddle 2
In case sql fiddle stops working
--Order table to make sure there is a sequence to follow
WITH OrderedSequence AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY Id) RnId,
Id,
[Number],
[Status]
FROM
Sequence
),
RecursiveCte AS
( SELECT
Id, [Number], [Status],
0 AS Diff,
[Number] AS [NewNumber],
RnId
FROM
OrderedSequence
WHERE Id = 1
UNION ALL
SELECT
t1.Id, t1.[Number], t1.[Status],
CASE WHEN t1.[Number] - cte.[Number] >= 30 THEN t1.Number - 1 ELSE Diff END,
CASE WHEN t1.[Number] - cte.[Number] >= 30 THEN 1 ELSE t1.[Number] - Diff END,
t1.RnId
FROM OrderedSequence t1
JOIN RecursiveCte cte ON cte.RnId + 1 = t1.RnId
)
SELECT Id, [NewNumber], [Status]
FROM RecursiveCte
I tried to optimize the queries here, since it took 1h20m to process my data. I had it down to 30s after some further research.
WITH AuxTable AS
( SELECT
id,
number,
status,
relevantId = CASE WHEN
number = 1 OR
((number - LAG(number, 1) OVER (ORDER BY id)) > 29)
THEN id
ELSE NULL
END,
deduct = CASE WHEN
((number - LAG(number, 1) OVER (ORDER BY id)) > 29)
THEN number - 1
ELSE 0
END
FROM #data data
)
,AuxTable2 AS
(
SELECT
id,
number,
status,
AT.deduct,
MAX(AT.relevantId) OVER (ORDER BY AT.id ROWS UNBOUNDED PRECEDING ) AS lastRelevantId
FROM AuxTable AT
)
SELECT
id,
number,
status,
number - MAX(deduct) OVER(PARTITION BY lastRelevantId ORDER BY id ROWS UNBOUNDED PRECEDING ) AS ReNumber,
FROM AuxTable2
I think this runs faster, but it's not shorter.

Grouping SQL Results based on order

I have table with data something like this:
ID | RowNumber | Data
------------------------------
1 | 1 | Data
2 | 2 | Data
3 | 3 | Data
4 | 1 | Data
5 | 2 | Data
6 | 1 | Data
7 | 2 | Data
8 | 3 | Data
9 | 4 | Data
I want to group each set of RowNumbers So that my result is something like this:
ID | RowNumber | Group | Data
--------------------------------------
1 | 1 | a | Data
2 | 2 | a | Data
3 | 3 | a | Data
4 | 1 | b | Data
5 | 2 | b | Data
6 | 1 | c | Data
7 | 2 | c | Data
8 | 3 | c | Data
9 | 4 | c | Data
The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.
Additional Info
ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.
For the clarified requirements in the comments
The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they
could be "1,2,4,6" ... a higher number followed by a lower would be a
new group.
A SQL Server 2012 solution could be as follows.
Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.
Code
WITH T1 AS
(
SELECT *,
LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
RowNumber,
Data,
SUM(NewGroup) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM T2
SQL Fiddle
Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.
If the ids are truly sequential, you can do:
select t.*,
(id - rowNumber) as grp
from t
Also you can use recursive CTE
;WITH cte AS
(
SELECT ID, RowNumber, Data, 1 AS [Group]
FROM dbo.test1
WHERE ID = 1
UNION ALL
SELECT t.ID, t.RowNumber, t.Data,
CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
)
SELECT *
FROM cte
Demo on SQLFiddle
How about:
select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
from [Your Table] t
) t
order by ID
This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.