Complex SQL query by pattern with timestamps - sql

I have the following info in my SQLite database:
ID | timestamp | val
1 | 1577644027 | 0
2 | 1577644028 | 0
3 | 1577644029 | 1
4 | 1577644030 | 1
5 | 1577644031 | 2
6 | 1577644032 | 2
7 | 1577644033 | 3
8 | 1577644034 | 2
9 | 1577644035 | 1
10 | 1577644036 | 0
11 | 1577644037 | 1
12 | 1577644038 | 1
13 | 1577644039 | 1
14 | 1577644040 | 0
I want to perform a query that returns the elements that compose an episode. An episode is a set of ordered registers that comply the following requirements:
The first element is greater than zero.
The previous element of the first one is zero.
The last element is greater than zero.
The next element of the last one is zero.
The expected result of the query on this example would be something like this:
[
[{"id":3, tmstamp:1577644029, value:1}
{"id":4, tmstamp:1577644030, value:1}
{"id":5, tmstamp:1577644031, value:2}
{"id":6, tmstamp:1577644032, value:2}
{"id":7, tmstamp:1577644033, value:3}
{"id":8, tmstamp:1577644034, value:2}
{"id":9, tmstamp:1577644035, value:1}],
[{"id":11, tmstamp:1577644037, value:1}
{"id":12, tmstamp:1577644038, value:1}
{"id":13, tmstamp:1577644039, value:1}]
]
Currently, I am avoiding this query and I am using an auxiliary table to store the initial and end timestamp of episodes, but this is only because I do not know how to perform this query.
Threfore, my question is quite straightforward: does anyone know how can I perform this query in order to obtain something similar to the stated ouput?

This answer assumes that the "before" and "after" conditions are not really important. That is, an episode can be the first row in the table.
You can identify the episodes by counting the number of 0s before each row. Then filter out the 0 values:
select t.*,
dense_rank() over (order by grp) as episode
from (select t.*,
sum(case when val = 0 then 1 else 0 end) over (order by timestamp) as grp
from t
) t
where val <> 0;
If this is not the case, then lag() and lead() and a cumulative sum can handle the previous value being 0:
select t.*,
sum(case when prev_val = 0 and val > 0 then 1 else 0 end) over (order by timestamp) as episode
from (select t.*,
lag(val) over (order by timestamp) as prev_val,
lead(val) over (order by timestamp) as next_val
from t
) t
where val <> 0;

If you want the result as JSON objects then you must use the JSON1 Extension functions of SQLite:
with cte as (
select *, sum(val = 0) over (order by timestamp) grp
from tablename
)
select
json_group_array(
json_object('id', id, 'timestamp', timestamp, 'val', val)
) result
from cte
where val > 0
group by grp
See the demo.
Results:
| result |
| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [{"id":3,"timestamp":1577644029,"val":1},{"id":4,"timestamp":1577644030,"val":1},{"id":5,"timestamp":1577644031,"val":2},{"id":6,"timestamp":1577644032,"val":2},{"id":7,"timestamp":1577644033,"val":3},{"id":8,"timestamp":1577644034,"val":2},{"id":9,"timestamp":1577644035,"val":1}] |
| [{"id":11,"timestamp":1577644037,"val":1},{"id":12,"timestamp":1577644038,"val":1},{"id":13,"timestamp":1577644039,"val":1}] |

Related

query SQL table for the same data in column for 3 times in a row

I have a table
Id, Response
1, Yes
2, Yes
3, No
4, No
5, Yes
6, No
7, No
8, No
I would like to be able to query the table and check for the response of No and if it occurs 3 times in a row return a value.
So I am trying
select count(response) where response = no
order by id
Basically, the theory goes, if there are 3 responses of No, I want to trigger something else to happen. So I need to query the table each time an entry is made, and if the last 3 entries are no then return value.
I only want to know if the latest values are 3 no. for example if the last 4 entries were no, no, no, yes - I don't care as there is a yes value
so the last 3 values have to be no
I don't know which RDBMS you use, but you can try something like that:
select count(*)
from
(select id,
response
from your_table
order by id desc
limit 3) t
where t.response = 'No';
Here is a solution in Bigquery. You may need to tweak the syntax for you SQL base:
SELECT
* ,
SUM( CASE WHEN response ="No" THEN 1 ELSE 0 END )
OVER (ORDER BY id RANGE BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM dataset
It returns output like this:
Which I think is what you want.
The key part is the window functions using RANGE BETWEEN 2 PRECEDING AND CURRENT ROW. The case statement is checking if the current row and the 2 before are "No". If they are return a 1. So when three in a row occur this will SUM to 3.
I would use two lag()s:
select t.*
from (select t.*,
lag(id, 2) over (order by id) as prev2_id,
lag(id, 2) over (order by id) as prev2_id_response
from t
) t
where response = 'no' and prev2_id = prev2_id_response;
The first lag() determines the id "2 back". The second determines the id "2 back" for the same response. If the response is the same for those three rows, then these are the same.
This returns each occurrence of "no" where this occurs. You can use exists if you just want to know if this ever occurs.
This can be done with window functions and a derived table or CTE term. The following takes you through how it can be done, step by step:
Full Example with data
WITH cte1 AS (
SELECT x.*
, CASE WHEN COALESCE(LAG(response) OVER (ORDER BY id), 'NA') <> response THEN 1 ELSE 0 END AS edge
FROM xlogs AS x
)
, cte2 AS (
SELECT x.*
, SUM(edge) OVER (ORDER BY id) AS xgroup
FROM cte1 AS x
)
, cte3 AS (
SELECT x.*
, ROW_NUMBER() OVER (PARTITION BY xgroup ORDER BY id) AS xposition
FROM cte2 AS x
)
, cte4 AS (
SELECT x.*
, CASE WHEN xposition >= 3 AND response = 'No' THEN 1 END AS xtrigger
FROM cte3 AS x
)
, cte5 AS (
SELECT x.*
FROM cte4 AS x
ORDER BY id DESC
LIMIT 1
)
SELECT *
FROM cte5
WHERE response = 'No'
;
The result of cte4 provides useful detail about the logic:
+----+----------+------+--------+-----------+----------+
| id | response | edge | xgroup | xposition | xtrigger |
+----+----------+------+--------+-----------+----------+
| 1 | Yes | 1 | 1 | 1 | NULL |
| 2 | Yes | 0 | 1 | 2 | NULL |
| 3 | No | 1 | 2 | 1 | NULL |
| 4 | No | 0 | 2 | 2 | NULL |
| 5 | Yes | 1 | 3 | 1 | NULL |
| 6 | No | 1 | 4 | 1 | NULL |
| 7 | No | 0 | 4 | 2 | NULL |
| 8 | No | 0 | 4 | 3 | 1 |
+----+----------+------+--------+-----------+----------+

Sql assign unique key to groups having particular pattern

Hi I was trying to group data based on a particular pattern.
I have a table with two column as below,
Name rollingsum
A 5
A 10
A 0
A 5
A 0
B 6
B 0
I need to generate a key column that increment only after rollingsum equals 0 is encountered.As given below
Name rollingsum key
A 5 1
A 10 1
A 0 1
A 5 2
A 0 2
B 6 3
B 0 3
I am using postgres, I tried to increment variable in case statement as below
Declare a int;
a:=1;
........etc
Case when rolling sum =0 then a:=a+1 else a end as key
But I am getting an error near :
Thanks in advance for all help
You need an ordering columns because the results depend on the ordering of the rows -- and SQL tables represent unordered sets.
Then do a cumulative sum of the 0 counts from the end of the data. That is in reverse order, so subtract that from the total:
select t.*,
(1 + sum( (rolling_sum = 0)::int ) over () -
sum( (rolling_sum = 0)::int ) over (order by ordercol desc)
) as key
from t;
Assuming that you have a column called id to order the rows, here is one option using a cumulative count and a window frame:
select name, rollingsum,
1 + count(*) filter(where rollingsum = 0) over(
order by id
rows between unbounded preceding and 1 preceding
) as key
from mytable
Demo on DB Fiddle:
name | rollingsum | key
:--- | ---------: | --:
A | 5 | 1
A | 10 | 1
A | 0 | 1
A | 5 | 2
A | 0 | 2
B | 6 | 3
B | 0 | 3

SQL select all rows per group after a condition is met

I would like to select all rows for each group after the last time a condition is met for that group. This related question has an answer using correlated subqueries.
In my case I will have millions of categories and hundreds of millions/billions of rows. Is there a way to achieve the same results using a more performant query?
Here is an example. The condition is all rows (per group) after the last 0 in the conditional column.
category | timestamp | condition
--------------------------------------
A | 1 | 0
A | 2 | 1
A | 3 | 0
A | 4 | 1
A | 5 | 1
B | 1 | 0
B | 2 | 1
B | 3 | 1
The result I would like to achieve is
category | timestamp | condition
--------------------------------------
A | 4 | 1
A | 5 | 1
B | 2 | 1
B | 3 | 1
If you want everything after the last 0, you can use window functions:
select t.*
from (select t.*,
max(case when condition = 0 then timestamp end) over (partition by category) as max_timestamp_0
from t
) t
where timestamp > max_timestamp_0 or
max_timestamp_0 is null;
With an index on (category, condition, timestamp), the correlated subquery version might also perform quite well:
select t.*
from t
where t.timestamp > all (select t2.timestamp
from t t2
where t2.category = t.category and
t2.condition = 0
);
You might want to try window functions:
select category, timestamp, condition
from (
select
t.*,
min(condition) over(partition by category order by timestamp desc) min_cond
from mytable t
) t
where min_cond = 1
The window min() with the order by clause computes the minimum value of condition over the current and following rows of the same category: we can use it as a filter to eliminate rows for which there is a more recent row with a 0.
Compared to the correlated subquery approach, the upside of using window functions is that it reduces the number of scans needed on the table. Of course this computing also has a cost, so you'll need to assess both solutions against your sample data.

in SQL, how to remove distinct column values (not rows, as usually done)

I have a production case, for a supply chain. We have devices that are moved around in warehouses, and I need to find the previous warehouse locations.
I have a table like this:
+--------+------------+--------+--------+--------+
| device | current_WH | prev_1 | prev_2 | prev_3 |
+--------+------------+--------+--------+--------+
| 1 | AB | KK | KK | KK |
| 2 | DE | DE | DE | NQ |
| 3 | FF | MM | ST | ST |
+--------+------------+--------+--------+--------+
I need to find the distinct values of current_WH and the "prev" columns. So I'm not flattening rows, but narrowing columns. I need to get this:
+--------+------------+--------+--------+--------+
| device | current_WH | prev_1 | prev_2 | prev_3 |
+--------+------------+--------+--------+--------+
| 1 | AB | KK | blank | blank |
| 2 | DE | NQ | blank | blank |
| 3 | FF | MM | ST | blank |
+--------+------------+--------+--------+--------+
I'll figure out nulls or blanks later. But for now I need one row for each device that shows the current WH and previous locations. There could be any number - not always the same.
If I do "distinct" that flattens rows. Doing a distinct and group by doesn't achieve the requirement.
Any help is appreciated. Thanks!
You need to do unpivot to let your column value rows, because that will easier to compare before current_WH value data, then do a pivot to recover the data schema.
Do unpivot to let your column value rows, because that will easier to compare before current_WH value data, and add a new grp column it can help to recover your expected result.
use LAG function to get the previous value it will be compared with current_WH value.
use SUM with CASE WHEN and window function to cumulative number if the previous equal to current_WH value.
if the SUM cumulative number greater than 0 means the name was repeated.
look like this.
with cteUnion as(
SELECT device,current_WH,0 grp
FROM T
UNION ALL
SELECT device,prev_1,1 grp
FROM T
UNION ALL
SELECT device,prev_2,2 grp
FROM T
UNION ALL
SELECT device,prev_3,3 grp
FROM T
),cte1 as(
SELECT *,
LAG(current_WH) over(partition by current_WH order by grp) perviosVal
from cteUnion
),cteResult as (
SELECT *,
(CASE WHEN sum(CASE WHEN perviosVal = current_WH then 1 else 0 end) over(partition by device order by grp) > 0 THEN 'Block' else current_WH end) val
FROM cte1
)
select device,
MAX(CASE WHEN grp = 0 then val end) current_WH ,
MAX(CASE WHEN grp = 1 then val end) prev_1,
MAX(CASE WHEN grp = 2 then val end) prev_2,
MAX(CASE WHEN grp = 3 then val end) prev_3
from cteResult
GROUP BY device
sqlfiddle
NOTE
grp column number value depends on your order.

SQL Pattern Length in New Column

I have a (very very large) table of similar format to the following:
+--------+-------+
| id | value |
+--------+-------+
| 1 | 5 |
| 2 | 6 |
| 3 | 6 |
| 4 | 4 |
| 5 | 3 |
| 6 | 2 |
| 7 | 4 |
| 8 | 5 |
+--------+-------+
What I'd like to be able to do is return the pattern length of the value column increasing or decreasing in a third column (with pattern being negative for decreasing and positive for increasing), while ignoring IDs where there is no change. The pattern should reset to 1 or -1 when the pattern is broken.
I've not explained that well at all, so with the table above, ideally the result would be:
+--------+-------+---------+
| id | value | pattern |
+--------+-------+---------+
| 1 | 5 | 0/NULL |
| 2 | 6 | 1 |
| 3 | 6 | 1 |
| 4 | 4 | -1 |
| 5 | 3 | -2 |
| 6 | 2 | -3 |
| 7 | 4 | 1 |
| 8 | 5 | 2 |
+--------+-------+---------+
I did some research and came across pattern matching, but it turns out either the version of SQL I'm using (it's the version used by/on Amazon Redshift , which according to them is 'based on' PostgreSQL 8.0.2 http://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html)) doesn't support it, or I'm being very silly.
So, is this something that is even possible with SQL, and if so how should I go about it? Many thanks.
In SQL Server 2012, you can do this with lead() and lag() and cumulative sum.
Something that comes quite close is this:
select t.*, sum(nextinc) over (order by id) as pattern
from (select t.*,
(case when lead(t.value) > t.value then 1
when lead(t.value) = t.value then 0
else -1 end) as nextinc,
(case when lag(t.value) > t.value then 1 else 0 end) as previnc
from table t
) t;
However, the pattern goes up and down in increments of 1 instead of starting over. So, we need to find the pattern breaks. The following defines the breaks in the pattern and then increments pattern for for sequences of increasing/decreasing values:
select t.*,
sum(nextinc) over (partition by grp order by id) as pattern
from (select t.*,
sum(case when (prev_value <= value and value <= next_value) or
(prev_value >= value and value >= next_value)
then 0 else 1
end) over (order by id) as grp
from (select t.*, lead(t.value) over (order by id) as next_value,
lag(t.value) over (order by id) as prev_value,
(case when lead(t.value) over (order by id) > t.value then 1
when lead(t.value) over (order by id) = t.value then 0
else -1 end) as nextinc
from table t
) t
) t
For the given example, the following seems to do the job:
SELECT
S3.id
, S3.value
, S3.pattern
, SUM(minusNullPlus) OVER (PARTITION BY sequenceID ORDER BY id) calculated
FROM
(SELECT
S2.*
, SUM(newSequence) OVER (ORDER BY id) sequenceID
FROM
(SELECT
S1.*
, CASE
WHEN minusNullPlus = LAG(minusNullPlus, 1, NULL) OVER (ORDER BY id)
OR
minusNullPlus = 0
OR
(minusNullPlus = 1
AND
value - LAG(value, 1, NULL) OVER (ORDER BY id) = 1
)
OR
(minusNullPlus = -1
AND
value - LAG(value, 1, NULL) OVER (ORDER BY id) = -1
)
THEN 0
ELSE 1
END newSequence
FROM
(SELECT
id
, value
, CASE
WHEN value > LAG(value, 1, NULL) OVER (ORDER BY id) THEN 1
WHEN value < LAG(value, 1, NULL) OVER (ORDER BY id) THEN -1
WHEN value = LAG(value, 1, NULL) OVER (ORDER BY id) THEN 0
ELSE 0
END minusNullPlus
, CASE
WHEN value - LAG(value, 1, NULL) OVER (ORDER BY id) = 0 THEN 0
ELSE 1
END change
, pattern
FROM SomeTable
) S1
) S2
) S3
ORDER BY id
;
See it in action: SQL Fiddle
It uses some additional data to check against - please verify the respective patterns to be actually in line with your expectations/requirements.
NB: The suggested solution relies on some of the particularities of the provided sample data (and its expansion in above SQL Fiddle).
Please comment, if and as adjustment / further detail is required.