Vertica/SQL: Getting rows immediately proceeding events - sql

Consider a simple query
select from tbl where status=MELTDOWN
I would like to now create a table that in addition to including these rows, also includes the previous p rows and the subsequent n rows, so that I can get a sense as to what happens in the surrounding time of these MELTDOWNs. Any hints?

You can do this with window functions by getting the seqnum of the meltdown rows. I prefer to do this with lag()/lead() ignore nulls, but Vertical doesn't support that. I think this is the equivalent with first_value()/last_value():
with t as (
select t.*, row_number() over (order by id) as seqnum
from tbl
),
tt as (
select t.*,
last_value(case when status = 'meltdown' then seqnum end ignore nulls) over (order by seqnum rows between unbounded preceding and current row) as prev_meltdown_seqnum,
first_value(case when status = 'meltdown' then seqnum end ignore nulls) over (order by seqnum rows between current row and unbounded following) as prev_meltdown_seqnum,
from t
)
select tt.*
from tt
where seqnum between prev_melt_seqnum and prev_melt_seqnum + 7 or
seqnum between next_melt_seqnum -5 and next_melt_seqnum;

WITH
grouped AS
(
SELECT
SUM(
CASE WHEN status = 'Meltdown' THEN 1 ELSE 0 END
)
OVER (
ORDER BY timeStamp
)
AS GroupID,
tbl.*
FROM
tbl
),
sorted AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY GroupID ORDER BY timeStamp ASC ) AS incPos,
ROW_NUMBER() OVER (PARTITION BY GroupID ORDER BY timeStamp DESC) AS decPos,
MAX(GroupID) OVER () AS LastGroup
grouped.*
FROM
grouped
)
SELECT
sorted.*
FROM
sorted
WHERE
(incPos <= 8 AND GroupID > 0 ) -- Meltdown and the 7 events following it
OR (decPos <= 6 AND GroupID <> LastGroup) -- and the 6 events preceding a Meltdown
ORDER BY
timeStamp

Related

Get Earliest Date corresponding to the latest occurrence of a recurring name

I have a table with Name and Date columns. I want to get the earliest date when the current name appeared. For example:
Name
Date
X
30-Jan-2021
X
29-Jan-2021
X
28-Jan-2021
Y
27-Jan-2021
Y
26-Jan-2021
Y
25-Jan-2021
Y
24-Jan-2021
X
23-Jan-2021
X
22-Jan-2021
Now when I try to get the earliest date when current name (X) started to appear, I want 28-Jan, but the sql query would give 22-Jan-2021 because that's when X appeared originally for the first time.
Update: This was the query I was using:
Select min(Date) from myTable where Name='X'
I am using older SQL Server 2008 (in the process of upgrading), so do not have access to LEAD/LAG functions.
The solutions suggested below do work as intended. Thanks.
This is a type of gaps-and-islands problem.
There are many solutions. Here is one that is optimized for your case
Use LEAD/LAG to identify the first row in each grouping
Filter to only those rows
Number them rows and take the first one
WITH StartPoints AS (
SELECT *,
IsStart = CASE WHEN Name <> LEAD(Name, 1, '') OVER (ORDER BY Date DESC) THEN 1 END
FROM YourTable
),
Numbered AS (
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Date DESC)
FROM StartPoints
WHERE IsStart = 1 AND Name = 'X'
)
SELECT
Name, Date
FROM Numbered
WHERE rn = 1;
db<>fiddle
For SQL Server 2008 or earlier (which I strongly suggest you upgrade from), you can use a self-join with row-numbering to simulate LEAD/LAG
WITH RowNumbered AS (
SELECT *,
AllRn = ROW_NUMBER() OVER (ORDER BY Date ASC)
FROM YourTable
),
StartPoints AS (
SELECT r1.*,
IsStart = CASE WHEN r1.Name <> ISNULL(r2.Name, '') THEN 1 END
FROM RowNumbered r1
LEFT JOIN RowNumbered r2 ON r2.AllRn = r1.AllRn - 1
),
Numbered AS (
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Date DESC)
FROM StartPoints
WHERE IsStart = 1
)
SELECT
Name, Date
FROM Numbered
WHERE rn = 1;
This is a gaps and island problem. Based on the sample data, this will work:
WITH Groups AS(
SELECT YT.[Name],
YT.[Date],
ROW_NUMBER() OVER (ORDER BY YT.Date DESC) -
ROW_NUMBER() OVER (PARTITION BY YT.[Name] ORDER BY Date DESC) AS Grp
FROM dbo.YourTable YT),
FirstGroup AS(
SELECT TOP (1) WITH TIES
G.[Name],
G.[Date]
FROM Groups G
WHERE [Name] = 'X'
ORDER BY Grp ASC)
SELECT MIN(FG.[Date]) AS Mi
db<>fiddle
If i did understand, you want to know when the X disappeared and reappeared again. in that case you can search for gaps in dates by group.
this and example how to detect that
SELECT name
,DATE
FROM (
SELECT *
,DATEDIFF(day, lead(DATE) OVER (
PARTITION BY name ORDER BY DATE DESC
), DATE) DIF
FROM YourTable
) a
WHERE DIF > 1

get first occurrence of last changed value of a column for each unique id

How to get first occurrence of last changed value of "sval" column?
for id = 22, 71 is the last changed value so wants to fetch first occurence of 71.
same way for id = 25, 74 is the last changed value so wants to fetch first occurence of 74.
https://dbfiddle.uk/?rdbms=mariadb_10.6&fiddle=c980809154d41f2accc9f14d569b48f1
data:
in above picture i wanted to fetch highlighted row.
try:
with LastValue as (
select t.sval
from test t
order by t.date desc
limit 1
)
select t.*
from test t
where t.sval = (select sval from LastValue)
and t.date > (select max(tt.date) from test tt where tt.sval <> (select sval from LastValue))
order by t.date asc
limit 1;
actually the problem statement is i dont want the group by sval first occurence, instead i want the whatever last changed sval so our example after highlighted rows should be returns for ids (22,25).
WITH
cte1 AS ( SELECT *,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY `date` DESC) rn1,
ROW_NUMBER() OVER (PARTITION BY id, sval ORDER BY `date` DESC) rn2
FROM test ),
cte2 AS ( SELECT *,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY `date` ASC) rn3
FROM cte1
WHERE rn1 = rn2 )
SELECT id, date, sval
FROM cte2
WHERE rn3 = 1;
https://dbfiddle.uk/?rdbms=mariadb_10.6&fiddle=a25569690e4b35a55b0bee13856eb724
One method for doing this uses lag() to check for a difference and then chooses the last point where there is a difference:
select t.*
from (select t.*,
row_number() over (partition by id order by date desc) as seqnum
from (select t.*,
lag(sval) over (partition by id order by date) as prev_sval
from test t
) t
where prev_sval is null or prev_sval <> sval
) t
where seqnum = 1;
Very importantly: This returns the last time there was a change even when the value returns to an already seen value for the id. That is how I interpret your question.

How to retrieve MAX Turntime of Top Two earliest date?

How would I construct a query to receive the MAX TurnTime per ID of the first 2 rounds? Rounds being defined as minimum Beginning_Date to mininmum End_Date of an ID. Without reusing either of the dates for the second round Turn Time calculation.
You can use row_number() . . . twice:
select d.*
from (select d.*,
row_number() over (partition by id order by turn_time desc) as seqnum_turntime
from (select d.*,
row_number() over (partition by id order by beginning_end desc) as seqnum_round
from data d
) d
where seqnum_round <= 2
) d
where seqnum_turntime = 1;
The innermost subquery gets the first two rounds. The outer subquery gets the maximum.
You could express this without window functions as well:
select top (1) with ties d.*
from data d
where d.beginning_date <= (select d2.beginning_date
from data d2
where d2.id = d.id
offset 1 fetch first 1 row only
)
order by row_number() over (partition by id order by turntime desc);
SELECT
ID
,turn_time
,beginning_date
,end_date
FROM
(
SELECT
ID
,MAX(turn_time) OVER (PARTITION BY Id ORDER BY BeginningDate ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS turn_time --Maximum turn time of the current row and preceding row
,MIN(BeginningDate) OVER (PARTITION BY Id ORDER BY BeginningDate ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS beginning_date --Minimum begin date over current row and preceding row (could also use LAG)
,end_date
,ROW_NUMBER() OVER (PARTITION BY Id ORDER BY BeginningDate) AS Turn_Number
FROM
<whatever your table is>
) turn_summary
WHERE
Turn_Number = 2

sparksql get previous not null value in a column

I am trying to get the previous row value of a column only when it's not null in spark SQL.
since ignore nulls is not available the alternative would be rank and get a max of the value.
lag( val ignore nulls) over ()
SELECT id, val, val_partition, MAX(val) over (partition by val_partition)
FROM (
SELECT
id,
val,
sum(case when val is null then 0 else 1 end) over (order by id rows unbounded preceding) as val_partition
FROM base
)
I am trying to find if there is any other optimized way?
Your method is fine, but can be written more concisely as:
SELECT id, val, val_partition,
MAX(val) over (PARTITION BY grp)
FROM (SELECT b.*,
COUNT(val) over (ORDER BY id) as grp
FROM base b
) b;
Note as well that the windowing clause is not needed if id is unique.

combine row groups within time range

I am facing a problem in a query and I am stuck for quite some time now. Here the situation: I have a table with certain records in it, which are terminated via ValidFrom and ValidTo. This table tracks data changes of another table - everytime the underlying data changes, the last valid record is terminated and an insert is performed. Following a SQL fiddle example:
http://sqlfiddle.com/#!18/15c7f/4/0
What I try to achieve is to group records with identical Flags within a timespan into one record. In my fiddle ecample: I would expect the first two records being combined to one record with ValidFrom 2017-01-01 and ValidTo 2017-01-10.
Anyways, I am severly stuck and I tried numerous approaches I found here and in another forum - but without success. One approach is included in the fiddle: evaluate the rownumber ordered by date and substract the rownumber partitioned by flag columns etc. ... but nothing works out.
Any help would be highly appreciated.
Try this query:
select keycol, min(validfrom), max(validto), flag1, flag2 from
(
select *,
sum(iscontinuation) over (partition by keycol order by validfrom rows between UNBOUNDED PRECEDING AND CURRENT ROW) [GroupingCol]
from (
select *,
case when
lag(validto) over (partition by keycol order by validfrom) = dateadd(day, -1, validfrom) and
lag(flag1) over (partition by keycol order by validfrom) = flag1 and
lag(flag2) over (partition by keycol order by validfrom) = flag2 then 0
else 1 end [IsContinuation]
from t
) a
) b group by keycol, flag1, flag2, groupingcol
You need to use min max with partition by your group.Got some help from Michal' Query . This can handle multiple flag data as well
With CTE2 as(
select *,
(case when flag1 = lag(flag1) over (order by (select null)) and
flag2 = lag(flag2) over (order by (select null)) then -1 else 0 end) +
row_number() over (order by (select null)) [GroupingCol]
from t)
Select KeyCol,Flag1,Flag2,ValidFrom,ValidTo From (
Select KeyCol,Flag1,Flag2,
min(ValidFrom) over (partition by KeyCol,Flag1,Flag2,GroupingCol) ValidFrom ,
Max(ValidTo) over (partition by KeyCol,Flag1,Flag2,GroupingCol) ValidTo,
Row_number() over (partition by KeyCol,Flag1,Flag2,GroupingCol order by keycol,ValidFrom) RN
From CTE2) A where RN=1