sparksql get previous not null value in a column - sql

I am trying to get the previous row value of a column only when it's not null in spark SQL.
since ignore nulls is not available the alternative would be rank and get a max of the value.
lag( val ignore nulls) over ()
SELECT id, val, val_partition, MAX(val) over (partition by val_partition)
FROM (
SELECT
id,
val,
sum(case when val is null then 0 else 1 end) over (order by id rows unbounded preceding) as val_partition
FROM base
)
I am trying to find if there is any other optimized way?

Your method is fine, but can be written more concisely as:
SELECT id, val, val_partition,
MAX(val) over (PARTITION BY grp)
FROM (SELECT b.*,
COUNT(val) over (ORDER BY id) as grp
FROM base b
) b;
Note as well that the windowing clause is not needed if id is unique.

Related

postgresql cumsum by condition

I have the table
I need to calculate cumsum group by id for every row with type="end".
Can anyone see the problem?
Output result
This is a little tricky. One method is to assign a grouping by reverse counting the ends. Then use dense_rank():
select t.*,
dense_rank() over (order by grp desc) as result
from (select t.*,
count(*) filter (where type = 'end') over (order by created desc) as grp
from t
) t;
You can also do this without a subquery:
select t.*,
(count(*) filter (where type = 'end') over () -
count(*) filter (where type = 'end') over (order by created desc) -
1
)
from t;

Oracle LEAD - return next matching column value

I having below data in one table.
And I want to get NEXT out data from OUT column. So used LEAD function in below query.
SELECT ROW_NUMBER,TIMESTAMP,IN,OUT,LEAD(OUT) OVER (PARTITION BY NULL ORDER BY TIMESTAMP) AS NEXT_OUT
FROM MYTABLE;
It gives data as below NEXT_OUT column.
But I need to know the matching next column value in sequential way like DESIRED columns. Please let me know how can i achieve this in Oracle LEAD FUNCTION
THANKS
Assign row number to all INs and OUTs separately, sort the results by placing them in a single column and calculate LEADs:
WITH cte AS (
SELECT t.*
, CASE WHEN "IN" IS NOT NULL THEN COUNT("IN") OVER (ORDER BY "TIMESTAMP") END AS rn1
, CASE WHEN "OUT" IS NOT NULL THEN COUNT("OUT") OVER (ORDER BY "TIMESTAMP") END AS rn2
FROM t
)
SELECT cte.*
, LEAD("OUT") OVER (ORDER BY COALESCE(rn1, rn2), rn1 NULLS LAST) AS NEXT_OUT
FROM cte
ORDER BY COALESCE(rn1, rn2), rn1 NULLS LAST
Demo on db<>fiddle
Enumerate in the "in"s and the "out"s and use that information for matching.
select tin.*, tout.out as next_out
from (select t.*,
count(in) over (order by timestamp) as seqnum_in
from t
) tin left join
(select t.*,
count(out) over (order by timestamp) as seqnum_out
from t
) tout
on tin.in is not null and
tout.out is not null and
tin.seqnum_in = tout.seqnum_out;

listagg(distinct column) over()

Any idea of alternatives to listagg(distinct column) over() that are supported- something to NOT be grouping by the rest of the columns? I have 20+..
You can use a subquery with row_number() to identify the first value to include in the listagg(), such as:
select listagg(case when seqnum = 1 then column end) within group (order by column) over (order by ?)
from (select t.*, row_number() over (partition by column order by column) as seqnum
from t
) t

Vertica/SQL: Getting rows immediately proceeding events

Consider a simple query
select from tbl where status=MELTDOWN
I would like to now create a table that in addition to including these rows, also includes the previous p rows and the subsequent n rows, so that I can get a sense as to what happens in the surrounding time of these MELTDOWNs. Any hints?
You can do this with window functions by getting the seqnum of the meltdown rows. I prefer to do this with lag()/lead() ignore nulls, but Vertical doesn't support that. I think this is the equivalent with first_value()/last_value():
with t as (
select t.*, row_number() over (order by id) as seqnum
from tbl
),
tt as (
select t.*,
last_value(case when status = 'meltdown' then seqnum end ignore nulls) over (order by seqnum rows between unbounded preceding and current row) as prev_meltdown_seqnum,
first_value(case when status = 'meltdown' then seqnum end ignore nulls) over (order by seqnum rows between current row and unbounded following) as prev_meltdown_seqnum,
from t
)
select tt.*
from tt
where seqnum between prev_melt_seqnum and prev_melt_seqnum + 7 or
seqnum between next_melt_seqnum -5 and next_melt_seqnum;
WITH
grouped AS
(
SELECT
SUM(
CASE WHEN status = 'Meltdown' THEN 1 ELSE 0 END
)
OVER (
ORDER BY timeStamp
)
AS GroupID,
tbl.*
FROM
tbl
),
sorted AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY GroupID ORDER BY timeStamp ASC ) AS incPos,
ROW_NUMBER() OVER (PARTITION BY GroupID ORDER BY timeStamp DESC) AS decPos,
MAX(GroupID) OVER () AS LastGroup
grouped.*
FROM
grouped
)
SELECT
sorted.*
FROM
sorted
WHERE
(incPos <= 8 AND GroupID > 0 ) -- Meltdown and the 7 events following it
OR (decPos <= 6 AND GroupID <> LastGroup) -- and the 6 events preceding a Meltdown
ORDER BY
timeStamp

combine row groups within time range

I am facing a problem in a query and I am stuck for quite some time now. Here the situation: I have a table with certain records in it, which are terminated via ValidFrom and ValidTo. This table tracks data changes of another table - everytime the underlying data changes, the last valid record is terminated and an insert is performed. Following a SQL fiddle example:
http://sqlfiddle.com/#!18/15c7f/4/0
What I try to achieve is to group records with identical Flags within a timespan into one record. In my fiddle ecample: I would expect the first two records being combined to one record with ValidFrom 2017-01-01 and ValidTo 2017-01-10.
Anyways, I am severly stuck and I tried numerous approaches I found here and in another forum - but without success. One approach is included in the fiddle: evaluate the rownumber ordered by date and substract the rownumber partitioned by flag columns etc. ... but nothing works out.
Any help would be highly appreciated.
Try this query:
select keycol, min(validfrom), max(validto), flag1, flag2 from
(
select *,
sum(iscontinuation) over (partition by keycol order by validfrom rows between UNBOUNDED PRECEDING AND CURRENT ROW) [GroupingCol]
from (
select *,
case when
lag(validto) over (partition by keycol order by validfrom) = dateadd(day, -1, validfrom) and
lag(flag1) over (partition by keycol order by validfrom) = flag1 and
lag(flag2) over (partition by keycol order by validfrom) = flag2 then 0
else 1 end [IsContinuation]
from t
) a
) b group by keycol, flag1, flag2, groupingcol
You need to use min max with partition by your group.Got some help from Michal' Query . This can handle multiple flag data as well
With CTE2 as(
select *,
(case when flag1 = lag(flag1) over (order by (select null)) and
flag2 = lag(flag2) over (order by (select null)) then -1 else 0 end) +
row_number() over (order by (select null)) [GroupingCol]
from t)
Select KeyCol,Flag1,Flag2,ValidFrom,ValidTo From (
Select KeyCol,Flag1,Flag2,
min(ValidFrom) over (partition by KeyCol,Flag1,Flag2,GroupingCol) ValidFrom ,
Max(ValidTo) over (partition by KeyCol,Flag1,Flag2,GroupingCol) ValidTo,
Row_number() over (partition by KeyCol,Flag1,Flag2,GroupingCol order by keycol,ValidFrom) RN
From CTE2) A where RN=1