How to use SQL LAG function with condition - sql

I have a table as the following rows:
tipoProtocolo numeroProtocolo dataReferencia dataAssinatura dataVencimento
------------- --------------- -------------- -------------- --------------
1 47676 NULL 20150112 20151231
1 47676 20151231 20150209 NULL
1 47676 NULL 20150224 NULL
1 47676 NULL 20151005 NULL
1 47676 NULL 20151021 NULL
1 47676 NULL 20151026 NULL
1 47676 NULL 20151120 NULL
I've implemented a piece of code that gets the value from the dataVencimento column (previous row) to the dataRefencia column (red arrow in the image). However, I would like to check if the column dataVencimento (from the previous row) is NULL. If this condition is true I need to copy the value from the column dataReferencia from the previous row (blue arrow in the image).
SELECT tipoProtocolo,
numeroProtocolo,
LAG(dataVencimento, 1 ) OVER(
PARTITION BY numeroProtocolo, tipoProtocolo
ORDER BY dataAssinatura
) dataReferencia,
dataAssinatura,
dataVencimento
FROM cte_ContratoAditivo
Here is my SQL code:
SELECT tipoProtocolo, numeroProtocolo,
LAG(dataVencimento, 1) OVER(
PARTITION BY numeroProtocolo, tipoProtocolo
ORDER BY dataAssinatura
) dataReferencia,
dataAssinatura, dataVencimento
FROM cte_ContratoAditivo

What you want is lag(ignore nulls). Unfortunately, SQL Server does not support this.
If the dates are increasing, you can use a cumulative max:
select . . .,
max(dataVencimento) over (
partition by numeroProtocolo, tipoProtocolo
order by dataAssinatura
rows between unbounded preceding and 1 preceding
) as dataReferencia
If this is not the case, you can use two levels of aggregation:
select ca.*,
max(dataVencimento) over (
partition by numeroProtocolo, tipoProtocolo
order by dataAssinatura
) as dataReferencia
from (select ca.*,
count(dataVencimento) over (
partition by numeroProtocolo, tipoProtocolo
order by dataAssinatura
) as grouping
from cte_ContratoAditivo ca
) ca;
The subquery counts the number of valid values. This is really to assign a group number to the rows. The outer query then spreads the value over the entire group.

As the OP did respond I've gone with the literal and the guessed answer. The first is the literal answer; if the prior row is NULL use that the one prior:
WITH VTE AS (
SELECT *
FROM (VALUES(1,47676,CONVERT(date,NULL),CONVERT(date,'20150112'),CONVERT(date,'20151231')),
(1,47676,CONVERT(date,'20151231'),CONVERT(date,'20150209'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20150224'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20151005'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20151021'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20151026'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20151120'),CONVERT(date,NULL))) V(tipoProtocolo,numeroProtocolo,dataReferencia,dataAssinatura,dataVencimento)),
CTE AS(
SELECT V.tipoProtocolo,
V.numeroProtocolo,
V.dataReferencia,
V.dataAssinatura,
V.dataVencimento,
LAG(dataVencimento) OVER (PARTITION BY numeroProtocolo, tipoProtocolo ORDER BY dataAssinatura) AS dataReferencia1,
LAG(dataVencimento,2) OVER (PARTITION BY numeroProtocolo, tipoProtocolo ORDER BY dataAssinatura) AS dataReferencia2
FROM VTE V)
SELECT C.tipoProtocolo,
C.numeroProtocolo,
C.dataReferencia,
C.dataAssinatura,
C.dataVencimento,
ISNULL(C.dataReferencia1,C.dataReferencia2) AS dataReferencia
FROM CTE C;
The other is what I suspect the OP really means and that they want the last non-NULLvalue. If this is the case, this is a "classic" gaps and islands problem:
WITH VTE AS (
SELECT *
FROM (VALUES(1,47676,CONVERT(date,NULL),CONVERT(date,'20150112'),CONVERT(date,'20151231')),
(1,47676,CONVERT(date,'20151231'),CONVERT(date,'20150209'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20150224'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20151005'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20151021'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20151026'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20151120'),CONVERT(date,NULL))) V(tipoProtocolo,numeroProtocolo,dataReferencia,dataAssinatura,dataVencimento)),
Grps AS(
SELECT V.tipoProtocolo,
V.numeroProtocolo,
V.dataReferencia,
V.dataAssinatura,
V.dataVencimento,
COUNT(dataVencimento) OVER (PARTITION BY numeroProtocolo, tipoProtocolo ORDER BY dataAssinatura
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM VTE V)
SELECT G.tipoProtocolo,
G.numeroProtocolo,
G.dataReferencia,
G.dataAssinatura,
G.dataVencimento,
MAX(dataVencimento) OVER (PARTITION BY G.Grp) AS dataReferencia,
G.Grp
FROM Grps G
ORDER BY dataAssinatura;
I will note that is seems odd that you call the column with the LAG expression dataReferencia, despite that the expression is on dataVencimento (and there is already a column called dataReferencia).

Related

How to retrieve MAX Turntime of Top Two earliest date?

How would I construct a query to receive the MAX TurnTime per ID of the first 2 rounds? Rounds being defined as minimum Beginning_Date to mininmum End_Date of an ID. Without reusing either of the dates for the second round Turn Time calculation.
You can use row_number() . . . twice:
select d.*
from (select d.*,
row_number() over (partition by id order by turn_time desc) as seqnum_turntime
from (select d.*,
row_number() over (partition by id order by beginning_end desc) as seqnum_round
from data d
) d
where seqnum_round <= 2
) d
where seqnum_turntime = 1;
The innermost subquery gets the first two rounds. The outer subquery gets the maximum.
You could express this without window functions as well:
select top (1) with ties d.*
from data d
where d.beginning_date <= (select d2.beginning_date
from data d2
where d2.id = d.id
offset 1 fetch first 1 row only
)
order by row_number() over (partition by id order by turntime desc);
SELECT
ID
,turn_time
,beginning_date
,end_date
FROM
(
SELECT
ID
,MAX(turn_time) OVER (PARTITION BY Id ORDER BY BeginningDate ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS turn_time --Maximum turn time of the current row and preceding row
,MIN(BeginningDate) OVER (PARTITION BY Id ORDER BY BeginningDate ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS beginning_date --Minimum begin date over current row and preceding row (could also use LAG)
,end_date
,ROW_NUMBER() OVER (PARTITION BY Id ORDER BY BeginningDate) AS Turn_Number
FROM
<whatever your table is>
) turn_summary
WHERE
Turn_Number = 2

Can I create a field that is conditionally assigned by date in SQL?

I'm dealing with some subscription data. When the user upgrades/downgrades, the system overwrites the level of the subscription with the new value. I am trying to assign the historical values when the user has upgraded. My data set looks like the following where one user can upgrade or downgrade multiple times.
I am trying to get the what is outlined in the "desired value" column.
Essentially, and transactions that happened before an upgrade should be assigned the "original_product" that is captured on the upgrade transaction, transactions that happen after this should be assigned the "new_product" value.
I've been trying joining the data to itself, but I can't find a way to avoid getting multiple rows for each invoice.
You can use window functions:
select t.*,
coalesce(last_value(case when event = 'Upgrade' then new_product end ignore nulls) over (partition by sub_id order by created),
first_value(original_product ignore nulls) over (partition by sub_id order by created)
) as desired_value
from t;
This gets the most recent new_product from an "Upgrade" row. If that doesn't exist, then it gets the overall original_product.
I think you want first_value():
select
t.*,
coalesce(
first_value(new_product ignore nulls) over(
order by created desc
rows between unboundeed preceding and current row
),
first_value(original_product ignore nulls) over(
order by created
rows between current row and unbounded following
)
) desired_value
from mytable t
The idea is to first try to get the first non-null new_product value on preceding rows (current row included). If there is no such row, then we lookup the first non-null original product in the following rows.
In theory, you would also need a partition by clause that contains the column that represent the user. Your data has no sign of such column though, so I left it apart.
Below is for BigQuery Standard SQL
#standardSQL
SELECT *,
IFNULL(
FIRST_VALUE(original_product IGNORE NULLS) OVER(original_product_lookup),
FIRST_VALUE(new_product IGNORE NULLS) OVER(new_product_lookup)
) AS desired_value
FROM `project.dataset.table`
WINDOW
original_product_lookup AS (ORDER BY created ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING),
new_product_lookup AS (ORDER BY created DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
You can test, play with above using simplified data from your question (using only used/relevant data-points) as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 created, NULL original_product, NULL new_product UNION ALL
SELECT 2, NULL, NULL UNION ALL
SELECT 3, 'Level 1', 'Level 2' UNION ALL
SELECT 4, NULL, NULL UNION ALL
SELECT 5, 'Level 2', 'Level 1' UNION ALL
SELECT 6, NULL, NULL
)
SELECT *,
IFNULL(
FIRST_VALUE(original_product IGNORE NULLS) OVER(original_product_lookup),
FIRST_VALUE(new_product IGNORE NULLS) OVER(new_product_lookup)
) AS desired_value
FROM `project.dataset.table`
WINDOW
original_product_lookup AS (ORDER BY created ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING),
new_product_lookup AS (ORDER BY created DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
ORDER BY created
with result
Row created original_product new_product desired_value
1 1 null null Level 1
2 2 null null Level 1
3 3 Level 1 Level 2 Level 2
4 4 null null Level 2
5 5 Level 2 Level 1 Level 1
6 6 null null Level 1
Was able to solve with the combination of answers
SELECT e.*,
coalesce(
last_value(case when (event ='Upgrade' OR event = "Downgrade" OR event = "Crossgrade") then new_product end ignore nulls) over (partition by subscription order by created),
first_value(original_product ignore nulls) over(
order by created
rows between current row and unbounded following
)
) desired_value
FROM e

how can I calculate totals within a query, depending on the date?

Below you can see my query, which gives the following result:
select t.actual_date,
t.id_key,
t.attendance_status,
t.money_step,
sum(t.money_step) over (partition by t.id_key order by t.actual_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)as accumulated
from example t
order by t.id_key, t.actual_date
I want the "accumulated" column to add up the value of "money_step" for each id_key.
If the attendance_status is the second time '15' for an Id, the counter should add up from the beginning. For ID_KEY = 1 it should look like this:
accumulated:
Row 1:20
Row 2: 80
Row 3: 100
Row 4: 120
How can I do this in the query? Can someone help me?
I understand that you want to reset the sum after the first row where attendance_status has value 15. One option uses a conditional max() to define the subgroup:
select
actual_date,
id_key,
attendance_status,
money_step,
sum(t.money_step) over (
partition by id_key, grp
order by actual_date
) as accumulated
from (
select
e.*,
max(case when attendance_status = 15 then 1 else 0 end) over(
partition by id_key
order by actual_date
rows between unbounded preceding and 1 preceding
) grp
from example e
) e
order by id_key, actual_date

Oracle LEAD - return next matching column value

I having below data in one table.
And I want to get NEXT out data from OUT column. So used LEAD function in below query.
SELECT ROW_NUMBER,TIMESTAMP,IN,OUT,LEAD(OUT) OVER (PARTITION BY NULL ORDER BY TIMESTAMP) AS NEXT_OUT
FROM MYTABLE;
It gives data as below NEXT_OUT column.
But I need to know the matching next column value in sequential way like DESIRED columns. Please let me know how can i achieve this in Oracle LEAD FUNCTION
THANKS
Assign row number to all INs and OUTs separately, sort the results by placing them in a single column and calculate LEADs:
WITH cte AS (
SELECT t.*
, CASE WHEN "IN" IS NOT NULL THEN COUNT("IN") OVER (ORDER BY "TIMESTAMP") END AS rn1
, CASE WHEN "OUT" IS NOT NULL THEN COUNT("OUT") OVER (ORDER BY "TIMESTAMP") END AS rn2
FROM t
)
SELECT cte.*
, LEAD("OUT") OVER (ORDER BY COALESCE(rn1, rn2), rn1 NULLS LAST) AS NEXT_OUT
FROM cte
ORDER BY COALESCE(rn1, rn2), rn1 NULLS LAST
Demo on db<>fiddle
Enumerate in the "in"s and the "out"s and use that information for matching.
select tin.*, tout.out as next_out
from (select t.*,
count(in) over (order by timestamp) as seqnum_in
from t
) tin left join
(select t.*,
count(out) over (order by timestamp) as seqnum_out
from t
) tout
on tin.in is not null and
tout.out is not null and
tin.seqnum_in = tout.seqnum_out;

Vertica/SQL: Getting rows immediately proceeding events

Consider a simple query
select from tbl where status=MELTDOWN
I would like to now create a table that in addition to including these rows, also includes the previous p rows and the subsequent n rows, so that I can get a sense as to what happens in the surrounding time of these MELTDOWNs. Any hints?
You can do this with window functions by getting the seqnum of the meltdown rows. I prefer to do this with lag()/lead() ignore nulls, but Vertical doesn't support that. I think this is the equivalent with first_value()/last_value():
with t as (
select t.*, row_number() over (order by id) as seqnum
from tbl
),
tt as (
select t.*,
last_value(case when status = 'meltdown' then seqnum end ignore nulls) over (order by seqnum rows between unbounded preceding and current row) as prev_meltdown_seqnum,
first_value(case when status = 'meltdown' then seqnum end ignore nulls) over (order by seqnum rows between current row and unbounded following) as prev_meltdown_seqnum,
from t
)
select tt.*
from tt
where seqnum between prev_melt_seqnum and prev_melt_seqnum + 7 or
seqnum between next_melt_seqnum -5 and next_melt_seqnum;
WITH
grouped AS
(
SELECT
SUM(
CASE WHEN status = 'Meltdown' THEN 1 ELSE 0 END
)
OVER (
ORDER BY timeStamp
)
AS GroupID,
tbl.*
FROM
tbl
),
sorted AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY GroupID ORDER BY timeStamp ASC ) AS incPos,
ROW_NUMBER() OVER (PARTITION BY GroupID ORDER BY timeStamp DESC) AS decPos,
MAX(GroupID) OVER () AS LastGroup
grouped.*
FROM
grouped
)
SELECT
sorted.*
FROM
sorted
WHERE
(incPos <= 8 AND GroupID > 0 ) -- Meltdown and the 7 events following it
OR (decPos <= 6 AND GroupID <> LastGroup) -- and the 6 events preceding a Meltdown
ORDER BY
timeStamp