How to use SQL LAG function with condition - sql
I have a table as the following rows:
tipoProtocolo numeroProtocolo dataReferencia dataAssinatura dataVencimento
------------- --------------- -------------- -------------- --------------
1 47676 NULL 20150112 20151231
1 47676 20151231 20150209 NULL
1 47676 NULL 20150224 NULL
1 47676 NULL 20151005 NULL
1 47676 NULL 20151021 NULL
1 47676 NULL 20151026 NULL
1 47676 NULL 20151120 NULL
I've implemented a piece of code that gets the value from the dataVencimento column (previous row) to the dataRefencia column (red arrow in the image). However, I would like to check if the column dataVencimento (from the previous row) is NULL. If this condition is true I need to copy the value from the column dataReferencia from the previous row (blue arrow in the image).
SELECT tipoProtocolo,
numeroProtocolo,
LAG(dataVencimento, 1 ) OVER(
PARTITION BY numeroProtocolo, tipoProtocolo
ORDER BY dataAssinatura
) dataReferencia,
dataAssinatura,
dataVencimento
FROM cte_ContratoAditivo
Here is my SQL code:
SELECT tipoProtocolo, numeroProtocolo,
LAG(dataVencimento, 1) OVER(
PARTITION BY numeroProtocolo, tipoProtocolo
ORDER BY dataAssinatura
) dataReferencia,
dataAssinatura, dataVencimento
FROM cte_ContratoAditivo
What you want is lag(ignore nulls). Unfortunately, SQL Server does not support this.
If the dates are increasing, you can use a cumulative max:
select . . .,
max(dataVencimento) over (
partition by numeroProtocolo, tipoProtocolo
order by dataAssinatura
rows between unbounded preceding and 1 preceding
) as dataReferencia
If this is not the case, you can use two levels of aggregation:
select ca.*,
max(dataVencimento) over (
partition by numeroProtocolo, tipoProtocolo
order by dataAssinatura
) as dataReferencia
from (select ca.*,
count(dataVencimento) over (
partition by numeroProtocolo, tipoProtocolo
order by dataAssinatura
) as grouping
from cte_ContratoAditivo ca
) ca;
The subquery counts the number of valid values. This is really to assign a group number to the rows. The outer query then spreads the value over the entire group.
As the OP did respond I've gone with the literal and the guessed answer. The first is the literal answer; if the prior row is NULL use that the one prior:
WITH VTE AS (
SELECT *
FROM (VALUES(1,47676,CONVERT(date,NULL),CONVERT(date,'20150112'),CONVERT(date,'20151231')),
(1,47676,CONVERT(date,'20151231'),CONVERT(date,'20150209'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20150224'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20151005'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20151021'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20151026'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20151120'),CONVERT(date,NULL))) V(tipoProtocolo,numeroProtocolo,dataReferencia,dataAssinatura,dataVencimento)),
CTE AS(
SELECT V.tipoProtocolo,
V.numeroProtocolo,
V.dataReferencia,
V.dataAssinatura,
V.dataVencimento,
LAG(dataVencimento) OVER (PARTITION BY numeroProtocolo, tipoProtocolo ORDER BY dataAssinatura) AS dataReferencia1,
LAG(dataVencimento,2) OVER (PARTITION BY numeroProtocolo, tipoProtocolo ORDER BY dataAssinatura) AS dataReferencia2
FROM VTE V)
SELECT C.tipoProtocolo,
C.numeroProtocolo,
C.dataReferencia,
C.dataAssinatura,
C.dataVencimento,
ISNULL(C.dataReferencia1,C.dataReferencia2) AS dataReferencia
FROM CTE C;
The other is what I suspect the OP really means and that they want the last non-NULLvalue. If this is the case, this is a "classic" gaps and islands problem:
WITH VTE AS (
SELECT *
FROM (VALUES(1,47676,CONVERT(date,NULL),CONVERT(date,'20150112'),CONVERT(date,'20151231')),
(1,47676,CONVERT(date,'20151231'),CONVERT(date,'20150209'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20150224'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20151005'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20151021'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20151026'),CONVERT(date,NULL)),
(1,47676,CONVERT(date,NULL),CONVERT(date,'20151120'),CONVERT(date,NULL))) V(tipoProtocolo,numeroProtocolo,dataReferencia,dataAssinatura,dataVencimento)),
Grps AS(
SELECT V.tipoProtocolo,
V.numeroProtocolo,
V.dataReferencia,
V.dataAssinatura,
V.dataVencimento,
COUNT(dataVencimento) OVER (PARTITION BY numeroProtocolo, tipoProtocolo ORDER BY dataAssinatura
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM VTE V)
SELECT G.tipoProtocolo,
G.numeroProtocolo,
G.dataReferencia,
G.dataAssinatura,
G.dataVencimento,
MAX(dataVencimento) OVER (PARTITION BY G.Grp) AS dataReferencia,
G.Grp
FROM Grps G
ORDER BY dataAssinatura;
I will note that is seems odd that you call the column with the LAG expression dataReferencia, despite that the expression is on dataVencimento (and there is already a column called dataReferencia).
Related
How to retrieve MAX Turntime of Top Two earliest date?
How would I construct a query to receive the MAX TurnTime per ID of the first 2 rounds? Rounds being defined as minimum Beginning_Date to mininmum End_Date of an ID. Without reusing either of the dates for the second round Turn Time calculation.
You can use row_number() . . . twice: select d.* from (select d.*, row_number() over (partition by id order by turn_time desc) as seqnum_turntime from (select d.*, row_number() over (partition by id order by beginning_end desc) as seqnum_round from data d ) d where seqnum_round <= 2 ) d where seqnum_turntime = 1; The innermost subquery gets the first two rounds. The outer subquery gets the maximum. You could express this without window functions as well: select top (1) with ties d.* from data d where d.beginning_date <= (select d2.beginning_date from data d2 where d2.id = d.id offset 1 fetch first 1 row only ) order by row_number() over (partition by id order by turntime desc);
SELECT ID ,turn_time ,beginning_date ,end_date FROM ( SELECT ID ,MAX(turn_time) OVER (PARTITION BY Id ORDER BY BeginningDate ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS turn_time --Maximum turn time of the current row and preceding row ,MIN(BeginningDate) OVER (PARTITION BY Id ORDER BY BeginningDate ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS beginning_date --Minimum begin date over current row and preceding row (could also use LAG) ,end_date ,ROW_NUMBER() OVER (PARTITION BY Id ORDER BY BeginningDate) AS Turn_Number FROM <whatever your table is> ) turn_summary WHERE Turn_Number = 2
Can I create a field that is conditionally assigned by date in SQL?
I'm dealing with some subscription data. When the user upgrades/downgrades, the system overwrites the level of the subscription with the new value. I am trying to assign the historical values when the user has upgraded. My data set looks like the following where one user can upgrade or downgrade multiple times. I am trying to get the what is outlined in the "desired value" column. Essentially, and transactions that happened before an upgrade should be assigned the "original_product" that is captured on the upgrade transaction, transactions that happen after this should be assigned the "new_product" value. I've been trying joining the data to itself, but I can't find a way to avoid getting multiple rows for each invoice.
You can use window functions: select t.*, coalesce(last_value(case when event = 'Upgrade' then new_product end ignore nulls) over (partition by sub_id order by created), first_value(original_product ignore nulls) over (partition by sub_id order by created) ) as desired_value from t; This gets the most recent new_product from an "Upgrade" row. If that doesn't exist, then it gets the overall original_product.
I think you want first_value(): select t.*, coalesce( first_value(new_product ignore nulls) over( order by created desc rows between unboundeed preceding and current row ), first_value(original_product ignore nulls) over( order by created rows between current row and unbounded following ) ) desired_value from mytable t The idea is to first try to get the first non-null new_product value on preceding rows (current row included). If there is no such row, then we lookup the first non-null original product in the following rows. In theory, you would also need a partition by clause that contains the column that represent the user. Your data has no sign of such column though, so I left it apart.
Below is for BigQuery Standard SQL #standardSQL SELECT *, IFNULL( FIRST_VALUE(original_product IGNORE NULLS) OVER(original_product_lookup), FIRST_VALUE(new_product IGNORE NULLS) OVER(new_product_lookup) ) AS desired_value FROM `project.dataset.table` WINDOW original_product_lookup AS (ORDER BY created ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING), new_product_lookup AS (ORDER BY created DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) You can test, play with above using simplified data from your question (using only used/relevant data-points) as in below example #standardSQL WITH `project.dataset.table` AS ( SELECT 1 created, NULL original_product, NULL new_product UNION ALL SELECT 2, NULL, NULL UNION ALL SELECT 3, 'Level 1', 'Level 2' UNION ALL SELECT 4, NULL, NULL UNION ALL SELECT 5, 'Level 2', 'Level 1' UNION ALL SELECT 6, NULL, NULL ) SELECT *, IFNULL( FIRST_VALUE(original_product IGNORE NULLS) OVER(original_product_lookup), FIRST_VALUE(new_product IGNORE NULLS) OVER(new_product_lookup) ) AS desired_value FROM `project.dataset.table` WINDOW original_product_lookup AS (ORDER BY created ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING), new_product_lookup AS (ORDER BY created DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) ORDER BY created with result Row created original_product new_product desired_value 1 1 null null Level 1 2 2 null null Level 1 3 3 Level 1 Level 2 Level 2 4 4 null null Level 2 5 5 Level 2 Level 1 Level 1 6 6 null null Level 1
Was able to solve with the combination of answers SELECT e.*, coalesce( last_value(case when (event ='Upgrade' OR event = "Downgrade" OR event = "Crossgrade") then new_product end ignore nulls) over (partition by subscription order by created), first_value(original_product ignore nulls) over( order by created rows between current row and unbounded following ) ) desired_value FROM e
how can I calculate totals within a query, depending on the date?
Below you can see my query, which gives the following result: select t.actual_date, t.id_key, t.attendance_status, t.money_step, sum(t.money_step) over (partition by t.id_key order by t.actual_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)as accumulated from example t order by t.id_key, t.actual_date I want the "accumulated" column to add up the value of "money_step" for each id_key. If the attendance_status is the second time '15' for an Id, the counter should add up from the beginning. For ID_KEY = 1 it should look like this: accumulated: Row 1:20 Row 2: 80 Row 3: 100 Row 4: 120 How can I do this in the query? Can someone help me?
I understand that you want to reset the sum after the first row where attendance_status has value 15. One option uses a conditional max() to define the subgroup: select actual_date, id_key, attendance_status, money_step, sum(t.money_step) over ( partition by id_key, grp order by actual_date ) as accumulated from ( select e.*, max(case when attendance_status = 15 then 1 else 0 end) over( partition by id_key order by actual_date rows between unbounded preceding and 1 preceding ) grp from example e ) e order by id_key, actual_date
Oracle LEAD - return next matching column value
I having below data in one table. And I want to get NEXT out data from OUT column. So used LEAD function in below query. SELECT ROW_NUMBER,TIMESTAMP,IN,OUT,LEAD(OUT) OVER (PARTITION BY NULL ORDER BY TIMESTAMP) AS NEXT_OUT FROM MYTABLE; It gives data as below NEXT_OUT column. But I need to know the matching next column value in sequential way like DESIRED columns. Please let me know how can i achieve this in Oracle LEAD FUNCTION THANKS
Assign row number to all INs and OUTs separately, sort the results by placing them in a single column and calculate LEADs: WITH cte AS ( SELECT t.* , CASE WHEN "IN" IS NOT NULL THEN COUNT("IN") OVER (ORDER BY "TIMESTAMP") END AS rn1 , CASE WHEN "OUT" IS NOT NULL THEN COUNT("OUT") OVER (ORDER BY "TIMESTAMP") END AS rn2 FROM t ) SELECT cte.* , LEAD("OUT") OVER (ORDER BY COALESCE(rn1, rn2), rn1 NULLS LAST) AS NEXT_OUT FROM cte ORDER BY COALESCE(rn1, rn2), rn1 NULLS LAST Demo on db<>fiddle
Enumerate in the "in"s and the "out"s and use that information for matching. select tin.*, tout.out as next_out from (select t.*, count(in) over (order by timestamp) as seqnum_in from t ) tin left join (select t.*, count(out) over (order by timestamp) as seqnum_out from t ) tout on tin.in is not null and tout.out is not null and tin.seqnum_in = tout.seqnum_out;
Vertica/SQL: Getting rows immediately proceeding events
Consider a simple query select from tbl where status=MELTDOWN I would like to now create a table that in addition to including these rows, also includes the previous p rows and the subsequent n rows, so that I can get a sense as to what happens in the surrounding time of these MELTDOWNs. Any hints?
You can do this with window functions by getting the seqnum of the meltdown rows. I prefer to do this with lag()/lead() ignore nulls, but Vertical doesn't support that. I think this is the equivalent with first_value()/last_value(): with t as ( select t.*, row_number() over (order by id) as seqnum from tbl ), tt as ( select t.*, last_value(case when status = 'meltdown' then seqnum end ignore nulls) over (order by seqnum rows between unbounded preceding and current row) as prev_meltdown_seqnum, first_value(case when status = 'meltdown' then seqnum end ignore nulls) over (order by seqnum rows between current row and unbounded following) as prev_meltdown_seqnum, from t ) select tt.* from tt where seqnum between prev_melt_seqnum and prev_melt_seqnum + 7 or seqnum between next_melt_seqnum -5 and next_melt_seqnum;
WITH grouped AS ( SELECT SUM( CASE WHEN status = 'Meltdown' THEN 1 ELSE 0 END ) OVER ( ORDER BY timeStamp ) AS GroupID, tbl.* FROM tbl ), sorted AS ( SELECT ROW_NUMBER() OVER (PARTITION BY GroupID ORDER BY timeStamp ASC ) AS incPos, ROW_NUMBER() OVER (PARTITION BY GroupID ORDER BY timeStamp DESC) AS decPos, MAX(GroupID) OVER () AS LastGroup grouped.* FROM grouped ) SELECT sorted.* FROM sorted WHERE (incPos <= 8 AND GroupID > 0 ) -- Meltdown and the 7 events following it OR (decPos <= 6 AND GroupID <> LastGroup) -- and the 6 events preceding a Meltdown ORDER BY timeStamp