Update with lag - sql

I would like to set the ACTIVE value of a table as follow:
If FLAG=E => ACTIVE=1 and for any subsequent FLAG values, until FLAG=H
If FLAG=H => ACTIVE=0 and for any subsequent FLAG values, until FLAG=E
and so on and so forth.
Example
ID | FLAG | ACTIVE
---+------+-------
1 | E | 1
2 | V | 1
3 | H | 0
4 | V | 0
5 | E | 1
6 | S | 1
7 | V | 1
8 | D | 1
9 | H | 0
The value are ordered by date.
For simplicity, I added an ID column to get the column order.
Question
What can be the SQL update statement ?
Note:
The business rule can also be expressed as follow:
If for a given row, the count of preceding E - the count of preceding H is 1, then ACTIVE is 1 for this row, 0 otherwise.

You can get the active value with the last_value() analytic function:
select id, flag,
last_value(case when flag = 'E' then 1 when flag = 'H' then 0 end) ignore nulls
over (order by id) as active
from your_table;
As a demo:
create table your_table (id, flag) as
select 1, 'E' from dual
union all select 2, 'V' from dual
union all select 3, 'H' from dual
union all select 4, 'V' from dual
union all select 5, 'E' from dual
union all select 6, 'S' from dual
union all select 7, 'V' from dual
union all select 8, 'D' from dual
union all select 9, 'H' from dual;
select id, flag,
last_value(case when flag = 'E' then 1 when flag = 'H' then 0 end) ignore nulls
over (order by id) as active
from your_table;
ID F ACTIVE
---------- - ----------
1 E 1
2 V 1
3 H 0
4 V 0
5 E 1
6 S 1
7 V 1
8 D 1
9 H 0
You can use the same thing for an update, though a merge is probably going to be simpler:
alter table your_table add active number;
merge into your_table
using (
select id,
last_value(case when flag = 'E' then 1 when flag = 'H' then 0 end) ignore nulls
over (order by id) as active
from your_table
) tmp
on (your_table.id = tmp.id)
when matched then update set active = tmp.active;
9 rows merged.
select * from your_table;
ID F ACTIVE
---------- - ----------
1 E 1
2 V 1
3 H 0
4 V 0
5 E 1
6 S 1
7 V 1
8 D 1
9 H 0
db<>fiddle demo.
You said your real data is actually ordered by a date, and I guess there are multiple flags for each of multiple IDs, so something like this is probably more realistic:
create table your_table (id, flag_time, flag) as
select 1, timestamp '2018-07-04 00:00:00', 'E' from dual
union all select 1, timestamp '2018-07-04 00:00:01', 'V' from dual
union all select 1, timestamp '2018-07-04 00:00:02', 'H' from dual
union all select 1, timestamp '2018-07-04 00:00:03', 'V' from dual
union all select 1, timestamp '2018-07-04 00:00:04', 'E' from dual
union all select 1, timestamp '2018-07-04 00:00:05', 'S' from dual
union all select 1, timestamp '2018-07-04 00:00:06', 'V' from dual
union all select 1, timestamp '2018-07-04 00:00:07', 'D' from dual
union all select 1, timestamp '2018-07-04 00:00:08', 'H' from dual;
alter table your_table add active number;
merge into your_table
using (
select id, flag_time,
last_value(case when flag = 'E' then 1 when flag = 'H' then 0 end) ignore nulls
over (partition by id order by flag_time) as active
from your_table
) tmp
on (your_table.id = tmp.id and your_table.flag_time = tmp.flag_time)
when matched then update set active = tmp.active;
select * from your_table;
ID FLAG_TIME F ACTIVE
---------- ----------------------- - ----------
1 2018-07-04 00:00:00.000 E 1
1 2018-07-04 00:00:01.000 V 1
1 2018-07-04 00:00:02.000 H 0
1 2018-07-04 00:00:03.000 V 0
1 2018-07-04 00:00:04.000 E 1
1 2018-07-04 00:00:05.000 S 1
1 2018-07-04 00:00:06.000 V 1
1 2018-07-04 00:00:07.000 D 1
1 2018-07-04 00:00:08.000 H 0
The main difference is the partition by id and changing the ordering to use flag_time - or whatever your real columns are called.
db<>fiddle demo.
There is potentially an issue if two flags can share a time; with a timestamp column that's hopefully very unlikely, but with a date the precision of the column may allow it. There isn't much you can do about that though, except maybe get into some logic to break ties by assuming flags should arrive in a certain order, and give them a weighting based on that. Rather off-topic though.

Related

Oracle how to return rows based on condition?

I am trying to return id and name based on flag column. If id has a rows with flag = 1 my query should only return these rows. If it hasn't flag=1 value it should return rows with flag = 0. What is the best way for it ? Here is sample data :
id name flag
5 aa 1
5 bb 0
6 cc 1
10 dd 0
11 ee 1
11 ee 0
Expected output is :
id name flag
5 aa 1
6 cc 1
10 dd 0
11 ee 1
Assuming flag column contains only 0 or 1, select rows whose flag is equal to maximal value of flags of given id:
select id, name, flag
from (
select id, name, flag, max(flag) over (partition by id) as m
from your_table
) x
where x.flag = x.m
You can use the keep dense_rank aggregating function to acheive that like below.
with t (id, name, flag) as (
select 5 , 'aa', 1 from dual union all
select 5 , 'bb', 0 from dual union all
select 6 , 'cc', 1 from dual union all
select 10, 'dd', 0 from dual union all
select 11, 'ee', 1 from dual union all
select 11, 'ee', 0 from dual
)
select id
, max(name)keep(dense_rank last order by id, flag) name
, max(flag)keep(dense_rank last order by id, flag) flag
from t
where flag in (0, 1)
group by id
order by id
;

Select rows when a value appears multiple times

I have a table like this one:
+------+------+
| ID | Cust |
+------+------+
| 1 | A |
| 1 | A |
| 1 | B |
| 1 | B |
| 2 | A |
| 2 | A |
| 2 | A |
| 2 | B |
| 3 | A |
| 3 | B |
| 3 | B |
+------+------+
I would like to get the IDs that have at least two times A and two times B. So in my example, the query should return only the ID 1,
Thanks!
In MySQL:
SELECT id
FROM test
GROUP BY id
HAVING GROUP_CONCAT(cust ORDER BY cust SEPARATOR '') LIKE '%aa%bb%'
In Oracle
WITH cte AS ( SELECT id, LISTAGG(cust, '') WITHIN GROUP (ORDER BY cust) custs
FROM test
GROUP BY id )
SELECT id
FROM cte
WHERE custs LIKE '%aa%bb%'
I would just use two levels of aggregation:
select id
from (select id, cust, count(*) as cnt
from t
where cust in ('A', 'B')
group by id, cust
) ic
group by id
having count(*) = 2 and -- both customers are in the result set
min(cnt) >= 2 -- and there are at least two instances
This is one option; lines #1 - 13 represent sample data. Query you might be interested in begins at line #14.
SQL> with test (id, cust) as
2 (select 1, 'a' from dual union all
3 select 1, 'a' from dual union all
4 select 1, 'b' from dual union all
5 select 1, 'b' from dual union all
6 select 2, 'a' from dual union all
7 select 2, 'a' from dual union all
8 select 2, 'a' from dual union all
9 select 2, 'b' from dual union all
10 select 3, 'a' from dual union all
11 select 3, 'b' from dual union all
12 select 3, 'b' from dual
13 )
14 select id
15 from (select
16 id,
17 sum(case when cust = 'a' then 1 else 0 end) suma,
18 sum(case when cust = 'b' then 1 else 0 end) sumb
19 from test
20 group by id
21 )
22 where suma = 2
23 and sumb = 2;
ID
----------
1
SQL>
You can use group by and having for the relevant Cust ('A' , 'B')
And query twice (I chose to use with to avoid multiple selects and to cache it)
with more_than_2 as
(
select Id, Cust, count(*) c
from tab
where Cust in ('A', 'B')
group by Id, Cust
having count(*) >= 2
)
select *
from tab
where exists ( select 1 from more_than_2 where more_than_2.Id = tab.Id and more_than_2.Cust = 'A')
and exists ( select 1 from more_than_2 where more_than_2.Id = tab.Id and more_than_2.Cust = 'B')
What you want is a perfect candidate for match_recognize. Here you go:
select id_ as id from t
match_recognize
(
order by id, cust
measures id as id_
pattern (A {2, } B {2, })
define A as cust = 'A',
B as cust = 'B'
)
Output:
Regards,
Ranagal

SQL how to change columns value in next 2 records

I have a table which contains two columns, one named "ID", and the other is "FLG". I need to add anther column "FLG1" that based on column "FLG".
The process logic is that if one record is "Y" in this table, then the next 2 records should be both "N" in "FLG1", if it is "N", then just keep it in "FLG1".
For example:
For record which ID=2 and FLG=Y,then FLG1 of this record should be "Y", but for the next 2 records(ID=3,4), FLG1 should be "N", though their FLG is "Y".
The expected result is as below:
I have tried many ways for days but failed and I don't want a stored procedure, I just want SQL query scripts for the implementation.
Here is the scripts for the data:
select 1 as ID, 'N' as FLG from dual union all
select 2 as ID, 'Y' as FLG from dual union all
select 3 as ID, 'Y' as FLG from dual union all
select 4 as ID, 'Y' as FLG from dual union all
select 5 as ID, 'Y' as FLG from dual union all
select 6 as ID, 'Y' as FLG from dual union all
select 7 as ID, 'Y' as FLG from dual union all
select 8 as ID, 'Y' as FLG from dual union all
select 9 as ID, 'N' as FLG from dual union all
select 10 as ID, 'Y' as FLG from dual union all
select 11 as ID, 'N' as FLG from dual union all
select 12 as ID, 'Y' as FLG from dual union all
select 13 as ID, 'N' as FLG from dual union all
select 14 as ID, 'N' as FLG from dual union all
select 15 as ID, 'N' as FLG from dual union all
select 16 as ID, 'Y' as FLG from dual union all
select 17 as ID, 'N' as FLG from dual union all
select 18 as ID, 'N' as FLG from dual union all
select 19 as ID, 'N' as FLG from dual union all
select 20 as ID, 'N' as FLG from dual union all
select 21 as ID, 'N' as FLG from dual union all
select 22 as ID, 'Y' as FLG from dual union all
select 23 as ID, 'Y' as FLG from dual
You need to iterate over the rows sequentially and calculate whether to show the row before you can calculate whether to show row following it; this means you need a recursive (or hierarchical) query.
You need to split the rows up into triplets of sequential rows such that each triplet starts with a FLG = 'Y' row and then all rows that are not in such a triplet or are in the second or third rows of the triplet will have a FLG1 value of N.
Like this:
WITH find_triplets ( id, flg, flg_count ) AS (
SELECT id,
flg,
DECODE( flg, 'Y', 1, 0 )
FROM table_name
WHERE id = 1
UNION ALL
SELECT t.id,
t.flg,
CASE f.flg_count
WHEN 0
THEN DECODE( t.flg, 'Y', 1, 0 )
ELSE MOD( f.flg_count + 1, 3 )
END
FROM find_triplets f
INNER JOIN table_name t
ON ( t.id = f.id + 1 )
)
SELECT id,
flg,
CASE
WHEN flg = 'Y' AND flg_count = 1
THEN 'Y'
ELSE 'N'
END as flg1
FROM find_triplets
ORDER BY id
Which, for your sample data, outputs:
ID | FLG | FLG1
-: | :-- | :---
1 | N | N
2 | Y | Y
3 | Y | N
4 | Y | N
5 | Y | Y
6 | Y | N
7 | Y | N
8 | Y | Y
9 | N | N
10 | Y | N
11 | N | N
12 | Y | Y
13 | N | N
14 | N | N
15 | N | N
16 | Y | Y
17 | N | N
18 | N | N
19 | N | N
20 | N | N
21 | N | N
22 | Y | Y
23 | Y | N
db<>fiddle here
Use `LAG` to see the previous two rows. Then use `CASE WHEN` to decide what to show.
select
id,
flg,
case
when lag(flg) over(order by id) = 'Y' or lag(flg, 2) over(order by id) = 'Y' then 'N'
else flg
end as flg1
from mytable
order by id;
Your description is wrong. You want to iterate through your rows for which you'd use a recursive query in SQL.
Number your rows first, because there can always be gaps in a table's IDs. Then use a recursive query to loop through the rows. I think this is the straight-forward to approach this.
with numbered as (select t.*, row_number() over (order by id) as rn from mytable t)
, cte (id, flg, flg1, prev_flg1, rn) as
(
select id, flg, flg, null, rn from numbered where rn = 1
union all
select
t.id,
t.flg,
case
when cte.flg1 = 'Y' or cte.prev_flg1 = 'Y' then 'N'
else t.flg
end,
cte.flg1,
t.rn
from cte
join numbered t on t.rn = cte.rn + 1
)
select id, flg, flg1
from cte
order by id;
In my misinterpretation of the question, this is a pretty simple gaps-and-islands problem. See the edit below for an improved answer. I would suggest using the difference of row numbers to define the islands. The definition of the flag is then just checking the row number on each group of 'Y' values:
select id, flg,
(case when flg = 'Y' and
mod(row_number() over (partition by flg, seqnum - seqnum_2 order by id), 3) = 1
then 'Y'
else 'N'
end) as flg1
from (select t.*,
row_number() over (order by id) as seqnum,
row_number() over (partition by flg order by id) as seqnum_2
from t
) t
order by id;
Here is a db<>fiddle.
If you want to update the flag, I would recommend using merge.
Note: I would also expect this to be faster (perhaps much faster) than a recursive CTE approach.
EDIT:
Alex makes a really good point. I think this requires a recursive CTE. If you have a large amount of data, it might be possible to optimize it by splitting the data into groups where you have multiple 'N's in a row. Your question doesn't mention data size.
I would approach this as:
with tt as (
select t.*, row_number() over (order by id) as seqnum
from t
),
cte (seqnum, id, flg, flg1, counter) as (
select seqnum, id, flg, flg,
(case when flg = 'Y' then 1 else 0 end)
from tt
where seqnum = 1
union all
select tt.seqnum, tt.id, tt.flg,
(case when cte.counter in (1, 2) then 'N'
when tt.flg = 'Y' then 'Y'
else 'N'
end),
(case when cte.counter in (1, 2) then cte.counter + 1
when tt.flg = 'Y' then 1
else 0
end)
from cte join
tt
on tt.seqnum = cte.seqnum + 1
)
select *
from cte;
Basically, this walks through the data and finds the first 'Y'. At that point, it sets a counter to 1. In the next two rows, the counter is incremented, regardless of the value of the flag. Then it goes back to looking for a 'Y' to repeat the process.
Amusingly, this seems like a pretty simple operation to implement using a Turing machine. Usually, it is not obvious how to implement such things.
Interestingly, if you put all the flags in a string, regular expressions solve the problem very simply:
select flgs,
substr(regexp_replace(flgs, 'Y(..|.$|$)', 'YNN'), 1, length(flgs)) as flg1s
from (select listagg(flg, '') within group (order by id) as flgs
from t
) t;

Get last value from a certain group (Oracle)

I have something like this
Date Group ID
11/01 'A' 1
12/01 'A' 2
13/01 'B' 3
14/01 'B' 4
What i basically want is to get for example the latest from group 'A'
Date Group ID LatestID_from_GROUP_A_ordered_by_recent_date
11/01 'A' 1 2
12/01 'A' 2 2
13/01 'B' 3 2
14/01 'B' 4 2
or at least something like this
Date Group ID LatestID_from_GROUP_A_ordered_by_recent_date
11/01 'A' 1 null
12/01 'A' 2 null
13/01 'B' 3 2
14/01 'B' 4 2
How about this:
with demo (somedate, somegroup, id) as
( select date '2018-01-11', 'A', 1 from dual union all
select date '2018-01-12', 'A', 2 from dual union all
select date '2018-01-13', 'B', 3 from dual union all
select date '2018-01-14', 'B', 4 from dual union all
select date '2018-01-15', 'A', 5 from dual -- example from comments
)
select somedate, somegroup, id
, ( select max(id) keep (dense_rank last order by somedate)
from demo
where somegroup = 'A' ) as last_a
from demo;
SOMEDATE SOMEGROUP ID LAST_A
----------- --------- ---------- ----------
11/01/2018 A 1 5
12/01/2018 A 2 5
13/01/2018 B 3 5
14/01/2018 B 4 5
15/01/2018 A 5 5
Note the max(id) is only a tiebreaker in the event of multiple rows with the last date.
Gordon was almost there.
You want to create a window over your whole query, but only pick the biggest value of 'A':
select
t.*,
max(case when group = 'A' then id end) over (partition by 1) as latest_from_a
from t
'partition by 1' will create a window of your complete result set because it only groups by a single static value: 1.
The logic seems to be:
select t.*,
max(case when group = 'A' then id end) over (order by date) as latest_from_a
from t;
The above gets the cumulative maximum up to each date. If you want the overall maximum:
select t.*,
max(case when group = 'A' then id end) over () as latest_from_a
from t;

counting most recent consecutive rows with like data using tabibitosan

My project is using an Oracle SQL database. I have a historical table that appends task status on a weekly basis, and am attempting to query the number of weeks a task that is currently off track has been off track. Here's an example excerpt from my source historical table:
ID WEEK ON_TRACK
1 1 N
1 2 Y
1 3 N
1 4 N
1 5 N
2 1 N
2 2 N
2 3 Y
2 4 Y
2 5 N
3 1 N
3 2 N
3 3 Y
3 4 Y
3 5 Y
I'm looking to return the count of consecutive "N" values in ON_TRACK starting backwards from the latest append. For the above example data, I'd like the query to return:
ID WKS_OFF_TRACK
1 3
2 1
3 0
I've done some research, and it looks like the Tabibitosan method is the most logical approach, and I've found ample examples to give the max consecutive values that match 1 criteria, but I'm having trouble tweaking to return the most recent consecutive values that match 2 criteria (ID and ON_TRACK).
Here's what I have so far
--this step creates a temp table with unique IDs for each weekly append to the historical table, and a 1 (if ON_TRACK = N) or 0 (if ON_TRACK = Y). This results in the expected info.
WITH HIST_TBL AS (
SELECT DISTINCT(ID),
CASE ON_TRACK
WHEN 'N' THEN 1
ELSE 0
END AS OFF_TRACK,
WEEK
FROM SOURCE_HISTORICAL_TBL
ORDER BY ID,WEEK DESC)
-- end of temp table
--this is where Im struggling I want one line per project number, and the sum of the latest string of 1s (weeks the task has been off track), until a 0 is reached.
SELECT ID,
SUM(OFF_TRACK) AS WKS_OFF_TRACK
FROM (SELECT WEEK,
ID,
OFF_TRACK,
ROW_NUMBER() OVER (ORDER BY WEEK DESC) - ROW_NUMBER() OVER
(PARTITION BY ID,OFF_TRACK ORDER BY WEEK DESC) GRP
FROM HIST_TBL)
GROUP BY ID, GRP
ORDER BY ID;
This code results in the a cumulative sum of all weeks each project has been off track, which for my example data would be:
ID WKS_OFF_TRACK
1 4
2 3
3 2
Any ideas where I'm going wrong?
Here is one method that assumes people were "on track" at some point in time:
select sht.id, count(*)
from SOURCE_HISTORICAL_TBL sht
where sht.week > (select max(sht2.week)
from SOURCE_HISTORICAL_TBL sht2
where sht2.id = sht.id and sht2.on_track = 'Y'
)
group by sht.id;
Otherwise, you need one more condition:
select sht.id, count(*)
from SOURCE_HISTORICAL_TBL sht
where sht.week > (select max(sht2.week)
from SOURCE_HISTORICAL_TBL sht2
where sht2.id = sht.id and sht2.on_track = 'Y'
) or
not exists (select 1
from SOURCE_HISTORICAL_TBL sht2
where sht2.id = sht.id and sht2.on_track = 'Y'
)
group by sht.id;
You can also phrase these as analytic functions:
select id,
sum(case when week > max_week_y or max_week_y is null then 1 else 0 end) as max_off_track
from (select sht.*,
max(case when on_track = 'Y' then week end) over (partition by id) as max_week_y
from SOURCE_HISTORICAL_TBL sht
) sht
group by id;
Note that this version will return 0s for people currently on track.
You can do it in a single table scan:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE SOURCE_HISTORICAL_TBL ( ID, WEEK, ON_TRACK ) AS
SELECT 1, 1, 'N' FROM DUAL UNION ALL
SELECT 1, 2, 'Y' FROM DUAL UNION ALL
SELECT 1, 3, 'N' FROM DUAL UNION ALL
SELECT 1, 4, 'N' FROM DUAL UNION ALL
SELECT 1, 5, 'N' FROM DUAL UNION ALL
SELECT 2, 1, 'N' FROM DUAL UNION ALL
SELECT 2, 2, 'N' FROM DUAL UNION ALL
SELECT 2, 3, 'Y' FROM DUAL UNION ALL
SELECT 2, 4, 'Y' FROM DUAL UNION ALL
SELECT 2, 5, 'N' FROM DUAL UNION ALL
SELECT 3, 1, 'N' FROM DUAL UNION ALL
SELECT 3, 2, 'N' FROM DUAL UNION ALL
SELECT 3, 3, 'Y' FROM DUAL UNION ALL
SELECT 3, 4, 'Y' FROM DUAL UNION ALL
SELECT 3, 5, 'Y' FROM DUAL UNION ALL
SELECT 4, 1, 'N' FROM DUAL UNION ALL
SELECT 5, 1, 'Y' FROM DUAL;
Query 1:
SELECT ID,
GREATEST(
COALESCE( MAX( CASE ON_TRACK WHEN 'N' THEN WEEK END ), 0 )
- COALESCE( MAX( CASE ON_TRACK WHEN 'Y' THEN WEEK END ), 0 ),
0
) AS weeks
FROM SOURCE_HISTORICAL_TBL
GROUP BY id
ORDER BY id
Results:
| ID | WEEKS |
|----|-------|
| 1 | 3 |
| 2 | 1 |
| 3 | 0 |
| 4 | 1 |
| 5 | 0 |