Oracle SQL query : finding the last time a data was changed - sql

I want to retrieve elapsed days since the last time the data of the specific column was changed, for example :
TABLE_X contains
ID PDATE DATA1 DATA2
A 10-Jan-2013 5 10
A 9-Jan-2013 5 10
A 8-Jan-2013 5 11
A 7-Jan-2013 5 11
A 6-Jan-2013 14 12
A 5-Jan-2013 14 12
B 10-Jan-2013 3 15
B 9-Jan-2013 3 15
B 8-Jan-2013 9 15
B 7-Jan-2013 9 15
B 6-Jan-2013 14 15
B 5-Jan-2013 14 8
I simplify the table for example purpose.
The result should be :
ID DATA1_LASTUPDATE DATA2_LASTUPDATE
A 4 2
B 2 5
which says,
- data1 of A last update is 4 days ago,
- data2 of A last update is 2 days ago,
- data1 of B last update is 2 days ago,
- data2 of B last update is 5 days ago.
Using query below is OK but it takes too long to complete if I apply it to the real table which have lots of records and add 2 more data columns to find their latest update days.
I use LEAD function for this purposes.
Any other alternatives to speed up the query?
with qdata1 as
(
select ID, pdate from
(
select a.*, row_number() over (partition by ID order by pdate desc) rnum from
(
select a.*,
lead(data1,1,0) over (partition by ID order by pdate desc) - data1 as data1_diff
from table_x a
) a
where data1_diff <> 0
)
where rnum=1
),
qdata2 as
(
select ID, pdate from
(
select a.*, row_number() over (partition by ID order by pdate desc) rnum from
(
select a.*,
lead(data2,1,0) over (partition by ID order by pdate desc) - data2 as data2_diff
from table_x a
) a
where data2_diff <> 0
)
where rnum=1
)
select a.ID,
trunc(sysdate) - b.pdate data1_lastupdate,
trunc(sysdate) - c.pdate data2_lastupdate,
from table_master a, qdata1 b, qdata2 c
where a.ID=b.ID(+) and a.ID=b.ID(+)
and a.ID=c.ID(+) and a.ID=c.ID(+)
Thanks a lot.

You can avoid the multiple hits on the table and the joins by doing both lag (or lead) calculations together:
with t as (
select id, pdate, data1, data2,
lag(data1) over (partition by id order by pdate) as lag_data1,
lag(data2) over (partition by id order by pdate) as lag_data2
from table_x
),
u as (
select t.*,
case when lag_data1 is null or lag_data1 != data1 then pdate end as pdate1,
case when lag_data2 is null or lag_data2 != data2 then pdate end as pdate2
from t
),
v as (
select u.*,
rank() over (partition by id order by pdate1 desc nulls last) as rn1,
rank() over (partition by id order by pdate2 desc nulls last) as rn2
from u
)
select v.id,
max(trunc(sysdate) - (case when rn1 = 1 then pdate1 end))
as data1_last_update,
max(trunc(sysdate) - (case when rn2 = 1 then pdate2 end))
as data2_last_update
from v
group by v.id
order by v.id;
I'm assuming that you meant your data to be for Jun-2014, not Jan-2013; and that you're comparing the most recent change dates with the current date. With the data adjusted to use 10-Jun-2014 etc., this gives:
ID DATA1_LAST_UPDATE DATA2_LAST_UPDATE
-- ----------------- -----------------
A 4 2
B 2 5
The first CTE (t) gets the actual table data and adds two extra columns, one for each of the data columns, using lag (whic his the the same as lead ordered by descending dates).
The second CTE (u) adds two date columns that are only set when the data columns are changed (or when they are first set, just in case they have never changed). So if a row has data1 the same as the previous row, its pdate1 will be blank. You could combine the first two by repeating the lag calculation but I've left it split out to make it a bit clearer.
The third CTE (v) assigns a ranking to those pdate columns such that the most recent is ranked first.
And the final query works out the difference from the current date to the highest-ranked (i.e. most recent) change for each of the data columns.
SQL Fiddle, including all the CTEs run individually so you can see what they are doing.

Your query wasn't returning the right results for me, maybe I missed something, but I got the correct results also with the below query (you can check this SQLFiddle demo):
with ranked as (
select ID,
data1,
data2,
rank() over(partition by id order by pdate desc) r
from table_x
)
select id,
sum(DATA1_LASTUPDATE) DATA1_LASTUPDATE,
sum(DATA2_LASTUPDATE) DATA2_LASTUPDATE
from (
-- here I get when data1 was updated
select id,
count(1) DATA1_LASTUPDATE,
0 DATA2_LASTUPDATE
from ranked
start with r = 1
CONNECT BY (PRIOR data1 = data1)
and PRIOR r = r - 1
group by id
union
-- here I get when data2 was updated
select id,
0 DATA1_LASTUPDATE,
count(1) DATA0_LASTUPDATE
from ranked
start with r = 1
CONNECT BY (PRIOR data2 = data2)
and PRIOR r = r - 1
group by id
)
group by id

Related

Is there a way to collapse ordered rows by terminal values with postgres window clause

I have a table foo:
some_fk
some_field
some_date_field
1
A
1990-01-01
1
B
1990-01-02
1
C
1990-03-01
1
X
1990-04-01
2
B
1990-01-01
2
B
1990-01-05
2
Z
1991-04-11
2
C
1992-01-01
2
B
1992-02-01
2
Y
1992-03-01
3
C
1990-01-01
some_field has 6 possible values: [A,B,C,X,Y,Z]
Where [A,B,C] signify opening or continuation events and [X,Y,Z] signify closing events. How do I get each span of time between the first opening event and closing event of each span, partitioned by some_fk, as shown in the table below:
some_fk
some_date_field_start
some_date_field_end
1
1990-01-01
1990-04-01
2
1990-01-01
1991-04-11
2
1992-01-01
1992-03-01
3
1990-01-01
NULL
*Note that a non-terminated time span ends with NULL
I do have a solution that involves 3 common table expressions, but I'm wondering if there is a (better/more elegant/canonical) way to do this in PostgreSQL without nested queries.
My approach was something like:
WITH ranked AS (
SELECT
RANK() OVER (PARTITION BY some_fk ORDER BY some_date_field) AS "rank"
some_fk,
some_field,
some_date_field
FROM foo
), openers AS (
SELECT * FROM ranked WHERE some_field IN ('A','B','C')
), closers AS (
SELECT
*,
LAG("rank") OVER (PARTITION BY some_fk ORDER BY "rank") AS rank_lag
FROM ranked WHERE some_field IN ('X','Y','Z')
)
SELECT DISTINCT
openers.some_fk,
FIRST_VALUE(openers.some_date_field) OVER (PARTITION BY some_fk ORDER BY "rank")
AS some_date_field_start,
closers.some_date_field AS some_date_field_end
FROM openers
JOIN closers
ON openers.some_fk = closers.some_fk
WHERE openers.some_date_field BETWEEN COALESCE(closers.rank_lag, 0) AND closers.rank
... but I feel there must be a better way.
Thanks in advance for the help.
Another approach is to create a grouping ID by creating a running sum of the closing events. Then in an outer SQL you can Group By and pick min() and max() dates.
Select some_fk,min(some_date) as some_date_field_start, max(some_date) as some_date_field _end
From (
Select some_fk,some_date,
Sum(Case When some_field in ('X','Y','Z') Then 1 Else 0 End)
Over (Partition By some_fk Order By some_date
Rows Between Unbounded Preceding And 1 Preceding)
as some_grouping
From foo
)
Group By some_fk,some_grouping
Order By some_fk,some_grouping
This seems a little simpler at least to me.
The basis of the query is to use LAG to determine if the previous record was a closure.
SELECT *,
LAG(some_field) OVER (PARTITION BY some_fk ORDER BY some_date_field) Previous_some_field
FROM foo
This allows you to filter on the correct 4 records from your expected results, with the first 2 columns included; your mistake was to put the WHERE clause onto that query directly, when what you want to do is use it as is in a sub-query and write the WHERE in the main query.From that point, you have several possibilities to finish the query.
Here is a version using a scalar subquery:
SELECT some_fk, some_date_field AS some_date_field_start,
(
SELECT MIN(some_date_field)
FROM foo
WHERE some_fk = F.some_fk AND some_date_field > F.some_date_field AND some_field IN ('X','Y','Z')
) AS some_date_field_end
FROM (
SELECT *,
LAG(some_field) OVER (PARTITION BY some_fk ORDER BY some_date_field) Previous_some_field
FROM foo
) F
WHERE some_field IN ('A','B','C')
AND COALESCE(previous_some_field,'Z') IN ('X','Y','Z')
Here is another version using a CROSS JOIN LATERAL:
SELECT some_fk, some_date_field AS some_date_field_start, some_date_field_end
FROM (
SELECT *,
LAG(some_field) OVER (PARTITION BY some_fk ORDER BY some_date_field) Previous_some_field
FROM foo
) F1
CROSS JOIN LATERAL (
SELECT MIN(some_date_field) AS some_date_field_end
FROM foo
WHERE some_fk = F1.some_fk AND some_date_field > F1.some_date_field AND some_field IN ('X','Y','Z')
) F2
WHERE some_field IN ('A','B','C')
AND COALESCE(previous_some_field,'Z') IN ('X','Y','Z')

How can I obtain the minimum date for a value that is equal to the maximum date?

I am trying to obtain the minimum start date for a query, in which the value is equal to its maximum date. So far, I'm able to obtain the value in it's maximum date, but I can't seem to obtain the minimum date where that value remains the same.
Here is what I got so far and the query result:
select a.id, a.end_date, a.value
from database1 as a
inner join (
select id, max(end_date) as end_date
from database1
group by id
) as b on a.id = b.id and a.end_date = b.end_date
where value is not null
order by id, end_date
This result obtains the most recent record, but I'm looking to obtain the most minimum end date record where the value remains the same as the most recent.
In the following sample table, this is the record I'd like to obtain the record from the row where id = 3, as it has the minimum end date in which the value remains the same:
id
end_date
value
1
02/12/22
5
2
02/13/22
5
3
02/14/22
4
4
02/15/22
4
Another option that just approaches the problem somewhat as described for the sample data as shown - Get the value of the maximum date and then the minimum id row that has that value:
select top(1) t.*
from (
select top(1) Max(end_date)d, [value]
from t
group by [value]
order by d desc
)d
join t on t.[value] = d.[value]
order by t.id;
DB<>Fiddle
I'm most likely overthinking this as a Gaps & Island problem, but you can do:
select min(end_date) as first_date
from (
select *, sum(inc) over (order by end_date desc) as grp
from (
select *,
case when value <> lag(value) over (order by end_date desc) then 1 else 0 end as inc
from t
) x
) y
where grp = 0
Result:
first_date
----------
2022-02-14
See running example at SQL Fiddle.
with data as (
select *,
row_number() over (partition by value) as rn,
last_value(value) over (order by end_date) as lv
from T
)
select * from data
where value = lv and rn = 1
This isn't looking strictly for streaks of consecutive days. Any date that happened to have the same value as on final date would be in contention.

Remove all non contiguous records with identical fields

I got a table with some columns like
ID RecordID DateInserted
1 10 now + 1
2 10 now + 2
3 4 now + 3
4 10 now + 4
5 10 now + 5
I would like to remove all non contiguous duplicates of the RecordID Column when they are sorted by DateInserted
In my example I would like to remove record 4 and 5 because between 2 and 4 there is a record with different id.
Is there a way to do it with 1 query ?
You can use window functions. One method is to count the changes in value that occur up to each row and just take the rows with one change:
select t.*
from (select t.*,
sum(case when prev_recordid = recordid then 0 else 1 end) over (order by dateinserted) as grp_num
from (select t.*,
lag(recordid) over (order by dateinserted) as prev_recordid
from t
) t
) t
where grp_num = 1;
One way would be to "flag" all the rows where it is not the first time this RecordID appeared and the prior row contained a different RecordID. Then you just exclude any row beyond that point for that RecordID.
;WITH cte AS
(
SELECT ID, RecordID, DateInserted,
dr = DENSE_RANK() OVER (PARTITION BY RecordID ORDER BY DateInserted),
prior = COALESCE(LAG(RecordID,1) OVER (ORDER BY DateInserted), RecordID)
FROM dbo.table_name
),
FlaggedRows AS
(
SELECT RecordID, dr
FROM cte
WHERE dr > 1 AND prior <> RecordID
)
SELECT cte.ID, cte.RecordID, cte.DateInserted
FROM cte
LEFT OUTER JOIN FlaggedRows AS f
ON cte.RecordID = f.RecordID
WHERE cte.dr < COALESCE(f.dr, cte.dr + 1)
ORDER BY cte.DateInserted;
If you want to actually delete the rows from the source (remove will typically be inferred as removing from the result), then change the SELECT at the end to:
DELETE cte
FROM cte
INNER JOIN FlaggedRows f
ON cte.RecordID = f.RecordID
WHERE cte.dr >= f.dr;

SQL: order by two columns and get the firsts rows with equal values in 2-nd column

I have a table sorted by 1, 2 columns. And I need to get the first row from the top and all succeeding rows while their values of 2-nd column is the same as value of the first row.
F.e I have data sample:
select * from sample
order by ID desc, date desc
ID Date
--- ----
45 NULL
44 NULL
40 01/01/10
35 NULL
32 04/05/08
I need to get the first two rows (with id in (45, 44)), because 2-nd row have Date = NULL.
If I'd had data sample:
ID Date
--- ----
45 NULL
44 NULL
40 NULL
35 NULL
32 04/05/08
I will need to get the first 4 rows (with id in (45, 44, 40, 35)).
I can't make query to resolve my issue. I considered about using row_number() and rank(), but I can't adapt they for me purpose.
Thanks a lot for any help!
Based on your description, you can do something like this:
with t as (<your query here>)
select t
from t cross join
(select t.*
from t
order by id desc
limit 1
) tt
order by (case when t.date = tt.date or t.date is null and t2.date is null then 1 else 2 end),
t.id desc;
Well, I concocted something like this, but it doesnt look elegantly.
select *
from (
select *,
sum(rank_date) over (partition by rank_date order by ID desc) as sm
from (
select *
,rank() over(order by DATE desc nulls first) rank_date
,row_number() over(order by ID desc) rank_id
from sample
) ss
) s
where sm = row_number

Join two queries from the same table - SELECT DISTINCT?

I have two tables linked by an AUTO_KEY field, from one table I'm retrieving the number (id), from the other I get several statuses by number(id), each status has a date associated to it.
I need to restrict the results only to the maximum/latest date for all numbers(ids) and the corresponding status
SELECT
OPERATION.NUMBER,
STATUS.STATUS,
Max(STATUS.DATE)
FROM
STATUS,
OPERATION
WHERE
OPERATION.AUTO_KEY = STATUS.AUTO_KEY
From here
Number Status Date
-----------------------------
1 A 10/20/13
1 B 10/15/13
2 A 10/10/13
2 AX 10/05/13
2 AD 10/03/13
3 DD 10/03/13
The outcome should be
Number Status Date
-----------------------------
1 A 10/20/13
2 A 10/10/13
3 DD 10/03/13
Thanks in advance
You can use a CTE with ROW_NUMBER() function. Also Please use a Table JOIN instead FROM STATUS, OPERATION
;With CTE AS (
SELECT O.NUMBER, S.STATUS, S.DATE,
ROW_NUMBER() OVER (ORDER BY S.DATE DESC) RN
FROM STATUS S JOIN OPERATION O
ON O.AUTO_KEY = S.AUTO_KEY
)
SELECT NUMBER, STATUS, DATE
FROM CTE
WHERE RN = 1
ORDER BY NUMBER
SELECT OPERATION.CNUMBER,
STATUS.STATUS,
STATUS.CDATE
FROM STATUS,
OPERATION
WHERE OPERATION.AUTO_KEY = STATUS.AUTO_KEY
AND STATUS.CDATE = (
SELECT MAX(STATUS.CDATE) MAX_DATE
FROM STATUS,
OPERATION
WHERE OPERATION.AUTO_KEY = STATUS.AUTO_KEY
GROUP BY OPERATION.CNUMBER )