Window Function - date ranges - sql

I'm trying to calculate duration between different status. Which is working for most part.
I have this table
Table
for id = 102, I was able to calculate duration of each status.
with ab as (
select id,
status,
max(updated_time) as end_time,
min(updated_time) as updated_time
from Table
group by id, status
)
select *,
lead(updated_time) over (partition by id order by updated_time) - updated_time as duration,
extract(epoch from duration) as duration_seconds
from ab
Output for id = 102
but for id = 101, status moved between 'IN_PROGRESS' to 'BLOCKED' & back to 'IN_PROGRESS'
here I need the below result so that I can get the correct IN_PROGRESS duration
Expected

One way to do this would be to track every time there is a change of STATUS for a given ID sorted by VERSION. The below query provides the desired output. More than brevity, I thought having multiple steps showing the transformations would be helpful. The column UNIX timestamp can be easily converted to human readable DateTimestamp format based on the specific database being used. The sample table definition and file used has also been shared below.
Query
WITH VW_STATUS_CHANGE AS
(
SELECT ID, STATUS, LAG(STATUS) OVER (PARTITION BY ID ORDER BY VERSION) LAG_STATUS, VERSION, UNIXTIME,
CASE WHEN LAG (STATUS) OVER (PARTITION BY ID ORDER BY VERSION) <> STATUS THEN 1 ELSE 0 END STATUS_CHANGE
FROM STACKOVERFLOWSQL
),
VW_CREATE_SYNTHETIC_PARTITION AS
(
SELECT ID, STATUS, LAG_STATUS, VERSION, UNIXTIME,STATUS_CHANGE,
SUM(STATUS_CHANGE) OVER (ORDER BY ID, VERSION) AS ROWNUMBER
FROM VW_STATUS_CHANGE
) ,
VW_RESULTS_INTERMEDIATE AS
(
SELECT ID, STATUS, LAG_STATUS, VERSION, UNIXTIME, STATUS_CHANGE,
"FIRST_VALUE"(UNIXTIME) OVER (
PARTITION BY "ID",
"STATUS", ROWNUMBER
ORDER BY
"VERSION"
) "TIME_FIRST_VALUE",
"FIRST_VALUE"(UNIXTIME) OVER (
PARTITION BY "ID",
"STATUS", ROWNUMBER
ORDER BY
"VERSION" DESC
) "TIME_LAST_VALUE"
FROM VW_CREATE_SYNTHETIC_PARTITION
ORDER BY ID, VERSION
)
SELECT DISTINCT ID, STATUS, TIME_FIRST_VALUE, TIME_LAST_VALUE
FROM VW_RESULTS_INTERMEDIATE
ORDER BY TIME_FIRST_VALUE
AWS Athena Table Used along with Sample data.
CREATE EXTERNAL TABLE STACKOVERFLOWSQL (
ID INTEGER,
STATUS STRING,
VERSION INTEGER,
UNIXTIME INTEGER
)
ROW FORMAT SERDE 'ORG.APACHE.HADOOP.HIVE.SERDE2.OPENCSVSERDE'
WITH SERDEPROPERTIES (
'SEPARATORCHAR' = ',',
"SKIP.HEADER.LINE.COUNT"="1"
)
STORED AS TEXTFILE
LOCATION 'S3://<S3BUCKETNAME>/';
Dataset Used:
ID,STATUS,VERSION,UNIXTIME
101,NOT_ASSIGNED,1,1668124141
101,IN_PROGRESS,2,1668124143
101,IN_PROGRESS,3,1668124146
101,IN_PROGRESS,4,1668124150
101,IN_PROGRESS,5,1668124155
101,BLOCKED,6,1668124161
101,BLOCKED,7,1668124168
101,IN_PROGRESS,8,1668124176
101,IN_PROGRESS,9,1668124185
101,IN_PROGRESS,10,1668124195
101,COMPLETED,11,1668124206
105,NOT_ASSIGNED,1,1668124207
105,IN_PROGRESS,2,1668124209
105,IN_PROGRESS,3,1668124212
105,IN_PROGRESS,4,1668124216
105,IN_PROGRESS,5,1668124221
105,IN_PROGRESS,6,1668124227
105,COMPLETED,7,1668124234
Result from the View
ID STATUS TIME_FIRST_VALUE TIME_LAST_VALUE
101 NOT_ASSIGNED 1668124141 1668124141
101 IN_PROGRESS 1668124143 1668124155
101 BLOCKED 1668124161 1668124168
101 IN_PROGRESS 1668124176 1668124195
101 COMPLETED 1668124206 1668124206
105 NOT_ASSIGNED 1668124207 1668124207
105 IN_PROGRESS 1668124209 1668124227
105 COMPLETED 1668124234 1668124234

Related

How to select the best item in each group?

I have table reports:
id
file_name
1
jan.xml
2
jan.csv
3
feb.csv
In human language: there are reports for each month. Each report could be in XML or CSV format. There could be 1-2 reports for each month in unique format.
I want to select the reports for all months, picking only 1 file for each month. The XML format is more preferable.
So, expected output is:
id
file_name
1
jan.xml
3
feb.csv
Explanation: the file jan.csv was excluded since there is more preferable report for that month: jan.xml.
As mentioned in the comments your data structure has a number of challenges. It really needs a column for ReportDate or something along those lines that is a date/datetime so you know which month the report belongs to. That would also give you something to sort by when you get your data back. Aside from those much needed improvements you can get the desired results from your sample data with something like this.
create table SomeFileTable
(
id int
, file_name varchar(10)
)
insert SomeFileTable
select 1, 'jan.xml' union all
select 2, 'jan.csv' union all
select 3, 'feb.csv'
select s.id
, s.file_name
from
(
select *
, FileName = parsename(file_name, 2)
, FileExtension = parsename(file_name, 1)
, RowNum = ROW_NUMBER() over(partition by parsename(file_name, 2) order by case parsename(file_name, 1) when 'xml' then 1 else 2 end)
from SomeFileTable
) s
where s.RowNum = 1
--ideally you would want to order the results but you don't have much of anything to work with in your data as a reliable sorting order since the dates are implied by the file name
You may want to use a window function that ranks your rows by partitioning on the month and ordering by the format name, by working on the file_name field.
WITH ranked_reports AS (
SELECT
id,
file_name,
ROW_NUMBER() OVER(
PARTITION BY LEFT(file_name, 3)
ORDER BY RIGHT(file_name, 3) DESC
) AS rank
FROM
reports
)
SELECT
id,
file_name
FROM
ranked_reports
WHERE
rank = 1

SQL: How to select the oldest date row

I have a report looks something like this :
numberOrder
timestamp
id
status
12
2021-06-23-14.00.00.232425
11
done
13
2021-06-30-18.00.00.224525
22
done
14
2021-07-01-01.00.00.224525
23
done
This is done with SQL :
SELECT numberOrder, timestamp, id, status
from order
where status = 'done'
I would like the report to show just the oldest row :
numberOrder
timestamp
id
status
12
2021-06-23-14.00.00.232425
11
done
SELECT numberOrder, timestamp, id, status
from order
WHERE timestamp = (select TOP 1 timestamp FROM order by timestamp)
and status = 'done'`
Any ideas ? I try to use min(). Any help is appreciated
And have any ideas if I dont find any status done then find the status cancel ?
I would like the report to show just the cancel row if we dont find any done status :
numberOrder
timestamp
id
status
12
2021-06-23-14.00.00.232425
11
cancel
Admittedly unfamiliar with DB2 but I would suggest the following to order the rows by timestamp and fetch the first (oldest) row.
select numberOrder, timestamp, id, status
from order
where status = 'done'
order by timestamp
fetch first 1 rows only
Try the following query.
You may run the statement as is if you remove group comments /* */.
The statement without this group comments work on rows with cancel statuses only in the base "table" MYTAB. You may comment out the row with 'cancel' and uncomment the row with 'done' inside VALUES to get all rows with statuses done only in the base "table". Or construct your base "table" with whatever contents of your original table.
You don't have to edit your query to get both results.
The idea is to enumerate rows row inside both status types. The first rows (ordered by timestamp) in all statues ordered to have the done row before the cancel one. Finally we get the first row only.
WITH
/*
MYTAB0 (numberOrder, timestamp, id) AS
(
VALUES
(12, '2021-06-23-14.00.00.232425', 11)
, (13, '2021-06-30-18.00.00.224525', 22)
, (14, '2021-07-01-01.00.00.224525', 23)
)
, MYTAB AS
(
SELECT *
FROM MYTAB0,
(
VALUES
'cancel'
--'done'
) S (STATUS)
)
,
*/
T AS
(
SELECT T.*, ROW_NUMBER () OVER (PARTITION BY STATUS ORDER BY TIMESTAMP) AS RN_
FROM MYTAB T
WHERE STATUS IN ('done', 'cancel')
)
SELECT numberOrder, timestamp, id, status
FROM T
WHERE RN_ = 1
ORDER BY DECODE (status, 'done', 0, 1)
FETCH FIRST 1 ROW ONLY

ORACLE: How to get earliest record of certain value when value alternates?

I'll simplify what I'm looking for here.
I have a table that stores an asset name, the date (job runs daily), and a value that is either 1 or 0 that indicates whether the asset is out of compliance.
I need to get the earliest date where the value is 0.
The issue I run into is that the issue can be intermittent, such that the same asset may show as in compliance, then out, and then in again. I want to retrieve the earliest date it was out of compliance this time.
Asset Date Compliant
NAME 2-FEB-18 0
NAME 1-FEB-18 0
NAME 31-JAN-18 1
NAME 30-JAN-18 0
In this example, I want to retrieve 1-FEB-18, and not 30-JAN-18.
I'm using a subquery into a temp table that retrieves the MIN(date) which would return 30-JAN-18. Thoughts?
Anonymized current subquery:
least_recent_created AS
(
SELECT t.date,t.ASSET, t.DATABASE_NAME FROM table t
WHERE t.date =
(
SELECT MIN(date)
FROM table2 t2
WHERE t.ASSET_ID = t2.ASSET_ID
AND t.DATABASE_NAME = t2.DATABASE_NAME
AND t2.compliant = 0
)
)
You want the earliest out-of-compliance date since the last in compliance. If the asset was never in compliance, I assume you want the earliest date.
select t.asset, min(date)
from (select t.*,
max(case when t.complaint = 1 then date end) over (partition by asset) as max_compliant1_date
from t
) t
where complaint = 0 and
(date > max_complaint1_date or max_complaint1_date is null)
group by t.asset;
You can use the following query:
SELECT "Asset", MAX("Date")
FROM (
SELECT "Asset", "Date", "Compliant",
CASE
WHEN "Compliant" = 0 AND
LAG("Compliant") OVER (PARTITION BY "Asset"
ORDER BY "Date") = 1 THEN "Date"
END AS OutOfComplianceDate
FROM mytable) t
WHERE OutOfComplianceDate IS NOT NULL
GROUP BY "Asset"
The inner query identifies 'Out-of-Compliance' dates, that is dates where the current record has "Compliant" = 0 whereas the immediately preceding record has "Compliant" = 1.
The outer query returns the latest 'Out-of-Compliance' date per "Asset".
Demo here

Oracle SQL query to fetch data from log table

The below provided data is tiny snapshot of a huge log table.
Please help with me a query to identify records having the TRAN_ID's 451140014 and 440102253.
The status of the record is getting updated to 'Definite' from 'Actual'.
As per the business rules of our application it is NOT suppose to happen, I need to fetch the list of all records in this huge table where the statuses are getting updated.
ROW_ID TRAN_ID TRAN_DATE CHG_TYPE DB_SESSION DB_OSUSER DB_HOST STAT_CD
500-XNEGXU 451327759 7/24/2015 11:35:26 AM Update SBLDATALOAD siebelp pas01 Actual
500-XNEGXU 451299279 7/24/2015 10:13:18 AM Update SBLDATALOAD siebelp pas01 Actual
500-XNEGXU 451140014 7/24/2015 1:04:36 AM Update SBLDATALOAD siebelp pas01 Definite
500-XNEGXU 440102253 6/23/2015 3:10:33 PM Update SBLDATALOAD convteam pas01 Actual
500-XNEGXU 426245149 5/8/2015 2:11:21 PM Update SBLDATALOAD convteam pas11 Actual
Edit :
thanks a lot Ponder for your help. Little modification of your query to get the results in a single row. This would give me the next transaction id which flipped the status from 'Actual' to 'Definite'
select row_id, tran_id, next_tran_id,tran_date, next_tran_date,stat_cd
from (
select abc.*, lag(tran_id) over (order by tran_id desc) next_tran_id,lag(tran_date) over (order by tran_id desc) next_tran_date,
case when stat_cd='Actual' and (lag(stat_cd) over (partition by row_id order by tran_id desc)) = 'Definite' then 1
end change
from abc )
where change = 1 order by row_id, tran_id
This query, using function lead() displays all rows where stat_cd is Definite and prior row in order of tran_id:
select row_id, tran_id, tran_date, stat_cd
from (
select data.*,
case when stat_cd='Definite'
or (lead(stat_cd) over (order by tran_id)) = 'Definite' then 1
end change
from data )
where change = 1 order by row_id, tran_id
SQLFiddle demo
You may need to change over (order by tran_id) to over (partition by row_id order by tran_id) if your data is organized this way.
Edit: Modified query after additional informations were provided:
select row_id, tran_id, tran_date, stat_cd
from (
select xyz.*,
case
when stat_cd='Actual'
and (lead(stat_cd) over (order by tran_id)) = 'Definite' then 1
when stat_cd='Definite'
and (lag(stat_cd) over (order by tran_id)) = 'Actual' then 2
end change
from xyz)
where change is not null
SQLFiddle demo

SQL needed for getting latest records based on Status?

I have a table TRX which has multiple values TRXID for a given SRCID sample data set shown below.
TRXID STATUS TIMESTAMP SRCID
839 EN 30-OCT-14 11.08.13.597000000 AM B0D35D0168G
1189 MO 30-OCT-14 11.13.19.554000000 AM B0D35D0168G
1549 CA 30-OCT-14 12.13.42.246000000 PM B0D35D0168G
1666 EN 30-OCT-14 02.43.22.271000000 PM A0D77E2168G
2221 CA 30-OCT-14 05.49.16.712000000 PM A0D77E2168G
2244 EN 31-OCT-14 11.21.18.146000000 AM A0D77E2168G ...
I want to get all SRCID which have latest status = 'CA' based on latest TIMESTAMP only.
so e.g if we ran the query for above data set we would only get 'B0D35D0168G' as a result.
This will work in Oracle:
SELECT srcid FROM (
SELECT srcid, status, ROW_NUMBER() OVER ( PARTITION BY srcid ORDER BY timestamp DESC ) AS rn
FROM trx
) WHERE status = 'CA' AND rn = 1;
It will work if you need to retrieve additional columns as well (e.g., if you need to know what the last value of timestamp is).
SELECT trxid, srcid, timestamp FROM (
SELECT trxid, srcid, timestamp, status, ROW_NUMBER() OVER ( PARTITION BY srcid ORDER BY timestamp DESC ) AS rn
FROM trx
) WHERE status = 'CA' AND rn = 1;
Here is the "most recent" pattern I've used quite successfully for several years now. If the key and date field form an index, it is really fast.
select *
from trx t1
where t1.status = 'CA'
and t1.timestamp =(
select Max( timestamp )
from trx
where status = t1.status
and timestamp <= SysDate );
The last line of the subquery may be omitted if future dates are not possible. Either way, the query should perform only index seeks.