SQL: How to select the oldest date row - sql

I have a report looks something like this :
numberOrder
timestamp
id
status
12
2021-06-23-14.00.00.232425
11
done
13
2021-06-30-18.00.00.224525
22
done
14
2021-07-01-01.00.00.224525
23
done
This is done with SQL :
SELECT numberOrder, timestamp, id, status
from order
where status = 'done'
I would like the report to show just the oldest row :
numberOrder
timestamp
id
status
12
2021-06-23-14.00.00.232425
11
done
SELECT numberOrder, timestamp, id, status
from order
WHERE timestamp = (select TOP 1 timestamp FROM order by timestamp)
and status = 'done'`
Any ideas ? I try to use min(). Any help is appreciated
And have any ideas if I dont find any status done then find the status cancel ?
I would like the report to show just the cancel row if we dont find any done status :
numberOrder
timestamp
id
status
12
2021-06-23-14.00.00.232425
11
cancel

Admittedly unfamiliar with DB2 but I would suggest the following to order the rows by timestamp and fetch the first (oldest) row.
select numberOrder, timestamp, id, status
from order
where status = 'done'
order by timestamp
fetch first 1 rows only

Try the following query.
You may run the statement as is if you remove group comments /* */.
The statement without this group comments work on rows with cancel statuses only in the base "table" MYTAB. You may comment out the row with 'cancel' and uncomment the row with 'done' inside VALUES to get all rows with statuses done only in the base "table". Or construct your base "table" with whatever contents of your original table.
You don't have to edit your query to get both results.
The idea is to enumerate rows row inside both status types. The first rows (ordered by timestamp) in all statues ordered to have the done row before the cancel one. Finally we get the first row only.
WITH
/*
MYTAB0 (numberOrder, timestamp, id) AS
(
VALUES
(12, '2021-06-23-14.00.00.232425', 11)
, (13, '2021-06-30-18.00.00.224525', 22)
, (14, '2021-07-01-01.00.00.224525', 23)
)
, MYTAB AS
(
SELECT *
FROM MYTAB0,
(
VALUES
'cancel'
--'done'
) S (STATUS)
)
,
*/
T AS
(
SELECT T.*, ROW_NUMBER () OVER (PARTITION BY STATUS ORDER BY TIMESTAMP) AS RN_
FROM MYTAB T
WHERE STATUS IN ('done', 'cancel')
)
SELECT numberOrder, timestamp, id, status
FROM T
WHERE RN_ = 1
ORDER BY DECODE (status, 'done', 0, 1)
FETCH FIRST 1 ROW ONLY

Related

Window Function - date ranges

I'm trying to calculate duration between different status. Which is working for most part.
I have this table
Table
for id = 102, I was able to calculate duration of each status.
with ab as (
select id,
status,
max(updated_time) as end_time,
min(updated_time) as updated_time
from Table
group by id, status
)
select *,
lead(updated_time) over (partition by id order by updated_time) - updated_time as duration,
extract(epoch from duration) as duration_seconds
from ab
Output for id = 102
but for id = 101, status moved between 'IN_PROGRESS' to 'BLOCKED' & back to 'IN_PROGRESS'
here I need the below result so that I can get the correct IN_PROGRESS duration
Expected
One way to do this would be to track every time there is a change of STATUS for a given ID sorted by VERSION. The below query provides the desired output. More than brevity, I thought having multiple steps showing the transformations would be helpful. The column UNIX timestamp can be easily converted to human readable DateTimestamp format based on the specific database being used. The sample table definition and file used has also been shared below.
Query
WITH VW_STATUS_CHANGE AS
(
SELECT ID, STATUS, LAG(STATUS) OVER (PARTITION BY ID ORDER BY VERSION) LAG_STATUS, VERSION, UNIXTIME,
CASE WHEN LAG (STATUS) OVER (PARTITION BY ID ORDER BY VERSION) <> STATUS THEN 1 ELSE 0 END STATUS_CHANGE
FROM STACKOVERFLOWSQL
),
VW_CREATE_SYNTHETIC_PARTITION AS
(
SELECT ID, STATUS, LAG_STATUS, VERSION, UNIXTIME,STATUS_CHANGE,
SUM(STATUS_CHANGE) OVER (ORDER BY ID, VERSION) AS ROWNUMBER
FROM VW_STATUS_CHANGE
) ,
VW_RESULTS_INTERMEDIATE AS
(
SELECT ID, STATUS, LAG_STATUS, VERSION, UNIXTIME, STATUS_CHANGE,
"FIRST_VALUE"(UNIXTIME) OVER (
PARTITION BY "ID",
"STATUS", ROWNUMBER
ORDER BY
"VERSION"
) "TIME_FIRST_VALUE",
"FIRST_VALUE"(UNIXTIME) OVER (
PARTITION BY "ID",
"STATUS", ROWNUMBER
ORDER BY
"VERSION" DESC
) "TIME_LAST_VALUE"
FROM VW_CREATE_SYNTHETIC_PARTITION
ORDER BY ID, VERSION
)
SELECT DISTINCT ID, STATUS, TIME_FIRST_VALUE, TIME_LAST_VALUE
FROM VW_RESULTS_INTERMEDIATE
ORDER BY TIME_FIRST_VALUE
AWS Athena Table Used along with Sample data.
CREATE EXTERNAL TABLE STACKOVERFLOWSQL (
ID INTEGER,
STATUS STRING,
VERSION INTEGER,
UNIXTIME INTEGER
)
ROW FORMAT SERDE 'ORG.APACHE.HADOOP.HIVE.SERDE2.OPENCSVSERDE'
WITH SERDEPROPERTIES (
'SEPARATORCHAR' = ',',
"SKIP.HEADER.LINE.COUNT"="1"
)
STORED AS TEXTFILE
LOCATION 'S3://<S3BUCKETNAME>/';
Dataset Used:
ID,STATUS,VERSION,UNIXTIME
101,NOT_ASSIGNED,1,1668124141
101,IN_PROGRESS,2,1668124143
101,IN_PROGRESS,3,1668124146
101,IN_PROGRESS,4,1668124150
101,IN_PROGRESS,5,1668124155
101,BLOCKED,6,1668124161
101,BLOCKED,7,1668124168
101,IN_PROGRESS,8,1668124176
101,IN_PROGRESS,9,1668124185
101,IN_PROGRESS,10,1668124195
101,COMPLETED,11,1668124206
105,NOT_ASSIGNED,1,1668124207
105,IN_PROGRESS,2,1668124209
105,IN_PROGRESS,3,1668124212
105,IN_PROGRESS,4,1668124216
105,IN_PROGRESS,5,1668124221
105,IN_PROGRESS,6,1668124227
105,COMPLETED,7,1668124234
Result from the View
ID STATUS TIME_FIRST_VALUE TIME_LAST_VALUE
101 NOT_ASSIGNED 1668124141 1668124141
101 IN_PROGRESS 1668124143 1668124155
101 BLOCKED 1668124161 1668124168
101 IN_PROGRESS 1668124176 1668124195
101 COMPLETED 1668124206 1668124206
105 NOT_ASSIGNED 1668124207 1668124207
105 IN_PROGRESS 1668124209 1668124227
105 COMPLETED 1668124234 1668124234

How do I do conditional logic between rows of a bigquery table?

I'm trying to write a query that goes through a table row by row comparing the current row with the next. Then based on a condition being true will perform a calculation which is then output in a column on the same table and a null value if false.
Consider the example above:
Row 8703 will be referred to as Row 1
Row 8704 will be referred to as Row 2
I would like to, if possible, compare Row 1 bookedEnd with Row 2 bookedStart. If they are of equal value (which in this case they are) I would like to subtract Row 2 actualStartdate from Row 1 actualEnddate and output the value in minutes in a separate column named 'difference' on Row 2.
If they are not of equal value (which is true for all other columns in the example above) I would like to output a null value.
For the above table the extra column named difference would have the row values of:
8701 - Null
8702 - Null
8703 - Null
8704 - 12
8705 - Null
Since you are writing to "Row 2", I use the LAG() function so you are comparing on the row you are writing.
with data as (select * from `project.dataset.table`),
lagged as (
select
*,
lag(bookedEnd,1) over(partition by roomID order by Row asc) as prev_bookedEnd,
lag(actualEnddate,1) over(partition by roomID order by Row asc) as prev_actualEnddate
from data
)
select
* except (prev_bookedEnd,prev_actualEnddate),
case when prev_bookedEnd = bookedStart then timestamp_diff(prev_actualEndDate,actualStartdate, minute) else null end as difference
from lagged
What you will want to do in this scenario is use the lead function
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#lead
it would look similar to
SELECT bookedEnd
, CASE WHEN bookedEnd = LEAD(bookedStart) OVER (PARTITION BY roomid ORDER BY Row) then XXXX END as actualStartdate
, CASE WHEN bookedEnd = LEAD(bookedStart) OVER (PARTITION BY roomid ORDER BY Row) then XXXX END as difference
SELECT
*,
IF( LAG(bookedEnd) OVER (PARTITION BY roomId ORDER BY bookedStart) = bookedStart,
TIMESTAMP_DIFF( actualStartdate,
LAG(actualEnddate) OVER (PARTITION BY roomId ORDER BY bookedStart),
MINUTE
),
NULL
) AS difference
FROM `project.dataset.table`

How show the last status of a mobile number and old data in the same row ? using SQL

I'm working in a telecom and part of work is to check the last status for a specific mobile number along with that last de-active status,it's easy to get the active number by using the condition ACTIVE int the statement ,but it's not easy to pick the last de-active status because each number might have more than one de-active status or only one status ACTIVE, I use the EXP_DATE as an indicator for the last de-active status,I want to show both new data and old data in one row,but I'm struggling with that ,below my table and my expected result :-
my expected result
query that I use on daily basis
select * from test where exp_date>sysdate; to get the active numbers , to get the de-active number select * from test where exp_date<sysdate;
You just need to do outer join with one subquery containing ACTIVE records and one with latest DE-ACTIVE record as following:
SELECT A.MSISDN,
A.NAME,
A.SUB_STATUS,
A.CREATED_DATE,
A.EXP_DATE,
D.MSISDN AS MSISDN_,
D.NAME AS OLD_NAME,
D.SUB_STATUS OLD_STATUS,
D.CREATED_DATE AS OLD_CREATED_DATE,
D.EXP_DATE AS OLD_EXP_DATE
FROM
(SELECT * FROM TEST
WHERE EXP_DATE > SYSDATE
AND SUB_STATUS = 'ACTIVE') A -- ACTIVE RECORD
-- USE CONDITION TO FETCH ACTIVE RECORD AS PER YOUR REQUIREMENT
FULL OUTER JOIN
(SELECT * FROM
(SELECT T.*,
ROW_NUMBER() OVER (PARTITION BY T.MSISDN ORDER BY EXP_DATE DESC NULLS LAST) AS RN
FROM TEST T
WHERE T.EXP_DATE < SYSDATE
AND T.SUB_STATUS='DE-ACTIVE')
-- USE CONDITION TO FETCH DEACTIVE RECORD AS PER YOUR REQUIREMENT
WHERE RN = 1
) D
ON (A.MSISDN = D.MSISDN)
Cheers!!
Here is an overview of how to do this -- one query to get a distinct list of all the phone numbers, left join to a list of the most recent active on that phone number,left join to a list of the most recent de-active on the phone number
How about conditional aggregation?
select msidn,
max(case when status = 'DE-ACTIVE' then create_date end) as deactive_date,
max(case when status = 'ACTIVE' then exp_date end) as active_date
from test
group by msisdn

SQL Server iterating through time series data

I am using SQL Server and wondering if it is possible to iterate through time series data until specific condition is met and based on that label my data in other table?
For example, let's say I have a table like this:
Id Date Some_kind_of_event
+--+----------+------------------
1 |2018-01-01|dsdf...
1 |2018-01-06|sdfs...
1 |2018-01-29|fsdfs...
2 |2018-05-10|sdfs...
2 |2018-05-11|fgdf...
2 |2018-05-12|asda...
3 |2018-02-15|sgsd...
3 |2018-02-16|rgw...
3 |2018-02-17|sgs...
3 |2018-02-28|sgs...
What I want to get, is to calculate for each key the difference between two adjacent events and find out if there exists difference > 10 days between these two adjacent events. In case yes, I want to stop iterating for that specific key and put label 'inactive', otherwise 'active' in my other table. After we finish with one key, we start with another.
So for example id = 1 would get label 'inactive' because there exists two dates which have difference bigger that 10 days. The final result would be like that:
Id Label
+--+----------+
1 |inactive
2 |active
3 |inactive
Any ideas how to do that? Is it possible to do it with SQL?
When working with a DBMS you need to get away from the idea of thinking iteratively. Instead you need to try and think in sets. "Instead of thinking about what you want to do to a row, think about what you want to do to a column."
If I understand correctly, is this what you're after?
CREATE TABLE SomeEvent (ID int, EventDate date, EventName varchar(10));
INSERT INTO SomeEvent
VALUES (1,'20180101','dsdf...'),
(1,'20180106','sdfs...'),
(1,'20180129','fsdfs..'),
(2,'20180510','sdfs...'),
(2,'20180511','fgdf...'),
(2,'20180512','asda...'),
(3,'20180215','sgsd...'),
(3,'20180216','rgw....'),
(3,'20180217','sgs....'),
(3,'20180228','sgs....');
GO
WITH Gaps AS(
SELECT *,
DATEDIFF(DAY,LAG(EventDate) OVER (PARTITION BY ID ORDER BY EventDate),EventDate) AS EventGap
FROM SomeEvent)
SELECT ID,
CASE WHEN MAX(EventGap) > 10 THEN 'inactive' ELSE 'active' END AS Label
FROM Gaps
GROUP BY ID
ORDER BY ID;
GO
DROP TABLE SomeEvent;
GO
This assumes you are using SQL Server 2012+, as it uses the LAG function, and SQL Server 2008 has less than 12 months of any kind of support.
Try this. Note, replace #MyTable with your actual table.
WITH Diffs AS (
SELECT
Id
,DATEDIFF(DAY,[Date],LEAD([Date],1,0) OVER (ORDER BY [Id], [Date])) Diff
FROM #MyTable)
SELECT
Id
,CASE WHEN MAX(Diff) > 10 THEN 'Inactive' ELSE 'Active' END
FROM Diffs
GROUP BY Id
Just to share another approach (without a CTE).
SELECT
ID
, CASE WHEN SUM(TotalDays) = (MAX(CNT) - 1) THEN 'Active' ELSE 'Inactive' END Label
FROM (
SELECT
ID
, EventDate
, CASE WHEN DATEDIFF(DAY, EventDate, LEAD(EventDate) OVER(PARTITION BY ID ORDER BY EventDate)) < 10 THEN 1 ELSE 0 END TotalDays
, COUNT(ID) OVER(PARTITION BY ID) CNT
FROM EventsTable
) D
GROUP BY ID
The method is counting how many records each ID has, and getting the TotalDays by date differences (in days) between the current the next date, if the difference is less than 10 days, then give me 1, else give me 0.
Then compare, if the total days equal the number of records that each ID has (minus one) would print Active, else Inactive.
This is just another approach that doesn't use CTE.

SQL Server 2008 query, time in each status

I'm wondering if anybody can help with a query I am working on. I'm trying to gather information for 'Time in each status' from my call activity table.
I need to set up 3 time ranges in days: <3 days, 4-5 days, 6+ days, returning the number of days each CallID is spending in each status.
The trouble I'm having is that I need to identify from the table below when there was a status change. This table records any activity to the call, i.e changed customer details and not just when a status has been changed.
Apologies if this is unclear, let me know if you need further details.
I'm using SQL Server 2008. Here is the table I'm using and related values:
CREATE TABLE Activity ( CallID varchar(30), Call_Date datetime, [User] varchar(30), Status varchar(10) );
INSERT INTO Activity VALUES (366,'2013/09/27 12:24:33',13,9);
INSERT INTO Activity VALUES (366,'2013/09/28 17:36:14',13,9);
INSERT INTO Activity VALUES (366,'2013/09/29 07:29:18',13,10);
INSERT INTO Activity VALUES (366,'2013/09/30 06:22:12',13,-1);
INSERT INTO Activity VALUES (367,'2013/09/27 12:13:16',9,6);
INSERT INTO Activity VALUES (367,'2013/09/27 12:25:03',9,6);
INSERT INTO Activity VALUES (367,'2013/09/29 12:25:29',9,6);
INSERT INTO Activity VALUES (367,'2013/09/30 12:45:55',9,7);
INSERT INTO Activity VALUES (367,'2013/10/01 12:46:04',9,8);
INSERT INTO Activity VALUES (367,'2013/10/02 15:12:27',9,-1);
INSERT INTO Activity VALUES (368,'2013/08/01 15:09:01',5,10);
INSERT INTO Activity VALUES (368,'2013/08/02 14:11:20',5,13);
INSERT INTO Activity VALUES (368,'2013/08/04 16:41:11',5,13);
INSERT INTO Activity VALUES (368,'2013/08/05 01:12:56',5,-1);
Desired Output 1: E.g. if CallID 35931 took 2 days to change from status 1 to status 2, 2 days would be added to the count in the <3 column
Status <3 Days 4-5 days 6+ Days
------ ------- -------- -------
1 10 3 1
2 8 1 2
3 5 3 1
I'm stuck in the first stage trying to identify the rows where there are status changes and ignoring the rest. I'm working on a subquery which selects the top date for each change of status. It's bringing back negative values. See here:
select CallID, T2.[status], Call_Date,
sum(datediff(dd, nextDate, [Call_Date]) - (datediff(wk, nextDate, [Call_Date]) * 2) -
case when datepart(wk, nextDate) = 1 then 1 else 0 end +
case when datepart(wk, [Call_Date]) = 7 then 1 else 0 end) as TotalDays
from (select *,
(select MAX( T0.[Call_Date])
from [Activity] T0
where T0.[Call_Date] > T1.[Call_Date] and
T0.CallID = T1.CallID
) as nextDate
from [Activity] T1
) T2
where T2.[status] <> '-1'
group by Call_Date, T2.[status], CallID
Thanks for your help in advance.
First of all i think that you need only the rows with the minimum date for each id and status as they would show a status change. This can be done with a CTE and using ROW_NUMBER.
Then you should join the results in a way that on the same record you would have the old status date and the new status date. On the first time you would have nulls for the first status.
;WITH CallsCTE AS
(
SELECT CallId,
Call_Date,
Status,
ROW_NUMBER() OVER(PARTITION BY CallId, Status ORDER BY Call_Date) AS rn
FROM Activity
),
StatusChangesCTE AS
(
SELECT CallID,
Call_Date,
Status
FROM CallsCTE
WHERE rn = 1
)
SELECT Sold.*,
Snew.*
FROM StatusChangesCTE Snew
LEFT JOIN StatusChangesCTE Sold
ON Snew.CallID = Sold.CallID
AND Sold.Call_Date = (SELECT MAX(Call_Date) FROM StatusChangesCTE WHERE CallID = Sold.CallID AND Call_Date < Snew.Call_Date)
I think that you can find your way using the above, as you could use DateDiff on Snew.Call_Date and Sold.Call_Date to find the time needed for a status change.
Let me know if you need any more assistance.