The below provided data is tiny snapshot of a huge log table.
Please help with me a query to identify records having the TRAN_ID's 451140014 and 440102253.
The status of the record is getting updated to 'Definite' from 'Actual'.
As per the business rules of our application it is NOT suppose to happen, I need to fetch the list of all records in this huge table where the statuses are getting updated.
ROW_ID TRAN_ID TRAN_DATE CHG_TYPE DB_SESSION DB_OSUSER DB_HOST STAT_CD
500-XNEGXU 451327759 7/24/2015 11:35:26 AM Update SBLDATALOAD siebelp pas01 Actual
500-XNEGXU 451299279 7/24/2015 10:13:18 AM Update SBLDATALOAD siebelp pas01 Actual
500-XNEGXU 451140014 7/24/2015 1:04:36 AM Update SBLDATALOAD siebelp pas01 Definite
500-XNEGXU 440102253 6/23/2015 3:10:33 PM Update SBLDATALOAD convteam pas01 Actual
500-XNEGXU 426245149 5/8/2015 2:11:21 PM Update SBLDATALOAD convteam pas11 Actual
Edit :
thanks a lot Ponder for your help. Little modification of your query to get the results in a single row. This would give me the next transaction id which flipped the status from 'Actual' to 'Definite'
select row_id, tran_id, next_tran_id,tran_date, next_tran_date,stat_cd
from (
select abc.*, lag(tran_id) over (order by tran_id desc) next_tran_id,lag(tran_date) over (order by tran_id desc) next_tran_date,
case when stat_cd='Actual' and (lag(stat_cd) over (partition by row_id order by tran_id desc)) = 'Definite' then 1
end change
from abc )
where change = 1 order by row_id, tran_id
This query, using function lead() displays all rows where stat_cd is Definite and prior row in order of tran_id:
select row_id, tran_id, tran_date, stat_cd
from (
select data.*,
case when stat_cd='Definite'
or (lead(stat_cd) over (order by tran_id)) = 'Definite' then 1
end change
from data )
where change = 1 order by row_id, tran_id
SQLFiddle demo
You may need to change over (order by tran_id) to over (partition by row_id order by tran_id) if your data is organized this way.
Edit: Modified query after additional informations were provided:
select row_id, tran_id, tran_date, stat_cd
from (
select xyz.*,
case
when stat_cd='Actual'
and (lead(stat_cd) over (order by tran_id)) = 'Definite' then 1
when stat_cd='Definite'
and (lag(stat_cd) over (order by tran_id)) = 'Actual' then 2
end change
from xyz)
where change is not null
SQLFiddle demo
Related
I'm trying to calculate duration between different status. Which is working for most part.
I have this table
Table
for id = 102, I was able to calculate duration of each status.
with ab as (
select id,
status,
max(updated_time) as end_time,
min(updated_time) as updated_time
from Table
group by id, status
)
select *,
lead(updated_time) over (partition by id order by updated_time) - updated_time as duration,
extract(epoch from duration) as duration_seconds
from ab
Output for id = 102
but for id = 101, status moved between 'IN_PROGRESS' to 'BLOCKED' & back to 'IN_PROGRESS'
here I need the below result so that I can get the correct IN_PROGRESS duration
Expected
One way to do this would be to track every time there is a change of STATUS for a given ID sorted by VERSION. The below query provides the desired output. More than brevity, I thought having multiple steps showing the transformations would be helpful. The column UNIX timestamp can be easily converted to human readable DateTimestamp format based on the specific database being used. The sample table definition and file used has also been shared below.
Query
WITH VW_STATUS_CHANGE AS
(
SELECT ID, STATUS, LAG(STATUS) OVER (PARTITION BY ID ORDER BY VERSION) LAG_STATUS, VERSION, UNIXTIME,
CASE WHEN LAG (STATUS) OVER (PARTITION BY ID ORDER BY VERSION) <> STATUS THEN 1 ELSE 0 END STATUS_CHANGE
FROM STACKOVERFLOWSQL
),
VW_CREATE_SYNTHETIC_PARTITION AS
(
SELECT ID, STATUS, LAG_STATUS, VERSION, UNIXTIME,STATUS_CHANGE,
SUM(STATUS_CHANGE) OVER (ORDER BY ID, VERSION) AS ROWNUMBER
FROM VW_STATUS_CHANGE
) ,
VW_RESULTS_INTERMEDIATE AS
(
SELECT ID, STATUS, LAG_STATUS, VERSION, UNIXTIME, STATUS_CHANGE,
"FIRST_VALUE"(UNIXTIME) OVER (
PARTITION BY "ID",
"STATUS", ROWNUMBER
ORDER BY
"VERSION"
) "TIME_FIRST_VALUE",
"FIRST_VALUE"(UNIXTIME) OVER (
PARTITION BY "ID",
"STATUS", ROWNUMBER
ORDER BY
"VERSION" DESC
) "TIME_LAST_VALUE"
FROM VW_CREATE_SYNTHETIC_PARTITION
ORDER BY ID, VERSION
)
SELECT DISTINCT ID, STATUS, TIME_FIRST_VALUE, TIME_LAST_VALUE
FROM VW_RESULTS_INTERMEDIATE
ORDER BY TIME_FIRST_VALUE
AWS Athena Table Used along with Sample data.
CREATE EXTERNAL TABLE STACKOVERFLOWSQL (
ID INTEGER,
STATUS STRING,
VERSION INTEGER,
UNIXTIME INTEGER
)
ROW FORMAT SERDE 'ORG.APACHE.HADOOP.HIVE.SERDE2.OPENCSVSERDE'
WITH SERDEPROPERTIES (
'SEPARATORCHAR' = ',',
"SKIP.HEADER.LINE.COUNT"="1"
)
STORED AS TEXTFILE
LOCATION 'S3://<S3BUCKETNAME>/';
Dataset Used:
ID,STATUS,VERSION,UNIXTIME
101,NOT_ASSIGNED,1,1668124141
101,IN_PROGRESS,2,1668124143
101,IN_PROGRESS,3,1668124146
101,IN_PROGRESS,4,1668124150
101,IN_PROGRESS,5,1668124155
101,BLOCKED,6,1668124161
101,BLOCKED,7,1668124168
101,IN_PROGRESS,8,1668124176
101,IN_PROGRESS,9,1668124185
101,IN_PROGRESS,10,1668124195
101,COMPLETED,11,1668124206
105,NOT_ASSIGNED,1,1668124207
105,IN_PROGRESS,2,1668124209
105,IN_PROGRESS,3,1668124212
105,IN_PROGRESS,4,1668124216
105,IN_PROGRESS,5,1668124221
105,IN_PROGRESS,6,1668124227
105,COMPLETED,7,1668124234
Result from the View
ID STATUS TIME_FIRST_VALUE TIME_LAST_VALUE
101 NOT_ASSIGNED 1668124141 1668124141
101 IN_PROGRESS 1668124143 1668124155
101 BLOCKED 1668124161 1668124168
101 IN_PROGRESS 1668124176 1668124195
101 COMPLETED 1668124206 1668124206
105 NOT_ASSIGNED 1668124207 1668124207
105 IN_PROGRESS 1668124209 1668124227
105 COMPLETED 1668124234 1668124234
I wrote a query which uses 2 temp tables. And then joins them into 1. However, I am seeing duplicate records in the student visit temp table. (Query is below). How could this be modified to remove the duplicate records of the visit temp table?
with clientbridge as (Select *
from (Select visitorid, --Visid
roomnumber,
room_id,
profid,
student_id,
ambc.datekey,
RANK() over(PARTITION BY visitorid,student_id,profid ORDER BY ambc.datekey desc) as rn
from university.course_office_hour_bridge cohd
--where student_id = '9999999-aaaa-6634-bbbb-96fa18a9046e'
)
where rn = 1 --visitorid = '999999999999999999999999999999'---'1111111111111111111111111111111' --and pai.datekey is not null --- 00000000000000000000000000
),
-----------------Data Header Table
studentvisit as
(SELECT
--Visit key will allow us to track everything they did within that visit.
distinct visid_visitorid,
--calcualted_visitorid,
uniquevisitkey,
--channel, -- says the room they're in. Channel might not be reliable would need to see how that operates
--office_list, -- add 7 to exact
--user_college,
--first_office_hour_name,
--first_question_time_attended,
studentaccountid_5,
profid_officenumber_8,
studentvisitstarttime,
room_id_115,
--date_time,
qqq144, --Course Name
qqq145, -- Course Office Hour Benefit
qqq146, --Course Office Hour ID
datekey
FROM university.office_hour_details ohd
--left_join niversity.course_office_hour_bridge cohd on ohd.visid_visitorid
where DateKey >='2022-10-01' --between '2022-10-01' and '2022-10-27'
and (qqq146 <> '')
)
select
*
from clientbridge ab inner join studentvisit sv on sv.visid_visitorid = cb.visitorid
I wrote a query which uses 2 temp tables. And then joins them into 1. However, I am seeing duplicate records in the student visit temp table. (Query is below). How could this be modified to remove the duplicate records of the visit temp table?
I think you may get have a better shot by joining the two datasets in the same query where you want the data ranked, otherwise your rank from query will be ignored within the results from the second query. Perhaps, something like ->
;with studentvisit as
(SELECT
--Visit key will allow us to track everything they did within that visit.
distinct visid_visitorid,
--calcualted_visitorid,
uniquevisitkey,
--channel, -- says the room they're in. Channel might not be reliable would need to see how that operates
--office_list, -- add 7 to exact
--user_college,
--first_office_hour_name,
--first_question_time_attended,
studentaccountid_5,
profid_officenumber_8,
studentvisitstarttime,
room_id_115,
--date_time,
qqq144, --Course Name
qqq145, -- Course Office Hour Benefit
qqq146, --Course Office Hour ID
datekey
FROM university.office_hour_details ohd
--left_join niversity.course_office_hour_bridge cohd on ohd.visid_visitorid
where DateKey >='2022-10-01' --between '2022-10-01' and '2022-10-27'
and (qqq146 <> '')
)
,clientbridge as (
Select
sv.*,
university.course_office_hour_bridge cohd, --Visid
roomnumber,
room_id,
profid,
student_id,
ambc.datekey,
RANK() over(PARTITION BY sv.visitorid,sv.student_id,sv,profid ORDER BY ambc.datekey desc) as rn
from university.course_office_hour_bridge cohd
inner join studentvisit sv on sv.visid_visitorid = cohd.visitorid
)
select
*
from clientbridge WHERE rn=1
I have written the query shown here that combines three tables and returns rows where the at_ticket_num from appeal_tickets is duplicated but against a different at_sys_ref value
select top 100
t.t_reference, at.at_system_ref, at_ticket_num, a.a_case_ref
from
tickets t, appeal_tickets at, appeals_2 a
where
t.t_reference in ('AB123','AB234') -- filtering on these values so that I can see that its working
and t.t_number = at.at_ticket_num
and at.at_system_ref = a.a_system_ref
and at.at_ticket_num IN (select at_ticket_num
from appeal_tickets
group by at_ticket_num
having count(distinct at_system_ref) > 1)
order by
t.t_reference desc
This is the output:
t_reference at_system_ref at_ticket_num a_case_ref
-------------------------------------------------------
AB123 30838974 23641583 1111979010
AB123 30838976 23641583 1111979010
AB234 30839149 23641520 1111977352
AB234 30839209 23641520 1111988003
I want to modify this so that it only returns records where t_reference is duplicated but against a different a_case_ref. So in above case only records for AB234 would be returned.
Any help would be much appreciated.
You want all ticket appeals that have more than one system reference and more than one case reference it seems. You can join the tables, count the occurrences per ticket and then only keep the tickets that match these criteria.
select *
from
(
select
t.t_reference, at.at_system_ref, at.at_ticket_num, a.a_case_ref,
count(distinct a.a_system_ref) over (partition by at.at_ticket_num) as sysrefs,
count(distinct a.a_case_ref) over (partition by at.at_ticket_num) as caserefs
from tickets t
join appeal_tickets at on at.at_ticket_num = t.t_number
join appeals_2 a on a.a_system_ref = at.at_system_ref
) counted
where sysrefs > 1 and caserefs > 1
order by t.t_reference, at.at_system_ref, at.at_ticket_num, a.a_case_ref;
Correction
It seems that SQL Server still doesn't support COUNT(DISTINCT ...) OVER (...). You can count distinct values in a subquery though. Replace
count(distinct a.a_system_ref) over (partition by at.at_ticket_num) as sysrefs,
by
(
select count(distinct a2.a_system_ref)
from appeal_tickets at2
join appeals_2 a2 on a2.a_system_ref = at2.at_system_ref
where at2.at_ticket_num = t.t_number
) as sysrefs,
An alternative workaround is to use DENSE_RANK in two directions (found here: https://stackoverflow.com/a/53518204/2270762):
dense_rank() over (partition by at.at_ticket_num order by a.a_system_ref) +
dense_rank() over (partition by at.at_ticket_num order by a.a_system_ref desc) -
1 as sysrefs,
with data as (
<your query plus one column>,
case when
min() over (partition by t.t_reference)
<>
max() over (partition by t.t_reference)
then 1 end as dup
)
select * from data where dup = 1
I want to create a add a specific value to rows with null value in case they have something that isn't a null value. It's something difficult to understand, but it could be easier in watching the desired output:
This is my actual table:
DATESTAMP______________pressure__________final_date
2021-02-19T21:19:35_______10.12_____________null
2021-02-19T22:19:35_______11.13_____________null
2021-02-19T23:19:35_______10.43_____________null
2021-02-20T00:19:35_______11.98_____________null
2021-02-20T01:19:35_______10.21_____________null
2021-02-20T01:40:10_______20.21_____________2021-02-20
2021-02-24T23:11:00_______10.42_____________null
2021-02-25T00:11:00_______10.51_____________null
2021-02-25T00:11:00_______20.51_____________2021-02-25
2021-02-28T11:11:12_______10.51_____________null
2021-02-28T12:11:12_______10.52_____________null
This is mi desired table after doing the query:
DATESTAMP______________pressure__________final_date
2021-02-19T21:19:35_______10.12_____________2021-02-20
2021-02-19T22:19:35_______11.13_____________2021-02-20
2021-02-19T23:19:35_______10.43_____________2021-02-20
2021-02-20T00:19:35_______11.98_____________2021-02-20
2021-02-20T01:19:35_______10.21_____________2021-02-20
2021-02-20T01:40:10_______20.21_____________2021-02-20
2021-02-24T23:11:00_______10.42_____________2021-02-25
2021-02-25T00:11:00_______10.51_____________2021-02-25
2021-02-25T00:11:00_______20.51_____________2021-02-25
2021-02-28T11:11:12_______10.51_____________null
2021-02-28T12:11:12_______10.52_____________null
It doesn't matter if I have to create a new column:
That's my query:
SELECT *, IF(final_date is null, LAG(final_date ) OVER (ORDER BY DATESTAMP DESC), final_date ) AS preceding FROM(
SELECT
* FROM my_table
ORDER BY DATESTAMP ASC)
ORDER BY DATESTAMP ASC
And that's the result I received in the before query:
DATESTAMP______________pressure_________final_date_______preceding
2021-02-19T21:19:35_______10.12_____________null_____________null
2021-02-19T22:19:35_______11.13_____________null_____________null
2021-02-19T23:19:35_______10.43_____________null _____________null
2021-02-20T00:19:35_______11.98_____________null _____________null
2021-02-20T01:19:35_______10.21_____________null_____________2021-02-20
2021-02-20T01:40:10_______20.21_____________2021-02-20 ______2021-02-20
2021-02-24T23:11:00_______10.42_____________null_____________null
2021-02-25T00:11:00_______10.51_____________null_____________2021-02-25
2021-02-25T00:11:00_______20.51_____________2021-02-25_______2021-02-25
2021-02-28T11:11:12_______10.51_____________null_____________null
2021-02-28T12:11:12_______10.52_____________null_____________null
Can someone help me?
Thanks!
This looks like a cumulative minimum:
SELECT t.*,
MIN(final_date) OVER (ORDER BY DATESTAMP DESC) as imputed_final_date
FROM my_table
I have a table where I'm trying to find a set of particular records. Here's what my table looks like...
tblA
ID VouchID Action Amount
1 177-17 Add 700
2 177-17 Update 1
3 198-01 Add 600
4 198-01 Update 620
So what happens here, is if a record was canceled/deleted, the action would be 'Update' and Amount would be updated to 1. In other words, the VouchID = 177-17, would not be counted/be selected in this query...
What I'm hoping to do here is only select records, that don't have a corresponding Update record with Amount = 1
Select distinct vouchID where Action='add'
However, this query does not take under consideration VoucherID's that have an 'update' action. Update action can be applied in two instances, in VouchID 177-17 the amount = 1 on action='update' that means, that the ADD action does not count, it's almost as if we removed the record all together (it's just there for record keeping). Another Update in case of VoucherID = 198-01, the update line and amount = 620, means that the Amount was updated by 20 to 620, that record i hope to be able to see in my end reuslt
Desired end result from above table:
ID VouchID Action Amount
3 198-01 Add 600
You could use LEAD (SQL Server 2012 and above):
WITH cte AS (
SELECT *, LEAD(Amount) OVER(PARTITION BY VouchID ORDER BY ID) AS next_amount
FROM table
)
SELECT *
FROM cte
WHERE (next_amount <> 1 OR next_amount IS NULL) AND Action='add';
EDIT
non-recursive CTE can be always replaced with simple subquery:
SELECT *
FROM (SELECT *,
LEAD(Amount) OVER(PARTITION BY VouchID ORDER BY ID) AS next_amount
FROM table) sub
WHERE (next_amount <> 1 OR next_amount IS NULL) AND Action='add';
EDIT:
Using EXISTS:
SELECT *
FROM table t1
WHERE Action='add'
AND NOT EXISTS (SELECT TOP 1
FROM table t2
WHERE t1.VouchId = t2.VouchId
AND Action='Update'
AND Amount = 1
ORDER BY ID ASC);
What I'm hoping to do here is only select records, that don't have a
corresponding Update record with Amount = 1
Seems easy enough with NOT EXISTS():
Select distinct vouchID FROM MyTable t1 where Action='add'
AND NOT EXISTS(SELECT * FROM MyTable t2
WHERE Action='Update'
AND Amount=1
AND t2.VouchId=t1.VouchId
Are you using SQL Server 2008 or better? If you are, I would try something like :
SELECT
ID, vouchID, Action, Amount
FROM tblA s
WHERE
Action='add'
AND NOT EXISTS(Select 1 from tblA l where l.vouchID = s.vouchID and l.Action = 'Update' and l.Amount = 1);