SQL sequence numbers and start/end date analysis - sql

I am fairly new to SQL (SQL Management Studio 2016) and I only joined the site this morning...so my first post! I have been looking for a solution on the site regarding my issue. I have found a few links but none that (I think) will work having tried a few. I have a table that holds boiler service data. One address can have multiple dates/sequence numbers. I am looking to create a script that proves the latest sequential numbers start date is less than or equal to the latest sequential end date. So, in my example, I'd want to select the MAX seq_no for the start_date field and the 2nd MAX seq_no for the end_date field to make sure they haven't breached timescale.
My sample data has been added as an image (hopefully!)...just two addresses but there are 1000's in reality):
I have tried SLQ to get max seq_no's for just the end date initially but it just keeps bringing back all the entries:
select max (seq_no) as SEQNO, end_date, cmpnt_ref, prty_id
FROM hgmpcych
where prty_id in ('ABBEY10_TD12','ABBEY12_TD12') and cmpnt_ref='Boiler' and cycle_no='5'
group by end_date,prty_id,cmpnt_ref,seq_no
order by prty_id
This will probably be quite basic, but I am still pretty new to SQL. Any hints, advice or tips would be very much appreciated.

You could use ROW_NUMBER() to mark the rows in each group and only select the rows marked with 1 or 2 (The two "latest" rows)...
WITH
enumerated_hgmpcych AS
(
SELECT
seq_no, start_date, end_date, cmpnt_ref, prty_id,
ROW_NUMBER() OVER (PARTITION BY prty_id, cmpnt_ref
ORDER BY seq_no DESC
)
desc_seq_enumerator
FROM
hgmpcych
WHERE
prty_id in ('ABBEY10_TD12','ABBEY12_TD12')
AND cmpnt_ref='Boiler'
AND cycle_no='5'
)
SELECT
*
FROM
enumerated_hgmpcych
WHERE
desc_seq_enumerator <= 2
ORDER BY
prty_id,
cmpnt_ref,
seq_no
If you wanted to, you could collapse that to one row per group...
WITH
enumerated_hgmpcych AS
(
SELECT
seq_no, start_date, end_date, cmpnt_ref, prty_id,
ROW_NUMBER() OVER (PARTITION BY prty_id, cmpnt_ref
ORDER BY seq_no DESC
)
desc_seq_enumerator
FROM
hgmpcych
WHERE
prty_id in ('ABBEY10_TD12','ABBEY12_TD12')
AND cmpnt_ref='Boiler'
AND cycle_no='5'
)
SELECT
prty_id,
cmpnt_ref,
MAX(CASE WHEN desc_seq_enumerator = 1 THEN seq_no END) AS final_seq_no,
MAX(CASE WHEN desc_seq_enumerator = 1 THEN start_date END) AS final_start_date,
MAX(CASE WHEN desc_seq_enumerator = 1 THEN end_date END) AS final_end_date,
MAX(CASE WHEN desc_seq_enumerator = 2 THEN seq_no END) AS prev_seq_no,
MAX(CASE WHEN desc_seq_enumerator = 2 THEN start_date END) AS prev_start_date,
MAX(CASE WHEN desc_seq_enumerator = 2 THEN end_date END) AS prev_end_date
FROM
enumerated_hgmpcych
WHERE
desc_seq_enumerator <= 2
GROUP BY
prty_id,
cmpnt_ref
ORDER BY
prty_id,
cmpnt_ref

If you have max(seq_no), then you don't want it in the group by:
select max (seq_no) as SEQNO, end_date, cmpnt_ref, prty_id
from hgmpcych
where prty_id in ('ABBEY10_TD12', 'ABBEY12_TD12') and
cmpnt_ref = 'Boiler' and cycle_no = '5'
group by end_date, prty_id, cmpnt_ref
order by prty_id;

Related

SQL query to pick up the previous row and the current row with the last update date condition

I have a table ANC_PER_ABS_ENTRIES that has the following detail -
PER_AB_ENTRY_ID
person_number
action_type
duration
START_dATE
END_DATE
LAST_UPD_DT
15
101
INSERT
3
01/10/2022
03/10/2022
2022-11-01T04:59:43
15
101
UPDATE
1
01/10/2022
01/10/2022
2022-11-02T10:59:43
This table is a history table and the action like - insert update and delete are tracked in this table.
Insert means when the entry in the table was added
Update means some sort of changes were made in the table
Delete means the record was deleted.
I want to create a query that picks up the changes in this table by comparing the last_update_date and the run_date (parameter)
Eg- for person_number 101 with per_ab_entry_id -- > 15 , the action_type is insert first that means the record was created on first, then it is updated and the end_date , duration is changed.
so if i run the below query on the 1st after 4:59, then the 1st row will be picked.
When I run it on 2nd , only the 2nd row is getting picked.
But how i want is that in case sthe same per_ab_entry_id was updated and if the last_upd_dt of the update >= run_Date then , the insert row should also be extracted -
The output should look like this in the latest run-
PER_AB_ENTRY_ID
person_number
flag
duration
START_dATE
END_DATE
LAST_UPD_DT
15
101
O
3
01/10/2022
03/10/2022
2022-11-01T04:59:43
15
101
u
1
01/10/2022
01/10/2022
2022-11-02T10:59:43
I have to run the below query such that the last_update_date >= :process_date.
Its working for the delete condition and evrything except this case. How can it be tweaked that when the last_upd_dt of the latest recorrd of one per_ab_entry_id >=process_date then its previous row is also sent.
The below query is not working because the last_upd_dt of the 1st row <= process_date
with anc as
(
select person_number,
absence_type,
ABSENCE_STATUS,
approval_status_cd,
start_date,
end_date,
duration,
PER_AB_ENTRY_ID,
AUDIT_ACTION_TYPE_,
row_number() over (order by PER_AB_ENTRY_ID, LAST_UPD_DT) rn
from ANC_PER_ABS_ENTRIES
)
SELECT * FROM ANC
where RN = 1
or RN = 2 and UPPER(flag) = 'D'
and APPROVAL_STATUS_CD = 'Approved'
and last_update_date >=:process_date
ORder by PER_AB_ENTRY_ID, LAST_UPD_DT
I understand that you want to find entries that were updated recently, and display their whole change log (starting with the initial insert).
Here is one way to do it with window functions:
select a.*
from (
select a.*,
max(last_update_date) over(partition by per_ab_entry_id) max_update_date
from anc_per_abs_entries a
) a
where max_update_date >= :process_date
order by per_ab_entry_id, last_update_date
In the subquery, the window max computes the latest update amongst all rows that belong to the same entry ; we can then use that information to filter in the outer query.
I was not sure about the filtering logic at the end of your SQL code, which is not described in the text of the question, so I left it apart - you might need to reincorporate it.
Try to create CTE from your table like this:
WITH
anc AS
(
Select
ROW_NUMBER() OVER(Partition By PER_AB_ENTRY_ID, PERSON_NUMBER Order By PER_AB_ENTRY_ID, PERSON_NUMBER, LAST_UPDATE_DATE) "RN",
Count(*) OVER(Partition By PER_AB_ENTRY_ID, PERSON_NUMBER Order By PER_AB_ENTRY_ID, PERSON_NUMBER) "MAX_RN",
a.*, Max(LAST_UPDATE_DATE) OVER(Partition By PER_AB_ENTRY_ID, PERSON_NUMBER Order By PER_AB_ENTRY_ID, PERSON_NUMBER, LAST_UPDATE_DATE) "MAX_UPD_DATE"
From anc_per_abs_entries a
)
Now you have last (max) update date along with row numbers and total number of rows per person and entry id.
The main SQL should do the job:
SELECT
a.RN,
a.MAX_RN,
a.PER_AB_ENTRY_ID,
a.PERSON_NUMBER,
a.ACTION_TYPE,
a.DURATION,
a.START_DATE,
a.END_DATE,
a.LAST_UPDATE_DATE,
To_Char(a.MAX_UPD_DATE, 'dd.mm.yyyy hh24:mi:ss') "MAX_UPD_DATE"
FROM
anc a
WHERE
LAST_UPDATE_DATE <= :process_date
AND
(
(
TRUNC(a.MAX_UPD_DATE) = TRUNC(:process_date) And
a.MAX_UPD_DATE <= :process_date And
a.RN = a.MAX_RN
)
OR
(
TRUNC(a.MAX_UPD_DATE) = TRUNC(:process_date) And
a.MAX_UPD_DATE > :process_date And
a.RN = a.MAX_RN - 1
)
OR
(
TRUNC(a.MAX_UPD_DATE) != TRUNC(:process_date) And
a.MAX_UPD_DATE > :process_date And
a.RN IN(a.MAX_RN, a.MAX_RN - 1)
)
)
Now you can adjust every part of the OR conditions to suit your needs. For instance I'm not sure what you want to select if second group of conditions is satisfied. It is either that in the code or maybe like here:
...
OR
(
TRUNC(a.MAX_UPD_DATE) = TRUNC(:process_date) And
a.MAX_UPD_DATE > :process_date And
a.RN = CASE WHEN a.MAX_RN > 1 THEN a.MAX_RN - 1 ELSE a.MAX_RN END
)
...
... or whatever suits you.
Regards...

I need to write a query to mark previous record as “Not eligible ” if a new record comes in within 30 days with same POS Order ID

I have a requirement to write a query to retrieve the records which have POS_ORDER_ID in the table with same POS_ORDER_ID which comes within 30days as new record with status 'Canceled', 'Discontinued' and need to mark previous POS_ORDER_ID record as it as not eligible
Table columns:
POS_ORDER_ID,
Status,
Order_date,
Error_description
A query containing MAX() and ROW_NUMBER() analytic functions might help you such as :
with t as
(
select t.*,
row_number() over (partition by pos_order_id order by Order_date desc ) as rn,
max(Order_date) over (partition by pos_order_id) as mx
from tab t -- your original table
)
select pos_order_id, Status, Order_date, Error_description,
case when rn >1
and t.status in ('Canceled','Discontinued')
and mx - t.Order_date <= 30
then
'Not eligible'
end as "Extra Status"
from t
Demo
Please use below query,
Select and validate
select POS_ORDER_ID, Status, Order_date, Error_description, row_number()
over(partition by POS_ORDER_ID order by Order_date desc)
from table_name;
Update query
merge into table_name t1
using
(select row_id, POS_ORDER_ID, Status, Order_date, Error_description,
row_number() over(partition by POS_ORDER_ID order by Order_date desc) as rnk
from table_name) t2
on (t1.POS_ORDER_ID = t2.POS_ORDER_ID and t1.row_id = t2.row_id)
when matched then
update
set
case when t2.rnk = 1 then 'Canceled' else 'Not Eligible';

sql how can I do get the first and last date when in two columns different rows (islands problem)

I think this problem is called islands and I'm looking on the net but not getting it.
I have a table where I need to get the start date and end date (different columns) in a range.
The table has 100,000 rows and I want to group it down so result will be
I have created a http://sqlfiddle.com/#!18/f4800/1
From the internet I think I need to create rows so have this now:
But I'm stuck thinking over what my next step will be.
You need row_number() instead of dense_rank() & use the difference of sequences :
select [CodeID], min([DATE_START]) as DATE_START,
max(DATE_FINISH) as DATE_FINISH, state
from (select [CodeID],[DATE_START],[DATE_FINISH],[STATE],
row_number() over(partition by [CodeID] order by [DATE_START]) as seq1,
row_number() over(partition by [CodeID],[STATE] order by [DATE_START]) as seq2
from Row_State
--where codeid = 'code1'
) t
group by [CodeID], state, (seq1-seq2)
order by CodeID, DATE_START;
Here is db fiddle.
If you know that the final result will be tiled in time with no gaps, then you can also use lag() and lead() like this:
select code_id, state, date_start,
lead(date_start) over (partition by code_id order by date_start) - interval '1 day' as day_end
from (select rs.*,
lag(state) over (partition by code_id order by date_start) as prev_state
from row_state rs
) rs
where prev_state is null or prev_state <> state;
The only issue with this version is that it does not correctly calculate the final date. But for that:
select code_id, state, date_start,
coalesce(dateadd(day, -1, lead(date_start) over (partition by code_id order by date_start)),
max_date_end
) as day_end
from (select rs.*,
lag(state) over (partition by code_id order by date_start) as prev_state,
max(date_end) over (partition by code_id) as max_date_end
from row_state rs
) rs
where prev_state is null or prev_state <> state;
This could be faster than an approach that uses aggregation.

How to return all the rows in the yellow census blocks?

Hey the schema is like this: for the whole dataset, we should order by machine_id first, then order by ss2k. after that, for each machine, we should find all the rows with at least consecutively 5 flag = 'census'. In this dataset, the result should be all the yellow rows..
I cannot return the last 4 rows of the yellow blocks by using this:
drop table if exists qz_panel_census_228_rank;
create table qz_panel_census_228_rank as
select t.*
from (select t.*,
count(*) filter (where flag = 'census') over (partition by machine_id, date order by ss2k rows between current row and 4 following) as census_cnt5,
count(*) filter (where flag = 'census') over (partition by machine_id, date) as count_census,
row_number() over (partition by machine_id, date order by ss2k) as seqnum,
count(*) over (partition by machine_id, date) as cnt
from qz_panel_census_228 t
) t
where census_cnt5 = 5
group by 1,2,3,4,5,6,7,8,9,10,11
DISTRIBUTED BY (machine_id);
You were close, but you need to search in both directions:
select t.*
from (select t.*,
case when count(*) filter (where flag = 'census')
over (partition by machine_id, date
order by ss2k
rows between 4 preceding and current row) = 5
or count(*) filter (where flag = 'census')
over (partition by machine_id, date
order by ss2k
rows between current row and 4 following) = 5
then 1
else 0
end as flag
from qz_panel_census_228 t
) t
where flag = 1
Edit:
This approach will not work unless you add an extra count for each possible 5 row window, e.g. 3 preceding and 1 following, 2 preceding and 2 following, etc. This results in ugly code and is not very flexible.
The common way to solve this gaps & islands problem is to assign consecutive rows to a common group first:
select *
from
(
select t2.*,
count(*) over (partition by machine_id, date, grp) as cnt
from
(
select t1.*
from (select t.*,
-- keep the same number for 'census' rows
sum(case when flag = 'census' then 0 else 1 end)
over (partition by machine_id, date
order by ss2k
rows unbounded preceding) as grp
from qz_panel_census_228 t
) t1
where flag = 'census' -- only census rows
) as t2
) t3
where cnt >= 5 -- only groups of at least 5 census rows
Wow, there has to be a better way of doing this, but the only way I could figure out was to create blocks of consecutive 'census' values. This looks awful but might be a catalyst to a better idea.
with q1 as (
select
machine_id, recorded, ss2k, flag, date,
case
when flag = 'census' and
lag (flag) over (order by machine_id, ss2k) != 'census'
then 1
else 0
end as block
from foo
),
q2 as (
select
machine_id, recorded, ss2k, flag, date,
sum (block) over (order by machine_id, ss2k) as group_id,
case when flag = 'census' then 1 else 0 end as census
from q1
),
q3 as (
select
machine_id, recorded, ss2k, flag, date, group_id,
sum (census) over (partition by group_id order by ss2k) as max_count
from q2
),
groups as (
select group_id
from q3
group by group_id
having max (max_count) >= 5
)
select
q2.machine_id, q2.recorded, q2.ss2k, q2.flag, q2.date
from
q2
join groups g on q2.group_id = g.group_id
where
q2.flag = 'census'
If you run each query within the with clauses in isolation, I think you will see how this evolves.

Oracle SQL: Show entries from component tables once apiece

My objective is produce a dataset that shows a boatload of data from, in total, just shy of 50 tables, all in the same Oracle SQL database schema. Each table except the first consists of, as far as the report I'm building cares, two elements:
A foreign-key identifier that matches a row on the first table
A date
There may be many rows on one of these tables corresponding to one case, and it will NOT be the same number of rows from table to table.
My objective is to have each row in the first table show up as many times as needed to display all the results from the other tables once. So, something like this (except on a lot more tables):
CASE_FILE_ID INITIATED_DATE INSPECTION_DATE PAYMENT_DATE ACTION_DATE
------------ -------------- --------------- ------------ -----------
1000 10-JUL-1986 14-JUL-1987 10-JUL-1986
1000 14-JUL-1988 10-JUL-1987
1000 14-JUL-1989 10-JUL-1988
1000 10-JUL-1989
My current SQL code (shrunk down to five tables, but the rest all follow the same format as T1-T4):
SELECT DISTINCT
A.CASE_FILE_ID,
T1.DATE AS INITIATED_DATE,
T2.DATE AS INSPECTION_DATE,
T3.DATE AS PAYMENT_DATE,
T4.DATE AS ACTION_DATE
FROM
RECORDS.CASE_FILE A
LEFT OUTER JOIN RECORDS.INITIATE T1 ON A.CASE_FILE_ID = T1.CASE_FILE_ID
LEFT OUTER JOIN RECORDS.INSPECTION T2 ON A.CASE_FILE_ID = T2.CASE_FILE_ID
LEFT OUTER JOIN RECORDS.PAYMENT T3 ON A.CASE_FILE_ID = T3.CASE_FILE_ID
LEFT OUTER JOIN RECORDS.ACTION T4 ON A.CASE_FILE_ID = T4.CASE_FILE_ID
ORDER BY
A.CASE_FILE_ID
The problem is, the output this produces results in distinct combinations; so in the above example (where I added a 'WHERE' clause of A.CASE_FILE_ID = '1000'), instead of four rows for case 1000, it'd show twelve (1 Initiated Date * 3 Inspection Dates * 4 Payment Dates = 12 rows). Suffice it to say, as the number of tables increases, this would get very prohibitive in both display and runtime, very quickly.
What is the best way to get an output loosely akin to the ideal above, where any one date is only shown once? Failing that, is there a way to get it to only show as many lines for one CASE_FILE as it needs to show all the dates, even if some dates repeat within that?
There isn't a good way, but there are two ways. One method involves subqueries for each table and complex outer joins. The second involves subqueries and union all. Let's go with that one:
SELECT CASE_FILE_ID,
MAX(INITIATED_DATE) as INITIATED_DATE,
MAX(INSPECTION_DATE) as INSPECTION_DATE,
MAX(PAYMENT_DATE) as PAYMENT_DATE,
MAX(ACTION) as ACTION
FROM ((SELECT A.CASE_FILE_ID, NULL as INITIATED_DATE, NULL as INSPECTION_DATE,
NULL as PAYMENT_DATE, NULL as ACTION_DATE,
1 as seqnum
FROM RECORDS.CASE_FILE A
) UNION ALL
(SELECT T1.CASE_FILE_ID, DATE as INITIATED_DATE, NULL as INSPECTION_DATE,
NULL as PAYMENT_DATE, NULL as ACTION_DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.INITIATE
) UNION ALL
(SELECT T1.CASE_FILE_ID, NULL as INITIATED_DATE, DATE as INSPECTION_DATE,
NULL as PAYMENT_DATE, NULL as ACTION_DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.INSPECTION
) UNION ALL
(SELECT T1.CASE_FILE_ID, NULL as INITIATED_DATE, NULL as INSPECTION_DATE,
DATE as PAYMENT_DATE, NULL as ACTION_DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.PAYMENT
) UNION ALL
(SELECT T1.CASE_FILE_ID, NULL as INITIATED_DATE, NULL as INSPECTION_DATE,
NULL as PAYMENT_DATE, ACTION as ACTION_DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.ACTION
)
) a
GROUP BY CASE_FILE_ID, seqnum;
Hmmm, a closely related solution is easier to maintain:
SELECT CASE_FILE_ID,
MAX(CASE WHEN type = 'INITIATED' THEN DATE END) as INITIATED_DATE,
MAX(CASE WHEN type = 'INSPECTION' THEN DATE END) as INSPECTION_DATE,
MAX(CASE WHEN type = 'PAYMENT' THEN DATE END) as PAYMENT_DATE,
MAX(CASE WHEN type = 'ACTION' THEN DATE END) as ACTION
FROM ((SELECT A.CASE_FILE_ID, NULL as TYPE, NULL as DATE,
1 as seqnum
FROM RECORDS.CASE_FILE A
) UNION ALL
(SELECT T1.CASE_FILE_ID, 'INSPECTION', DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.INITIATE
) UNION ALL
(SELECT T1.CASE_FILE_ID, 'INSPECTION', DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.INSPECTION
) UNION ALL
(SELECT T1.CASE_FILE_ID, 'PAYMENT', DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.PAYMENT
) UNION ALL
(SELECT T1.CASE_FILE_ID, 'ACTION', DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.ACTION
)
) a
GROUP BY CASE_FILE_ID, seqnum;