Get Current and Previous level In Hive QL - hive

I have a table with below details. I need to get the Current level and Previous level.
ID Level start_dt End_dt
A 1 2018-03-12 18:39:10 2020-01-01 00:00:00
A 1 2018-01-17 13:21:26 2018-03-12 18:39:10
A 2 2018-01-14 13:21:17 2018-01-17 13:21:26
My End state table is as below:
ID, current_level, previous_level, upgrade/downgrade flag
I tried ranking based on END_dt desc. But it would rank my second row as 2 and that isn't the previous level. Can I handle this in a single query? Or a single hop?

Tricky part is the previous level, I don't think that's possible in 1 pass. Try something like this though
with setup AS (
select ID
, Level
, row_number() over (partition by ID order by End_dt desc) as row_num
, MIN(Level) over (Partition by ID order by End_dt desc ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) as previous_level
from my_table
)
SELECT ID
, Level
, previous_level
, case when Level = previous_level THEN 'No Upgrade/Downgrade'
WHEN Level < previous_level THEN 'Upgrade'
WHEN Level > previous_level THEN 'Downgrade'
ELSE 'Unknown' END AS upgrade_downgrade_description
FROM setup
WHERE row_num = 1
;

You can use the LAG to get the previous row values, reference docs LAG
create table table_1(ID string,Level int,start_dt timestamp,End_dt timestamp);
insert into table_1 values
('A',1,'2018-03-12 18:39:10','2020-01-01 00:00:00'),
('A',1,'2018-01-17 13:21:26','2018-03-12 18:39:10'),
('A',2,'2018-01-14 13:21:17','2018-01-17 13:21:26');
SQL:
select id,curr_level,prev_level,
case when curr_level=prev_level then 'No Ups - Downs'
when curr_level>prev_level then 'Downgrade'
when curr_level<prev_level then 'Up-Downgrade'
when prev_level is null then 'No-Previous Level'
else 'Unkonwn state'
end upgrade_downgrade_description
from(
select table_1.id,
table_1.level as curr_level,
lag(table_1.level,1) over (partition by table_1.id order by table_1.end_dt desc) prev_level
from table_1) s;
Output:
id curr_level prev_level upgrade_downgrade_description
A 1 NULL No-Previous Level
A 1 1 No Ups - Downs
A 2 1 Downgrade

Related

SQL query to pick up the previous row and the current row with the last update date condition

I have a table ANC_PER_ABS_ENTRIES that has the following detail -
PER_AB_ENTRY_ID
person_number
action_type
duration
START_dATE
END_DATE
LAST_UPD_DT
15
101
INSERT
3
01/10/2022
03/10/2022
2022-11-01T04:59:43
15
101
UPDATE
1
01/10/2022
01/10/2022
2022-11-02T10:59:43
This table is a history table and the action like - insert update and delete are tracked in this table.
Insert means when the entry in the table was added
Update means some sort of changes were made in the table
Delete means the record was deleted.
I want to create a query that picks up the changes in this table by comparing the last_update_date and the run_date (parameter)
Eg- for person_number 101 with per_ab_entry_id -- > 15 , the action_type is insert first that means the record was created on first, then it is updated and the end_date , duration is changed.
so if i run the below query on the 1st after 4:59, then the 1st row will be picked.
When I run it on 2nd , only the 2nd row is getting picked.
But how i want is that in case sthe same per_ab_entry_id was updated and if the last_upd_dt of the update >= run_Date then , the insert row should also be extracted -
The output should look like this in the latest run-
PER_AB_ENTRY_ID
person_number
flag
duration
START_dATE
END_DATE
LAST_UPD_DT
15
101
O
3
01/10/2022
03/10/2022
2022-11-01T04:59:43
15
101
u
1
01/10/2022
01/10/2022
2022-11-02T10:59:43
I have to run the below query such that the last_update_date >= :process_date.
Its working for the delete condition and evrything except this case. How can it be tweaked that when the last_upd_dt of the latest recorrd of one per_ab_entry_id >=process_date then its previous row is also sent.
The below query is not working because the last_upd_dt of the 1st row <= process_date
with anc as
(
select person_number,
absence_type,
ABSENCE_STATUS,
approval_status_cd,
start_date,
end_date,
duration,
PER_AB_ENTRY_ID,
AUDIT_ACTION_TYPE_,
row_number() over (order by PER_AB_ENTRY_ID, LAST_UPD_DT) rn
from ANC_PER_ABS_ENTRIES
)
SELECT * FROM ANC
where RN = 1
or RN = 2 and UPPER(flag) = 'D'
and APPROVAL_STATUS_CD = 'Approved'
and last_update_date >=:process_date
ORder by PER_AB_ENTRY_ID, LAST_UPD_DT
I understand that you want to find entries that were updated recently, and display their whole change log (starting with the initial insert).
Here is one way to do it with window functions:
select a.*
from (
select a.*,
max(last_update_date) over(partition by per_ab_entry_id) max_update_date
from anc_per_abs_entries a
) a
where max_update_date >= :process_date
order by per_ab_entry_id, last_update_date
In the subquery, the window max computes the latest update amongst all rows that belong to the same entry ; we can then use that information to filter in the outer query.
I was not sure about the filtering logic at the end of your SQL code, which is not described in the text of the question, so I left it apart - you might need to reincorporate it.
Try to create CTE from your table like this:
WITH
anc AS
(
Select
ROW_NUMBER() OVER(Partition By PER_AB_ENTRY_ID, PERSON_NUMBER Order By PER_AB_ENTRY_ID, PERSON_NUMBER, LAST_UPDATE_DATE) "RN",
Count(*) OVER(Partition By PER_AB_ENTRY_ID, PERSON_NUMBER Order By PER_AB_ENTRY_ID, PERSON_NUMBER) "MAX_RN",
a.*, Max(LAST_UPDATE_DATE) OVER(Partition By PER_AB_ENTRY_ID, PERSON_NUMBER Order By PER_AB_ENTRY_ID, PERSON_NUMBER, LAST_UPDATE_DATE) "MAX_UPD_DATE"
From anc_per_abs_entries a
)
Now you have last (max) update date along with row numbers and total number of rows per person and entry id.
The main SQL should do the job:
SELECT
a.RN,
a.MAX_RN,
a.PER_AB_ENTRY_ID,
a.PERSON_NUMBER,
a.ACTION_TYPE,
a.DURATION,
a.START_DATE,
a.END_DATE,
a.LAST_UPDATE_DATE,
To_Char(a.MAX_UPD_DATE, 'dd.mm.yyyy hh24:mi:ss') "MAX_UPD_DATE"
FROM
anc a
WHERE
LAST_UPDATE_DATE <= :process_date
AND
(
(
TRUNC(a.MAX_UPD_DATE) = TRUNC(:process_date) And
a.MAX_UPD_DATE <= :process_date And
a.RN = a.MAX_RN
)
OR
(
TRUNC(a.MAX_UPD_DATE) = TRUNC(:process_date) And
a.MAX_UPD_DATE > :process_date And
a.RN = a.MAX_RN - 1
)
OR
(
TRUNC(a.MAX_UPD_DATE) != TRUNC(:process_date) And
a.MAX_UPD_DATE > :process_date And
a.RN IN(a.MAX_RN, a.MAX_RN - 1)
)
)
Now you can adjust every part of the OR conditions to suit your needs. For instance I'm not sure what you want to select if second group of conditions is satisfied. It is either that in the code or maybe like here:
...
OR
(
TRUNC(a.MAX_UPD_DATE) = TRUNC(:process_date) And
a.MAX_UPD_DATE > :process_date And
a.RN = CASE WHEN a.MAX_RN > 1 THEN a.MAX_RN - 1 ELSE a.MAX_RN END
)
...
... or whatever suits you.
Regards...

SQL - Find the min(date) since a category has its most recent value

I need some help with this problem.
Assuming I have following table:
contract_id
tariff_id
product_category
date (DD.MM.YYYY)
month (YYYYMM)
123456
ABC
small
01.01.2021
202101
123456
ABC
medium
01.02.2021
202102
123456
DEF
small
01.03.2021
202103
123456
DEF
small
01.04.2021
202104
123456
ABC
big
01.05.2021
202105
123456
DEF
small
01.06.2021
202106
123456
DEF
medium
02.06.2021
202106
123456
DEF
medium
01.07.2021
202107
The table is partitioned by month.
This is a part of my table containing multiple contract_ids.
I'm trying to figure out for every contract_id, since when it has its most recent tariff_id and since when it has the product_category_id='small' (if it doesn't have small as product category, the value should then be Null).
The results will be written into a table which gets updated every month.
So for the table above my latest results should look like this:
contract_id
same_tariff_id_since
product_category_small_since
123456
01.06.2021
NULL
I'm using Hive.
So far, I could only come up with this solution for same_tariff_id_since:
The problem is that it gives me absolute min(date) for the tariff_id and not the min(date) since the most recent tariff_id.
I think the code for product_category_small_since will have mostly the same logic.
My current code is:
SELECT q2.contract_id
, q3.tariff_id
, q2.date
FROM (
SELECT contract_id
, max(date_2) AS date
FROM (
SELECT contract_id
, date
, min(date) OVER (PARTITION BY tariff_id ORDER BY date) AS date_2
FROM given_table
)q1
WHERE date=date_2
GROUP BY contract_id
)q2
JOIN given_table AS q3
ON q2.contract_id=q3.contract_id
AND q2.date=q3.date
Thanks in advance.
One approach for solving this type of query is to do a grouping of the sequences you want to track. For the tariff_id sequence grouping, you want a new "sequence grouping id" for each time that the tariff id changes for a given contract id. Since the product_category can change independently, you need to do a sequence grouping id for that change as well.
Here's code to accomplish the task. This only returns the latest version of each contract and the specific columns you described in your latest results table. This was done against PostgreSQL 9.6, but the syntax and data types can probably be modified to be compatible with Hive.
https://www.db-fiddle.com/f/qSk3Mb9Xfp1NDo5VeA1qHh/8
select q2.contract_id
, to_char(min(q2."date (DD.MM.YYYY)")
over (partition by q2.contract_id, q2.contract_tariff_sequence_id), 'DD.MM.YYYY') as same_tariff_id_since
, to_char(min(case when q2.product_category = 'small' then q2."date (DD.MM.YYYY)" else null end)
over (partition by q2.contract_id, q2.contract_product_category_sequence_id), 'DD.MM.YYYY') as product_category_small_since
from(
select q1.*
, sum(case when q1.tariff_id = q1.prior_tariff_id then 0 else 1 end)
over (partition by q1.contract_id order by q1."date (DD.MM.YYYY)" rows unbounded preceding) as contract_tariff_sequence_id
, sum(case when q1.product_category = q1.prior_product_category then 0 else 1 end)
over (partition by q1.contract_id order by q1."date (DD.MM.YYYY)" rows unbounded preceding) as contract_product_category_sequence_id
from (
select *
, lag(tariff_id) over (partition by contract_id order by "date (DD.MM.YYYY)") as prior_tariff_id
, lag(product_category) over (partition by contract_id order by "date (DD.MM.YYYY)") as prior_product_category
, row_number() over (partition by contract_id order by "date (DD.MM.YYYY)" desc) latest_record_per_contract
from contract_tariffs
) q1
) q2
where latest_record_per_contract = 1
If you want to see all the rows and columns so you can examine how this works with the sequence grouping ids etc., you can modify the outer query slightly:
https://www.db-fiddle.com/f/qSk3Mb9Xfp1NDo5VeA1qHh/10
If this works for you, please mark as correct answer.

Find the true start end dates for customers that have multiple accounts in SQL Server 2014

I have a checking account table that contains columns Cust_id (customer id), Open_Date (start date), and Closed_Date (end date). There is one row for each account. A customer can open multiple accounts at any given point. I would like to know how long the person has been a customer.
eg 1:
CREATE TABLE [Cust]
(
[Cust_id] [varchar](10) NULL,
[Open_Date] [date] NULL,
[Closed_Date] [date] NULL
)
insert into [Cust] values ('a123', '10/01/2019', '10/15/2019')
insert into [Cust] values ('a123', '10/12/2019', '11/01/2019')
Ideally I would like to insert this into a table with just one row, that says this person has been a customer from 10/01/2019 to 11/01/2019. (as he opened his second account before he closed his previous one.
Similarly eg 2:
insert into [Cust] values ('b245', '07/01/2019', '09/15/2019')
insert into [Cust] values ('b245', '10/12/2019', '12/01/2019')
I would like to see 2 rows in this case- one that shows he was a customer from 07/01 to 09/15 and then again from 10/12 to 12/01.
Can you point me to the best way to get this?
I would approach this as a gaps and islands problem. You want to group together groups of adjacents rows whose periods overlap.
Here is one way to solve it using lag() and a cumulative sum(). Everytime the open date is greater than the closed date of the previous record, a new group starts.
select
cust_id,
min(open_date) open_date,
max(closed_date) closed_date
from (
select
t.*,
sum(case when not open_date <= lag_closed_date then 1 else 0 end)
over(partition by cust_id order by open_date) grp
from (
select
t.*,
lag(closed_date) over (partition by cust_id order by open_date) lag_closed_date
from cust t
) t
) t
group by cust_id, grp
In this db fiddle with your sample data, the query produces:
cust_id | open_date | closed_date
:------ | :--------- | :----------
a123 | 2019-10-01 | 2019-11-01
b245 | 2019-07-01 | 2019-09-15
b245 | 2019-10-12 | 2019-12-01
I would solve this with recursion. While this is certainly very heavy, it should accommodate even the most complex account timings (assuming your data has such). However, if the sample data provided is as complex as you need to solve for, I highly recommend sticking with the solution provided above. It is much more concise and clear.
WITH x (cust_id, open_date, closed_date, lvl, grp) AS (
SELECT cust_id, open_date, closed_date, 1, 1
FROM (
SELECT cust_id
, open_date
, closed_date
, row_number()
OVER (PARTITION BY cust_id ORDER BY closed_date DESC, open_date) AS rn
FROM cust
) AS t
WHERE rn = 1
UNION ALL
SELECT cust_id, open_date, closed_date, lvl, grp
FROM (
SELECT c.cust_id
, c.open_date
, c.closed_date
, x.lvl + 1 AS lvl
, x.grp + CASE WHEN c.closed_date < x.open_date THEN 1 ELSE 0 END AS grp
, row_number() OVER (PARTITION BY c.cust_id ORDER BY c.closed_date DESC) AS rn
FROM cust c
JOIN x
ON x.cust_id = c.cust_id
AND c.open_date < x.open_date
) AS t
WHERE t.rn = 1
)
SELECT cust_id, min(open_date) AS first_open_date, max(closed_date) AS last_closed_date
FROM x
GROUP BY cust_id, grp
ORDER BY cust_id, grp
I would also add the caveat that I don't run on SQL Server, so there could be syntax differences that I didn't account for. Hopefully they are minor, if present.
you can try something like that:
select distinct
cust_id,
(select min(Open_Date)
from Cust as b
where b.cust_id = a.cust_id and
a.Open_Date <= b.Closed_Date and
a.Closed_Date >= b.Open_Date
),
(select max(Closed_Date)
from Cust as b
where b.cust_id = a.cust_id and
a.Open_Date <= b.Closed_Date and
a.Closed_Date >= b.Open_Date
)
from Cust as a
so, for every row - you're selecting minimal and maximal dates from all overlapping ranges, later distinct filters out duplicates

Query Hierarchical Queries

I wanted to profile my data set to find the data discrepancies.
My sample date set:
id status stdate enddate
1 new 01-JUL-17 31-JUL-17
1 process 01-OCT-17 31-DEC-18
1 new 01-JAN-19 31-JAN-19--- issue
2 new 01-SEP-14 31-JAN-15
2 process 01-JUN-16 30-NOV-17
2 complete 01-DEC-17 31-DEC-18
....
....
I would like to find out how many of those IDs have a result status that is older than current. The order of the status sequence should be NEW-PROCESS-COMPLETE. So I want report all IDs where the most recent status has reversed to an earlier status.
You can use the LAG() function to find the offending rows, as in:
with x (id, status, stdate, enddate,
prev_id, prev_status, prev_stdate, prev_enddate) as (
select
id,
status,
stdate,
enddate,
lag(id) over(partition by id order by stdate),
lag(status) over(partition by id order by stdate),
lag(stdate) over(partition by id order by stdate),
lag(enddate) over(partition by id order by stdate)
from my_table
)
select * from x
where status = 'new' and prev_status in ('process', 'complete')
or status = 'process' and prev_status = 'complete'
Note: I assume you need to compare only between rows of the same ID.

Query for getting previous date in oracle in specific scenario

I have the below data in a table A which I need to insert into table B along with one computed column.
TABLE A:
Account_No | Balance | As_on_date
1001 |-100 | 1-Jan-2013
1001 |-150 | 2-Jan-2013
1001 | 200 | 3-Jan-2013
1001 |-250 | 4-Jan-2013
1001 |-300 | 5-Jan-2013
1001 |-310 | 6-Jan-2013
Table B:
In table B, there should be no of days to be shown when balance is negative and
the date one which it has gone into negative.
So, for 6-Jan-2013, this table should show below data:
Account_No | Balance | As_on_date | Days_passed | Start_date
1001 | -310 | 6-Jan-2013 | 3 | 4-Jan-2013
Here, no of days should be the days when the balance has gone negative in recent time and
not from the old entry.
I need to write a SQL query to get the no of days passed and the start date from when the
balance has gone negative.
I tried to formulate a query using Lag analytical function, but I am not succeeding.
How should I check the first instance of negative balance by traversing back using LAG function?
Even the first_value function was given a try but not getting how to partition in it based on negative value.
Any help or direction on this will be really helpful.
Here's a way to achive this using analytical functions.
INSERT INTO tableb
WITH tablea_grouped1
AS (SELECT account_no,
balance,
as_on_date,
SUM (CASE WHEN balance >= 0 THEN 1 ELSE 0 END)
OVER (PARTITION BY account_no ORDER BY as_on_date)
grp
FROM tablea),
tablea_grouped2
AS (SELECT account_no,
balance,
as_on_date,
grp,
LAST_VALUE (
balance)
OVER (
PARTITION BY account_no, grp
ORDER BY as_on_date
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
closing_balance
FROM tablea_grouped1
WHERE balance < 0
AND grp != 0 --keep this, if starting negative balance is to be ignored
)
SELECT account_no,
closing_balance,
MAX (as_on_date),
MAX (as_on_date) - MIN (as_on_date) + 1,
MIN (as_on_date)
FROM tablea_grouped2
GROUP BY account_no, grp, closing_balance
ORDER BY account_no, MIN (as_on_date);
First, SUM is used as analytical function to assign group number to consecutive balances less than 0.
LAST_VALUE function is then used to find the last -ve balance in each group
Finally, the result is aggregated based on each group. MAX(date) gives the last date, MIN(date) gives the starting date, and the difference of the two gives number of days.
Demo at sqlfiddle.
Try this and use gone_negative to computing specified column value for insert into another table:
select temp.account_no,
temp.balance,
temp.prev_balance,
temp.on_date,
temp.prev_on_date,
case
WHEN (temp.balance < 0 and temp.prev_balance >= 0) THEN
1
else
0
end as gone_negative
from (select account_no,
balance,
on_date,
lag(balance, 1, 0) OVER(partition by account_no ORDER BY account_no) prev_balance,
lag(on_date, 1) OVER(partition by account_no ORDER BY account_no) prev_on_date
from tblA
order by account_no) temp;
Hope this helps pal.
Here's on way to do it.
Select all records from my_table where the balance is positive.
Do a self-join and get all the records that have a as_on_date is greater than the current row, but the amounts are in negative
Once we get these, we cut-off the rows WHERE the date difference between the current and the previous row for as_on_date is > 1. We then filter the results a outer sub query
The Final select just groups the rows and gets the min, max values for the filtered rows which are grouped.
Query:
SELECT
account_no,
min(case when row_number = 1 then balance end) as balance,
max(mt2_date) as As_on_date,
max(mt2_date) - mt1_date as Days_passed,
min(mt2_date) as Start_date
FROM
(
SELECT
*,
MIN(break_date) OVER( PARTITION BY mt1_date ) AS min_break_date,
ROW_NUMBER() OVER( PARTITION BY mt1_date ORDER BY mt2_date desc ) AS row_number
FROM
(
SELECT
mt1.account_no,
mt2.balance,
mt1.as_on_date as mt1_date,
mt2.as_on_date as mt2_date,
case when mt2.as_on_date - lag(mt2.as_on_date,1) over () > 1 then mt2.as_on_date end as break_date
FROM
my_table mt1
JOIN my_table mt2 ON ( mt2.balance < mt1.balance AND mt2.as_on_date > mt1.as_on_date )
WHERE
MT1.balance > 0
order by
mt1.as_on_date,
mt2.as_on_date ) sub_query
) T
WHERE
min_break_date is null
OR mt2_date < min_break_date
GROUP BY
mt1_date,
account_no
SQLFIDDLE
I have a added a few more rows in the FIDDLE, just to test it out