Show next date (LEAD) based on simple condition - sql-server-2012

I have a table with a unique index on Contracts of Customers that live in Houses, I want to know how many days it takes for a House to be inhabited by the next person.
I am already quite far, but unfortunately my dataset has contracts with TYPE = 0, which are automatically generated by the system and which should be ignored,
if I don't ignore these 'empty contracts' with TYPE = 0, then the data says actually ALL houses are inhabited within 1 day.
Now I currently get the following result:
SELECT
CONTRACTID
,RENTALOBJECTID
,TYPE
,VALIDFROM
,VALIDTO
,LEAD(CONTRACTID) OVER (PARTITION BY RENTALOBJECTID ORDER BY VALIDFROM) AS 'NextContractId'
,LEAD(VALIDFROM) OVER (PARTITION BY RENTALOBJECTID ORDER BY VALIDFROM) AS 'NextValidFrom'
,LEAD(VALIDTO) OVER (PARTITION BY RENTALOBJECTID ORDER BY VALIDFROM) AS 'NextValidTo'
FROMPMCCONTRACT
With the following code:
CONTRACTID RENTALOBJECTID TYPE VALIDFROM VALIDTO NextContractId NextValidFrom NextValidTo
HC001 1 0 1/1/2015 1/1/2017 HC002 1/2/2017 8/1/2017
HC002 1 0 1/2/2017 8/1/2017 HC003 8/2/2017 NULL
HC003 1 3 8/2/2017 NULL NULL NULL NULL
However I want the result to look like the following, where it ignores the Contracts where TYPE = 0.
CONTRACTID RENTALOBJECTID TYPE VALIDFROM VALIDTO NextContractId NextValidFrom NextValidTo
HC001 1 3 1/1/2015 1/1/2017 HC003 8/2/2017 NULL
HC002 1 0 1/2/2017 8/1/2017 NULL NULL NULL
HC003 1 3 8/2/2017 NULL NULL NULL NULL
And as you can see the time in days for RENTALOBJECTID = 1 to be inhabited again after CONTRACT = HC001 is more than a month now.
Does anyone know how this works in SQL-server-2012?
Kind regards,
Igor

Your example data is somewhat inconsistent and you have omitted to explain some aspects of the desired results but this should basically do what you need.
The window frame is set with VALIDFROM ordered descending and thus ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING includes all rows with higher VALIDFROM. Only ones with TYPE <> 0 get a not null ConcatResult and the MIN picks out the one with the lowest VALIDFROM in that window frame (i.e. next biggest to current row). The three concatenated column values are then pulled out of this result.
WITH PMC
AS (SELECT CONTRACTID,
RENTALOBJECTID,
TYPE,
VALIDFROM,
VALIDTO,
ConcatResult = MIN(CASE
WHEN TYPE <> 0
THEN FORMAT(VALIDFROM, 'yyyy-MM-dd')
+ FORMAT(ISNULL(VALIDTO, '1900-01-01'), 'yyyy-MM-dd')
+ CONTRACTID
END)
OVER (
PARTITION BY RENTALOBJECTID
ORDER BY VALIDFROM DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
FROM PMCCONTRACT)
SELECT *,
NextContractId = SUBSTRING(ConcatResult, 21, 10),
NextValidFrom = CAST(SUBSTRING(ConcatResult, 1, 10) AS DATE),
NextValidTo = CAST(NULLIF(SUBSTRING(ConcatResult, 11, 10), '1900-01-01') AS DATE)
FROM PMC
ORDER BY RENTALOBJECTID,
VALIDFROM

Related

Collapse multiple rows into a single row based upon a break condition

I have a simple sounding requirement that has had me stumped for a day or so now, so its time to seek help from the experts.
My requirement is to simply roll-up multiple rows into a single row based upon a break condition - when any of these columns change Employee ID, Allowance Plan, Allowance Amount or To Date, then the row is to be kept, if that makes sense.
An example source data set is shown below:
and the target data after collapsing the rows should look like this:
As you can see I don't need any type of running totals calculating I just need to collapse the rows into a single record per from date/to date combination.
So far I have tried the following SQL using a GROUP BY and MIN function
select [Employee ID], [Allowance Plan],
min([From Date]), max([To Date]), [Allowance Amount]
from [dbo].[#AllowInfo]
group by [Employee ID], [Allowance Plan], [Allowance Amount]
but that just gives me a single row and does not take into account the break condition.
what do I need to do so that the records are rolled-up (correct me if that is not the right terminology) correctly taking into account the break condition?
Any help is appreciated.
Thank you.
Note that your test data does not really exercise the algo that well - e.g. you only have one employee, one plan. Also, as you described it, you would end up with 4 rows as there is a change of todate between 7->8, 8->9, 9->10 and 10->11.
But I can see what you are trying to do, so this should at least get you on the right track, and returns the expected 3 rows. I have taken the end of a group to be where either employee/plan/amount has changed, or where todate is not null (or where we reach the end of the data)
CREATE TABLE #data
(
RowID INT,
EmployeeID INT,
AllowancePlan VARCHAR(30),
FromDate DATE,
ToDate DATE,
AllowanceAmount DECIMAL(12,2)
);
INSERT INTO #data(RowID, EmployeeID, AllowancePlan, FromDate, ToDate, AllowanceAmount)
VALUES
(1,200690,'CarAllowance','30/03/2017', NULL, 1000.0),
(2,200690,'CarAllowance','01/08/2017', NULL, 1000.0),
(6,200690,'CarAllowance','23/04/2018', NULL, 1000.0),
(7,200690,'CarAllowance','30/03/2018', NULL, 1000.0),
(8,200690,'CarAllowance','21/06/2018', '01/04/2019', 1000.0),
(9,200690,'CarAllowance','04/11/2021', NULL, 1000.0),
(10,200690,'CarAllowance','30/03/2017', '13/05/2022', 1000.0),
(11,200690,'CarAllowance','14/05/2022', NULL, 850.0);
-- find where the break points are
WITH chg AS
(
SELECT *,
CASE WHEN LAG(EmployeeID, 1, -1) OVER(ORDER BY RowID) != EmployeeID
OR LAG(AllowancePlan, 1, 'X') OVER(ORDER BY RowID) != AllowancePlan
OR LAG(AllowanceAmount, 1, -1) OVER(ORDER BY RowID) != AllowanceAmount
OR LAG(ToDate, 1) OVER(ORDER BY RowID) IS NOT NULL
THEN 1 ELSE 0 END AS NewGroup
FROM #data
),
-- count the number of break points as we go to group the related rows
grp AS
(
SELECT chg.*,
ISNULL(
SUM(NewGroup)
OVER (ORDER BY RowID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
0) AS grpNum
FROM chg
)
SELECT MIN(grp.RowID) AS RowID,
MAX(grp.EmployeeID) AS EmployeeID,
MAX(grp.AllowancePlan) AS AllowancePlan,
MIN(grp.FromDate) AS FromDate,
MAX(grp.ToDate) AS ToDate,
MAX(grp.AllowanceAmount) AS AllowanceAmount
FROM grp
GROUP BY grpNum
one way is to get all rows the last todate, and then group on that
select min(t.RowID) as RowID,
t.EmployeeID,
min(t.AllowancePlan) as AllowancePlan,
min(t.FromDate) as FromDate,
max(t.ToDate) as ToDate,
min(t.AllowanceAmount) as AllowanceAmount
from ( select t.RowID,
t.EmployeeID,
t.FromDate,
t.AllowancePlan,
t.AllowanceAmount,
case when t.ToDate is null then ( select top 1 t2.ToDate
from test t2
where t2.EmployeeID = t.EmployeeID
and t2.ToDate is not null
and t2.FromDate > t.FromDate -- t2.RowID > t.RowID
order by t2.RowID, t2.FromDate
)
else t.ToDate
end as todate
from test t
) t
group by t.EmployeeID, t.ToDate
order by t.EmployeeID, min(t.RowID)
See and test yourself in this DBFiddle
the result is
RowID
EmployeeID
AllowancePlan
FromDate
ToDate
AllowanceAmount
1
200690
CarAllowance
2017-03-30
2019-04-01
1000
9
200690
CarAllowance
2021-11-04
2022-05-13
1000
11
200690
CarAllowance
2022-05-14
(null)
850

SQL - Find the min(date) since a category has its most recent value

I need some help with this problem.
Assuming I have following table:
contract_id
tariff_id
product_category
date (DD.MM.YYYY)
month (YYYYMM)
123456
ABC
small
01.01.2021
202101
123456
ABC
medium
01.02.2021
202102
123456
DEF
small
01.03.2021
202103
123456
DEF
small
01.04.2021
202104
123456
ABC
big
01.05.2021
202105
123456
DEF
small
01.06.2021
202106
123456
DEF
medium
02.06.2021
202106
123456
DEF
medium
01.07.2021
202107
The table is partitioned by month.
This is a part of my table containing multiple contract_ids.
I'm trying to figure out for every contract_id, since when it has its most recent tariff_id and since when it has the product_category_id='small' (if it doesn't have small as product category, the value should then be Null).
The results will be written into a table which gets updated every month.
So for the table above my latest results should look like this:
contract_id
same_tariff_id_since
product_category_small_since
123456
01.06.2021
NULL
I'm using Hive.
So far, I could only come up with this solution for same_tariff_id_since:
The problem is that it gives me absolute min(date) for the tariff_id and not the min(date) since the most recent tariff_id.
I think the code for product_category_small_since will have mostly the same logic.
My current code is:
SELECT q2.contract_id
, q3.tariff_id
, q2.date
FROM (
SELECT contract_id
, max(date_2) AS date
FROM (
SELECT contract_id
, date
, min(date) OVER (PARTITION BY tariff_id ORDER BY date) AS date_2
FROM given_table
)q1
WHERE date=date_2
GROUP BY contract_id
)q2
JOIN given_table AS q3
ON q2.contract_id=q3.contract_id
AND q2.date=q3.date
Thanks in advance.
One approach for solving this type of query is to do a grouping of the sequences you want to track. For the tariff_id sequence grouping, you want a new "sequence grouping id" for each time that the tariff id changes for a given contract id. Since the product_category can change independently, you need to do a sequence grouping id for that change as well.
Here's code to accomplish the task. This only returns the latest version of each contract and the specific columns you described in your latest results table. This was done against PostgreSQL 9.6, but the syntax and data types can probably be modified to be compatible with Hive.
https://www.db-fiddle.com/f/qSk3Mb9Xfp1NDo5VeA1qHh/8
select q2.contract_id
, to_char(min(q2."date (DD.MM.YYYY)")
over (partition by q2.contract_id, q2.contract_tariff_sequence_id), 'DD.MM.YYYY') as same_tariff_id_since
, to_char(min(case when q2.product_category = 'small' then q2."date (DD.MM.YYYY)" else null end)
over (partition by q2.contract_id, q2.contract_product_category_sequence_id), 'DD.MM.YYYY') as product_category_small_since
from(
select q1.*
, sum(case when q1.tariff_id = q1.prior_tariff_id then 0 else 1 end)
over (partition by q1.contract_id order by q1."date (DD.MM.YYYY)" rows unbounded preceding) as contract_tariff_sequence_id
, sum(case when q1.product_category = q1.prior_product_category then 0 else 1 end)
over (partition by q1.contract_id order by q1."date (DD.MM.YYYY)" rows unbounded preceding) as contract_product_category_sequence_id
from (
select *
, lag(tariff_id) over (partition by contract_id order by "date (DD.MM.YYYY)") as prior_tariff_id
, lag(product_category) over (partition by contract_id order by "date (DD.MM.YYYY)") as prior_product_category
, row_number() over (partition by contract_id order by "date (DD.MM.YYYY)" desc) latest_record_per_contract
from contract_tariffs
) q1
) q2
where latest_record_per_contract = 1
If you want to see all the rows and columns so you can examine how this works with the sequence grouping ids etc., you can modify the outer query slightly:
https://www.db-fiddle.com/f/qSk3Mb9Xfp1NDo5VeA1qHh/10
If this works for you, please mark as correct answer.

Find minimum overlap of Each Status

I need to find date ranges where status is Missing/Not Ready in all the groups ( Only the overlapping date ranges where all the groups have status of missing/notready)
'''
ID. Group. Eff_Date. Exp_Date Status
1. 1 1/1/18 10:00 3/4/18 15:23 Ready
1 1 3/4/18 15:24. 7/12/18 13:54. Not Ready
1. 1 7/12/18 13:55. 11/22/19 11:20 Missing
1. 1. 11/22/19 11:21. 9/25/20 1:12. Ready
1. 1. 9/25/20 1:13 12/31/99. Missing
1. 2 1/1/16 10:00 2/2/17 17:20 Ready
1 2 2/2/17 17:21. 5/25/18 1:23. Missing
1. 2 5/25/18 1:24 9/2/18 4:15 Not Ready
1. 2 9/2/18 4:16. 6/3/21 7:04. Missing
1. 2 6/3/21 7:04. 12/31/99. Ready
Output for not ready: ( below are the dates where each group has not ready status)
5/25/18 1:24. 7/12/18 13:54 Not Ready
Missing: ( Below are the date where each group has Missing status)
9/25/20 1:13 6/3/21 7:04 Missing
'''
Note-> Each ID can have any number of groups. Database is Snowflake
You can do this by unpivoting and counting. Assuming that the periods do not overlap for a given id:
with x as (
select eff_date as date, 1 as inc
from t
where status = 'Missing'
union all
select end_date, -1 as inc
from t
where status = 'Missing'
)
select date, next_date, active_on_date
from (select date,
sum(sum(inc)) over (order by date) as active_on_date,
lead(date) over (order by date) as next_date
from x
group by date
) x
where active_on_date = (select count(distinct id) from t);
Note: This handles one status at a time, which is what this question is asking. If you want to handle all event types, then ask a new question with appropriate sample data, desired results, and explanation.

Get Current and Previous level In Hive QL

I have a table with below details. I need to get the Current level and Previous level.
ID Level start_dt End_dt
A 1 2018-03-12 18:39:10 2020-01-01 00:00:00
A 1 2018-01-17 13:21:26 2018-03-12 18:39:10
A 2 2018-01-14 13:21:17 2018-01-17 13:21:26
My End state table is as below:
ID, current_level, previous_level, upgrade/downgrade flag
I tried ranking based on END_dt desc. But it would rank my second row as 2 and that isn't the previous level. Can I handle this in a single query? Or a single hop?
Tricky part is the previous level, I don't think that's possible in 1 pass. Try something like this though
with setup AS (
select ID
, Level
, row_number() over (partition by ID order by End_dt desc) as row_num
, MIN(Level) over (Partition by ID order by End_dt desc ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) as previous_level
from my_table
)
SELECT ID
, Level
, previous_level
, case when Level = previous_level THEN 'No Upgrade/Downgrade'
WHEN Level < previous_level THEN 'Upgrade'
WHEN Level > previous_level THEN 'Downgrade'
ELSE 'Unknown' END AS upgrade_downgrade_description
FROM setup
WHERE row_num = 1
;
You can use the LAG to get the previous row values, reference docs LAG
create table table_1(ID string,Level int,start_dt timestamp,End_dt timestamp);
insert into table_1 values
('A',1,'2018-03-12 18:39:10','2020-01-01 00:00:00'),
('A',1,'2018-01-17 13:21:26','2018-03-12 18:39:10'),
('A',2,'2018-01-14 13:21:17','2018-01-17 13:21:26');
SQL:
select id,curr_level,prev_level,
case when curr_level=prev_level then 'No Ups - Downs'
when curr_level>prev_level then 'Downgrade'
when curr_level<prev_level then 'Up-Downgrade'
when prev_level is null then 'No-Previous Level'
else 'Unkonwn state'
end upgrade_downgrade_description
from(
select table_1.id,
table_1.level as curr_level,
lag(table_1.level,1) over (partition by table_1.id order by table_1.end_dt desc) prev_level
from table_1) s;
Output:
id curr_level prev_level upgrade_downgrade_description
A 1 NULL No-Previous Level
A 1 1 No Ups - Downs
A 2 1 Downgrade

Query for getting previous date in oracle in specific scenario

I have the below data in a table A which I need to insert into table B along with one computed column.
TABLE A:
Account_No | Balance | As_on_date
1001 |-100 | 1-Jan-2013
1001 |-150 | 2-Jan-2013
1001 | 200 | 3-Jan-2013
1001 |-250 | 4-Jan-2013
1001 |-300 | 5-Jan-2013
1001 |-310 | 6-Jan-2013
Table B:
In table B, there should be no of days to be shown when balance is negative and
the date one which it has gone into negative.
So, for 6-Jan-2013, this table should show below data:
Account_No | Balance | As_on_date | Days_passed | Start_date
1001 | -310 | 6-Jan-2013 | 3 | 4-Jan-2013
Here, no of days should be the days when the balance has gone negative in recent time and
not from the old entry.
I need to write a SQL query to get the no of days passed and the start date from when the
balance has gone negative.
I tried to formulate a query using Lag analytical function, but I am not succeeding.
How should I check the first instance of negative balance by traversing back using LAG function?
Even the first_value function was given a try but not getting how to partition in it based on negative value.
Any help or direction on this will be really helpful.
Here's a way to achive this using analytical functions.
INSERT INTO tableb
WITH tablea_grouped1
AS (SELECT account_no,
balance,
as_on_date,
SUM (CASE WHEN balance >= 0 THEN 1 ELSE 0 END)
OVER (PARTITION BY account_no ORDER BY as_on_date)
grp
FROM tablea),
tablea_grouped2
AS (SELECT account_no,
balance,
as_on_date,
grp,
LAST_VALUE (
balance)
OVER (
PARTITION BY account_no, grp
ORDER BY as_on_date
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
closing_balance
FROM tablea_grouped1
WHERE balance < 0
AND grp != 0 --keep this, if starting negative balance is to be ignored
)
SELECT account_no,
closing_balance,
MAX (as_on_date),
MAX (as_on_date) - MIN (as_on_date) + 1,
MIN (as_on_date)
FROM tablea_grouped2
GROUP BY account_no, grp, closing_balance
ORDER BY account_no, MIN (as_on_date);
First, SUM is used as analytical function to assign group number to consecutive balances less than 0.
LAST_VALUE function is then used to find the last -ve balance in each group
Finally, the result is aggregated based on each group. MAX(date) gives the last date, MIN(date) gives the starting date, and the difference of the two gives number of days.
Demo at sqlfiddle.
Try this and use gone_negative to computing specified column value for insert into another table:
select temp.account_no,
temp.balance,
temp.prev_balance,
temp.on_date,
temp.prev_on_date,
case
WHEN (temp.balance < 0 and temp.prev_balance >= 0) THEN
1
else
0
end as gone_negative
from (select account_no,
balance,
on_date,
lag(balance, 1, 0) OVER(partition by account_no ORDER BY account_no) prev_balance,
lag(on_date, 1) OVER(partition by account_no ORDER BY account_no) prev_on_date
from tblA
order by account_no) temp;
Hope this helps pal.
Here's on way to do it.
Select all records from my_table where the balance is positive.
Do a self-join and get all the records that have a as_on_date is greater than the current row, but the amounts are in negative
Once we get these, we cut-off the rows WHERE the date difference between the current and the previous row for as_on_date is > 1. We then filter the results a outer sub query
The Final select just groups the rows and gets the min, max values for the filtered rows which are grouped.
Query:
SELECT
account_no,
min(case when row_number = 1 then balance end) as balance,
max(mt2_date) as As_on_date,
max(mt2_date) - mt1_date as Days_passed,
min(mt2_date) as Start_date
FROM
(
SELECT
*,
MIN(break_date) OVER( PARTITION BY mt1_date ) AS min_break_date,
ROW_NUMBER() OVER( PARTITION BY mt1_date ORDER BY mt2_date desc ) AS row_number
FROM
(
SELECT
mt1.account_no,
mt2.balance,
mt1.as_on_date as mt1_date,
mt2.as_on_date as mt2_date,
case when mt2.as_on_date - lag(mt2.as_on_date,1) over () > 1 then mt2.as_on_date end as break_date
FROM
my_table mt1
JOIN my_table mt2 ON ( mt2.balance < mt1.balance AND mt2.as_on_date > mt1.as_on_date )
WHERE
MT1.balance > 0
order by
mt1.as_on_date,
mt2.as_on_date ) sub_query
) T
WHERE
min_break_date is null
OR mt2_date < min_break_date
GROUP BY
mt1_date,
account_no
SQLFIDDLE
I have a added a few more rows in the FIDDLE, just to test it out