I need to find date ranges where status is Missing/Not Ready in all the groups ( Only the overlapping date ranges where all the groups have status of missing/notready)
'''
ID. Group. Eff_Date. Exp_Date Status
1. 1 1/1/18 10:00 3/4/18 15:23 Ready
1 1 3/4/18 15:24. 7/12/18 13:54. Not Ready
1. 1 7/12/18 13:55. 11/22/19 11:20 Missing
1. 1. 11/22/19 11:21. 9/25/20 1:12. Ready
1. 1. 9/25/20 1:13 12/31/99. Missing
1. 2 1/1/16 10:00 2/2/17 17:20 Ready
1 2 2/2/17 17:21. 5/25/18 1:23. Missing
1. 2 5/25/18 1:24 9/2/18 4:15 Not Ready
1. 2 9/2/18 4:16. 6/3/21 7:04. Missing
1. 2 6/3/21 7:04. 12/31/99. Ready
Output for not ready: ( below are the dates where each group has not ready status)
5/25/18 1:24. 7/12/18 13:54 Not Ready
Missing: ( Below are the date where each group has Missing status)
9/25/20 1:13 6/3/21 7:04 Missing
'''
Note-> Each ID can have any number of groups. Database is Snowflake
You can do this by unpivoting and counting. Assuming that the periods do not overlap for a given id:
with x as (
select eff_date as date, 1 as inc
from t
where status = 'Missing'
union all
select end_date, -1 as inc
from t
where status = 'Missing'
)
select date, next_date, active_on_date
from (select date,
sum(sum(inc)) over (order by date) as active_on_date,
lead(date) over (order by date) as next_date
from x
group by date
) x
where active_on_date = (select count(distinct id) from t);
Note: This handles one status at a time, which is what this question is asking. If you want to handle all event types, then ask a new question with appropriate sample data, desired results, and explanation.
Related
I need some help with this problem.
Assuming I have following table:
contract_id
tariff_id
product_category
date (DD.MM.YYYY)
month (YYYYMM)
123456
ABC
small
01.01.2021
202101
123456
ABC
medium
01.02.2021
202102
123456
DEF
small
01.03.2021
202103
123456
DEF
small
01.04.2021
202104
123456
ABC
big
01.05.2021
202105
123456
DEF
small
01.06.2021
202106
123456
DEF
medium
02.06.2021
202106
123456
DEF
medium
01.07.2021
202107
The table is partitioned by month.
This is a part of my table containing multiple contract_ids.
I'm trying to figure out for every contract_id, since when it has its most recent tariff_id and since when it has the product_category_id='small' (if it doesn't have small as product category, the value should then be Null).
The results will be written into a table which gets updated every month.
So for the table above my latest results should look like this:
contract_id
same_tariff_id_since
product_category_small_since
123456
01.06.2021
NULL
I'm using Hive.
So far, I could only come up with this solution for same_tariff_id_since:
The problem is that it gives me absolute min(date) for the tariff_id and not the min(date) since the most recent tariff_id.
I think the code for product_category_small_since will have mostly the same logic.
My current code is:
SELECT q2.contract_id
, q3.tariff_id
, q2.date
FROM (
SELECT contract_id
, max(date_2) AS date
FROM (
SELECT contract_id
, date
, min(date) OVER (PARTITION BY tariff_id ORDER BY date) AS date_2
FROM given_table
)q1
WHERE date=date_2
GROUP BY contract_id
)q2
JOIN given_table AS q3
ON q2.contract_id=q3.contract_id
AND q2.date=q3.date
Thanks in advance.
One approach for solving this type of query is to do a grouping of the sequences you want to track. For the tariff_id sequence grouping, you want a new "sequence grouping id" for each time that the tariff id changes for a given contract id. Since the product_category can change independently, you need to do a sequence grouping id for that change as well.
Here's code to accomplish the task. This only returns the latest version of each contract and the specific columns you described in your latest results table. This was done against PostgreSQL 9.6, but the syntax and data types can probably be modified to be compatible with Hive.
https://www.db-fiddle.com/f/qSk3Mb9Xfp1NDo5VeA1qHh/8
select q2.contract_id
, to_char(min(q2."date (DD.MM.YYYY)")
over (partition by q2.contract_id, q2.contract_tariff_sequence_id), 'DD.MM.YYYY') as same_tariff_id_since
, to_char(min(case when q2.product_category = 'small' then q2."date (DD.MM.YYYY)" else null end)
over (partition by q2.contract_id, q2.contract_product_category_sequence_id), 'DD.MM.YYYY') as product_category_small_since
from(
select q1.*
, sum(case when q1.tariff_id = q1.prior_tariff_id then 0 else 1 end)
over (partition by q1.contract_id order by q1."date (DD.MM.YYYY)" rows unbounded preceding) as contract_tariff_sequence_id
, sum(case when q1.product_category = q1.prior_product_category then 0 else 1 end)
over (partition by q1.contract_id order by q1."date (DD.MM.YYYY)" rows unbounded preceding) as contract_product_category_sequence_id
from (
select *
, lag(tariff_id) over (partition by contract_id order by "date (DD.MM.YYYY)") as prior_tariff_id
, lag(product_category) over (partition by contract_id order by "date (DD.MM.YYYY)") as prior_product_category
, row_number() over (partition by contract_id order by "date (DD.MM.YYYY)" desc) latest_record_per_contract
from contract_tariffs
) q1
) q2
where latest_record_per_contract = 1
If you want to see all the rows and columns so you can examine how this works with the sequence grouping ids etc., you can modify the outer query slightly:
https://www.db-fiddle.com/f/qSk3Mb9Xfp1NDo5VeA1qHh/10
If this works for you, please mark as correct answer.
I have a table with a unique index on Contracts of Customers that live in Houses, I want to know how many days it takes for a House to be inhabited by the next person.
I am already quite far, but unfortunately my dataset has contracts with TYPE = 0, which are automatically generated by the system and which should be ignored,
if I don't ignore these 'empty contracts' with TYPE = 0, then the data says actually ALL houses are inhabited within 1 day.
Now I currently get the following result:
SELECT
CONTRACTID
,RENTALOBJECTID
,TYPE
,VALIDFROM
,VALIDTO
,LEAD(CONTRACTID) OVER (PARTITION BY RENTALOBJECTID ORDER BY VALIDFROM) AS 'NextContractId'
,LEAD(VALIDFROM) OVER (PARTITION BY RENTALOBJECTID ORDER BY VALIDFROM) AS 'NextValidFrom'
,LEAD(VALIDTO) OVER (PARTITION BY RENTALOBJECTID ORDER BY VALIDFROM) AS 'NextValidTo'
FROMPMCCONTRACT
With the following code:
CONTRACTID RENTALOBJECTID TYPE VALIDFROM VALIDTO NextContractId NextValidFrom NextValidTo
HC001 1 0 1/1/2015 1/1/2017 HC002 1/2/2017 8/1/2017
HC002 1 0 1/2/2017 8/1/2017 HC003 8/2/2017 NULL
HC003 1 3 8/2/2017 NULL NULL NULL NULL
However I want the result to look like the following, where it ignores the Contracts where TYPE = 0.
CONTRACTID RENTALOBJECTID TYPE VALIDFROM VALIDTO NextContractId NextValidFrom NextValidTo
HC001 1 3 1/1/2015 1/1/2017 HC003 8/2/2017 NULL
HC002 1 0 1/2/2017 8/1/2017 NULL NULL NULL
HC003 1 3 8/2/2017 NULL NULL NULL NULL
And as you can see the time in days for RENTALOBJECTID = 1 to be inhabited again after CONTRACT = HC001 is more than a month now.
Does anyone know how this works in SQL-server-2012?
Kind regards,
Igor
Your example data is somewhat inconsistent and you have omitted to explain some aspects of the desired results but this should basically do what you need.
The window frame is set with VALIDFROM ordered descending and thus ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING includes all rows with higher VALIDFROM. Only ones with TYPE <> 0 get a not null ConcatResult and the MIN picks out the one with the lowest VALIDFROM in that window frame (i.e. next biggest to current row). The three concatenated column values are then pulled out of this result.
WITH PMC
AS (SELECT CONTRACTID,
RENTALOBJECTID,
TYPE,
VALIDFROM,
VALIDTO,
ConcatResult = MIN(CASE
WHEN TYPE <> 0
THEN FORMAT(VALIDFROM, 'yyyy-MM-dd')
+ FORMAT(ISNULL(VALIDTO, '1900-01-01'), 'yyyy-MM-dd')
+ CONTRACTID
END)
OVER (
PARTITION BY RENTALOBJECTID
ORDER BY VALIDFROM DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
FROM PMCCONTRACT)
SELECT *,
NextContractId = SUBSTRING(ConcatResult, 21, 10),
NextValidFrom = CAST(SUBSTRING(ConcatResult, 1, 10) AS DATE),
NextValidTo = CAST(NULLIF(SUBSTRING(ConcatResult, 11, 10), '1900-01-01') AS DATE)
FROM PMC
ORDER BY RENTALOBJECTID,
VALIDFROM
I wanted to profile my data set to find the data discrepancies.
My sample date set:
id status stdate enddate
1 new 01-JUL-17 31-JUL-17
1 process 01-OCT-17 31-DEC-18
1 new 01-JAN-19 31-JAN-19--- issue
2 new 01-SEP-14 31-JAN-15
2 process 01-JUN-16 30-NOV-17
2 complete 01-DEC-17 31-DEC-18
....
....
I would like to find out how many of those IDs have a result status that is older than current. The order of the status sequence should be NEW-PROCESS-COMPLETE. So I want report all IDs where the most recent status has reversed to an earlier status.
You can use the LAG() function to find the offending rows, as in:
with x (id, status, stdate, enddate,
prev_id, prev_status, prev_stdate, prev_enddate) as (
select
id,
status,
stdate,
enddate,
lag(id) over(partition by id order by stdate),
lag(status) over(partition by id order by stdate),
lag(stdate) over(partition by id order by stdate),
lag(enddate) over(partition by id order by stdate)
from my_table
)
select * from x
where status = 'new' and prev_status in ('process', 'complete')
or status = 'process' and prev_status = 'complete'
Note: I assume you need to compare only between rows of the same ID.
i have a table, which showing statuses of processes ( especially i searching canceled processes), there is no sorting out there. I want to select all of them that they were resume again. I want to do this "sticking a specific date to canceled process and check if there are still other statuses after the cancellation status.
Example:
[id] [moddate] [status]
1 01/01/17 started
1 02/01/17 waiting for signature
1 04/01/17 canceled
1 09/01/17 delivery documents
1 11/01/17 complited <-- I want to select these statuses, (Canceled and then somehow resumed)
I got something like this on start:
SELECT * FROM DATABASE
WHERE APPLICATIONSTATUSSYMBOL LIKE 'CANCELED%'
AND APPLICATIONDATE BETWEEN '17/01/01' AND '17/07/24';
One method for doing this uses window functions:
select d.*
from (select d.*,
max(case when status = 'canceled' then applicationdate end) over (partition by id) as canceldate
from database
where applicationdate between date '2017-01-01' and date '2017-07-24'
) d
where applicationdate > canceldate;
I have the below data in a table A which I need to insert into table B along with one computed column.
TABLE A:
Account_No | Balance | As_on_date
1001 |-100 | 1-Jan-2013
1001 |-150 | 2-Jan-2013
1001 | 200 | 3-Jan-2013
1001 |-250 | 4-Jan-2013
1001 |-300 | 5-Jan-2013
1001 |-310 | 6-Jan-2013
Table B:
In table B, there should be no of days to be shown when balance is negative and
the date one which it has gone into negative.
So, for 6-Jan-2013, this table should show below data:
Account_No | Balance | As_on_date | Days_passed | Start_date
1001 | -310 | 6-Jan-2013 | 3 | 4-Jan-2013
Here, no of days should be the days when the balance has gone negative in recent time and
not from the old entry.
I need to write a SQL query to get the no of days passed and the start date from when the
balance has gone negative.
I tried to formulate a query using Lag analytical function, but I am not succeeding.
How should I check the first instance of negative balance by traversing back using LAG function?
Even the first_value function was given a try but not getting how to partition in it based on negative value.
Any help or direction on this will be really helpful.
Here's a way to achive this using analytical functions.
INSERT INTO tableb
WITH tablea_grouped1
AS (SELECT account_no,
balance,
as_on_date,
SUM (CASE WHEN balance >= 0 THEN 1 ELSE 0 END)
OVER (PARTITION BY account_no ORDER BY as_on_date)
grp
FROM tablea),
tablea_grouped2
AS (SELECT account_no,
balance,
as_on_date,
grp,
LAST_VALUE (
balance)
OVER (
PARTITION BY account_no, grp
ORDER BY as_on_date
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
closing_balance
FROM tablea_grouped1
WHERE balance < 0
AND grp != 0 --keep this, if starting negative balance is to be ignored
)
SELECT account_no,
closing_balance,
MAX (as_on_date),
MAX (as_on_date) - MIN (as_on_date) + 1,
MIN (as_on_date)
FROM tablea_grouped2
GROUP BY account_no, grp, closing_balance
ORDER BY account_no, MIN (as_on_date);
First, SUM is used as analytical function to assign group number to consecutive balances less than 0.
LAST_VALUE function is then used to find the last -ve balance in each group
Finally, the result is aggregated based on each group. MAX(date) gives the last date, MIN(date) gives the starting date, and the difference of the two gives number of days.
Demo at sqlfiddle.
Try this and use gone_negative to computing specified column value for insert into another table:
select temp.account_no,
temp.balance,
temp.prev_balance,
temp.on_date,
temp.prev_on_date,
case
WHEN (temp.balance < 0 and temp.prev_balance >= 0) THEN
1
else
0
end as gone_negative
from (select account_no,
balance,
on_date,
lag(balance, 1, 0) OVER(partition by account_no ORDER BY account_no) prev_balance,
lag(on_date, 1) OVER(partition by account_no ORDER BY account_no) prev_on_date
from tblA
order by account_no) temp;
Hope this helps pal.
Here's on way to do it.
Select all records from my_table where the balance is positive.
Do a self-join and get all the records that have a as_on_date is greater than the current row, but the amounts are in negative
Once we get these, we cut-off the rows WHERE the date difference between the current and the previous row for as_on_date is > 1. We then filter the results a outer sub query
The Final select just groups the rows and gets the min, max values for the filtered rows which are grouped.
Query:
SELECT
account_no,
min(case when row_number = 1 then balance end) as balance,
max(mt2_date) as As_on_date,
max(mt2_date) - mt1_date as Days_passed,
min(mt2_date) as Start_date
FROM
(
SELECT
*,
MIN(break_date) OVER( PARTITION BY mt1_date ) AS min_break_date,
ROW_NUMBER() OVER( PARTITION BY mt1_date ORDER BY mt2_date desc ) AS row_number
FROM
(
SELECT
mt1.account_no,
mt2.balance,
mt1.as_on_date as mt1_date,
mt2.as_on_date as mt2_date,
case when mt2.as_on_date - lag(mt2.as_on_date,1) over () > 1 then mt2.as_on_date end as break_date
FROM
my_table mt1
JOIN my_table mt2 ON ( mt2.balance < mt1.balance AND mt2.as_on_date > mt1.as_on_date )
WHERE
MT1.balance > 0
order by
mt1.as_on_date,
mt2.as_on_date ) sub_query
) T
WHERE
min_break_date is null
OR mt2_date < min_break_date
GROUP BY
mt1_date,
account_no
SQLFIDDLE
I have a added a few more rows in the FIDDLE, just to test it out