SQL - Find the min(date) since a category has its most recent value - sql

I need some help with this problem.
Assuming I have following table:
contract_id
tariff_id
product_category
date (DD.MM.YYYY)
month (YYYYMM)
123456
ABC
small
01.01.2021
202101
123456
ABC
medium
01.02.2021
202102
123456
DEF
small
01.03.2021
202103
123456
DEF
small
01.04.2021
202104
123456
ABC
big
01.05.2021
202105
123456
DEF
small
01.06.2021
202106
123456
DEF
medium
02.06.2021
202106
123456
DEF
medium
01.07.2021
202107
The table is partitioned by month.
This is a part of my table containing multiple contract_ids.
I'm trying to figure out for every contract_id, since when it has its most recent tariff_id and since when it has the product_category_id='small' (if it doesn't have small as product category, the value should then be Null).
The results will be written into a table which gets updated every month.
So for the table above my latest results should look like this:
contract_id
same_tariff_id_since
product_category_small_since
123456
01.06.2021
NULL
I'm using Hive.
So far, I could only come up with this solution for same_tariff_id_since:
The problem is that it gives me absolute min(date) for the tariff_id and not the min(date) since the most recent tariff_id.
I think the code for product_category_small_since will have mostly the same logic.
My current code is:
SELECT q2.contract_id
, q3.tariff_id
, q2.date
FROM (
SELECT contract_id
, max(date_2) AS date
FROM (
SELECT contract_id
, date
, min(date) OVER (PARTITION BY tariff_id ORDER BY date) AS date_2
FROM given_table
)q1
WHERE date=date_2
GROUP BY contract_id
)q2
JOIN given_table AS q3
ON q2.contract_id=q3.contract_id
AND q2.date=q3.date
Thanks in advance.

One approach for solving this type of query is to do a grouping of the sequences you want to track. For the tariff_id sequence grouping, you want a new "sequence grouping id" for each time that the tariff id changes for a given contract id. Since the product_category can change independently, you need to do a sequence grouping id for that change as well.
Here's code to accomplish the task. This only returns the latest version of each contract and the specific columns you described in your latest results table. This was done against PostgreSQL 9.6, but the syntax and data types can probably be modified to be compatible with Hive.
https://www.db-fiddle.com/f/qSk3Mb9Xfp1NDo5VeA1qHh/8
select q2.contract_id
, to_char(min(q2."date (DD.MM.YYYY)")
over (partition by q2.contract_id, q2.contract_tariff_sequence_id), 'DD.MM.YYYY') as same_tariff_id_since
, to_char(min(case when q2.product_category = 'small' then q2."date (DD.MM.YYYY)" else null end)
over (partition by q2.contract_id, q2.contract_product_category_sequence_id), 'DD.MM.YYYY') as product_category_small_since
from(
select q1.*
, sum(case when q1.tariff_id = q1.prior_tariff_id then 0 else 1 end)
over (partition by q1.contract_id order by q1."date (DD.MM.YYYY)" rows unbounded preceding) as contract_tariff_sequence_id
, sum(case when q1.product_category = q1.prior_product_category then 0 else 1 end)
over (partition by q1.contract_id order by q1."date (DD.MM.YYYY)" rows unbounded preceding) as contract_product_category_sequence_id
from (
select *
, lag(tariff_id) over (partition by contract_id order by "date (DD.MM.YYYY)") as prior_tariff_id
, lag(product_category) over (partition by contract_id order by "date (DD.MM.YYYY)") as prior_product_category
, row_number() over (partition by contract_id order by "date (DD.MM.YYYY)" desc) latest_record_per_contract
from contract_tariffs
) q1
) q2
where latest_record_per_contract = 1
If you want to see all the rows and columns so you can examine how this works with the sequence grouping ids etc., you can modify the outer query slightly:
https://www.db-fiddle.com/f/qSk3Mb9Xfp1NDo5VeA1qHh/10
If this works for you, please mark as correct answer.

Related

Query database of events to find only events meeting parameters

I have a dataset [Table_1] that records all events on a new row, meaning there are multiple entries for each customer_id. The structure is this;
customer_id
recorded_at
event_type
value
123-456-789
2022-05-28
status
open
123-456-789
2022-06-01
attribute
order_placed
123-456-789
2022-06-02
attribute
order_fulfilled
123-456-789
2022-06-04
status
closed
123-456-789
2022-06-05
attribute
order_placed
123-456-789
2022-06-07
attribute
order_fulfilled
123-456-789
2022-06-10
status
open
123-456-789
2022-06-11
attribute
order_placed
123-456-789
2022-06-12
attribute
order_fulfilled
123-456-789
2022-06-15
attribute
order_placed
123-456-789
2022-06-17
attribute
order_fulfilled
987-654-321
2022-06-12
status
open
987-654-321
2022-06-15
attribute
order_placed
987-654-321
2022-06-17
attribute
order_fulfilled
987-654-321
2022-06-17
status
closed
What I'm trying to do is write a query that returns the dates of the two attributes, order_placed and order_fulfilled after the last time the status went open. My approach is to query the dataset three times, first for all customers who went open, then returning the dates when the attributes are order_placed and order_cancelled, however I'm running into issues in returning all instances where the attributes are order_placed and order_fulfilled, not just the most recent one.
With d1 as (Select customer_id,recorded_at as open_time from Table_1 where event_type = 'status' and value = 'open')
Select d1.customer_id,
d1.open_time,
order_placed.order_placed_time,
order_fulfilled.order_filled_time
from d1
left join (Select customer_id,max(recorded_at) as order_placed_time from Table_1 where event_type = 'attribute' and value = 'order_placed') order_placed
on d1.customer_id = order_placed.customer_id and order_placed.order_placed_time > d1.open_time
left join (Select customer_id,max(recorded_at) as order_fulfilled_time from Table_1 where event_type = 'attribute' and value = 'order_fulfilled') order_filled
on d1.customer_id = order_filled.customer_id and order_filled.order_fulfilled_time > d1.open_time
where order_filled.order_fulfilled_time > order_placed.order_placed_time
However, this only returns the last time an order was placed and fulfilled after the status = open, not every instance where that happened. The output I am going for would look like:
customer_id
open_time
order_placed_time
order_filled_time
123-456-789
2022-05-28
2022-06-01
2022-06-01
123-456-789
2022-06-10
2022-06-11
2022-06-12
123-456-789
2022-06-10
2022-06-15
2022-06-17
987-654-321
2022-06-12
2022-06-15
2022-06-17
What I'm trying to do is write a query that returns the dates of the two attributes, order_placed and order_fulfilled after the last time the status went open.
Consider below query:
WITH orders AS (
SELECT *, SUM(IF(value IN ('open', 'closed'), 1, 0)) OVER w AS order_group
FROM sample
WINDOW w AS (PARTITION BY customer_id ORDER BY recorded_at, event_type)
)
SELECT customer_id, open_time, pre_recorded_at AS order_placed_time, recorded_at AS order_filled_time
FROM (
SELECT *, FIRST_VALUE(IF(value = 'open', recorded_at, NULL)) OVER w AS open_time,
LAG(recorded_at) OVER w AS pre_recorded_at,
FROM orders
WINDOW w AS (PARTITION BY customer_id, order_group ORDER BY recorded_at)
)
WHERE open_time IS NOT NULL AND value = 'order_fulfilled'
;
output will be:
Note: Due to transactions below in your dataset, orders CTE has a weired event_type column in ORDER BY clause. If you have more accurate timestamp recorded_at, it can be removed. I'll leave it to you.
WINDOW w AS (PARTITION BY customer_id ORDER BY recorded_at, event_type)
987-654-321 2022-06-17 attribute order_fulfilled
987-654-321 2022-06-17 status closed
One option to solve this problem is following these steps:
keep all rows found between an open and an end, hence remove the end and the others
assign a unique id to different couples of ("order_placed","order_fulfilled")
extract the values relative to "open_time", "order_placed_time" and "order_fulfilled_time" with a CASE statement in three separate fields
apply different aggregations over "open_time" and "order_placed/fulfilled_time" separately, as long as each "open_time" can have multiple couples of orders.
These four steps are implemented in two ctes.
The first cte includes:
the first COUNT, that allows to extract even values for the open/order_placed/order_fulfilled (orders following open) values and odd values for the closed/order_placed/order_fulfilled values (orders following closed):
the second COUNT, that allows to extract different values for each couple made of ("order_placed", "order_fulfilled")
SELECT *,
COUNT(CASE WHEN value = 'open' THEN 1
WHEN value = 'closed' THEN 0 END) OVER (
PARTITION BY customer_id
ORDER BY recorded_at, event_type
) AS status_value,
COUNT(CASE WHEN value <> 'order_fulfilled' THEN 1 END) OVER(
PARTITION BY customer_id
ORDER BY recorded_at, event_type
) AS order_value
FROM tab
The second cte includes:
a WHERE clause that filters out all rows that are found between a "closed" and an "open" value, first included, last excluded
the first MAX window function, that partitions on the customer and on the previous first COUNT function, to extract the "open_time" value
the second MAX window function, that partitions on the customer and on the previous second COUNT function, to extract the "order_placed_time" value
the third MAX window function, that partitions on the customer and on the previous second COUNT function, to extract the "order_fulfilled_time" value
SELECT customer_id,
MAX(CASE WHEN value = 'open' THEN recorded_at END) OVER(
PARTITION BY customer_id, status_value
) AS open_time,
MAX(CASE WHEN value = 'order_placed' THEN recorded_at END) OVER(
PARTITION BY customer_id, order_value
) AS order_placed_time,
MAX(CASE WHEN value = 'order_fulfilled' THEN recorded_at END) OVER(
PARTITION BY customer_id, order_value
) AS order_fulfilled_time
FROM cte
WHERE MOD(status_value, 2) = 1
Note that it is not possible to use the MAX aggregation functions with a unique GROUP BY clause because the first MAX and the other two MAX aggregate on different columns respectively.
The final query uses the ctes and adds up:
a selection of DISTINCT rows (we're aggregating the output of the window functions)
a filtering operation on rows with NULL values in either the "order_placed_time" or "order_fulfilled_time" (correspond to the old "open" rows).
WITH cte AS (
SELECT *,
COUNT(CASE WHEN value = 'open' THEN 1
WHEN value = 'closed' THEN 0 END) OVER (
PARTITION BY customer_id
ORDER BY recorded_at, event_type
) AS status_value,
COUNT(CASE WHEN value <> 'order_fulfilled' THEN 1 END) OVER(
PARTITION BY customer_id
ORDER BY recorded_at, event_type
) AS order_value
FROM tab
), cte2 AS(
SELECT customer_id,
MAX(CASE WHEN value = 'open' THEN recorded_at END) OVER(
PARTITION BY customer_id, status_value
) AS open_time,
MAX(CASE WHEN value = 'order_placed' THEN recorded_at END) OVER(
PARTITION BY customer_id, order_value
) AS order_placed_time,
MAX(CASE WHEN value = 'order_fulfilled' THEN recorded_at END) OVER(
PARTITION BY customer_id, order_value
) AS order_fulfilled_time
FROM cte
WHERE MOD(status_value, 2) = 1
)
SELECT DISTINCT *
FROM cte2
WHERE order_fulfilled_time IS NOT NULL
I'd recommend to check intermediate output steps for a deep understanding of this specific solution.
Consider yet another option
with order_groups as (
select *,
countif(value in ('open', 'closed')) over order_group_sorted as group_num,
countif(value = 'order_placed') over order_group_sorted as subgroup_num,
from your_table
window order_group_sorted as (partition by customer_id order by recorded_at, event_type)
)
select * except(subgroup_num) from (
select customer_id, recorded_at, value, subgroup_num,
max(if(value = 'open', recorded_at, null)) over order_group as open_time
from order_groups
window order_group as (partition by customer_id, group_num)
)
pivot (any_value(recorded_at) for value in ('order_placed', 'order_fulfilled'))
where not open_time || order_placed is null
if applied to sample data in your question - output is
with data as (
select *, sum(case when value = 'open' then 1 end) over (partition by customer_id) as grp
from T
)
select customer_id,
min(case when value = 'open' then recorded_at end) as open_time,
...
from data
group by customer_id, grp

Need to pull customers with specific activity this year (Leads), where that activity (Lead) was their last one

I'm using SSMS to create a report showing customer accounts where the Sales Reps didn't follow up on leads we received this year. That would be indicated in accounts wheriin the list of activities (actions in the account), 'Lead' is the last one listed (the rep didn't take any actions after receiving the lead).
My code is pulling the latest 'Lead' activity for all customers who've had at least one lead this year:
CustomerName
Activity
Date
Bob's Tires
Lead
2021-01-05
Ned's Nails
Lead
2021-02-02
Good Eats
Lead
2021-02-03
I need it to only pull customers where the Lead was the last activity:
CustomerName
Activity
Date
Ned's Nails
Lead
2021-02-02
Here is my code and example tables. What am I missing? I've tried many things with no luck.
WITH activities AS (
SELECT
a. *
, CASE WHEN a.ContactDate = MAX(CASE WHEN a.Activity LIKE 'Lead%'
THEN a.ContactDate END) OVER (PARTITION BY a.AcctID)
THEN 1 ELSE 0 END AS no_followup
FROM AcctActivities a
WHERE a.ContactDate >= '2021-01-01'
)
SELECT
c.Name,
act.Activity,
act.ContactDate
FROM Customers c
INNER JOIN activities act ON c.AcctID = act.AcctID AND act.no_followup = 1
ORDER BY c.AcctID, act.ContactDate ASC
Table 1: Customers (c)
AcctID
CustomerName
11
Bob's Tires
12
Ned's Nails
13
Good Eats
14
Embers
Table 2: Activities (a)
AcctivityID
AcctID
Activity
Date
1
11
Contact Added
2021-01-01
2
11
Lead
2021-01-05
3
11
Phone Call
2021-01-06
4
12
Lead
2021-02-02
5
13
Lead
2021-02-03
6
13
Phone Call
2021-01-15
7
13
Sales Email
2021-01-15
8
14
Cold Call
2021-01-20
Your approach filters which rows may be considered in the max value in the comparison. I've included the suggested modification below which also modifies your CASE expression to consider that the current row is a lead as the case expression may filter the bounded values to consider (i.e. it will give you the latest lead activity but the latest lead activity may not be your latest activity).
Another modification, possibly optional but safe is adding the ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING in the OVER clause of your partition. While you could have also used UNBOUNDED PRECEDING instead of CURRENT ROW, it seems like extra processing when all the rows before the ordered ContactDate would be already be less than the current value and you are interested in the maximum value for contact date . The window function by default considers all rows current and before. The amendment would ask the window function to look at all the results in the partition after the current row.
Eg.
WITH activities AS
(
SELECT
a. *,
CASE
WHEN a.Activity LIKE 'Lead%' AND
a.ContactDate = (MAX(a.ContactDate) OVER (PARTITION BY a.AcctID ORDER BY a.ContactDate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ))
THEN 1
ELSE 0
END AS no_followup
FROM
AcctActivities a
WHERE
a.ContactDate >= '2021-01-01'
)
SELECT
c.Name,
act.Activity,
act.ContactDate
FROM
Customers c
INNER JOIN
activities act ON c.AcctID = act.AcctID
AND act.no_followup = 1
ORDER BY
c.AcctID, act.ContactDate ASC
Furthermore, if you are only interested in the customer details and all the resulting activity names would be Lead you may consider the following approach which uses aggregation and having clause to filter your desired results. This approach returns less details in the resulting CTE by filtering early.
WITH customers_with_last_activity_as_lead as (
SELECT AcctID, MAX(ContactDate) as ContactDate
FROM AcctActivities a
WHERE a.ContactDate >= '2021-01-01'
GROUP BY AcctID
HAVING
MAX(a.ContactDate) = MAX(
CASE
WHEN a.Activity LIKE 'Lead%' THEN a.ContactDate
END
)
)
SELECT
c.Name,
-- 'Lead' as Activity, -- Uncomment this line if it is that you would like to see this constant value in your resulting queries.
act.ContactDate
FROM
Customers c
INNER JOIN
customers_with_last_activity_as_lead act ON c.AcctID = act.AcctID
ORDER BY
c.AcctID, act.ContactDate ASC
if all the values aren't a constant/literal Lead then the following approach may assist in retrieving the correct activity name also
WITH customers_with_last_activity_as_lead as (
SELECT
AcctID,
REPLACE(MAX(CONCAT(ContactDate,Activity)),MAX(ContactDate),'') as Activity,
MAX(ContactDate) as ContactDate
FROM AcctActivities a
WHERE a.ContactDate >= '2021-01-01'
GROUP BY AcctID
HAVING
MAX(a.ContactDate) = MAX(
CASE
WHEN a.Activity LIKE 'Lead%' THEN a.ContactDate
END
)
)
SELECT
c.Name,
act.Activity,
act.ContactDate
FROM
Customers c
INNER JOIN
customers_with_last_activity_as_lead act ON c.AcctID = act.AcctID
ORDER BY
c.AcctID, act.ContactDate ASC
Let me know if this works for you.
I agree with Gordon that your question isn't totally clear (do you actually only care about activities this year? What is the intended meaning of no_followup?). Having said that, I think this is what you want:
select c.Name,
lastActivity.Activity,
lastActivity.ContactDate
from Customers c
cross apply (
select top 1 a.Activity, a.ContactDate
from activities a
where a.acctId = c.acctId
-- and a.ContactDate >= '2021-01-01' uncomment this if you only care about activity this year
order by a.ContactDate desc
) lastActivity
where lastActivity.Activity like 'Lead%'

Find the true start end dates for customers that have multiple accounts in SQL Server 2014

I have a checking account table that contains columns Cust_id (customer id), Open_Date (start date), and Closed_Date (end date). There is one row for each account. A customer can open multiple accounts at any given point. I would like to know how long the person has been a customer.
eg 1:
CREATE TABLE [Cust]
(
[Cust_id] [varchar](10) NULL,
[Open_Date] [date] NULL,
[Closed_Date] [date] NULL
)
insert into [Cust] values ('a123', '10/01/2019', '10/15/2019')
insert into [Cust] values ('a123', '10/12/2019', '11/01/2019')
Ideally I would like to insert this into a table with just one row, that says this person has been a customer from 10/01/2019 to 11/01/2019. (as he opened his second account before he closed his previous one.
Similarly eg 2:
insert into [Cust] values ('b245', '07/01/2019', '09/15/2019')
insert into [Cust] values ('b245', '10/12/2019', '12/01/2019')
I would like to see 2 rows in this case- one that shows he was a customer from 07/01 to 09/15 and then again from 10/12 to 12/01.
Can you point me to the best way to get this?
I would approach this as a gaps and islands problem. You want to group together groups of adjacents rows whose periods overlap.
Here is one way to solve it using lag() and a cumulative sum(). Everytime the open date is greater than the closed date of the previous record, a new group starts.
select
cust_id,
min(open_date) open_date,
max(closed_date) closed_date
from (
select
t.*,
sum(case when not open_date <= lag_closed_date then 1 else 0 end)
over(partition by cust_id order by open_date) grp
from (
select
t.*,
lag(closed_date) over (partition by cust_id order by open_date) lag_closed_date
from cust t
) t
) t
group by cust_id, grp
In this db fiddle with your sample data, the query produces:
cust_id | open_date | closed_date
:------ | :--------- | :----------
a123 | 2019-10-01 | 2019-11-01
b245 | 2019-07-01 | 2019-09-15
b245 | 2019-10-12 | 2019-12-01
I would solve this with recursion. While this is certainly very heavy, it should accommodate even the most complex account timings (assuming your data has such). However, if the sample data provided is as complex as you need to solve for, I highly recommend sticking with the solution provided above. It is much more concise and clear.
WITH x (cust_id, open_date, closed_date, lvl, grp) AS (
SELECT cust_id, open_date, closed_date, 1, 1
FROM (
SELECT cust_id
, open_date
, closed_date
, row_number()
OVER (PARTITION BY cust_id ORDER BY closed_date DESC, open_date) AS rn
FROM cust
) AS t
WHERE rn = 1
UNION ALL
SELECT cust_id, open_date, closed_date, lvl, grp
FROM (
SELECT c.cust_id
, c.open_date
, c.closed_date
, x.lvl + 1 AS lvl
, x.grp + CASE WHEN c.closed_date < x.open_date THEN 1 ELSE 0 END AS grp
, row_number() OVER (PARTITION BY c.cust_id ORDER BY c.closed_date DESC) AS rn
FROM cust c
JOIN x
ON x.cust_id = c.cust_id
AND c.open_date < x.open_date
) AS t
WHERE t.rn = 1
)
SELECT cust_id, min(open_date) AS first_open_date, max(closed_date) AS last_closed_date
FROM x
GROUP BY cust_id, grp
ORDER BY cust_id, grp
I would also add the caveat that I don't run on SQL Server, so there could be syntax differences that I didn't account for. Hopefully they are minor, if present.
you can try something like that:
select distinct
cust_id,
(select min(Open_Date)
from Cust as b
where b.cust_id = a.cust_id and
a.Open_Date <= b.Closed_Date and
a.Closed_Date >= b.Open_Date
),
(select max(Closed_Date)
from Cust as b
where b.cust_id = a.cust_id and
a.Open_Date <= b.Closed_Date and
a.Closed_Date >= b.Open_Date
)
from Cust as a
so, for every row - you're selecting minimal and maximal dates from all overlapping ranges, later distinct filters out duplicates

Looking to use Partitions to Compare Two Different Columns on Previous Rows

I am using SQL Developer and am trying to find a way to create a renewal indicator. I have a table with policy numbers, effective dates, and expiration dates. If a certain policy has a new effective date the same as the expiration date of the previous row, then it should get a "1" because it renewed. If there is no new effective date, the policy did not renew. In my example below policies 12345 and 12389 should get a "1" and policy 12367 should get a "0". How do I do this? Somehow using PARTITION BY, ROWS PRECEDING, etc?
POLICY_NUMBER EFFECTIVE_DATE EXPIRATION_DATE
12345 20140120 20140720
12345 20140720 20150120
12367 20140122 20140722
12389 20140122 20140722
12389 20140722 20150122
Yes, You are right, You need to use partition by and order by. Following query will work for you :
SELECT A.POLICY_NUMBER ,
CASE WHEN A.EXPIRATION_DATE = B.EFFECTIVE_DATE THEN 1 ELSE 0 END AS ISRENEW
FROM
(
SELECT ROW_NUMBER() OVER(PARTITION BY POLICY_NUMBER ORDER BY POLICY_NUMBER) AS NUM1 ,
POLICY_NUMBER , EFFECTIVE_DATE , EXPIRATION_DATE
FROM POLICYTABLE AS A
) AS A
LEFT OUTER JOIN
(
SELECT ROW_NUMBER() OVER(PARTITION BY POLICY_NUMBER ORDER BY POLICY_NUMBER) AS NUM1 ,
POLICY_NUMBER , EFFECTIVE_DATE , EXPIRATION_DATE
FROM POLICYTABLE AS A
) AS B ON A.NUM1 = B.NUM1 - 1 AND A.POLICY_NUMBER = B.POLICY_NUMBER
WHERE A.NUM1 = 1

Query for getting previous date in oracle in specific scenario

I have the below data in a table A which I need to insert into table B along with one computed column.
TABLE A:
Account_No | Balance | As_on_date
1001 |-100 | 1-Jan-2013
1001 |-150 | 2-Jan-2013
1001 | 200 | 3-Jan-2013
1001 |-250 | 4-Jan-2013
1001 |-300 | 5-Jan-2013
1001 |-310 | 6-Jan-2013
Table B:
In table B, there should be no of days to be shown when balance is negative and
the date one which it has gone into negative.
So, for 6-Jan-2013, this table should show below data:
Account_No | Balance | As_on_date | Days_passed | Start_date
1001 | -310 | 6-Jan-2013 | 3 | 4-Jan-2013
Here, no of days should be the days when the balance has gone negative in recent time and
not from the old entry.
I need to write a SQL query to get the no of days passed and the start date from when the
balance has gone negative.
I tried to formulate a query using Lag analytical function, but I am not succeeding.
How should I check the first instance of negative balance by traversing back using LAG function?
Even the first_value function was given a try but not getting how to partition in it based on negative value.
Any help or direction on this will be really helpful.
Here's a way to achive this using analytical functions.
INSERT INTO tableb
WITH tablea_grouped1
AS (SELECT account_no,
balance,
as_on_date,
SUM (CASE WHEN balance >= 0 THEN 1 ELSE 0 END)
OVER (PARTITION BY account_no ORDER BY as_on_date)
grp
FROM tablea),
tablea_grouped2
AS (SELECT account_no,
balance,
as_on_date,
grp,
LAST_VALUE (
balance)
OVER (
PARTITION BY account_no, grp
ORDER BY as_on_date
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
closing_balance
FROM tablea_grouped1
WHERE balance < 0
AND grp != 0 --keep this, if starting negative balance is to be ignored
)
SELECT account_no,
closing_balance,
MAX (as_on_date),
MAX (as_on_date) - MIN (as_on_date) + 1,
MIN (as_on_date)
FROM tablea_grouped2
GROUP BY account_no, grp, closing_balance
ORDER BY account_no, MIN (as_on_date);
First, SUM is used as analytical function to assign group number to consecutive balances less than 0.
LAST_VALUE function is then used to find the last -ve balance in each group
Finally, the result is aggregated based on each group. MAX(date) gives the last date, MIN(date) gives the starting date, and the difference of the two gives number of days.
Demo at sqlfiddle.
Try this and use gone_negative to computing specified column value for insert into another table:
select temp.account_no,
temp.balance,
temp.prev_balance,
temp.on_date,
temp.prev_on_date,
case
WHEN (temp.balance < 0 and temp.prev_balance >= 0) THEN
1
else
0
end as gone_negative
from (select account_no,
balance,
on_date,
lag(balance, 1, 0) OVER(partition by account_no ORDER BY account_no) prev_balance,
lag(on_date, 1) OVER(partition by account_no ORDER BY account_no) prev_on_date
from tblA
order by account_no) temp;
Hope this helps pal.
Here's on way to do it.
Select all records from my_table where the balance is positive.
Do a self-join and get all the records that have a as_on_date is greater than the current row, but the amounts are in negative
Once we get these, we cut-off the rows WHERE the date difference between the current and the previous row for as_on_date is > 1. We then filter the results a outer sub query
The Final select just groups the rows and gets the min, max values for the filtered rows which are grouped.
Query:
SELECT
account_no,
min(case when row_number = 1 then balance end) as balance,
max(mt2_date) as As_on_date,
max(mt2_date) - mt1_date as Days_passed,
min(mt2_date) as Start_date
FROM
(
SELECT
*,
MIN(break_date) OVER( PARTITION BY mt1_date ) AS min_break_date,
ROW_NUMBER() OVER( PARTITION BY mt1_date ORDER BY mt2_date desc ) AS row_number
FROM
(
SELECT
mt1.account_no,
mt2.balance,
mt1.as_on_date as mt1_date,
mt2.as_on_date as mt2_date,
case when mt2.as_on_date - lag(mt2.as_on_date,1) over () > 1 then mt2.as_on_date end as break_date
FROM
my_table mt1
JOIN my_table mt2 ON ( mt2.balance < mt1.balance AND mt2.as_on_date > mt1.as_on_date )
WHERE
MT1.balance > 0
order by
mt1.as_on_date,
mt2.as_on_date ) sub_query
) T
WHERE
min_break_date is null
OR mt2_date < min_break_date
GROUP BY
mt1_date,
account_no
SQLFIDDLE
I have a added a few more rows in the FIDDLE, just to test it out