Identify New Seller (without buying in recent 3 months) - google-bigquery

In my SQL - BigQuery, I have a table with 3 columns: Month, Date, ID about records of transactions of users.
Here is the example
I want to identify which ID is the new seller in each month, the definition of a new seller is the seller without buying the recent 3 months.
I tried to sort row_number the ID order by date, ID. I reckon that the row_number not in (2,3,4) is the new seller. However, ID can skip 1 month and rebuy next month, my code doesn't work with this situation.
Could you please help me to solve this problem? Thank you very much.

Below is for BigQuery Standard SQL
#standardSQL
SELECT *,
COUNT(1) OVER(
PARTITION BY id
ORDER BY DATE_DIFF(`date`, '2000-01-01', MONTH)
RANGE BETWEEN 4 PRECEDING AND 1 PRECEDING
) = 0 AS new_seller
FROM `project.dataset.table`
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'Mar-19' month, DATE '2019-03-01' `date`, 1 id UNION ALL
SELECT 'Mar-19', '2019-03-03', 2 UNION ALL
SELECT 'Mar-19', '2019-03-04', 3 UNION ALL
SELECT 'Apr-19', '2019-04-05', 3 UNION ALL
SELECT 'Apr-19', '2019-04-06', 4 UNION ALL
SELECT 'Apr-19', '2019-04-07', 5 UNION ALL
SELECT 'May-19', '2019-05-03', 3 UNION ALL
SELECT 'May-19', '2019-05-04', 6 UNION ALL
SELECT 'May-19', '2019-05-05', 5 UNION ALL
SELECT 'Jun-19', '2019-06-06', 1 UNION ALL
SELECT 'Jun-19', '2019-06-07', 7 UNION ALL
SELECT 'Jun-19', '2019-06-08', 8 UNION ALL
SELECT 'Jun-19', '2019-06-09', 9 UNION ALL
SELECT 'Jul-19', '2019-07-05', 2 UNION ALL
SELECT 'Jul-19', '2019-07-06', 5 UNION ALL
SELECT 'Jul-19', '2019-07-07', 9
)
SELECT *,
COUNT(1) OVER(
PARTITION BY id
ORDER BY DATE_DIFF(`date`, '2000-01-01', MONTH)
RANGE BETWEEN 4 PRECEDING AND 1 PRECEDING
) = 0 AS new_seller
FROM `project.dataset.table`
-- ORDER BY `date`
with below output
Row month date id new_seller
1 Mar-19 2019-03-01 1 true
2 Mar-19 2019-03-03 2 true
3 Mar-19 2019-03-04 3 true
4 Apr-19 2019-04-05 3 false
5 Apr-19 2019-04-06 4 true
6 Apr-19 2019-04-07 5 true
7 May-19 2019-05-03 3 false
8 May-19 2019-05-04 6 true
9 May-19 2019-05-05 5 false
10 Jun-19 2019-06-06 1 false
11 Jun-19 2019-06-07 7 true
12 Jun-19 2019-06-08 8 true
13 Jun-19 2019-06-09 9 true
14 Jul-19 2019-07-05 2 false
15 Jul-19 2019-07-06 5 false
16 Jul-19 2019-07-07 9 false

Related

How to find first time a price has changed in SQL

I have a table that contains an item ID, the date and the price. All items show their price for each day, but I want only to select the items that have not had their price change, and to show the days without change.
An example of the table is
id
Price
Day
Month
Year
asdf
10
03
11
2022
asdr1
8
03
11
2022
asdf
10
02
11
2022
asdr1
8
02
11
2022
asdf
10
01
11
2022
asdr1
7
01
11
2022
asdf
9
31
10
2022
asdr1
8
31
10
2022
asdf
8
31
10
2022
asdr1
8
31
10
2022
The output I want is:
Date
id
Last_Price
First_Price_Appearance
DaysWOchange
2022-11-03
asdf
10
2022-11-01
2
2022-11-03
asdr1
8
2022-11-02
1
The solutions needs to run quickly, so how are some efficency intensive ways to solve this, considering that the table has millions of rows, and there are items that have not changed their price in years.
The issue for efficiency comes because for each id, I would need to loop the entire table, looking for the first match in which the price has changed, and repeat this for thousands of items.
I am attempting to calculate the difference between the current last price, and all the history, but these becomes slow to process, and may take several minutes to calculate for all of history.
The main concern for this problem is efficiency.
DECLARE #table TABLE (id NVARCHAR(5), Price INT, Date DATE)
INSERT INTO #table (id, Price, Date) VALUES
('asdf', 10, '2022-10-20'),
('asdr1', 8, '2022-10-15'),
('asdf', 10, '2022-11-03'),
('asdr1', 8, '2022-11-02'),
('asdf', 10, '2022-11-02'),
('asdr1', 8, '2022-11-02'),
('asdf', 10, '2022-11-01'),
('asdr1', 7, '2022-11-01'),
('asdf', 9, '2022-10-31'),
('asdr1', 8, '2022-10-31'),
('asdf', 8, '2022-10-31'),
('asdr1', 8, '2022-10-31')
Tables of data are useful, but it's even more so if you can put the demo date into an object.
SELECT id, FirstDate, LastChange, DaysSinceChange, Price
FROM (
SELECT id, MIN(Date) OVER (PARTITION BY id ORDER BY Date) AS FirstDate, Date AS LastChange, Price,
CASE WHEN LEAD(Date,1) OVER (PARTITION BY id ORDER BY Date) IS NULL THEN DATEDIFF(DAY,Date,CURRENT_TIMESTAMP)
ELSE DATEDIFF(DAY,LAG(Date) OVER (PARTITION BY id ORDER BY Date),Date)
END AS DaysSinceChange, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS rn
FROM #table
) a
WHERE rn = 1
This is a quick way to get what you want. If you execute the subquery by itself you can see all the history.
id FirstDate LastChange Price DaysSinceChange
-------------------------------------------------------
asdf 2022-10-20 2022-11-03 10 0
asdr1 2022-10-15 2022-11-02 8 1
SELECT id, MIN(Date) OVER (PARTITION BY id ORDER BY Date) AS FirstDate, Date AS LastChange, Price,
CASE WHEN LEAD(Date,1) OVER (PARTITION BY id ORDER BY Date) IS NULL THEN DATEDIFF(DAY,Date,CURRENT_TIMESTAMP)
ELSE DATEDIFF(DAY,LAG(Date) OVER (PARTITION BY id ORDER BY Date),Date)
END AS DaysSinceChange, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS rn
FROM #table
id FirstDate LastChange Price DaysSinceChange rn
------------------------------------------------------
asdf 2022-10-20 2022-11-03 10 0 1
asdf 2022-10-20 2022-11-02 10 1 2
asdf 2022-10-20 2022-11-01 10 1 3
asdf 2022-10-20 2022-10-31 9 11 4
asdf 2022-10-20 2022-10-31 8 0 5
asdf 2022-10-20 2022-10-20 10 NULL 6
asdr1 2022-10-15 2022-11-02 8 1 1
asdr1 2022-10-15 2022-11-02 8 1 2
asdr1 2022-10-15 2022-11-01 7 1 3
asdr1 2022-10-15 2022-10-31 8 16 4
asdr1 2022-10-15 2022-10-31 8 0 5
asdr1 2022-10-15 2022-10-15 8 NULL 6
You can use lag() and a cumulative max():
select id, date, price
from (select t.*,
max(case when price <> lag_price then date end) over (partition by id) as price_change_date
from (select t.*, lag(price) over (partition by id order by date) as lag_price
from t
) t
) t
where price_change_date is null;
This calculates the first date of a price change for each id. It then filters out all rows where a price change occurred.
The use of window functions should be highly efficient, taking advantage of indexes on (id, date) and (id, price, date).

Count number of events before certain events until another event is encountered in the same group in Big query?

I want to count some certain values until a specific event occurred in SQL. This is very similar question to this question:
Count number of events before and after a event “A” till another event “A” is encountered in Big query?
The answer to this question didn't solve my problem which I am confused with RANGE BETWEEN 1 PRECEDING AND 1 PRECEDING too. Differently from that question I don't look for strings but I want to count other events.
My table would be like
User Event Day
1 C 2019-01-10
1 B 2019-01-11
1 D 2019-01-12
1 A 2019-01-13
2 D 2019-01-10
2 B 2019-01-11
2 C 2019-01-12
2 D 2019-01-13
2 A 2019-01-14
2 E 2019-01-15
I would like to count C or D until event A or B occured.
I try
COUNTIF(Event = 'C' OR Event = 'D') OVER(PARTITION BY User ORDER BY Day ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS count_events
But this doesn't stop counting until event A or B. It counts all C or D events in partition.
My result table would look like this and the counting would stop if one of the event is occurred and restart counting again when the expected event occurs.
User Event Day count_events
1 C 2019-01-10 0
1 B 2019-01-11 1
1 D 2019-01-12 0
1 A 2019-01-13 1
2 D 2019-01-10 0
2 B 2019-01-11 1
2 C 2019-01-12 0
2 D 2019-01-13 1
2 A 2019-01-14 2
2 E 2019-01-15 0
Below is for BigQuery Standard SQL
#standardSQL
SELECT * EXCEPT(grp),
COUNTIF(event IN ('C', 'D'))
OVER(PARTITION BY user, grp ORDER BY day ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) count_events
FROM (
SELECT *,
COUNTIF(event IN ('A', 'B'))
OVER(PARTITION BY user ORDER BY day ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING ) grp
FROM `project.dataset.table`
)
if to apply to sample data in your question - result is
Row user event day count_events
1 1 C 2019-01-10 0
2 1 B 2019-01-11 1
3 1 D 2019-01-12 0
4 1 A 2019-01-13 1
5 2 D 2019-01-10 0
6 2 B 2019-01-11 1
7 2 C 2019-01-12 0
8 2 D 2019-01-13 1
9 2 A 2019-01-14 2
10 2 E 2019-01-15 0
You can test, play with above using below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 user, 'C' event, DATE '2019-01-10' day UNION ALL
SELECT 1, 'B', '2019-01-11' UNION ALL
SELECT 1, 'D', '2019-01-12' UNION ALL
SELECT 1, 'A', '2019-01-13' UNION ALL
SELECT 2, 'D', '2019-01-10' UNION ALL
SELECT 2, 'B', '2019-01-11' UNION ALL
SELECT 2, 'C', '2019-01-12' UNION ALL
SELECT 2, 'D', '2019-01-13' UNION ALL
SELECT 2, 'A', '2019-01-14' UNION ALL
SELECT 2, 'E', '2019-01-15'
)
SELECT * EXCEPT(grp),
COUNTIF(event IN ('C', 'D'))
OVER(PARTITION BY user, grp ORDER BY day ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) count_events
FROM (
SELECT *,
COUNTIF(event IN ('A', 'B'))
OVER(PARTITION BY user ORDER BY day ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING ) grp
FROM `project.dataset.table`
)
-- ORDER BY user, day
I don't want to count last event C because it didn't occur before event A but after event A
Below is quick "fix"
#standardSQL
SELECT * EXCEPT(grp),
COUNTIF(event IN ('A', 'B')) OVER(PARTITION BY user, grp) *
COUNTIF(event IN ('C', 'D'))
OVER(PARTITION BY user, grp ORDER BY day ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) count_events
FROM (
SELECT *,
COUNTIF(event IN ('A', 'B'))
OVER(PARTITION BY user ORDER BY day ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING ) grp
FROM `project.dataset.table`
)
-- ORDER BY user, day
If to apply to recent example you used - result is
Row user event day count_events
1 1 C 2019-01-10 0
2 1 B 2019-01-11 1
3 1 D 2019-01-12 0
4 1 A 2019-01-13 1
5 2 D 2019-01-10 0
6 2 B 2019-01-11 1
7 2 C 2019-01-12 0
8 2 D 2019-01-13 1
9 2 A 2019-01-14 2
10 2 C 2019-01-15 0
11 2 E 2019-01-16 0

Using the earliest date of a partition to determine what other dates belong to that partition

Assume this is my table:
ID DATE
--------------
1 2018-11-12
2 2018-11-13
3 2018-11-14
4 2018-11-15
5 2018-11-16
6 2019-03-05
7 2019-05-07
8 2019-05-08
9 2019-05-08
I need to have partitions be determined by the first date in the partition. Where, any date that is within 2 days of the first date, belongs in the same partition.
The table would end up looking like this if each partition was ranked
PARTITION ID DATE
------------------------
1 1 2018-11-12
1 2 2018-11-13
1 3 2018-11-14
2 4 2018-11-15
2 5 2018-11-16
3 6 2019-03-05
4 7 2019-05-07
4 8 2019-05-08
4 9 2019-05-08
I've tried using datediff with lag to compare to the previous date but that would allow a partition to be inappropriately sized based on spacing, for example all of these dates would be included in the same partition:
ID DATE
--------------
1 2018-11-12
2 2018-11-14
3 2018-11-16
4 2018-11-18
3 2018-11-20
4 2018-11-22
Previous flawed attempt:
Mark when a date is more than 2 days past the previous date:
(case when datediff(day, lag(event_time, 1) over (partition by user_id, stage order by event_time), event_time) > 2 then 1 else 0 end)
You need to use a recursive CTE for this, so the operation is expensive.
with t as (
-- add an incrementing column with no gaps
select t.*, row_number() over (order by date) as seqnum
from t
),
cte as (
select id, date, date as mindate, seqnum
from t
where seqnum = 1
union all
select t.id, t.date,
(case when t.date <= dateadd(day, 2, cte.mindate)
then cte.mindate else t.date
end) as mindate,
t.seqnum
from cte join
t
on t.seqnum = cte.seqnum + 1
)
select cte.*, dense_rank() over (partition by mindate) as partition_num
from cte;

Oracle SQL Developer Subscriber - Creating a Cross Table

I have a table UPCALL_HISTORY that has 3 columns: SUBSCRIBER_ID, START_DATE and END_DATE. Let the number of unique subscribers be N.
I want to create a new table with 3 columns:
SUBSCRIBER_ID: All of the unique subscriber ids repeated 36 times in a row.
MONTHLY_CALENDAR_ID: For each SUBSCRIBER_ID, this column will have dates listed from July 2015 until July 2018 (36 months).
ACTIVE: This column will be used as a flag for each subscriber and whether they have a subscription during that month. This subscription data is in a table called UPCALL_HISTORY.
I am fairly new to SQL, don't have a lot of experience. I am good at Python but it seems that SQL doesn't work like Python.
Any query ideas that could help me build this table?
Let my UPCALL_HISTORY table be:
+---------------+------------+------------+
| SUBSCRIBER_ID | START_DATE | END_DATE |
+---------------+------------+------------+
| 119 | 01/07/2015 | 01/08/2015 |
| 120 | 01/08/2015 | 01/09/2015 |
| 121 | 01/09/2015 | 01/10/2015 |
+---------------+------------+------------+
I want a table that looks like:
+---------------+------------+--------+
| SUBSCRIBER_ID | MON_CA | ACTIVE |
+---------------+------------+--------+
| 119 | 01/07/2015 | 1 |
| * | 01/08/2015 | 0 |
| * | 01/09/2015 | 0 |
| (36 times) | 01/10/2015 | 0 |
| * | * | 0 |
| 119 | 01/07/2018 | 0 |
+---------------+------------+--------+
that continues for 120 and 121
EDIT: Added Example
Here's how I understood the question.
Sample table and several rows:
SQL> create table upcall_history
2 (subscriber_id number,
3 start_date date,
4 end_date date);
Table created.
SQL> insert into upcall_history
2 select 1, date '2015-12-25', date '2016-01-13' from dual union
3 select 1, date '2017-07-10', date '2017-07-11' from dual union
4 select 2, date '2018-01-01', date '2018-04-24' from dual;
3 rows created.
Create a new table. For distinct SUBSCRIBER_ID's, it creates 36 "monthly" rows, fixed (as you stated).
SQL> create table new_table as
2 select
3 x.subscriber_id,
4 add_months(date '2015-07-01', column_value - 1) monthly_calendar_id,
5 0 active
6 from (select distinct subscriber_id from upcall_history) x,
7 table(cast(multiset(select level from dual
8 connect by level <= 36
9 ) as sys.odcinumberlist));
Table created.
Update ACTIVE column value to "1" for rows whose MONTHLY_CALENDAR_ID is contained in START_DATE and END_DATE of the UPCALL_HISTORY table.
SQL> merge into new_table n
2 using (select subscriber_id, start_date, end_date from upcall_history) x
3 on ( n.subscriber_id = x.subscriber_id
4 and n.monthly_calendar_id between trunc(x.start_date, 'mm')
5 and trunc(x.end_date, 'mm')
6 )
7 when matched then
8 update set n.active = 1;
7 rows merged.
SQL>
Result (only ACTIVE = 1):
SQL> select * from new_table
2 where active = 1
3 order by subscriber_id, monthly_calendar_id;
SUBSCRIBER_ID MONTHLY_CA ACTIVE
------------- ---------- ----------
1 2015-12-01 1
1 2016-01-01 1
1 2017-07-01 1
2 2018-01-01 1
2 2018-02-01 1
2 2018-03-01 1
2 2018-04-01 1
7 rows selected.
SQL>
If you're on 12c you can use an inline view of all the months with cross apply to get the combinations of those with all IDs:
select uh.subscriber_id, m.month,
case when trunc(uh.start_date, 'MM') <= m.month
and (uh.end_date is null or uh.end_date >= add_months(m.month, 1))
then 1 else 0 end as active
from upcall_history uh
cross apply (
select add_months(trunc(sysdate, 'MM'), - level) as month
from dual
connect by level <= 36
) m
order by uh.subscriber_id, m.month;
I've made it a rolling 36-months window up to the current month, but you may actually want fixed dates as you had in the question.
With sample data from a CTE:
with upcall_history (subscriber_id, start_date, end_date) as (
select 1, date '2015-09-04', '2015-12-15' from dual
union all select 2, date '2017-12-04', '2018-05-15' from dual
)
that generates 72 rows:
SUBSCRIBER_ID MONTH ACTIVE
------------- ---------- ----------
1 2015-07-01 0
1 2015-08-01 0
1 2015-09-01 1
1 2015-10-01 1
1 2015-11-01 1
1 2015-12-01 0
1 2016-01-01 0
...
2 2017-11-01 0
2 2017-12-01 1
2 2018-01-01 1
2 2018-02-01 1
2 2018-03-01 1
2 2018-04-01 1
2 2018-05-01 0
2 2018-06-01 0
You can use that to create a new table, or populate an existing table; though if you do want a rolling window then a view might be more appropriate.
If you aren't on 12c then cross apply isn't available - you'll get "ORA-00905: missing keyword".
You can get the same result with two CTEs (one to get all the months, the other to get all the IDs) cross-joined, and then outer joined to your actual data:
with m (month) as (
select add_months(trunc(sysdate, 'MM'), - level)
from dual
connect by level <= 36
),
i (subscriber_id) as (
select distinct subscriber_id
from upcall_history
)
select i.subscriber_id, m.month,
case when uh.subscriber_id is null then 0 else 1 end as active
from m
cross join i
left join upcall_history uh
on uh.subscriber_id = i.subscriber_id
and trunc(uh.start_date, 'MM') <= m.month
and (uh.end_date is null or uh.end_date >= add_months(m.month, 1))
order by i.subscriber_id, m.month;
You can do this in 11g using Partitioned Outer Joins, like so:
WITH upcall_history AS (SELECT 119 subscriber_id, to_date('01/07/2015', 'dd/mm/yyyy') start_date, to_date('01/08/2015', 'dd/mm/yyyy') end_date FROM dual UNION ALL
SELECT 120 subscriber_id, to_date('01/08/2015', 'dd/mm/yyyy') start_date, to_date('01/09/2015', 'dd/mm/yyyy') end_date FROM dual UNION ALL
SELECT 121 subscriber_id, to_date('01/09/2015', 'dd/mm/yyyy') start_date, to_date('01/10/2015', 'dd/mm/yyyy') end_date FROM dual),
mnths AS (SELECT add_months(TRUNC(SYSDATE, 'mm'), + 1 - LEVEL) mnth
FROM dual
CONNECT BY LEVEL <= 12 * 3 + 1)
SELECT uh.subscriber_id,
m.mnth,
CASE WHEN mnth BETWEEN start_date AND end_date - 1 THEN 1 ELSE 0 END active
FROM mnths m
LEFT OUTER JOIN upcall_history uh PARTITION BY (uh.subscriber_id) ON (1=1)
ORDER BY uh.subscriber_id,
m.mnth;
SUBSCRIBER_ID MNTH ACTIVE
------------- ----------- ----------
119 01/07/2015 1
119 01/08/2015 0
119 01/09/2015 0
119 01/10/2015 0
<snip>
119 01/06/2018 0
119 01/07/2018 0
--
120 01/07/2015 0
120 01/08/2015 1
120 01/09/2015 0
120 01/10/2015 0
<snip>
120 01/06/2018 0
120 01/07/2018 0
--
121 01/07/2015 0
121 01/08/2015 0
121 01/09/2015 1
121 01/10/2015 0
<snip>
121 01/06/2018 0
121 01/07/2018 0
N.B. I have assumed some things about your start/end dates and what constitutes active; hopefully it should be easy enough for you to tweak the case statement to fit the logic that works best for your situation.
I also believe this is can be an example for CROSS JOIN. All I had to do was create a small table of all of the dates and then CROSS JOIN it with the table of subscribers.
Example: https://www.essentialsql.com/cross-join-introduction/

How to find regions where total of their sale exceeded 60%

I have a table interest_summary table with two columns:
int_rate number,
total_balance number
example
10.25 50
10.50 100
10.75 240
11.00 20
My query should return in 2 columns or a string like 10.50 to 10.75 because adding their total exceed 60% of total amount added together
Could you suggest a logic in Oracle?
select
min(int_rate),
max(int_rate)
from
(
select
int_rate,
nvl(sum(total_balance) over(
order by total_balance desc
rows between unbounded preceding and 1 preceding
),0) as part_sum
from interest_summary
)
where
part_sum < (select 0.6*sum(total_balance) from interest_summary)
fiddle
I'm assuming that you're selecting the rows based on the following algorithm:
Sort your rows by total_balance (descending)
Select the highest total_balance row remaining
If its total_balance added to the running total of the total balance is under 60%, add it to the pool and get the next row (step 2)
If not add the row to the pool and return.
The sorted running total looks like this (I'll number the rows so that it's easier to understand what happens):
SQL> WITH data AS (
2 SELECT 1 id, 10.25 interest_rate, 50 total_balance FROM DUAL
3 UNION ALL SELECT 2 id, 10.50 interest_rate, 100 total_balance FROM DUAL
4 UNION ALL SELECT 3 id, 10.75 interest_rate, 240 total_balance FROM DUAL
5 UNION ALL SELECT 4 id, 11.00 interest_rate, 20 total_balance FROM DUAL
6 )
7 SELECT id, interest_rate,
8 SUM(total_balance) OVER (ORDER BY total_balance DESC) running_total,
9 SUM(total_balance) OVER (ORDER BY total_balance DESC)
10 /
11 SUM(total_balance) OVER () * 100 pct_running_total
12 FROM data
13 ORDER BY 3;
ID INTEREST_RATE RUNNING_TOTAL PCT_RUNNING_TOTAL
---------- ------------- ------------- -----------------
3 10,75 240 58,5365853658537
2 10,5 340 82,9268292682927
1 10,25 390 95,1219512195122
4 11 410 100
So in this example we must return rows 3 and 2 because row 2 is the first row where its percent running total is above 60%:
SQL> WITH data AS (
2 SELECT 1 id, 10.25 interest_rate, 50 total_balance FROM DUAL
3 UNION ALL SELECT 2 id, 10.50 interest_rate, 100 total_balance FROM DUAL
4 UNION ALL SELECT 3 id, 10.75 interest_rate, 240 total_balance FROM DUAL
5 UNION ALL SELECT 4 id, 11.00 interest_rate, 20 total_balance FROM DUAL
6 )
7 SELECT ID, interest_rate
8 FROM (SELECT ID, interest_rate,
9 SUM(over_limit)
10 OVER(ORDER BY total_balance DESC) over_limit_no
11 FROM (SELECT id,
12 interest_rate,
13 total_balance,
14 CASE
15 WHEN SUM(total_balance)
16 OVER(ORDER BY total_balance DESC)
17 / SUM(total_balance) OVER() * 100 < 60 THEN
18 0
19 ELSE
20 1
21 END over_limit
22 FROM data
23 ORDER BY 3))
24 WHERE over_limit_no <= 1;
ID INTEREST_RATE
---------- -------------
3 10,75
2 10,5