Select max of nested id from amazon redshift - sql

My database is an amazon redshift.
I have a table that looks like this -
id
nested_id
date
value
1
10
'2021-01-01'
5
1
20
'2021-01-01'
10
1
10
'2021-01-02'
6
1
20
'2021-01-02'
11
1
10
'2021-01-03'
7
1
20
'2021-01-03'
12
2
30
'2021-01-01'
5
2
40
'2021-01-01'
10
2
30
'2021-01-02'
6
2
40
'2021-01-02'
11
2
30
'2021-01-03'
7
2
40
'2021-01-03'
12
So this is basically a table that tracks values by id over time, except for every id there can be a nested_id. And the dates and values are primarily connected to the nested_id.
However, let's say I'm starting with the id field, but for each id I want to only return the points over time for the nested_id that has the greater sum of points.
So right now I'm just grabbing it like this...
select *
from mytable
where id in (1, 2)
except I only want it to return nested_id rows where the maximum value of that nested_id is the greatest.
So here's how I would do this manually.
For id of 1, the maximum value is 12, and the nested_id of that value is 20
For id of 2, the maximum value is 12, and the nested_id of that value is 40
So my return table should be
id
nested_id
date
value
1
20
'2021-01-01'
10
1
20
'2021-01-02'
11
1
20
'2021-01-03'
12
2
40
'2021-01-01'
10
2
40
'2021-01-02'
11
2
40
'2021-01-03'
12
Is there an easy way of performing this query? I'm assuming you have to partition somehow?

You can solve this with row_number window functions
with maxs as (
select id,
nested_id,
value,
row_number() over (partition by id order by value desc) rn
from mytable
)
select mt.*
from mytable mt
left join maxs on mt.id = maxs.id and mt.nested_id = maxs.nested_id
where maxs.rn = 1

Related

To write a Oracle stored procedure to get data between the months

I have a procedure sp_data_between_months (p_from_date DATE, p_to_date DATE) // example p_from_date = '01-jan-2021' and 'p_to_date' = '31-mar-2021'.
I need to get the latest record for the ID for each month, add these values, and populate against p_to_date for each ID from the below table using PLSQL.
Table Name: ID_Value
ID
Date
value
1
1-jan-2021
10
1
10-jan-2021
20
2
15-jan-2021
15
2
16-jan-2021
20
2
02-feb-2021
10
2
06-feb-2021
15
1
17-feb-2021
10
1
5-mar-2021
15
1
17-mar-2021
10
2
10-mar-2021
10
the expected output is to get the latest value for each ID for each month-end and the sum of its value between those months between the ranges.
Output: p_to_date ID Sum of latest record of value for each month
DATE
ID
VALUE
31-Mar-2021
1
40 //(20+10+10) sum of value oflatest record foreach month
31-Mar-2021
2
45 //(20+15+10)
Here you are. Read comments within code.
SQL> with
2 temp as
3 -- analytic function will return 1 for the latest row for that ID in that month
4 (select id, datum, value,
5 row_number() over (partition by id, trunc(datum, 'mm') order by datum desc) rn
6 from id_value
7 )
8 -- finally, select last day in MAX month and sum all values for RN = 1
9 select
10 id,
11 last_day(max(datum)) datum,
12 sum(value)
13 from temp
14 where rn = 1
15 group by id;
ID DATUM SUM(VALUE)
---------- ----------- ----------
1 31-mar-2021 40
2 31-mar-2021 45
SQL>

Estimation of Cumulative value every 3 months in SQL

I have a table like this:
ID Date Prod
1 1/1/2009 5
1 2/1/2009 5
1 3/1/2009 5
1 4/1/2009 5
1 5/1/2009 5
1 6/1/2009 5
1 7/1/2009 5
1 8/1/2009 5
1 9/1/2009 5
And I need to get the following result:
ID Date Prod CumProd
1 2009/03/01 5 15 ---Each 3 months
1 2009/06/01 5 30 ---Each 3 months
1 2009/09/01 5 45 ---Each 3 months
What could be the best approach to take in SQL?
You can try the below - using window function
DEMO Here
select * from
(
select *,sum(prod) over(order by DATEPART(qq,dateval)) as cum_sum,
row_number() over(partition by DATEPART(qq,dateval) order by dateval) as rn
from t
)A where rn=1
How about just filtering on the month number?
select t.*
from (select id, date, prod, sum(prod) over (partition by id order by date) as running_prod
from t
) t
where month(date) in (3, 6, 9, 12);

cumulative using case statement in Oracle's SQL

I have a simple data
Date Count by english count by chinese
08-Mar-19 12 54
09-Mar-19 15 66
10-Mar-19 45 32
11-Mar-19 21 70
12-Mar-19 57 64
29-Mar-19 43 53
30-Mar-19 67 21
I want to group this data by week and the sum should be cumulative.The date starts from 8 march so the week should be calculated that way only. So the result should be
count by english count by chinese
08-MAR-19-14-MAR-19 150 286
15-MAR-19-22-MAR-19 150 286 (no data so same as above)
23-MAR-19-30-MAR-19 260 360
Tried using cumulative and sum but not able to achieve it
You can generate your week ranges, then use an outer join to see which data fits in each week, and use an analytic sum to get the result you want;
with week_ranges (date_from, date_to) as (
select min_date + ((level - 1) * 7), min_date + (level * 7)
from (
select min(some_date) as min_date, ceil((max(some_date) - min(some_date)) / 7) as weeks
from your_table
)
connect by level <= weeks
)
select distinct wr.date_from, wr.date_to - 1 as date_to,
sum(count_english) over (order by wr.date_from) as count_english,
sum(count_chinese) over (order by wr.date_from) as count_chinese
from week_ranges wr
left join your_table yt
on yt.some_date >= wr.date_from
and yt.some_date < wr.date_to
order by date_from;
which with your sample data gets:
DATE_FROM DATE_TO COUNT_ENGLISH COUNT_CHINESE
---------- ---------- ------------- -------------
2019-03-08 2019-03-14 150 286
2019-03-15 2019-03-21 150 286
2019-03-22 2019-03-28 150 286
2019-03-29 2019-04-04 260 360
Note this is splitting it up into four 7-days weeks, rather than one of 7 days and two of 8 days...
db<>fiddle
Here's one option; note that "my weeks" are different than yours because - your data is somewhat inconsistent as they vary from 6 to 7 days. That's also why the final result is different, but the general idea should be OK.
SQL> alter session set nls_date_format = 'dd.mm.yyyy';
Session altered.
SQL> with test (datum, cbe) as
2 -- sample data
3 (select date '2019-03-08', 12 from dual union all
4 select date '2019-03-09', 15 from dual union all
5 select date '2019-03-10', 45 from dual union all
6 select date '2019-03-11', 21 from dual union all
7 select date '2019-03-12', 57 from dual union all
8 select date '2019-03-29', 43 from dual union all
9 select date '2019-03-30', 67 from dual
10 ),
11 span as
12 -- min and max date value, so that we could create a "calendar"
13 (select min(datum) mindat,
14 max(datum) maxdat
15 from test
16 ),
17 periods as
18 -- "calendar" whose periods are weeks
19 (select s.mindat + (level - 1) * 7 datum_from,
20 (s.mindat + level * 7) - 1 datum_to
21 from span s
22 connect by level <= (s.maxdat - s.mindat) / 7 + 1
23 )
24 -- running sum per weeks
25 select distinct
26 p.datum_from,
27 p.datum_to,
28 sum(t.cbe) over (order by p.datum_from) sum_cbe
29 from test t full outer join periods p on t.datum between p.datum_from and p.datum_to
30 order by p.datum_from;
DATUM_FROM DATUM_TO SUM_CBE
---------- ---------- ----------
08.03.2019 14.03.2019 150
15.03.2019 21.03.2019 150
22.03.2019 28.03.2019 150
29.03.2019 04.04.2019 260
SQL>

SELECT query for skipping rows with duplicates but leaving the first and the last occurrences in PostgreSQL

I have a table with items, dates, and prices and I am trying to find a way to write a SELECT query in PostgreSQL which will skip rows with duplicate prices so that, only the first and last occurrence of the same price in a row would stay. After the price change, it can go back to the previous value and it should be preserved as well.
id date price item
1 20.10.2018 10 a
2 21.10.2018 10 a
3 22.10.2018 10 a
4 23.10.2018 15 a
5 24.10.2018 15 a
6 25.10.2018 15 a
7 26.10.2018 10 a
8 27.10.2018 10 a
9 28.10.2018 10 a
10 29.10.2018 10 a
11 26.10.2018 3 b
12 27.10.2018 3 b
13 28.10.2018 3 b
14 29.10.2018 3 c
Result:
id date price item
1 20.10.2018 10 a
3 22.10.2018 10 a
4 23.10.2018 15 a
6 25.10.2018 15 a
7 26.10.2018 10 a
10 29.10.2018 10 a
11 26.10.2018 3 b
13 28.10.2018 3 b
14 29.10.2018 3 c
You can use lag() and lead():
select id, date, price, item
from (select t.*,
lag(price) over (partition by item order by date) as prev_price,
lead(price) over (partition by item order by date) as next_price
from t
) t
where prev_price is null or prev_price <> price or
next_price is null or next_price <> price

How to find regions where total of their sale exceeded 60%

I have a table interest_summary table with two columns:
int_rate number,
total_balance number
example
10.25 50
10.50 100
10.75 240
11.00 20
My query should return in 2 columns or a string like 10.50 to 10.75 because adding their total exceed 60% of total amount added together
Could you suggest a logic in Oracle?
select
min(int_rate),
max(int_rate)
from
(
select
int_rate,
nvl(sum(total_balance) over(
order by total_balance desc
rows between unbounded preceding and 1 preceding
),0) as part_sum
from interest_summary
)
where
part_sum < (select 0.6*sum(total_balance) from interest_summary)
fiddle
I'm assuming that you're selecting the rows based on the following algorithm:
Sort your rows by total_balance (descending)
Select the highest total_balance row remaining
If its total_balance added to the running total of the total balance is under 60%, add it to the pool and get the next row (step 2)
If not add the row to the pool and return.
The sorted running total looks like this (I'll number the rows so that it's easier to understand what happens):
SQL> WITH data AS (
2 SELECT 1 id, 10.25 interest_rate, 50 total_balance FROM DUAL
3 UNION ALL SELECT 2 id, 10.50 interest_rate, 100 total_balance FROM DUAL
4 UNION ALL SELECT 3 id, 10.75 interest_rate, 240 total_balance FROM DUAL
5 UNION ALL SELECT 4 id, 11.00 interest_rate, 20 total_balance FROM DUAL
6 )
7 SELECT id, interest_rate,
8 SUM(total_balance) OVER (ORDER BY total_balance DESC) running_total,
9 SUM(total_balance) OVER (ORDER BY total_balance DESC)
10 /
11 SUM(total_balance) OVER () * 100 pct_running_total
12 FROM data
13 ORDER BY 3;
ID INTEREST_RATE RUNNING_TOTAL PCT_RUNNING_TOTAL
---------- ------------- ------------- -----------------
3 10,75 240 58,5365853658537
2 10,5 340 82,9268292682927
1 10,25 390 95,1219512195122
4 11 410 100
So in this example we must return rows 3 and 2 because row 2 is the first row where its percent running total is above 60%:
SQL> WITH data AS (
2 SELECT 1 id, 10.25 interest_rate, 50 total_balance FROM DUAL
3 UNION ALL SELECT 2 id, 10.50 interest_rate, 100 total_balance FROM DUAL
4 UNION ALL SELECT 3 id, 10.75 interest_rate, 240 total_balance FROM DUAL
5 UNION ALL SELECT 4 id, 11.00 interest_rate, 20 total_balance FROM DUAL
6 )
7 SELECT ID, interest_rate
8 FROM (SELECT ID, interest_rate,
9 SUM(over_limit)
10 OVER(ORDER BY total_balance DESC) over_limit_no
11 FROM (SELECT id,
12 interest_rate,
13 total_balance,
14 CASE
15 WHEN SUM(total_balance)
16 OVER(ORDER BY total_balance DESC)
17 / SUM(total_balance) OVER() * 100 < 60 THEN
18 0
19 ELSE
20 1
21 END over_limit
22 FROM data
23 ORDER BY 3))
24 WHERE over_limit_no <= 1;
ID INTEREST_RATE
---------- -------------
3 10,75
2 10,5