Percentile for Year-to-Day (successive YtD) - sql

I have the following data:
ID |MPERIOD|FRDATE |FR
===+=======+==========+==
100|2017M01|01.01.2017|60 \ \ \
101|2017M01|02.01.2017|75 > YtD 2017M01 | |
103|2017M01|08.01.2017|48 / > Ytd 2017M02 |
104|2017M02|06.02.2017|55 | > YtD 2017M03
105|2017M02|15.02.2017|63 / |
106|2017M03|18.03.2017|41 |
107|2017M03|22.03.2017|71 /
...|.......|..........|..
I need to calculate 80% percentile for each month and for YtD in (up to) that month (from start of year up to current calculation moment).
I use the following SQL query:
SELECT DISTINCT mperiod,
ROUND(PERCENTILE_CONT(0.8) WITHIN GROUP (ORDER BY fr OVER (PARTITION BY mperiod),2) "80%_FR",
ROUND(PERCENTILE_CONT(0.8) WITHIN GROUP (ORDER BY fr OVER (PARTITION BY SUBSTR(mperiod,1,4)),2) "80%_FR_YtD"
FROM mytable
ORDER BY 1
If I run this query in last day of month when I do not have data for the following month yet then this SQL will correctly calculate YtD value. For example, if I have data for first six months and do not have data for seventh month, and calculate this for sixth month then calculation with year partition OVER (PARTITION BY SUBSTR(mperiod,1,4) will calculate correct YtD value. But if I have data after this month it will be included in PARTITION BY and will not calculate up to that moment.
How to calculate YtD retroactively, for previous months!? For example, the calculation of YtD for third month should include calculation for only those first three months in year, not all months in year.

Since you can't use a windowing clause or add in additional order by columns in PERCENTILE_CONT (boo!), here's one way of achieving your aims. N.B. it's not pretty, and I'm sure it won't be terrifically performant, but it should work at least!
WITH mytable AS (SELECT 100 ID, '2017M01' mperiod, to_date('01/01/2017', 'dd/mm/yyyy') frdate, 60 fr FROM dual UNION ALL
SELECT 101 ID, '2017M01' mperiod, to_date('02/01/2017', 'dd/mm/yyyy') frdate, 75 fr FROM dual UNION ALL
SELECT 103 ID, '2017M01' mperiod, to_date('08/01/2017', 'dd/mm/yyyy') frdate, 48 fr FROM dual UNION ALL
SELECT 104 ID, '2017M02' mperiod, to_date('06/02/2017', 'dd/mm/yyyy') frdate, 55 fr FROM dual UNION ALL
SELECT 105 ID, '2017M02' mperiod, to_date('15/02/2017', 'dd/mm/yyyy') frdate, 63 fr FROM dual UNION ALL
SELECT 106 ID, '2017M03' mperiod, to_date('18/03/2017', 'dd/mm/yyyy') frdate, 41 fr FROM dual UNION ALL
SELECT 107 ID, '2017M03' mperiod, to_date('22/03/2017', 'dd/mm/yyyy') frdate, 71 fr FROM dual UNION ALL
SELECT 108 ID, '2016M12' mperiod, to_date('22/12/2016', 'dd/mm/yyyy') frdate, 42 fr FROM dual UNION ALL
SELECT 109 ID, '2016M11' mperiod, to_date('22/11/2016', 'dd/mm/yyyy') frdate, 32 fr FROM dual),
unpckd AS (SELECT mt.ID,
mt.mperiod,
mt.frdate,
mt.fr,
CASE WHEN substr(mt.mperiod, -2) <= d.id THEN SUBSTR(mt.mperiod, 1, 5) || to_char(d.id, 'fm09')
END new_mperiod,
d.id dummy_id
FROM mytable mt
INNER JOIN (SELECT LEVEL ID
FROM dual
CONNECT BY LEVEL <= 12) d ON substr(mt.mperiod, -2) <= d.id),
res AS (SELECT mperiod,
new_mperiod,
ROUND(PERCENTILE_CONT(0.8) WITHIN GROUP (ORDER BY fr) OVER (PARTITION BY CASE WHEN mperiod = new_mperiod THEN mperiod END),2) fr_80,
ROUND(PERCENTILE_CONT(0.8) WITHIN GROUP (ORDER BY fr) OVER (PARTITION BY new_mperiod),2) fr_80_ytd
FROM unpckd)
SELECT DISTINCT new_mperiod mperiod,
fr_80 "80%_FR",
fr_80_ytd "80%_FR_YtD"
FROM res
WHERE new_mperiod = mperiod
ORDER BY 1;
MPERIOD 80%_FR 80%_FR_YtD
-------- ---------- ----------
2016M11 32 32
2016M12 42 40
2017M01 69 69
2017M02 61.4 65.4
2017M03 65 69.4
This works by doing a partial cross join between the numbers 1 to 12 (12 months in the year) and the last two digits of the mperiod. Once we have that, we now know the overall ytd period that the rows belong to (ie. number 1 will match to the 2017M01, 2 will match to 2017M01 and 2017M02, etc), so you can now produce a label for this calculated value (which I've called new_mperiod) and use that to partition against.
It's obviously going to be inefficient (since the partial cross join will generate more rows than is necessary for a year that's not got data for all its months, which get filtered out later, but I can't think of a better way of doing it.

Related

Oracle SQL recursive adding values

I have the following data in the table
Period Total_amount R_total
01/01/20 2 2
01/02/20 5 null
01/03/20 3 null
01/04/20 8 null
01/05/20 31 null
Based on the above data I would like to have the following situation.
Period Total_amount R_total
01/01/20 2 2
01/02/20 5 3
01/03/20 3 0
01/04/20 8 8
01/05/20 31 23
Additional data
01/06/20 21 0 (previously it would be -2)
01/07/20 25 25
01/08/20 29 4
Pattern to the additional data is:
if total_amount < previous(r_total) then 0
Based on the filled data, we can spot the pattern is:
R_total = total_amount - previous(R_total)
Could you please help me out with this issue?
As Gordon Linoff suspected, it is possible to solve this problem with analytic functions. The benefit is that the query will likely be much faster. The price to pay for that benefit is that you need to do a bit of math beforehand (before ever thinking about "programming" and "computers").
A bit of elementary arithmetic shows that R_TOTAL is an alternating sum of TOTAL_AMOUNT. This can be arranged easily by using ROW_NUMBER() (to get the signs) and then an analytic SUM(), as shown below.
Table setup:
create table sample_data (period, total_amount) as
select to_date('01/01/20', 'mm/dd/rr'), 2 from dual union all
select to_date('01/02/20', 'mm/dd/rr'), 5 from dual union all
select to_date('01/03/20', 'mm/dd/rr'), 3 from dual union all
select to_date('01/04/20', 'mm/dd/rr'), 8 from dual union all
select to_date('01/05/20', 'mm/dd/rr'), 31 from dual
;
Query and result:
with
prep (period, total_amount, sgn) as (
select period, total_amount,
case mod(row_number() over (order by period), 2) when 0 then 1 else -1 end
from sample_data
)
select period, total_amount,
sgn * sum(sgn * total_amount) over (order by period) as r_total
from prep
;
PERIOD TOTAL_AMOUNT R_TOTAL
-------- ------------ ----------
01/01/20 2 2
01/02/20 5 3
01/03/20 3 0
01/04/20 8 8
01/05/20 31 23
This may be possible with window functions, but the simplest method is probably a recursive CTE:
with t as (
select t.*, row_number() over (order by period) as seqnum
from yourtable t
),
cte(period, total_amount, r_amount, seqnum) as (
select period, total_amount, r_amount, seqnum
from t
where seqnum = 1
union all
select t.period, t.total_amount, t.total_amount - cte.r_amount, t.seqnum
from cte join
t
on t.seqnum = cte.seqnum + 1
)
select *
from cte;
This question explicitly talks about "recursively" adding values. If you want to solve this using another mechanism, you might explain the logic in detail and ask if there is a non-recursive CTE solution.

Oracle Running Subtraction

I have the below data. I want to subtract the first row from Total Qty (80) and then subtract the rest of the rows from QTY from the previous row of QTY1.
QTY QTY1 DATE TOTAL QTY
2 78 01-JAN-20 80
1 77 15-JAN-20
46 31 22-JAN-20
16 15 27-JAN-20
Is there a way to do this? Any help is greatly appreciated. Thanks
select
t.*
,first_value(TOTAL_QTY)over(order by DT) - sum(QTY)over(order by DT) as QTY1
from t;
Full example with your sample data:
with T(QTY, DT, TOTAL_QTY) as (
select 2 , to_date('01-JAN-20','dd-mon-yy'),80 from dual union all
select 1 , to_date('15-JAN-20','dd-mon-yy'),null from dual union all
select 46, to_date('22-JAN-20','dd-mon-yy'),null from dual union all
select 16, to_date('27-JAN-20','dd-mon-yy'),null from dual
)
select
t.*
,first_value(TOTAL_QTY)over(order by DT) - sum(QTY)over(order by DT) as QTY1
from t;
Result:
QTY DT TOTAL_QTY QTY1
2 2020-01-01 80 78
1 2020-01-15 77
46 2020-01-22 31
16 2020-01-27 15
SQL tables represent unordered sets. Your question seems to rely on the ordering of the rows. Let me assume you have a column that represents the ordering.
Use a cumulative sum:
select t.*,
sum(total_qty) over () - sum(qty) over (order by <ordering col>) as qty1
from t;
Here is a db<>fiddle.
Something like this (the CTE is just your data): if you add any more stuff later (in the total_qty column), then that would also get added to the total_qty calcuation (as would be typical for additions to, and subtractions from, inventory.
with d as
(select 2 qty, 78 qty1 , to_date('01-JAN-20','dd-mon-rr') datecol, 80 total_qty from dual union all
select 1, 77, to_date('15-JAN-20','dd-mon-rr'),null from dual union all
select 46 , 31, to_date('22-JAN-20','dd-mon-rr'),null from dual union all
select 16 , 15 , to_date('27-JAN-20','dd-mon-rr'),null from dual
)
select sum(total_qty) over (order by datecol) - sum(qty) over (order by datecol)
from d
You can do:
select
qty,
first_value(total_qty) over(order by date)
- sum(qty) over(order by date) as qty1,
date, total_qty
from t
order by date

How to show total profit for each month, show null when there is no record in that month in oracle

i am producing a report to show the total profit of each month IN 2018, and show NIL when there is no profit earned in certain months
the profit earned = 0.1 * Total_payment.
The PROFIT is earned when the service is done, the column "Total_payment" is come from the table BOOKING, i have to join BOOKING and SERVICE in order to get the total profit of each month, Booking_num is the key for joining BOOKING and SERVICE, Actual_end is the end date of the service
Now the problem is there is no profit earned in jan,feb and aug.
is there anyway to show NIL in profit column FOR THIS THREE MONTH?
SELECT EXTRACT(MONTH FROM Actual_end) AS MONTH,SUM(Total_payment *0.1) AS PROFIT
FROM SERVICE,BOOKING
WHERE SERVICE.Booking_num = BOOKING.Booking_num
AND EXTRACT(YEAR FROM Actual_end) = 2018
GROUP BY EXTRACT(MONTH FROM Actual_end);
This is the code of showing profit for 9 months,without jan,feb and aug
MONTH PROFIT
3 88.4
4 146.1
5 112.6
6 108.3
7 102.6
9 130.3
10 72.6
12 124.9
I expect the output to be
MONTH PROFIT
1 NIL
2 NIL
3 88.4
4 146.1
5 112.6
6 108.3
7 102.6
8 NIL
9 130.3
10 72.6
11 124.9
12 25.2
how do i modify it, i have also tried
WITH CALENDAR AS(
SELECT TO_CHAR(add_months(date '2018-01-01',ROWNUM -1),'MM') AS MONTH
FROM DUAL
CONNECT BY LEVEL <=12)
SELECT CALENDER.MONTH, NVL(SUM(Total_payment*0.1),null) AS PROFIT
FROM BOOKING,SERVICE,CALENDER
WHERE BOOKING.Booking_num = SERVICE.Booking_num
AND CALENDER.MONTH = EXTRACT(MONTH FROM Actual_end(+))
AND EXTRACT(MONTH FROM Actual_end) = 2018
GROUP BY CALENDER.MONTH
THE OUTPUT:
NO ROWS SELECTED
You need and outer join(left or right). Btw, get rid of old-fashioned comma seperated join among tables, rather, use explicit join.
Add RIGHT JOIN (SELECT LEVEL AS MNT FROM DUAL CONNECT BY LEVEL <= 12 ) MNT to your query if returning one certain year with whole months is the matter :
SELECT MNT AS MONTH,NVL(TO_CHAR(SUM(Total_payment *0.1)),'NIL') AS PROFIT
FROM SERVICE S
JOIN BOOKING B
ON S.Booking_num = B.Booking_num
RIGHT JOIN (SELECT LEVEL AS MNT
FROM DUAL
CONNECT BY LEVEL <= 12 ) MNT
ON MNT.MNT = EXTRACT(MONTH FROM Actual_end)
AND EXTRACT(YEAR FROM Actual_end)=2018
GROUP BY MNT
ORDER BY MONTH;
Demo
In this case you need a list of all months:
with months as (
select 1 as month from dual union all
select 2 as month from dual union all
select 3 as month from dual union all
select 4 as month from dual union all
select 5 as month from dual union all
select 6 as month from dual union all
select 7 as month from dual union all
select 8 as month from dual union all
select 9 as month from dual union all
select 10 as month from dual union all
select 11 as month from dual union all
select 12 as month from dual
)
select m.month, sum(s.total_payment * 0.1) as profit
from months m left join
booking b
on extract(month from b.actual_end) = m.month and
b.actual_end >= date '2018-01-01' and
b.actual_end < date '2019-01-01' left join
service s join
on s.booking_num = b.booking_num
group by m.month;
Note:
This is guessing that actual_end is in booking and total_payment is in service. The query would be slightly different if this guess is not correct.
Never use commas in the FROM clause.
This query should use LEFT JOIN. The first table has all the rows that you want.
Filters on subsequent tables go in the on clause, not the where clause.
Note the use of date constants. Such comparisons usually make it easier for the engine to optimize the query (typically by using indexes).

take sum of last 7 days from the observed date in BigQuery

I have a table on which I want to compute the sum of revenue on last 7 days from the observed day. Here is my table -
with temp as
(
select DATE('2019-06-29') as transaction_date, "x"as id, 0 as revenue
union all
select DATE('2019-06-30') as transaction_date, "x"as id, 80 as revenue
union all
select DATE('2019-07-04') as transaction_date, "x"as id, 64 as revenue
union all
select DATE('2019-07-06') as transaction_date, "x"as id, 64 as revenue
union all
select DATE('2019-07-11') as transaction_date, "x"as id, 75 as revenue
union all
select DATE('2019-07-12') as transaction_date, "x"as id, 0 as revenue
)
select * from temp
I want to take a sum of last 7 days for each transaction_date. For instance for the last record which has transaction_date = 2019-07-12, I would like to add another column which adds up revenue for last 7 days from 2019-07-12 (which is until 2019-07-05), hence the value of new rollup_revenue column would be 0 + 75 + 64 = 139. Likewise, I need to compute the rollup for all the dates for every ID.
Note - the ID may or may not appear daily.
I have tried self join but I am unable to figure it out.
Below is for BigQuery Standard SQL
#standardSQL
SELECT *,
SUM(revenue) OVER(
PARTITION BY id ORDER BY UNIX_DATE(transaction_date)
RANGE BETWEEN 6 PRECEDING AND CURRENT ROW
) rollup_revenue
FROM `project.dataset.temp`
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.temp` AS (
SELECT DATE '2019-06-29' AS transaction_date, 'x' AS id, 0 AS revenue UNION ALL
SELECT '2019-06-30', 'x', 80 UNION ALL
SELECT '2019-07-04', 'x', 64 UNION ALL
SELECT '2019-07-06', 'x', 64 UNION ALL
SELECT '2019-07-11', 'x', 75 UNION ALL
SELECT '2019-07-12', 'x', 0
)
SELECT *,
SUM(revenue) OVER(
PARTITION BY id ORDER BY UNIX_DATE(transaction_date)
RANGE BETWEEN 6 PRECEDING AND CURRENT ROW
) rollup_revenue
FROM `project.dataset.temp`
-- ORDER BY transaction_date
with result
Row transaction_date id revenue rollup_revenue
1 2019-06-29 x 0 0
2 2019-06-30 x 80 80
3 2019-07-04 x 64 144
4 2019-07-06 x 64 208
5 2019-07-11 x 75 139
6 2019-07-12 x 0 139
One option uses a correlated subquery to find the rolling sum:
SELECT
transaction_date,
revenue,
(SELECT SUM(t2.revenue) FROM temp t2 WHERE t2.transaction_date
BETWEEN DATE_SUB(t1.transaction_date, INTERVAL 7 DAY) AND
t1.transaction_date) AS rev_7_days
FROM temp t1
ORDER BY
transaction_date;

Oracle: Need to calculate rolling average for past 3 months where we have more than one submission per month

I've seen many examples of rolling averages in oracle but done do quite what I desire.
This is my raw data
DATE SCORE AREA
----------------------------
01-JUL-14 60 A
01-AUG-14 45 A
01-SEP-14 45 A
02-SEP-14 50 A
01-OCT-14 30 A
02-OCT-14 45 A
03-OCT-14 50 A
01-JUL-14 60 B
01-AUG-14 45 B
01-SEP-14 45 B
02-SEP-14 50 B
01-OCT-14 30 B
02-OCT-14 45 B
03-OCT-14 50 B
This is the desired result for my rolling average
MMYY AVG AREA
-------------------------
JUL-14 60 A
AUG-14 52.5 A
SEP-14 50 A
OCT-14 44 A
JUL-14 60 B
AUG-14 52.5 B
SEP-14 50 B
OCT-14 44 B
The way I need it to work is that for each MMYY, I need to look back 3 months, and AVG the scores per dept. So for example,
For Area A in OCT, in the last 3 months from oct, there were 6 studies, (45+45+50+30+45+50)/6 = 44.1
Normally I would write the query like so
SELECT
AREA,
TO_CHAR(T.DT,'MMYY') MMYY,
ROUND(AVG(SCORE)
OVER (PARTITION BY AREA ORDER BY TO_CHAR(T.DT,'MMYY') ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),1)
AS AVG
FROM T
This will look over the last 3 enteries not the last 3 months
One way to do this is to mix aggregation functions with analytic functions. The key idea for average is to avoid using avg() and instead do a sum() divided by a count(*).
SELECT AREA, TO_CHAR(T.DT, 'MMYY') AS MMYY,
SUM(SCORE) / COUNT(*) as AvgScore,
SUM(SUM(SCORE)) OVER (PARTITION BY AREA ORDER BY MAX(T.DT) ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) / SUM(COUNT(*)) OVER (PARTITION BY AREA ORDER BY MAX(T.DT) ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM t
GROUP BY AREA, TO_CHAR(T.DT, 'MMYY') ;
Note the order by clause. If your data spans years, then using the MMYY format poses problems. It is better to use a format such as YYYY-MM for months, because the alphabetical ordering is the same as the natural ordering.
You can specify also ranges, not only rows.
SELECT
AREA,
TO_CHAR(T.DT,'MMYY') MMYY,
ROUND(AVG(SCORE)
OVER (PARTITION BY AREA
ORDER BY DT RANGE BETWEEN INTERVAL '3' MONTH PRECEDING AND CURRENT ROW))
AS AVG
FROM T
Since CURRENT ROW is the default, just ORDER BY DT RANGE INTERVAL '3' MONTH PRECEDING should work as well. Perhaps you have to do some fine-tuning, I did not test the behaviour regarding the 28/29/30/31 days per month issue.
Check the Oracle Windowing Clause for further details.
SQL> WITH DATA AS(
2 SELECT to_date('01-JUL-14','DD-MON-RR') dt, 60 score, 'A' area FROM dual UNION ALL
3 SELECT to_date('01-AUG-14','DD-MON-RR') dt, 45 score, 'A' area FROM dual UNION ALL
4 SELECT to_date('01-SEP-14','DD-MON-RR') dt, 45 score, 'A' area FROM dual UNION ALL
5 SELECT to_date('02-SEP-14','DD-MON-RR') dt, 50 score, 'A' area FROM dual UNION ALL
6 SELECT to_date('01-OCT-14','DD-MON-RR') dt, 30 score, 'A' area FROM dual UNION ALL
7 SELECT to_date('02-OCT-14','DD-MON-RR') dt, 45 score, 'A' area FROM dual UNION ALL
8 SELECT to_date('03-OCT-14','DD-MON-RR') dt, 50 score, 'A' area FROM dual UNION ALL
9 SELECT to_date('01-JUL-14','DD-MON-RR') dt, 60 score, 'B' area FROM dual UNION ALL
10 SELECT to_date('01-AUG-14','DD-MON-RR') dt, 45 score, 'B' area FROM dual UNION ALL
11 SELECT to_date('01-SEP-14','DD-MON-RR') dt, 45 score, 'B' area FROM dual UNION ALL
12 SELECT to_date('02-SEP-14','DD-MON-RR') dt, 50 score, 'B' area FROM dual UNION ALL
13 SELECT to_date('01-OCT-14','DD-MON-RR') dt, 30 score, 'B' area FROM dual UNION ALL
14 SELECT to_date('02-OCT-14','DD-MON-RR') dt, 45 score, 'B' area FROM dual UNION ALL
15 SELECT to_date('03-OCT-14','DD-MON-RR') dt, 50 score, 'B' area FROM dual)
16 SELECT TO_CHAR(T.DT, 'MON-RR') AS MMYY,
17 round(
18 SUM(SUM(SCORE)) OVER (PARTITION BY AREA ORDER BY MAX(T.DT) ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)/
19 SUM(COUNT(*)) OVER (PARTITION BY AREA ORDER BY MAX(T.DT) ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),1)
20 AS avg_score,
21 AREA
22 FROM data t
23 GROUP BY AREA, TO_CHAR(T.DT, 'MON-RR')
24 /
MMYY AVG_SCORE A
------ ---------- -
JUL-14 60 A
AUG-14 52.5 A
SEP-14 50 A
OCT-14 44.2 A
JUL-14 60 B
AUG-14 52.5 B
SEP-14 50 B
OCT-14 44.2 B
8 rows selected.
SQL>
From next time, I would expect you to provide the create and insert statements so that we don't have to spend time on preparing a test case.
And, why YY format? Haven't you seen the Y2K bug? Please use YYYY format.