Adding zero-value records in a query using cumulative analytical functions

Adding zero-value records in a query using cumulative analytical functions - sql

Input and code:
with data as (
select 1 id, 'A' name, 'fruit' r_group, '2007' year, '04' month, 5 sales from dual union all
select 2 id, 'Z' name, 'fruit' r_group, '2007' year, '04' month, 99 sales from dual union all
select 3 id, 'A' name, 'fruit' r_group, '2008' year, '05' month, 10 sales from dual union all
select 4 id, 'B' name, 'vegetable' r_group, '2008' year, '07' month, 20 sales from dual
)
select year,
month,
r_group,
sum(sales) sales,
sum(opening) opening,
sum(closing) closing
from (
select t.*,
(sum(sales) over (partition by name, r_group
order by year, month
rows between unbounded preceding and current row
) -sales ) as opening,
sum(sales) over (partition by name, r_group
order by year, month
rows between unbounded preceding and current row
) as closing
from data t
)
group by year, month, r_group
order by year, month
Output:
year | month | r_group | sales | opening | closing |
2007 | 04 | fruit | 104 | 0 | 104 |
2008 | 05 | fruit | 10 | 5 | 15 |
2008 | 07 | vegetable | 20 | 0 | 20 |
I want the output to be like the following:
year | month | r_group | sales | opening | closing |
2007 | 04 | fruit | 104 | 0 | 104 |
2008 | 05 | fruit | 10 | 104 | 114 |
2008 | 07 | vegetable | 20 | 0 | 20 |
I can achieve the desired output only by adding a zero-valued record in the data for month=05 and for name = 'Z' like this:
select 1 id, 'A' name, 'fruit' r_group, '2007', year '04' month, 5 sales from dual union all
select 2 id, 'Z' name, 'fruit' r_group, '2007', year '04' month, 99 sales from dual union all
select 3 id, 'A' name, 'fruit' r_group, '2008', year '05' month, 10 sales from dual union all
select 4 id, 'Z' name, 'fruit' r_group, '2008', year '05' month, 0 sales from dual union all
select 5 id, 'B' name, 'vegetable' r_group, '2008', year '07' month, 20 sales from dual ))
However, I want to know if I can do this as part of the select query without having to edit the data itself.
EDIT
The inner select statement will input into a database table the detailed version: year, month, name, r_group, opening, closing. In other words the result of this query will be used to populate the db table and then aggregation using the outer query will happen afterwards:
select t.*,
(sum(sales) over (partition by name, r_group
order by year, month
rows between unbounded preceding and current row
) -sales ) as opening,
sum(sales) over (partition by name, r_group
order by year, month
rows between unbounded preceding and current row
) as closing
from data t
then I'll use an aggregate on that using an analytical tool (3rd party) to aggregate on r_group only without including the name. But the year, month, name, r_group detail must exist in the background.
EDIT 2
In other workds, I'm trying to dynamically add missing data. For instance, if name = 'Z' exists in 2007,04 but DOES NOT in 2008,05 then the cumulative function will fail once it gets to 2008. Because, it does not have a name ='Z' in 2008 to start with it fails.

Instead of CURRENT ROW you can use PRECEDING keyword to sum till the previous row.
with data as (
select 1 id, 'A' name, 'fruit' r_group, '2007' year, '04' month, 5 sales from dual union all
select 2 id, 'Z' name, 'fruit' r_group, '2007' year, '04' month, 99 sales from dual union all
select 3 id, 'A' name, 'fruit' r_group, '2008' year, '05' month, 10 sales from dual union all
select 4 id, 'B' name, 'vegetable' r_group, '2008' year, '07' month, 20 sales from dual )
select t.*,
coalesce(sum(sales) over (partition by r_group order by year, month rows between unbounded preceding and 1 preceding),0) opening,
sum(sales) over (partition by r_group order by year, month rows between unbounded preceding and current row) closing
from (
select year, month, r_group, sum(sales) sales
from data
group by year, month, r_group
) t
order by 3,1,2;
year month r_group sales opening closing
---------------------------------------------------
2007 04 fruit 104 0 104
2008 05 fruit 10 104 114
2008 07 vegetable 20 0 20

Group by R_GROUP, YEAR and MONTH first then use the analytical query:
SELECT t.*,
SUM( sales ) OVER ( PARTITION BY r_group ORDER BY year, month ) - sales
AS opening,
SUM( sales ) OVER ( PARTITION BY r_group ORDER BY year, month ) AS closing
FROM (
SELECT r_group,
year,
month,
SUM( sales ) AS sales
FROM data
GROUP BY r_group, year, month
) t
ORDER BY year, month
Update:
This will also include the name in the output:
SELECT t.*,
SUM( sales ) OVER ( PARTITION BY r_group, dt ) AS r_group_month_sales,
COALESCE(
SUM( sales ) OVER (
PARTITION BY r_group
ORDER BY dt
RANGE BETWEEN UNBOUNDED PRECEDING AND INTERVAL '1' MONTH PRECEDING
),
0
) AS opening,
SUM( sales ) OVER (
PARTITION BY r_group
ORDER BY dt
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS closing
FROM (
SELECT d.*,
TO_DATE( year || month, 'YYYYMM' ) AS dt
FROM data d
) t
ORDER BY dt
Output:
ID NAME R_GROUP YEAR MONTH SALES DT R_GROUP_MONTH_SALES OPENING CLOSING
-- ---- --------- ---- ----- ----- ---------- ------------------- ------- -------
1 A fruit 2007 04 5 2007-04-01 104 0 104
2 Z fruit 2007 04 99 2007-04-01 104 0 104
3 A fruit 2008 05 10 2008-05-01 10 104 114
4 B vegetable 2008 07 20 2008-07-01 20 0 20
You can then do whatever processing you want on the output of this query.
Maybe something like this:
SELECT year,
month,
r_group,
MAX( r_group_month_sales ) AS sales,
MAX( opening ) AS opening,
MAX( closing ) AS closing,
YOUR_THIRD_PARTY_AGGREGATION_FUNCTION( column_names ) AS other
FROM (
-- insert the query above
)
GROUP BY year, month, r_group
ORDER BY year, month

Related

Percentile for Year-to-Day (successive YtD)

I have the following data:
ID |MPERIOD|FRDATE |FR
===+=======+==========+==
100|2017M01|01.01.2017|60 \ \ \
101|2017M01|02.01.2017|75 > YtD 2017M01 | |
103|2017M01|08.01.2017|48 / > Ytd 2017M02 |
104|2017M02|06.02.2017|55 | > YtD 2017M03
105|2017M02|15.02.2017|63 / |
106|2017M03|18.03.2017|41 |
107|2017M03|22.03.2017|71 /
...|.......|..........|..
I need to calculate 80% percentile for each month and for YtD in (up to) that month (from start of year up to current calculation moment).
I use the following SQL query:
SELECT DISTINCT mperiod,
ROUND(PERCENTILE_CONT(0.8) WITHIN GROUP (ORDER BY fr OVER (PARTITION BY mperiod),2) "80%_FR",
ROUND(PERCENTILE_CONT(0.8) WITHIN GROUP (ORDER BY fr OVER (PARTITION BY SUBSTR(mperiod,1,4)),2) "80%_FR_YtD"
FROM mytable
ORDER BY 1
If I run this query in last day of month when I do not have data for the following month yet then this SQL will correctly calculate YtD value. For example, if I have data for first six months and do not have data for seventh month, and calculate this for sixth month then calculation with year partition OVER (PARTITION BY SUBSTR(mperiod,1,4) will calculate correct YtD value. But if I have data after this month it will be included in PARTITION BY and will not calculate up to that moment.
How to calculate YtD retroactively, for previous months!? For example, the calculation of YtD for third month should include calculation for only those first three months in year, not all months in year.

Since you can't use a windowing clause or add in additional order by columns in PERCENTILE_CONT (boo!), here's one way of achieving your aims. N.B. it's not pretty, and I'm sure it won't be terrifically performant, but it should work at least!
WITH mytable AS (SELECT 100 ID, '2017M01' mperiod, to_date('01/01/2017', 'dd/mm/yyyy') frdate, 60 fr FROM dual UNION ALL
SELECT 101 ID, '2017M01' mperiod, to_date('02/01/2017', 'dd/mm/yyyy') frdate, 75 fr FROM dual UNION ALL
SELECT 103 ID, '2017M01' mperiod, to_date('08/01/2017', 'dd/mm/yyyy') frdate, 48 fr FROM dual UNION ALL
SELECT 104 ID, '2017M02' mperiod, to_date('06/02/2017', 'dd/mm/yyyy') frdate, 55 fr FROM dual UNION ALL
SELECT 105 ID, '2017M02' mperiod, to_date('15/02/2017', 'dd/mm/yyyy') frdate, 63 fr FROM dual UNION ALL
SELECT 106 ID, '2017M03' mperiod, to_date('18/03/2017', 'dd/mm/yyyy') frdate, 41 fr FROM dual UNION ALL
SELECT 107 ID, '2017M03' mperiod, to_date('22/03/2017', 'dd/mm/yyyy') frdate, 71 fr FROM dual UNION ALL
SELECT 108 ID, '2016M12' mperiod, to_date('22/12/2016', 'dd/mm/yyyy') frdate, 42 fr FROM dual UNION ALL
SELECT 109 ID, '2016M11' mperiod, to_date('22/11/2016', 'dd/mm/yyyy') frdate, 32 fr FROM dual),
unpckd AS (SELECT mt.ID,
mt.mperiod,
mt.frdate,
mt.fr,
CASE WHEN substr(mt.mperiod, -2) <= d.id THEN SUBSTR(mt.mperiod, 1, 5) || to_char(d.id, 'fm09')
END new_mperiod,
d.id dummy_id
FROM mytable mt
INNER JOIN (SELECT LEVEL ID
FROM dual
CONNECT BY LEVEL <= 12) d ON substr(mt.mperiod, -2) <= d.id),
res AS (SELECT mperiod,
new_mperiod,
ROUND(PERCENTILE_CONT(0.8) WITHIN GROUP (ORDER BY fr) OVER (PARTITION BY CASE WHEN mperiod = new_mperiod THEN mperiod END),2) fr_80,
ROUND(PERCENTILE_CONT(0.8) WITHIN GROUP (ORDER BY fr) OVER (PARTITION BY new_mperiod),2) fr_80_ytd
FROM unpckd)
SELECT DISTINCT new_mperiod mperiod,
fr_80 "80%_FR",
fr_80_ytd "80%_FR_YtD"
FROM res
WHERE new_mperiod = mperiod
ORDER BY 1;
MPERIOD 80%_FR 80%_FR_YtD
-------- ---------- ----------
2016M11 32 32
2016M12 42 40
2017M01 69 69
2017M02 61.4 65.4
2017M03 65 69.4
This works by doing a partial cross join between the numbers 1 to 12 (12 months in the year) and the last two digits of the mperiod. Once we have that, we now know the overall ytd period that the rows belong to (ie. number 1 will match to the 2017M01, 2 will match to 2017M01 and 2017M02, etc), so you can now produce a label for this calculated value (which I've called new_mperiod) and use that to partition against.
It's obviously going to be inefficient (since the partial cross join will generate more rows than is necessary for a year that's not got data for all its months, which get filtered out later, but I can't think of a better way of doing it.

Dynamically adding zero-valued records for subsequent APs for analytical function to work

with data as (
select 1 id, 'A' name, 'fruit' r_group, '2007' year, '04' month, 5 sales from dual union all
select 2 id, 'Z' name, 'fruit' r_group, '2007' year, '04' month, 99 sales from dual union all
select 3 id, 'A' name, 'fruit' r_group, '2008' year, '05' month, 10 sales from dual union all
select 4 id, 'B' name, 'vegetable' r_group, '2008' year, '07' month, 20 sales from dual
)
select t.*,
(sum(sales) over (partition by name, r_group
order by year, month
rows between unbounded preceding and current row
) -sales ) as opening,
sum(sales) over (partition by name, r_group
order by year, month
rows between unbounded preceding and current row
) as closing
from data t
order by year , month
Output will be:
year | month | name | r_group | sales | opening | closing |
2007 | 04 | 'A' | fruit | 5 | 0 | 5 |
2007 | 04 | 'Z' | fruit | 99 | 0 | 99 |
2008 | 05 | 'A' | fruit | 10 | 5 | 15 |
2008 | 07 | 'B' | vegetable | 20 | 0 | 20 |
If I aggregate now on top of this select statement using this:
select year, month, r_group, sum(sales) sales, sum(opening) opening, sum(closing) closing from (
select t.*,
(sum(sales) over........
)
group by year, month, r_group
order by year, month
I get the following result:
year | month | r_group | sales | opening | closing |
2007 | 04 | fruit | 104 | 0 | 104 |
2008 | 05 | fruit | 10 | 5 | 15 |
2008 | 07 | vegetable | 20 | 0 | 20 |
which is wrong. Notice that the value of name='Z' has not been taken into account at all in 2008. Since the cumulative function works backwards it didn't have a name='Z' record in 2008 to go backwards with. If I put a zero-value record in 2008, for name = 'Z' then it will work. I want to avoid adding dummy zero-valued records and have this done dynamically in the query. If I add the zero-valued record in the data like this:
select 1 id, 'A' name, 'fruit' r_group, '2007', year '04' month, 5 sales from dual union all
select 2 id, 'Z' name, 'fruit' r_group, '2007', year '04' month, 99 sales from dual union all
select 3 id, 'A' name, 'fruit' r_group, '2008', year '05' month, 10 sales from dual union all
select 4 id, 'Z' name, 'fruit' r_group, '2008', year '05' month, 0 sales from dual union all
select 5 id, 'B' name, 'vegetable' r_group, '2008', year '07' month, 20 sales from dual ))
then the first query will output:
year | month | name | r_group | sales | opening | closing |
2007 | 04 | 'A' | fruit | 5 | 0 | 5 |
2007 | 04 | 'Z' | fruit | 99 | 0 | 99 |
2008 | 05 | 'A' | fruit | 10 | 5 | 15 |
2008 | 05 | 'Z' | fruit | 0 | 99 | 99 |
2008 | 07 | 'B' | vegetable | 20 | 0 | 20 |
and If i aggregate again using the second outer select I will get:
year | month | r_group | sales | opening | closing |
2007 | 04 | fruit | 104 | 0 | 104 |
2008 | 05 | fruit | 10 | 104 | 114 |
2008 | 07 | vegetable | 20 | 0 | 20 |
which is correct. However, as I mentioned, I do not want to add zero-valued records. There is discussion on just this topic here: https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:8912311513313 but I haven't been able to make this work.

A fairly simplistic approach (and similar to what that AskTom link shows) is to extract all the year/month pairs, and all the name/r_group pairs, and then cross-join those:
with data as (
select 1 id, 'A' name, 'fruit' r_group, '2007' year, '04' month, 5 sales from dual union all
select 2 id, 'Z' name, 'fruit' r_group, '2007' year, '04' month, 99 sales from dual union all
select 3 id, 'A' name, 'fruit' r_group, '2008' year, '05' month, 10 sales from dual union all
select 4 id, 'B' name, 'vegetable' r_group, '2008' year, '07' month, 20 sales from dual
)
select a.year, a.month, b.name, b.r_group, nvl(d.sales, 0) as sales
from (select distinct year, month from data) a
cross join (select distinct name, r_group from data) b
left join data d on d.year = a.year and d.month = a.month and d.name = b.name and d.r_group = b.r_group
order by year, month, name, r_group;
YEAR MO N R_GROUP SALES
---- -- - --------- ----------
2007 04 A fruit 5
2007 04 B vegetable 0
2007 04 Z fruit 99
2008 05 A fruit 10
2008 05 B vegetable 0
2008 05 Z fruit 0
2008 07 A fruit 0
2008 07 B vegetable 20
2008 07 Z fruit 0
But that produces more rows than you wanted with your first level fo aggregation:
YEAR MO N R_GROUP SALES OPENING CLOSING
---- -- - --------- ---------- ---------- ----------
2007 04 A fruit 5 0 5
2007 04 B vegetable 0 0 0
2007 04 Z fruit 99 0 99
2008 05 A fruit 10 5 15
2008 05 B vegetable 0 0 0
2008 05 Z fruit 0 99 99
2008 07 A fruit 0 15 15
2008 07 B vegetable 20 0 20
2008 07 Z fruit 0 99 99
and when aggregated with your second level (from the other query) would produce extra rows for, say, 2007/04/vegetable:
YEAR MO R_GROUP SALES OPENING CLOSING
---- -- --------- ---------- ---------- ----------
2007 04 fruit 104 0 104
2007 04 vegetable 0 0 0
2008 05 fruit 10 104 114
2008 05 vegetable 0 0 0
2008 07 fruit 0 114 114
2008 07 vegetable 20 0 20
which you could partially filter those out before aggregating because all the intermediate columns would be zero:
with data as (
select 1 id, 'A' name, 'fruit' r_group, '2007' year, '04' month, 5 sales from dual union all
select 2 id, 'Z' name, 'fruit' r_group, '2007' year, '04' month, 99 sales from dual union all
select 3 id, 'A' name, 'fruit' r_group, '2008' year, '05' month, 10 sales from dual union all
select 4 id, 'B' name, 'vegetable' r_group, '2008' year, '07' month, 20 sales from dual
)
select year,
month,
r_group,
sum(sales) sales,
sum(opening) opening,
sum(closing) closing
from (
select t.*,
(sum(sales) over (partition by name, r_group
order by year, month
rows between unbounded preceding and current row
) -sales ) as opening,
sum(sales) over (partition by name, r_group
order by year, month
rows between unbounded preceding and current row
) as closing
from (
select a.year, a.month, b.name, b.r_group, nvl(d.sales, 0) as sales
from (select distinct year, month from data) a
cross join (select distinct name, r_group from data) b
left join data d
on d.year = a.year and d.month = a.month and d.name = b.name and d.r_group = b.r_group
) t
)
where sales != 0 or opening != 0 or closing != 0
group by year, month, r_group
order by year, month;
to get:
YEAR MO R_GROUP SALES OPENING CLOSING
---- -- --------- ---------- ---------- ----------
2007 04 fruit 104 0 104
2008 05 fruit 10 104 114
2008 07 fruit 0 114 114
2008 07 vegetable 20 0 20
You could further filter that result to remove rows where the aggregated sales value is still zero, though if you're doing that the filter before aggregation isn't needed any more; but it's still a bit messy. And it isn't clear if your outermost aggregation can be modified to do that.

This can be done using a partitioned outer join - but first you have to find the distinct name/r_group combinations and then partition outer join accordingly:
with data as (select 1 id, 'A' name, 'fruit' r_group, '2007' year, '04' month, 5 sales from dual union all
select 2 id, 'Z' name, 'fruit' r_group, '2007' year, '04' month, 99 sales from dual union all
select 3 id, 'A' name, 'fruit' r_group, '2008' year, '05' month, 10 sales from dual union all
select 4 id, 'B' name, 'vegetable' r_group, '2008' year, '07' month, 20 sales from dual),
data2 as (select distinct name, r_group
from data),
res as (select d.year,
d.month,
d2.r_group,
d.id,
d2.name,
nvl(d.sales, 0) sales,
sum(nvl(d.sales, 0)) over (partition by d2.name, d2.r_group
order by d.year, d.month
rows between unbounded preceding and current row) - nvl(d.sales,0) as opening,
sum(nvl(d.sales, 0)) over (partition by d2.name, d2.r_group
order by d.year, d.month
rows between unbounded preceding and current row) as closing
from data2 d2
left outer join data d partition by (d.year, d.month) on (d.name = d2.name and d.r_group = d2.r_group))
select year,
month,
r_group,
sum(sales) sales,
sum(opening) opening,
sum(closing) closing
from res
where sales != 0
or opening != 0
or closing != 0
group by year,
month,
r_group
order by year,
month;
YEAR MONTH R_GROUP SALES OPENING CLOSING
---- ----- --------- ---------- ---------- ----------
2007 04 fruit 104 0 104
2008 05 fruit 10 104 114
2008 07 fruit 0 114 114
2008 07 vegetable 20 0 20
This is very similar to Alex's answer, but the use of the partition outer join negates the need to find the distinct year/month pairs, as that is taken care of in the join clause.

Add missing data from previous month or year cumulatively

Say I have the following data:
select 1 id, 'A' name, '2007' year, '04' month, 5 sales from dual union all
select 2 id, 'A' name, '2007' year, '05' month, 2 sales from dual union all
select 3 id, 'B' name, '2008' year, '12' month, 3 sales from dual union all
select 4 id, 'B' name, '2009' year, '12' month, 56 sales from dual union all
select 5 id, 'C' name, '2009' year, '08' month, 89 sales from dual union all
select 13 id,'B' name, '2016' year, '01' month, 10 sales from dual union all
select 14 id,'A' name, '2016' year, '02' month, 8 sales from dual union all
select 15 id,'D' name, '2016' year, '03' month, 12 sales from dual union all
select 16 id,'E' name, '2016' year, '04' month, 34 sales from dual
I want to cumulatively add up all the sales across all years and their respective periods (months). The output should look like the following:
name year month sale opening bal closing bal
A 2007 04 5 0 5
A 2007 05 2 5 7
B 2008 12 3 12 15
A 2008 04 0 5 5 -- to be generated
A 2008 05 0 7 7 -- to be generated
B 2009 12 56 15 71
C 2009 08 89 71 160
A 2009 04 0 5 5 -- to be generated
A 2009 05 0 7 7 -- to be generated
B 2016 01 10 278 288
B 2016 12 0 71 71 -- to be generated
A 2016 02 8 288 296
A 2016 04 0 5 5 -- to be generated
A 2016 05 0 7 7 -- to be generated
D 2016 03 12 296 308
E 2016 04 34 308 342
C 2016 08 0 160 160 -- to be generated
The Opening balance is the closing balance of previous month, and if it goes into next year than the opening balance for next year is the closing balance of the previous year. It should be able to work like this for subsequent years. I've got this part working. However, I don't know how to get around ths missing in say 2009 that exists in 2008. For instance the key A,2008,04 and also A,2008,05 does not exist in 2009 and the code should be able to add it in 2009 like above. Same applies for other years and months.
I'm working on Oracle 12c.
Thanks in advance.

A variation on #boneists approach, starting with your sample data in a CTE:
with t as (
select 1 id, 'A' name, '2007' year, '04' month, 5 sales from dual union all
select 2 id, 'A' name, '2007' year, '05' month, 2 sales from dual union all
select 3 id, 'B' name, '2008' year, '12' month, 3 sales from dual union all
select 4 id, 'B' name, '2009' year, '12' month, 56 sales from dual union all
select 5 id, 'C' name, '2009' year, '08' month, 89 sales from dual union all
select 13 id,'B' name, '2016' year, '01' month, 10 sales from dual union all
select 14 id,'A' name, '2016' year, '02' month, 8 sales from dual union all
select 15 id,'D' name, '2016' year, '03' month, 12 sales from dual union all
select 16 id,'E' name, '2016' year, '04' month, 34 sales from dual
),
y (year, rnk) as (
select year, dense_rank() over (order by year)
from (select distinct year from t)
),
r (name, year, month, sales, rnk) as (
select t.name, t.year, t.month, t.sales, y.rnk
from t
join y on y.year = t.year
union all
select r.name, y.year, r.month, 0, y.rnk
from y
join r on r.rnk = y.rnk - 1
where not exists (
select 1 from t where t.year = y.year and t.month = r.month and t.name = r.name
)
)
select name, year, month, sales,
nvl(sum(sales) over (partition by name order by year, month
rows between unbounded preceding and 1 preceding), 0) as opening_bal,
nvl(sum(sales) over (partition by name order by year, month
rows between unbounded preceding and current row), 0) as closing_bal
from r
order by year, month, name;
Which gets the same result too, though it also doesn't match the expected results in the question:
NAME YEAR MONTH SALES OPENING_BAL CLOSING_BAL
---- ---- ----- ---------- ----------- -----------
A 2007 04 5 0 5
A 2007 05 2 5 7
A 2008 04 0 7 7
A 2008 05 0 7 7
B 2008 12 3 0 3
A 2009 04 0 7 7
A 2009 05 0 7 7
C 2009 08 89 0 89
B 2009 12 56 3 59
B 2016 01 10 59 69
A 2016 02 8 7 15
D 2016 03 12 0 12
A 2016 04 0 15 15
E 2016 04 34 0 34
A 2016 05 0 15 15
C 2016 08 0 89 89
B 2016 12 0 69 69
The y CTE (feel free to use more meaningful names!) generates all the distinct years from your original data, and also adds a ranking, so 2007 is 1, 2008 is 2, 2009 is 3, and 2016 is 4.
The r recursive CTE combines your actual data with dummy rows with zero sales, based on the name/month data from previous years.
From what that recursive CTE produces you can do your analytic cumulative sum to add the opening/closing balances. This is using windowing clauses to decide which sales values to include - essentially the opening and closing balances are the sum of all values up to this point, but opening doesn't include the current row.

This is the closest I can get to your result, although I realise it's not an exact match. For example, your opening balances don't look correct (where did the opening balance of 12 come from for the output row for id = 3?). Anyway, hopefully the following will enable you to amend as appropriate:
with sample_data as (select 1 id, 'A' name, '2007' year, '04' month, 5 sales from dual union all
select 2 id, 'A' name, '2007' year, '05' month, 2 sales from dual union all
select 3 id, 'B' name, '2008' year, '12' month, 3 sales from dual union all
select 4 id, 'B' name, '2009' year, '12' month, 56 sales from dual union all
select 5 id, 'C' name, '2009' year, '08' month, 89 sales from dual union all
select 13 id, 'B' name, '2016' year, '01' month, 10 sales from dual union all
select 14 id, 'A' name, '2016' year, '02' month, 8 sales from dual union all
select 15 id, 'D' name, '2016' year, '03' month, 12 sales from dual union all
select 16 id, 'E' name, '2016' year, '04' month, 34 sales from dual),
dts as (select distinct year
from sample_data),
res as (select sd.name,
dts.year,
sd.month,
nvl(sd.sales, 0) sales,
min(sd.year) over (partition by sd.name, sd.month) min_year_per_name_month,
sum(nvl(sd.sales, 0)) over (partition by name order by to_date(dts.year||'-'||sd.month, 'yyyy-mm')) - nvl(sd.sales, 0) as opening,
sum(nvl(sd.sales, 0)) over (partition by name order by to_date(dts.year||'-'||sd.month, 'yyyy-mm')) as closing
from dts
left outer join sample_data sd partition by (sd.name, sd.month) on (sd.year = dts.year))
select name,
year,
month,
sales,
opening,
closing
from res
where (opening != 0 or closing != 0)
and year >= min_year_per_name_month
order by to_date(year||'-'||month, 'yyyy-mm'),
name;
NAME YEAR MONTH SALES OPENING CLOSING
---- ---- ----- ---------- ---------- ----------
A 2007 04 5 0 5
A 2007 05 2 5 7
A 2008 04 0 7 7
A 2008 05 0 7 7
B 2008 12 3 0 3
A 2009 04 0 7 7
A 2009 05 0 7 7
C 2009 08 89 0 89
B 2009 12 56 3 59
B 2016 01 10 59 69
A 2016 02 8 7 15
D 2016 03 12 0 12
A 2016 04 0 15 15
E 2016 04 34 0 34
A 2016 05 0 15 15
C 2016 08 0 89 89
B 2016 12 0 69 69
I've used Partition Outer Join to link any month and name combination in the table (in my query, the sample_data subquery - you wouldn't need that subquery, you'd just use your table instead!) to any year in the same table, and then working out the opening / closing balances. I then discard any rows that have an opening and closing balance of 0.

SQL SUMs in where clause with conditionals

I want to get a "totals" report for business XYZ. They want the season,term,distinct count of employees, and total employee's dropped hours, only when dropped hours of anemployee != any adds that equal the drops.
trying to do something like this:
select year,
season,
(select count(distinct empID)
from tableA
where a.season = season
and a.year = year) "Employees",
(select sum(hours)
from(
select distinct year,season,empID,hours
from tableA
where code like 'Drop%'
)
where a.season = season
and a.year = year) "Dropped"
from tableA a
-- need help below
where (select sum(hours)
from(
select distinct year,season,empID,hours
from tableA
where code like 'Drop%'
)
where a.season = season
and a.year = year
and a.emplID = emplID)
!=
(select sum(hours)
from(
select distinct year,season,empID,hours
from tableA
where code like 'Add%'
)
where a.season = season
and a.year = year
and a.emplID = emplID)
group by year,season
It appears I am not correctly doing my where clause correctly. I dont believe I am joining the emplID to each emplID correctly to exlude those whos "drops" <> "adds"
EDIT:
sample data:
year,season,EmplID,hours,code
2015, FALL, 001,10,Drop
20150 FALL, 001,10,Add
2015,FALL,002,5,Drop
2015,FALL,003,10,Drop
The total hours should be 15. EmplyID 001 should be removed from the totaling because he has drops that are exactly equal to adds.

I managed to work it out with a bit of analytics .. ;)
with tableA as (
select 2015 year, 1 season, 1234 empID, 2 hours , 'Add' code from dual union all
select 2015 year, 1 season, 1234 empID, 3 hours , 'Add' code from dual union all
select 2015 year, 1 season, 1234 empID, 4 hours , 'Add' code from dual union all
select 2015 year, 1 season, 1234 empID, 2 hours , 'Drop' code from dual union all
select 2015 year, 1 season, 2345 empID, 5 hours , 'Add' code from dual union all
select 2015 year, 1 season, 2345 empID, 3.5 hours, 'Add' code from dual union all
select 2015 year, 2 season, 1234 empID, 7 hours , 'Add' code from dual union all
select 2015 year, 2 season, 1234 empID, 5 hours , 'Add' code from dual union all
select 2015 year, 2 season, 2345 empID, 5 hours , 'Add' code from dual union all
select 2015 year, 2 season, 7890 empID, 3 hours , 'Add' code from dual union all
select 2014 year, 1 season, 1234 empID, 1 hours , 'Add' code from dual union all
select 2014 year, 1 season, 1234 empID, 2 hours , 'Add' code from dual union all
select 2014 year, 1 season, 1234 empID, 4 hours , 'Add' code from dual
),
w_group as (
select year, season, empID, hours, code,
lead(hours) over (partition by year, season, empID, hours
order by case when code like 'Drop%' then 'DROP'
when code like 'Add%' then 'ADD'
else NULL end ) new_hours
from tableA
)
select year, season, count(distinct empID),
sum(hours-nvl(new_hours,0)) total_hours
from w_group
where code like 'Add%'
group by year, season
/
YEAR SEASON COUNT(DISTINCTEMPID) TOTAL_HOURS
---------- ---------- -------------------- -----------
2015 1 2 15.5
2014 1 1 7
2015 2 3 20
(the first part "with tableA" is just faking some data, since you didn't provide any) :)
[edit]
corrected based on your data, and your explanation - in short, you're counting the DROPs, (minus the ADDs), I was doing the reverse
[edit2] replaced below query with minor tweak based on comment/feedback: don't count an empID if their DROP-ADD zero out)
with tableA as (
select 2015 year, 'FALL' season, '001' empID, 10 hours, 'Drop' code from dual union all
select 2015 year, 'FALL' season, '001' empID, 10 hours, 'Add' code from dual union all
select 2015 year, 'FALL' season, '002' empID, 5 hours, 'Drop' code from dual union all
select 2015 year, 'FALL' season, '003' empID, 10 hours, 'Drop' code from dual
),
w_group as (
select year, season, empID, hours, code,
lag(hours) over (partition by year, season, empID, hours
order by case when code like 'Drop%' then 'DROP'
when code like 'Add%' then 'ADD'
else NULL end ) new_hours
from tableA
)
select year, season, count(distinct empID),
sum(hours-nvl(new_hours,0)) total_hours
from w_group
where code like 'Drop%'
and hours - nvl(new_hours,0) > 0
group by year, season
/
YEAR SEAS COUNT(DISTINCTEMPID) TOTAL_HOURS
---------- ---- -------------------- -----------
2015 FALL 2 15
[/edit]

I think you can do what you want with just conditional aggregation. Something like this:
select year, season, count(distinct empID) as Employees,
sum(case when code like 'Drop%' then hours end) as Dropped
from tableA
group by year, season;
It is hard to tell exactly what you want, because you do not have sample data and desired results (or better yet, a SQL Fiddle). You might also want a having clause:
having (sum(case when code like 'Drop%' then hours end) <>
sum(case when code like 'Add%' then hours end)
)

Are you wanting the result of something like this?
SELECT
year
,season
,COUNT(DISTINCT empID) AS Employees
,SUM(CASE WHEN code LIKE 'Drop%' THEN hours ELSE 0 END) AS Dropped
FROM
TableA
GROUP BY
year
,season
HAVING
(
SUM(CASE WHEN code LIKE 'Drop%' THEN hours ELSE 0 END)
- SUM(CASE WHEN code LIKE 'Add%' THEN hours ELSE 0 END)
) <> 0

Moving average of 2 columns

Hello I have a problem. I know how to calculate moving average last 3 months using oracle analytic functions... but my situatiion is a little different
Month-----ProductType-----Sales----------Average(HAVE TO FIND THIS)
1---------A---------------10
1---------B---------------12
1---------C---------------17
2---------A---------------21
3---------C---------------2
3---------B---------------21
4---------B---------------23
5
6
7
8
9
So we have sales for each month and each product type... I need to calculate the moving average of the last 3 months and the particular product.
example:
For month 4 and Produt B it would be (21+0+12)/3
Any ideas ?

Another option is to use the windowing clause of analytic functions
with my_data as (
select 1 as month, 'A' as product, 10 as sales from dual union all
select 1 as month, 'B' as product, 12 as sales from dual union all
select 1 as month, 'C' as product, 17 as sales from dual union all
select 2 as month, 'A' as product, 21 as sales from dual union all
select 3 as month, 'C' as product, 2 as sales from dual union all
select 3 as month, 'B' as product, 21 as sales from dual union all
select 4 as month, 'B' as product, 23 as sales from dual
)
select
month,
product,
sales,
nvl(sum(sales)
over (partition by product order by month
range between 3 preceding and 1 preceding),0)/3 as average_sales
from my_data
order by month, product

SELECT month,
productType,
sales,
(lag(sales, 3) over (partition by produtType order by month) +
lag(sales, 2) over (partition by productType order by month) +
lag(sales, 1) over (partition by productType order by month)/3 moving_avg
FROM your_table_name

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Adding zero-value records in a query using cumulative analytical functions - sql

Related

Percentile for Year-to-Day (successive YtD)

Dynamically adding zero-valued records for subsequent APs for analytical function to work

Add missing data from previous month or year cumulatively

SQL SUMs in where clause with conditionals

Moving average of 2 columns

Categories

Resources