sql group by monthly sum and sum year by month

sql group by monthly sum and sum year by month - sql

RDBMS - Latest Oracle
I'm out of my element here. I need to organize account transaction information by account and by month, and also use another column to show summed transactions for year to date. Here is a depiction of what I'm trying to get
ACCT_ID | ACCT_MM | FISCAL_YYYY | FISCAL_MM_AMT | YTD_AMT
------------------------------------------------------------
1 | 11 | 2018 | 25 | 100
1 | 12 | 2018 | 50 | 150
1 | 01 | 2019 | 20 | 20
I know you can get FISCAL_MM_AMT with a group by ACCT_MM, FISCAL_YYYY
this is all I have figured out so far.
SELECT ACCT_ID,ACCT_MM,FISCAL_YYYY,SUM(NVL(ACCT_TRNSCTN_AMT,0))
FROM TBL_ACCT_DETAIL
GROUP BY ACCT_ID,ACCT_MM,FISCAL_YYYY
Now how to combine this with the additional column YTD_AMT to that adds up all totals for that year up to the current month is what has me baffled. sql noob ftw.

Try analytical function SUM as following:
SELECT T.*,
SUM(FISCAL_MM_AMT)
OVER (PARTITION BY ACCT_ID, FISCAL_YYYY
ORDER BY ACCT_MM) AS YTD_AMT
FROM
(SELECT ACCT_ID,ACCT_MM,FISCAL_YYYY,SUM(NVL(ACCT_TRNSCTN_AMT,0)) AS FISCAL_MM_AMT
FROM TBL_ACCT_DETAIL
GROUP BY ACCT_ID,ACCT_MM,FISCAL_YYYY);
Cheers!!

You can use the cumulative sum window function:
SELECT ACCT_ID, ACCT_MM, FISCAL_YYYY,
COALESCE(SUM(ACCT_TRNSCTN_AMT), 0),
SUM(SUM(ACCT_TRNSCTN_AMT)) OVER (PARTITION BY ACCT_ID, FISCAL_YYYY ORDER BY ACCT_MM) AS YTD
FROM TBL_ACCT_DETAIL
GROUP BY ACCT_ID, ACCT_MM, FISCAL_YYYY

Do you want to have one additional column bringing the sum of the year with it, and a year-to-date column?
OLAP functions are a prerequisite. But every respectable RDBMS should offer those by now.
Then, I think (adding what I presume should be the input) ... you should go:
WITH
---- this is just the input so I have example data
input( acct_id,acct_mm,fiscal_yyyy,fiscal_mm_amt) AS (
SELECT 1, 1, 2018, 5
UNION ALL SELECT 1, 3, 2018, 5
UNION ALL SELECT 1, 4, 2018, 5
UNION ALL SELECT 1, 5, 2018, 10
UNION ALL SELECT 1, 6, 2018, 10
UNION ALL SELECT 1, 7, 2018, 10
UNION ALL SELECT 1, 8, 2018, 10
UNION ALL SELECT 1, 9, 2018, 10
UNION ALL SELECT 1, 10, 2018, 10
UNION ALL SELECT 1, 11, 2018, 25
UNION ALL SELECT 1, 12, 2018, 50
UNION ALL SELECT 1, 01, 2019, 20
)
---- end of input -----
SELECT
*
, SUM(fiscal_mm_amt) OVER(
PARTITION BY acct_id,fiscal_yyyy
) AS fiscal_yy_amt
, SUM(fiscal_mm_amt) OVER(
PARTITION BY acct_id,fiscal_yyyy
ORDER BY acct_mm
) AS ytd_amt
FROM input;
-- out acct_id | acct_mm | fiscal_yyyy | fiscal_mm_amt | fiscal_yy_amt | ytd_amt
-- out ---------+---------+-------------+---------------+---------------+---------
-- out 1 | 1 | 2018 | 5 | 150 | 5
-- out 1 | 3 | 2018 | 5 | 150 | 10
-- out 1 | 4 | 2018 | 5 | 150 | 15
-- out 1 | 5 | 2018 | 10 | 150 | 25
-- out 1 | 6 | 2018 | 10 | 150 | 35
-- out 1 | 7 | 2018 | 10 | 150 | 45
-- out 1 | 8 | 2018 | 10 | 150 | 55
-- out 1 | 9 | 2018 | 10 | 150 | 65
-- out 1 | 10 | 2018 | 10 | 150 | 75
-- out 1 | 11 | 2018 | 25 | 150 | 100
-- out 1 | 12 | 2018 | 50 | 150 | 150
-- out 1 | 1 | 2019 | 20 | 20 | 20
-- out (12 rows)
-- out
-- out Time: First fetch (12 rows): 76.235 ms. All rows formatted: 76.288 ms

Related

SQL Query for a Compare Report from single table

SQL Newb here, I'm having a bit of trouble understanding this problem. How can I write a single SELECT statement where I can have columns with their own WHERE clauses, do a calculation, and group the results.
I can write the query to sum totals and do averages checks grouping by revenue center and fiscal year, but I can't quite grasp how to do side by side compare with a single query.
SALES DATA
| RevenueCenter | FiscalYear | TotalSales | NumChecks |
|---------------|------------|------------|-----------|
| market | 2019 | 2000.00 | 10 |
| restaurant | 2019 | 5000.00 | 25 |
| restaurant | 2020 | 4000.00 | 20 |
| market | 2020 | 3000.00 | 10 |
COMPARE REPORT
| RevenueCenter | TotalSales2020 | TotalSales2019 | %Change | AvgCheck2020 | AvgCheck2019 | %Change |
| market | 3000.00 | 2000.00 | +50% | 300.00 | 200.00 | +50% |
| restaurant | 4000.00 | 5000.00 | -20% | 200.00 | 200.00 | 0% |

Would this help? No big deal, just a self-join with some arithmetic.
SQL> with sales (revenuecenter, fiscalyear, totalsales, numchecks) as
2 -- sample data
3 (select 'market' , 2019, 2000, 10 from dual union all
4 select 'market' , 2020, 3000, 10 from dual union all
5 select 'restaurant', 2019, 5000, 25 from dual union all
6 select 'restaurant', 2020, 4000, 20 from dual
7 )
8 -- query you need
9 select a.revenuecenter,
10 b.totalsales totalsales2020,
11 a.totalsales totalsales2019,
12 --
13 (b.totalsales/a.totalsales) * 100 - 100 "%change totalsal",
14 --
15 b.totalsales / b.numchecks avgcheck2020,
16 a.totalsales / a.numchecks avgcheck2019,
17 --
18 (b.totalsales / b.numchecks) /
19 (a.totalsales / a.numchecks) * 100 - 100 "%change numcheck"
20 from sales a join sales b on a.revenuecenter = b.revenuecenter
21 and a.fiscalyear < b.fiscalyear;
REVENUECEN TOTALSALES2020 TOTALSALES2019 %change totalsal AVGCHECK2020 AVGCHECK2019 %change numcheck
---------- -------------- -------------- ---------------- ------------ ------------ ----------------
market 3000 2000 50 300 200 50
restaurant 4000 5000 -20 200 200 0
SQL>

SQL: Filling in Missing Records with Conditional

I need to count the number of products that existed in inventory by date. In the database however, a product is only recorded when it was viewed by a consumer.
For example consider this basic table structure:
date | productId | views
July 1 | A | 8
July 2 | A | 6
July 2 | B | 4
July 3 | A | 2
July 4 | A | 8
July 4 | B | 6
July 4 | C | 4
July 5 | C | 2
July 10 | A | 17
Using the following query, I attempt to determine the amount of products in inventory on a given date.
select date, count(distinct productId) as Inventory, sum(views) as views
from (
select date, productId, count(*) as views
from SomeTable
group by date, productID
order by date asc, productID asc
)
group by date
This is the output
date | Inventory | views
July 1 | 1 | 8
July 2 | 2 | 10
July 3 | 1 | 2
July 4 | 3 | 18
July 5 | 1 | 2
July 10 | 1 | 17
My output is not an accurate reflection of how many products were in inventory due to missing rows.
The correct understanding of inventory is as follows:
- Product A was present in inventory from July 1 - July 10.
- Product B was present in inventory from July 2 - July 4.
- Product C was in inventory from July 4 - July 5.
The correct SQL output should be:
date | Inventory | views
July 1 | 1 | 8
July 2 | 2 | 10
July 3 | 2 | 2
July 4 | 3 | 18
July 5 | 2 | 2
July 6 | 1 | 0
July 7 | 1 | 0
July 8 | 1 | 0
July 9 | 1 | 0
July 10 | 1 | 17
If you are following along, let me confirm that I am comfortable defining "in inventory" as the date difference between the first & last view.
I have followed the following faulty process:
First I created a table which was the cartesian product of every productID & every date.
'''
with Dates as (
select date
from SomeTable
group by date
),
Products as (
select productId
from SomeTable
group by productId
)
select Dates.date, Products.productId
from Dates cross join Products
'''
Then I attempted do a right outer join to reduce this to just the missing records:
with Records as (
select date, productId, count(*) as views
from SomeTable
group by date, productId
),
Cartesian as (
{See query above}
)
Select Cartesian.date, Cartesian.productId, 0 as views #for upcoming union
from Cartesian right outer join Records
on Cartesian.date = Records.date
where Records.productId is null
Then with the missing rows in hand, union them back onto the Records.
in doing so, I create a new problem: extra rows.
date | productId | views
July 1 | A | 8
July 1 | B | 0
July 1 | C | 0
July 2 | A | 6
July 2 | B | 4
July 2 | C | 0
July 3 | A | 2
July 3 | B | 0
July 3 | C | 0
July 4 | A | 8
July 4 | B | 6
July 4 | C | 4
July 5 | A | 2
July 5 | B | 0
July 5 | C | 0
July 6 | A | 0
July 6 | B | 0
July 6 | C | 0
July 7 | A | 0
July 7 | B | 0
July 7 | C | 0
July 8 | A | 0
July 8 | B | 0
July 8 | C | 0
July 9 | A | 0
July 9 | B | 0
July 9 | C | 0
July 10 | A | 17
July 10 | B | 0
July 10 | C | 0
And when I run my simple query
select date, count(distinct productId) as Inventory, sum(views) as views
on that table I get the wrong output again:
date | Inventory | views
July 1 | 3 | 8
July 2 | 3 | 10
July 3 | 3 | 2
July 4 | 3 | 18
July 5 | 3 | 2
July 6 | 3 | 0
July 7 | 3 | 0
July 8 | 3 | 0
July 9 | 3 | 0
July 10 | 3 | 17
My next thought would be to iterate through each productId, determine it's first & last date, then Union that with the Cartesian table with the condition that the Cartesian.date falls between the first & last date for each specific product.
There's got to be an easier way to do this. Thanks.

Below is for BigQuery Standard SQL
#standardSQL
WITH dates AS (
SELECT day FROM (
SELECT MIN(day) min_day, MAX(day) max_day
FROM `project.dataset.table`
), UNNEST(GENERATE_DATE_ARRAY(min_day, max_day, INTERVAL 1 DAY)) day
), ranges AS (
SELECT productId, MIN(day) min_day, MAX(day) max_day
FROM `project.dataset.table` t
GROUP BY productId
)
SELECT day, COUNT(DISTINCT productId) Inventory, SUM(IFNULL(views, 0)) views
FROM dates d, ranges r
LEFT JOIN `project.dataset.table` USING(day, productId)
WHERE day BETWEEN min_day AND max_day
GROUP BY day
If to apply to sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2019-07-01' day, 'A' productId, 8 views UNION ALL
SELECT '2019-07-02', 'A', 6 UNION ALL
SELECT '2019-07-02', 'B', 4 UNION ALL
SELECT '2019-07-03', 'A', 2 UNION ALL
SELECT '2019-07-04', 'A', 8 UNION ALL
SELECT '2019-07-04', 'B', 6 UNION ALL
SELECT '2019-07-04', 'C', 4 UNION ALL
SELECT '2019-07-05', 'C', 2 UNION ALL
SELECT '2019-07-10', 'A', 17
), dates AS (
SELECT day FROM (
SELECT MIN(day) min_day, MAX(day) max_day
FROM `project.dataset.table`
), UNNEST(GENERATE_DATE_ARRAY(min_day, max_day, INTERVAL 1 DAY)) day
), ranges AS (
SELECT productId, MIN(day) min_day, MAX(day) max_day
FROM `project.dataset.table` t
GROUP BY productId
)
SELECT day, COUNT(DISTINCT productId) Inventory, SUM(IFNULL(views, 0)) views
FROM dates d, ranges r
LEFT JOIN `project.dataset.table` USING(day, productId)
WHERE day BETWEEN min_day AND max_day
GROUP BY day
-- ORDER BY day
result is
Row day Inventory views
1 2019-07-01 1 8
2 2019-07-02 2 10
3 2019-07-03 2 2
4 2019-07-04 3 18
5 2019-07-05 2 2
6 2019-07-06 1 0
7 2019-07-07 1 0
8 2019-07-08 1 0
9 2019-07-09 1 0
10 2019-07-10 1 17

Rolling Average SQL

Hi I have a dataset where I have Year Month and output variables with the values as following:
Year | Month | Output
2015 | 1 | 12
2015 | 2 | 24
2015 | 3 | 2
2015 | 4 | 3
2015 | 5 | 7
2015 | 6 | 3
2015 | 7 | 7
2015 | 8 | 6
2015 | 9 | 7
2015 | 10 | 8
2015 | 11 | 3
2015 | 12 | 6
2016 | 1 | 3
2016 | 2 | 6
2016 | 3 | 8
2016 | 4 | 9
2016 | 5 | 4
......... and so on...
I want to add a new column in the dataset as Rolling_Average
Rolling_Average = Sum of previous 12 month Output/ Output of this month
for example :
Rolling_Average (for 2015-7) = output (2015-01) + output (2015-02) +output (2015-03) + output (2015-04) +output (2015-05) + output (2015-06) / output (2015-07)
I tried couple of queries online to get the output but it didn't work for me. Can someone please help me
Output Required is as follows:
Year | Month | Output | Rolling Average
2015 | 1 | 12 | 12
2015 | 2 | 24 | 0.5
2015 | 3 | 2 | 18
2015 | 4 | 3 | 38/3
2015 | 5 | 7 | 45/7
2015 | 6 | 3 | 48/3
2015 | 7 | 7 | 55/7
2015 | 8 | 6 | 61/6
2015 | 9 | 7 | 68/7
2015 | 10 | 8 | 74/8
2015 | 11 | 3 | 77/3
2015 | 12 | 6 | 83/6
2016 | 1 | 3 | 86/3
2016 | 2 | 6 | 92/6
2016 | 3 | 8 | 100/8
2016 | 4 | 9 | 109/9
2016 | 5 | 4 | 113/4
The Query I tried is :
SELECT DISTINCT
//CALCULATIONS
Year,
Month,
Output,
(sum(CAST(Output) AS DOUBLE)))
over(order by year,month rows between 12 preceding and 1 preceding )
as Rolling_Average
from my_table
group by Year,Month
order by Year,Month
It gives me error :
Syntax error: OVER keyword must follow a function call
Also I have tried other things
Can someone please help me in an easy way . I am using SQL Plx it is similar to SQL
Thank You!

You might have misplaced some parentheses
(sum( CAST(Output) AS DOUBLE ))) over (order by year, month rows between 12 preceding and 1 preceding ) as Rolling_Average
Versus:
SUM( CAST(Output AS DOUBLE) ) OVER (order by year, month rows between 12 preceding and 1 preceding) as Rolling_Average
You can also ROUND that result.
And those records already seem to be unique by Year and Month.
So there's not really a need to group on those.
SELECT
t.Year, t.Month, t.Output,
ROUND(SUM(CAST(t.Output AS INT)) OVER (ORDER BY t.Year, t.Month ROWS BETWEEN 12 PRECEDING AND 1 PRECEDING)*1.0 / CAST(t.Output AS INT), 1) as Rolling_Average
FROM my_table t
ORDER BY t.Year, t.Month;
And if the window functions aren't supported, then this will work:
SELECT
t1.Year, t1.Month, t1.Output,
ROUND(SUM(CAST(t2.Output AS INT))*1.0 / CAST(t1.Output AS INT), 1) as Rolling_Average
FROM my_table t1
LEFT JOIN my_table t2 ON ((t2.Year = t1.Year AND t2.Month < t1.Month) OR
(t2.Year = t1.Year - 1 AND t2.Month >= t1.Month))
GROUP BY t1.Year, t1.Month, t1.Output
ORDER BY t1.Year, t1.Month;
db<>fiddle here

Try this(if you use sql-server)
Select *
from tableName T
outer apply (
select sum(output) Rolling_Average
from tableName T_in on T_in.year = T.year and T_in.Month <= T.Month
)x

HIVE: Finding running totals

I have a Table called Program which have following columns:
ProgDate(Date)
Episode(String)
Impression_id(int)
ProgName(String)
I want to find out total impressions for each date and episode for which i have the below query which is working fine
Select progdate, episode, count(distinct impression_id) Impression from Program where progname='BBC' group by progdate, episode order by progdate, episode;
Result:
ProgDate Episode Impression
20160919 1 5
20160920 1 15
20160921 1 10
20160922 1 5
20160923 2 25
20160924 2 10
20160925 2 25
But I also want to find out the cumulative total for each episode. I tried searching on how to find running total but it is adding up all the previous totals. i want running total for each episode, like below:
Date Episode Impression CumulativeImpressionsPerChannel
20160919 1 5 5
20160920 1 15 20
20160921 1 10 30
20160922 1 5 35
20160923 2 25 25
20160924 2 10 35
20160925 2 25 60

Recent versions of Hive HQL support windowed analytic functions (ref 1) (ref 2) including SUM() OVER()
Assuming you have such a version I have mimicked the syntax using PostgreSQL at SQL Fiddle
CREATE TABLE d
(ProgDate int, Episode int, Impression int)
;
INSERT INTO d
(ProgDate, Episode, Impression)
VALUES
(20160919, 1, 5),
(20160920, 1, 15),
(20160921, 1, 10),
(20160922, 1, 5),
(20160923, 2, 25),
(20160924, 2, 10),
(20160925, 2, 25)
;
Query 1:
select
ProgDate, Episode, Impression
, sum(Impression) over(partition by Episode order by ProgDate) CumImpsPerChannel
, sum(Impression) over(order by ProgDate) CumOverall
from (
Select progdate, episode, count(distinct impression_id) Impression
from Program
where progname='BBC'
group by progdate, episode order by progdate, episode
) d
Results:
| progdate | episode | impression | cumimpsperchannel |
|----------|---------|------------|-------------------|
| 20160919 | 1 | 5 | 5 |
| 20160920 | 1 | 15 | 20 |
| 20160921 | 1 | 10 | 30 |
| 20160922 | 1 | 5 | 35 |
| 20160923 | 2 | 25 | 25 |
| 20160924 | 2 | 10 | 35 |
| 20160925 | 2 | 25 | 60 |

How in query result add 0-data for don't exist rows?

I have Table with columns: "Month" and "Year", and other data.
All row in Table have different values "Month" and "Year".
But for some Month and Year rows don't exist.
I want create SQL-query (... where year in (2010, 2011, 2012) ...), that in result this SQL-query have all Month for select Year and if some month don't exist else add it to result with 0 in other data columns.
Example:
Input: Table
data / month / year
+-----+---+------+
| 3.0 | 1 | 2011 |
| 4.3 | 3 | 2011 |
| 5.7 | 4 | 2011 |
| 2.2 | 5 | 2011 |
| 5.4 | 7 | 2011 |
+-----+---+------+
Output: SELECT ... WHERE year IN (2011)
+-----+----+------+
| 3.0 | 1 | 2011 |
| 0 | 2 | 2011 |
| 4.3 | 3 | 2011 |
| 5.7 | 4 | 2011 |
| 2.2 | 5 | 2011 |
| 0 | 6 | 2011 |
| 5.4 | 7 | 2011 |
| 0 | 8 | 2011 |
| 0 | 9 | 2011 |
| 0 | 10 | 2011 |
| 0 | 11 | 2011 |
| 0 | 12 | 2011 |
+-----+----+------+

Try Partition Outer Join:
SELECT
NVL(T.DATA, 0) DATA,
F.MONTH,
T.YEAR
FROM <your_table> T
PARTITION BY(T.YEAR)
RIGHT JOIN (SELECT LEVEL MONTH FROM DUAL CONNECT BY LEVEL <= 12) F ON T.MONTH = F.MONTH
Add your WHERE clause at the end or create a view with that definition and query against it.

select datecol,
nvl(val,0),
to_char(d.date_col,'MM') month,
to_char(d.date_col,'yyyy') year
from(
select add_months('1-Jan-2011',level-1) as datecol
from dual connect by level <= 12
) d
left join(
select sum(val) as val, month, year
from your_table
group by month, year
) S
on (to_char(d.date_col,'MM') = s.month and to_char(d.date_col,'yyyy') = s.year)

select nvl(t.data, 0), x.month, nvl(t.year, <your_year>) as year
from <your_table> t,
(select rownum as month from dual connect by level < 13) x
where (t.year is null or t.year = <your_year>)
and t.month(+) = x.month
order by x.month

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

sql group by monthly sum and sum year by month - sql

Try analytical function SUM as following: SELECT T.*, SUM(FISCAL_MM_AMT) OVER (PARTITION BY ACCT_ID, FISCAL_YYYY ORDER BY ACCT_MM) AS YTD_AMT FROM (SELECT ACCT_ID,ACCT_MM,FISCAL_YYYY,SUM(NVL(ACCT_TRNSCTN_AMT,0)) AS FISCAL_MM_AMT FROM TBL_ACCT_DETAIL GROUP BY ACCT_ID,ACCT_MM,FISCAL_YYYY); Cheers!!

You can use the cumulative sum window function: SELECT ACCT_ID, ACCT_MM, FISCAL_YYYY, COALESCE(SUM(ACCT_TRNSCTN_AMT), 0), SUM(SUM(ACCT_TRNSCTN_AMT)) OVER (PARTITION BY ACCT_ID, FISCAL_YYYY ORDER BY ACCT_MM) AS YTD FROM TBL_ACCT_DETAIL GROUP BY ACCT_ID, ACCT_MM, FISCAL_YYYY

Related

SQL Query for a Compare Report from single table

SQL: Filling in Missing Records with Conditional

Rolling Average SQL

HIVE: Finding running totals

How in query result add 0-data for don't exist rows?

Categories

Resources