How can i do a rolling 12 month sum when some year month values are missing? - sql

I am calculating rolling sum as such:
select
city,
month_year,
person,
sum(total) over (partition by person,city order by month_year rows between 11 preceding and current row) rolling_one_year
from
(select
city,
month_year,
person,
sum(amount_dollar) as total
from db1 d
group by 1,2,3) ;
however sometimes the not every person has a month_year value: e.g. a rolling 12 year some is as below IF we had consecutive month values:
but what if a month was missing for person e.g. 202208, according to the logic above it would calculate the following 202201 - 202301 which as we know 13 months.
How can i adapt my code above to ensure that the range of months selected is within 1 year?

A possible solution is to LEFT JOIN your data to the calendar table.
Here is a guide on how to create the calendar table if you don't have one.
Create a date table in hive

You should use a logical window frame RANGE instead of ROWS. consider below query.
WITH monthly_total AS (
SELECT '201911' year_month, 4 total UNION ALL
SELECT '201912' year_month, 10 total UNION ALL
SELECT '202201' year_month, 1 total UNION ALL
SELECT '202202' year_month, 3 total UNION ALL
SELECT '202203' year_month, 9 total UNION ALL
SELECT '202204' year_month, 4 total UNION ALL
SELECT '202205' year_month, 2 total UNION ALL
SELECT '202206' year_month, 8 total UNION ALL
SELECT '202207' year_month, 6 total UNION ALL
SELECT '202209' year_month, 3 total UNION ALL
SELECT '202210' year_month, 10 total UNION ALL
SELECT '202211' year_month, 1 total UNION ALL
SELECT '202212' year_month, 3 total UNION ALL
SELECT '202301' year_month, 50 total
)
SELECT *, SUM(total) OVER w AS rolling_12m_sum
FROM monthly_total
WINDOW w AS (
ORDER BY CAST(SUBSTR(year_month, 1, 4) AS INTEGER) * 12 + CAST(SUBSTR(year_month, 5, 2) AS INTEGER)
RANGE BETWEEN 11 PRECEDING AND CURRENT ROW
) ORDER BY year_month;
I'ved ignored partition by person,city for simplicity.
Below would be helpful in case you're not familiar with RANGE
https://learnsql.com/blog/difference-between-rows-range-window-functions/
Query results

Related

What to use in place of union in above query i wrote or more optimize query then my given query without union and union all

I am counting the birthdays , sales , order in all 12 months from customers table in SQL server like these
In Customers table birth_date ,sale_date, order_date are columns of the table
select 1 as ranking,'Birthdays' as Type,[MONTH],TOTAL
from ( select DATENAME(month, birth_date) AS [MONTH],count(*) TOTAL
from customers
group by DATENAME(month, birth_date)
)x
union
select 2 as ranking,'sales' as Type,[MONTH],TOTAL
from ( select DATENAME(month, sale_date) AS [MONTH],count(*) TOTAL
from customers
group by DATENAME(month, sale_date)
)x
union
select 3 as ranking,'Orders' as Type,[MONTH],TOTAL
from ( select DATENAME(month, order_date) AS [MONTH],count(*) TOTAL
from customers
group by DATENAME(month, order_date)
)x
And the output is like these(just dummy data)
ranking
Type
MONTH
TOTAL
1
Birthdays
January
12
1
Birthdays
April
6
1
Birthdays
May
10
2
Sales
Febrary
8
2
Sales
April
14
2
Sales
May
10
3
Orders
June
4
3
Orders
July
3
3
Orders
October
6
3
Orders
December
17
I want to find count of these all these three types without using UNION and UNION ALL, means I want these data by single query statement (or more optimize version of these query)
Another approach is to create a CTE with all available ranking values ​​and use CROSS APPLY for it, as shown below.
WITH ranks(ranking) AS (
SELECT * FROM (VALUES (1), (2), (3)) v(r)
)
SELECT
r.ranking,
CASE WHEN r.ranking = 1 THEN 'Birthdays'
WHEN r.ranking = 2 THEN 'Sales'
WHEN r.ranking = 3 THEN 'Orders'
END AS Type,
DATENAME(month, CASE WHEN r.ranking = 1 THEN c.birth_date
WHEN r.ranking = 2 THEN c.sale_date
WHEN r.ranking = 3 THEN c.order_date
END) AS MONTH,
COUNT(*) AS TOTAL
FROM customers c
CROSS APPLY ranks r
GROUP BY r.ranking,
DATENAME(month, CASE WHEN r.ranking = 1 THEN c.birth_date
WHEN r.ranking = 2 THEN c.sale_date
WHEN r.ranking = 3 THEN c.order_date
END)
ORDER BY r.ranking, MONTH

How to show total profit for each month, show null when there is no record in that month in oracle

i am producing a report to show the total profit of each month IN 2018, and show NIL when there is no profit earned in certain months
the profit earned = 0.1 * Total_payment.
The PROFIT is earned when the service is done, the column "Total_payment" is come from the table BOOKING, i have to join BOOKING and SERVICE in order to get the total profit of each month, Booking_num is the key for joining BOOKING and SERVICE, Actual_end is the end date of the service
Now the problem is there is no profit earned in jan,feb and aug.
is there anyway to show NIL in profit column FOR THIS THREE MONTH?
SELECT EXTRACT(MONTH FROM Actual_end) AS MONTH,SUM(Total_payment *0.1) AS PROFIT
FROM SERVICE,BOOKING
WHERE SERVICE.Booking_num = BOOKING.Booking_num
AND EXTRACT(YEAR FROM Actual_end) = 2018
GROUP BY EXTRACT(MONTH FROM Actual_end);
This is the code of showing profit for 9 months,without jan,feb and aug
MONTH PROFIT
3 88.4
4 146.1
5 112.6
6 108.3
7 102.6
9 130.3
10 72.6
12 124.9
I expect the output to be
MONTH PROFIT
1 NIL
2 NIL
3 88.4
4 146.1
5 112.6
6 108.3
7 102.6
8 NIL
9 130.3
10 72.6
11 124.9
12 25.2
how do i modify it, i have also tried
WITH CALENDAR AS(
SELECT TO_CHAR(add_months(date '2018-01-01',ROWNUM -1),'MM') AS MONTH
FROM DUAL
CONNECT BY LEVEL <=12)
SELECT CALENDER.MONTH, NVL(SUM(Total_payment*0.1),null) AS PROFIT
FROM BOOKING,SERVICE,CALENDER
WHERE BOOKING.Booking_num = SERVICE.Booking_num
AND CALENDER.MONTH = EXTRACT(MONTH FROM Actual_end(+))
AND EXTRACT(MONTH FROM Actual_end) = 2018
GROUP BY CALENDER.MONTH
THE OUTPUT:
NO ROWS SELECTED
You need and outer join(left or right). Btw, get rid of old-fashioned comma seperated join among tables, rather, use explicit join.
Add RIGHT JOIN (SELECT LEVEL AS MNT FROM DUAL CONNECT BY LEVEL <= 12 ) MNT to your query if returning one certain year with whole months is the matter :
SELECT MNT AS MONTH,NVL(TO_CHAR(SUM(Total_payment *0.1)),'NIL') AS PROFIT
FROM SERVICE S
JOIN BOOKING B
ON S.Booking_num = B.Booking_num
RIGHT JOIN (SELECT LEVEL AS MNT
FROM DUAL
CONNECT BY LEVEL <= 12 ) MNT
ON MNT.MNT = EXTRACT(MONTH FROM Actual_end)
AND EXTRACT(YEAR FROM Actual_end)=2018
GROUP BY MNT
ORDER BY MONTH;
Demo
In this case you need a list of all months:
with months as (
select 1 as month from dual union all
select 2 as month from dual union all
select 3 as month from dual union all
select 4 as month from dual union all
select 5 as month from dual union all
select 6 as month from dual union all
select 7 as month from dual union all
select 8 as month from dual union all
select 9 as month from dual union all
select 10 as month from dual union all
select 11 as month from dual union all
select 12 as month from dual
)
select m.month, sum(s.total_payment * 0.1) as profit
from months m left join
booking b
on extract(month from b.actual_end) = m.month and
b.actual_end >= date '2018-01-01' and
b.actual_end < date '2019-01-01' left join
service s join
on s.booking_num = b.booking_num
group by m.month;
Note:
This is guessing that actual_end is in booking and total_payment is in service. The query would be slightly different if this guess is not correct.
Never use commas in the FROM clause.
This query should use LEFT JOIN. The first table has all the rows that you want.
Filters on subsequent tables go in the on clause, not the where clause.
Note the use of date constants. Such comparisons usually make it easier for the engine to optimize the query (typically by using indexes).

take sum of last 7 days from the observed date in BigQuery

I have a table on which I want to compute the sum of revenue on last 7 days from the observed day. Here is my table -
with temp as
(
select DATE('2019-06-29') as transaction_date, "x"as id, 0 as revenue
union all
select DATE('2019-06-30') as transaction_date, "x"as id, 80 as revenue
union all
select DATE('2019-07-04') as transaction_date, "x"as id, 64 as revenue
union all
select DATE('2019-07-06') as transaction_date, "x"as id, 64 as revenue
union all
select DATE('2019-07-11') as transaction_date, "x"as id, 75 as revenue
union all
select DATE('2019-07-12') as transaction_date, "x"as id, 0 as revenue
)
select * from temp
I want to take a sum of last 7 days for each transaction_date. For instance for the last record which has transaction_date = 2019-07-12, I would like to add another column which adds up revenue for last 7 days from 2019-07-12 (which is until 2019-07-05), hence the value of new rollup_revenue column would be 0 + 75 + 64 = 139. Likewise, I need to compute the rollup for all the dates for every ID.
Note - the ID may or may not appear daily.
I have tried self join but I am unable to figure it out.
Below is for BigQuery Standard SQL
#standardSQL
SELECT *,
SUM(revenue) OVER(
PARTITION BY id ORDER BY UNIX_DATE(transaction_date)
RANGE BETWEEN 6 PRECEDING AND CURRENT ROW
) rollup_revenue
FROM `project.dataset.temp`
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.temp` AS (
SELECT DATE '2019-06-29' AS transaction_date, 'x' AS id, 0 AS revenue UNION ALL
SELECT '2019-06-30', 'x', 80 UNION ALL
SELECT '2019-07-04', 'x', 64 UNION ALL
SELECT '2019-07-06', 'x', 64 UNION ALL
SELECT '2019-07-11', 'x', 75 UNION ALL
SELECT '2019-07-12', 'x', 0
)
SELECT *,
SUM(revenue) OVER(
PARTITION BY id ORDER BY UNIX_DATE(transaction_date)
RANGE BETWEEN 6 PRECEDING AND CURRENT ROW
) rollup_revenue
FROM `project.dataset.temp`
-- ORDER BY transaction_date
with result
Row transaction_date id revenue rollup_revenue
1 2019-06-29 x 0 0
2 2019-06-30 x 80 80
3 2019-07-04 x 64 144
4 2019-07-06 x 64 208
5 2019-07-11 x 75 139
6 2019-07-12 x 0 139
One option uses a correlated subquery to find the rolling sum:
SELECT
transaction_date,
revenue,
(SELECT SUM(t2.revenue) FROM temp t2 WHERE t2.transaction_date
BETWEEN DATE_SUB(t1.transaction_date, INTERVAL 7 DAY) AND
t1.transaction_date) AS rev_7_days
FROM temp t1
ORDER BY
transaction_date;

SQL: How to create a weekly user count summary by month

I’m trying to create a week over week active user count summary report/table aggregated by month. I have one table for June 2017 and one table for May 2016 which I need to join together in order to. The date timestamp is created_utc which is a UNIX timestamp which I can figure out to transform into a human-readable format and from there extract the week of the year value so 1 through 52. The questions I have are:
Number the weeks just by values of 1 through 4. So, week 1 for June, Week 1 for May, Week 2 for June week 2 for May and so on.
Joining the tables based by those weeks 1 through 4 values
Pivoting the table and adding a WOW Change variable.
I'd like the final table to look like this:
W
| Week | June_count | May_count |WOW_Change |
|:-----------|:-----------:|:------------:|:----------:
| Week_1 | 5 | 8 | 0.6 |
| Week_2 | 2 | 1 | -0.5 |
| Week_3 | 10 | 5 | -0.5 |
| Week_4 | 30 | 6 | 1 |
Below is some sample data as well as the code I've started.
CREATE TABLE June
(created_utc int, id varchar(6))
;
INSERT INTO June
(created_utc, userid)
VALUES
(1496354167, '6eq4xf'),
(1496362973, '6eqzz3'),
(1496431934, '6ewlm8'),
(1496870877, '6fwied'),
(1496778080, '6fo79k'),
(1496933893, '6g1gcg'),
(1497154559, '6gjkid'),
(1497618561, '6hmeud'),
(1497377349, '6h1osm'),
(1497221017, '6god73'),
(1497731470, '6hvmic'),
(1497273130, '6gs4ay'),
(1498080798, '6ioz8q'),
(1497769316, '6hyer4'),
(1497415729, '6h5cgu'),
(1497978764, '6iffwq')
;
CREATE TABLE May
(created_utc int, id varchar(6))
;
INSERT INTO May
(created_utc, userid)
VALUES
(1493729491, '68sx7k'),
(1493646801, '68m2s2'),
(1493747285, '68uohf'),
(1493664087, '68ntss'),
(1493690759, '68qe5k'),
(1493829196, '691fy9'),
(1493646344, '68m1dv'),
(1494166859, '69rhkl'),
(1493883023, '6963qb'),
(1494362328, '6a83wv'),
(1494525998, '6alv6c'),
(1493945230, '69bkhb'),
(1494050355, '69jqtz'),
(1494418011, '6accd0'),
(1494425781, '6ad0xm'),
(1494024697, '69hx2z'),
(1494586576, '6aql9y')
;
#standardSQL
SELECT created_utc,
DATE(TIMESTAMP_SECONDS(created_utc)) as event_date,
CAST(EXTRACT(WEEK FROM TIMESTAMP_SECONDS(created_utc)) AS STRING) AS week_number,
COUNT(distinct userid) as user_count
FROM June
SELECT created_utc,
DATE(TIMESTAMP_SECONDS(created_utc)) as event_date,
CAST(EXTRACT(WEEK FROM TIMESTAMP_SECONDS(created_utc)) AS STRING) AS week_number,
COUNT(distinct userid) as user_count
FROM May
Below is for BigQuery Standard SQL
#standardSQL
SELECT
CONCAT('Week_', CAST(week AS STRING)) Week,
June.user_count AS June_count,
May.user_count AS May_count,
ROUND((May.user_count - June.user_count) / June.user_count, 2) AS WOW_Change
FROM (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.June`
GROUP BY week
) June
JOIN (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.May`
GROUP BY week
) May
USING(week)
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.June` AS (
SELECT 1496354167 created_utc, '6eq4xf' userid UNION ALL
SELECT 1496362973, '6eqzz3' UNION ALL
SELECT 1496431934, '6ewlm8' UNION ALL
SELECT 1496870877, '6fwied' UNION ALL
SELECT 1496778080, '6fo79k' UNION ALL
SELECT 1496933893, '6g1gcg' UNION ALL
SELECT 1497154559, '6gjkid' UNION ALL
SELECT 1497618561, '6hmeud' UNION ALL
SELECT 1497377349, '6h1osm' UNION ALL
SELECT 1497221017, '6god73' UNION ALL
SELECT 1497731470, '6hvmic' UNION ALL
SELECT 1497273130, '6gs4ay' UNION ALL
SELECT 1498080798, '6ioz8q' UNION ALL
SELECT 1497769316, '6hyer4' UNION ALL
SELECT 1497415729, '6h5cgu' UNION ALL
SELECT 1497978764, '6iffwq'
), `project.dataset.May` AS (
SELECT 1493729491 created_utc, '68sx7k' userid UNION ALL
SELECT 1493646801, '68m2s2' UNION ALL
SELECT 1493747285, '68uohf' UNION ALL
SELECT 1493664087, '68ntss' UNION ALL
SELECT 1493690759, '68qe5k' UNION ALL
SELECT 1493829196, '691fy9' UNION ALL
SELECT 1493646344, '68m1dv' UNION ALL
SELECT 1494166859, '69rhkl' UNION ALL
SELECT 1493883023, '6963qb' UNION ALL
SELECT 1494362328, '6a83wv' UNION ALL
SELECT 1494525998, '6alv6c' UNION ALL
SELECT 1493945230, '69bkhb' UNION ALL
SELECT 1494050355, '69jqtz' UNION ALL
SELECT 1494418011, '6accd0' UNION ALL
SELECT 1494425781, '6ad0xm' UNION ALL
SELECT 1494024697, '69hx2z' UNION ALL
SELECT 1494586576, '6aql9y'
)
SELECT
CONCAT('Week_', CAST(week AS STRING)) Week,
June.user_count AS June_count,
May.user_count AS May_count,
ROUND((May.user_count - June.user_count) / June.user_count, 2) AS WOW_Change
FROM (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.June`
GROUP BY week
) June
JOIN (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.May`
GROUP BY week
) May
USING(week)
-- ORDER BY week
with result (as sample data is limited to just first two weeks result is also showing two weeks only which should not be an issue when you apply it to real data)
Row Week June_count May_count WOW_Change
1 Week_1 5 12 1.4
2 Week_2 6 5 -0.17
Use arithmetic on the day of the month to get the week:
SELECT j.weeknumber, j.user_count as june_user_count,
m.user_count as may_user_count
FROM (SELECT (EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1) / 7 as week_number,
COUNT(distinct userid) as user_count
FROM June
GROUP BY week_number
) j JOIN
(SELECT (EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1) / 7 as week_number,
COUNT(distinct userid) as user_count
FROM May
GROUP BY week_number
) m
ON m.week_number = j.week_number;
Note that splitting data into different tables just based on the date is bad idea. The data should all go into one table, perhaps partitioned if data volume is an issue.

Moving average of 2 columns

Hello I have a problem. I know how to calculate moving average last 3 months using oracle analytic functions... but my situatiion is a little different
Month-----ProductType-----Sales----------Average(HAVE TO FIND THIS)
1---------A---------------10
1---------B---------------12
1---------C---------------17
2---------A---------------21
3---------C---------------2
3---------B---------------21
4---------B---------------23
5
6
7
8
9
So we have sales for each month and each product type... I need to calculate the moving average of the last 3 months and the particular product.
example:
For month 4 and Produt B it would be (21+0+12)/3
Any ideas ?
Another option is to use the windowing clause of analytic functions
with my_data as (
select 1 as month, 'A' as product, 10 as sales from dual union all
select 1 as month, 'B' as product, 12 as sales from dual union all
select 1 as month, 'C' as product, 17 as sales from dual union all
select 2 as month, 'A' as product, 21 as sales from dual union all
select 3 as month, 'C' as product, 2 as sales from dual union all
select 3 as month, 'B' as product, 21 as sales from dual union all
select 4 as month, 'B' as product, 23 as sales from dual
)
select
month,
product,
sales,
nvl(sum(sales)
over (partition by product order by month
range between 3 preceding and 1 preceding),0)/3 as average_sales
from my_data
order by month, product
SELECT month,
productType,
sales,
(lag(sales, 3) over (partition by produtType order by month) +
lag(sales, 2) over (partition by productType order by month) +
lag(sales, 1) over (partition by productType order by month)/3 moving_avg
FROM your_table_name