Querying average and rolling 12 month average - sql

I want to be able to find out the average per month and rolling average over the last 12 months of a count for the number of changes per customer.
SELECT
crq_requested_by_company as 'Customer',
COUNT(crq_number) as 'Number of Changes'
FROM
change_information ci1
GROUP BY
crq_requested_by_company
At the moment I am just doing the count of the total and my results look like this
crq_requested_by_company count
A 4
B 2
C 2269
D 7696
E 110
F 91
G 33
The date column I will be using is called 'start_date'.
I assume GETDATE() will be needed to work out the rolling average for the last 12 months.
Additional info after comments:
Using the code
;WITH CTE as
(
SELECT
crq_requested_by_company as Customer,
COUNT(crq_number) Nuc,
dateadd(month, datediff(month, 0, crq_start_date),0) m
FROM
change_information ci1
WHERE
crq_start_date >= dateadd(month,datediff(month, 0,getdate()) - 12,0)
GROUP BY
crq_requested_by_company,
datediff(month, 0, crq_start_date)
)
SELECT
Customer,
avg(Nuc) over (partition by Customer order by m) running_avg,
m start_month,
avg(Nuc) over (partition by Customer) simply_average
FROM
CTE
ORDER BY Customer, start_month
This gives the results
Customer running_avg start_month simply_average
A 8 01/01/2016 00:00 13
A 10 01/02/2016 00:00 13
A 10 01/03/2016 00:00 13
A 11 01/04/2016 00:00 13
A 14 01/05/2016 00:00 13
A 13 01/06/2016 00:00 13
B 1 01/01/2016 00:00 1
C 3 01/01/2016 00:00 2
C 3 01/02/2016 00:00 2
C 2 01/03/2016 00:00 2
C 2 01/04/2016 00:00 2
C 2 01/05/2016 00:00 2
C 2 01/06/2016 00:00 2
It needs to look like this so the average of the results above - the average of the 6 months above (I only currently have 6 months of data and needs to be 12 eventually)
Customer avg_of_running_avg
A 11
B 1
C 2

Try this, it should work for sqlserver 2012 using running average:
;WITH CTE as
(
SELECT
crq_requested_by_company as Customer,
COUNT(crq_number) Nuc,
dateadd(month, datediff(month, 0, start_date),0) m
FROM
change_information ci1
WHERE
start_date >= dateadd(month,datediff(month, 0,getdate()) - 12,0)
GROUP BY
crq_requested_by_company,
datediff(month, 0, start_date)
)
SELECT
Customer,
avg(Nuc) over (partition by Customer order by m) running_avg,
m start_month,
avg(Nuc) over (partition by Customer) simply_average
FROM
CTE
ORDER BY Customer, start_month

Related

Calculate sales metrics (like past 6 months, past 3 months, sale one year ago etc.) on transaction data in BigQuery

I have to create a view in BigQuery with some details of product sales. The measurements to be included in the view are explained below. These measurements have to be calculated for each product for every day that product is sold. A product is identified by unique combination of 5 -6 attributes (in our demo, code1 and code2 columns). The date represents the transaction dates.
sales_today -> the sum of sales for each product (combination of code1 and code2) per day.
TotSales_previous_3_months -> the sum of sales for each product in the previous 3 months(without including any sales from current month). for e.g., if we are calculating TotSales_previous_3_months for a product sale on 5th March 2022, we have to sum up the sales of that product from 1st December 2021 to 28th February 2022.
TotSales_previous_6_months -> the sum of sales for each product in the previous 6 months(without including any sales from current month). Follow the same logic as for TotSales_previous_3_months.
sale_one_month_ago -> The sum of sales of the product on this day exactly one month ago. For e.g., if we are calculating sale_one_month_ago for a product sale on 5th March 2022, it would be the sum of sales of that product on 5th February 2022.
sale_one_year_ago -> The sum of sales of the product on this day exactly one month ago. For e.g., if we are calculating sale_one_month_ago for a product sale on 5th March 2022, it would be the sum of sales of that product on 5th March 2021.
Unique_count_flag -> flag = 1 if the number of sales of the product on a day = 1. If the number of sales of the product is more than 1 on a day, flag = 0.
I have created this table (test_sales) with some demo data for understanding.
code1
code2
date
gen
sales
1
A
2021-02-04
jerez
7
1
A
2021-02-04
abc
5
1
A
2022-02-04
wres
10
1
A
2022-03-04
tomz
10
1
A
2022-03-05
everyz
10
1
A
2022-05-01
ben10
30
1
A
2022-06-01
xyx
10
1
A
2022-06-01
xya
5
2
A
2022-05-10
iqoom
20
3
C
2022-01-10
imola
60
3
C
2022-04-01
nurburgring
50
3
C
2022-06-01
jerez
30
The result set after calculations should be like -
code1
code2
date
gen
sales
sales_today
TotSales_previous_3_months
TotSales_previous_6_months
sale_one_month_ago
sale_one_year_ago
Unique_count_flag
1
A
2021-02-04
jerez
7
12
0
0
0
0
1
A
2021-02-04
abc
5
12
0
0
0
0
1
A
2022-02-04
wres
10
10
0
0
0
12
1
1
A
2022-03-04
tomz
10
10
10
10
10
1
1
A
2022-03-05
everyz
10
10
10
10
0
1
1
A
2022-05-01
ben10
30
30
30
30
0
1
1
A
2022-06-01
xyx
10
15
50
60
30
0
1
A
2022-06-01
xya
5
15
50
60
30
0
2
A
2022-05-10
iqoom
20
20
0
0
0
1
3
C
2022-01-10
imola
60
60
0
0
0
1
3
C
2022-04-01
nurburgring
50
50
60
60
0
1
3
C
2022-06-01
jerez
30
30
50
110
0
1
I was able to create the below code to achieve result, but the problem is that this code works fine for small datasets but here I am dealing with around 60 GB of data(~50 columns and ~80 million rows). If I adapt the code given below for the original sales data(which itself is a combination of few tables after joining them) it just long runs. Is there an alternative or efficient way to achieve the results?
with temp as
(SELECT
code1,code2,date,gen,sales,
COUNT(*) OVER(PARTITION BY code1, code2, date) AS cnt,
SUM(sales) OVER(PARTITION BY code1, code2,date) AS sales_today,
array_agg(struct(sales as sales,date as date)) over(partition by code1,code2 order by date) as past_records
FROM
`test_sales`
)
select * except(past_records,cnt),
(select ifnull(sum(x.sales),0)
from unnest(temp.past_records) as x
where x.date between (date_trunc(temp.date,MONTH) - INTERVAL 3 MONTH) and (date_trunc(temp.date, MONTH) - interval 1 day)) as TotSales_previous_3_months,
(select ifnull(sum(x.sales),0)
from unnest(temp.past_records) as x
where x.date between (date_trunc(temp.date,MONTH) - INTERVAL 6 MONTH) and (date_trunc(temp.date, MONTH) - interval 1 day)) as TotSales_previous_6_months,
(select ifnull(sum(x.sales),0)
from unnest(temp.past_records) as x
where x.date = temp.date - INTERVAL 1 MONTH) as sale_one_month_ago,
(select ifnull(sum(x.sales),0)
from unnest(temp.past_records) as x
where x.date = temp.date - INTERVAL 1 YEAR) as sale_one_year_ago,
if(cnt = 1,1,0) as Unique_count_flag
from temp
Modified Code inspired from Mikhail's approach:-
select *,
-- extract(year from date) * 12 + extract(month from date) as months,
-- UNIX_DATE(date) AS days,
sum(sales) over(product_date) as sales_today,
sum(sales) over(product range between 3 preceding and 1 preceding) as TotSales_previous_3_months,
sum(sales) over(product range between 6 preceding and 1 preceding) as TotSales_previous_6_months,
case when extract(day from date) = 31 and extract(month from date) in (3,12,10,7,5)
then sum(sales) over(product_by_unix_date range between 31 preceding and 31 preceding)
when extract(day from date) = 30 and extract(month from date) = 3
then sum(sales) over(product_by_unix_date range between 30 preceding and 30 preceding)
when extract(day from date) = 29 and extract(month from date) = 3
then sum(sales) over(product_by_unix_date range between 29 preceding and 29 preceding)
else
sum(sales) over(product_day range between 1 preceding and 1 preceding)
end as sale_one_month_ago,
case when extract(day from date) = 29 and extract(month from date) = 2
then sum(sales) over(product_by_unix_date range between 366 preceding and 366 preceding)
else
sum(sales) over(product_day range between 12 preceding and 12 preceding)
end as sale_one_year_ago
from `river-blade-343102.test.test_sales`
window
product as (partition by code1, code2 order by extract(year from date) * 12 + extract(month from date)),
product_date as (partition by code1, code2, date ),
product_day as (partition by code1, code2, extract(day from date) order by extract(year from date) * 12 + extract(month from date)),
product_by_unix_date as (partition by code1,code2 order by UNIX_DATE(date))
Consider below version of your query - it still not the perfect - but at least it is easier to handle/read and maintain
select *,
sum(sales) over(product_date) as sales_today,
sum(sales) over(product range between 3 preceding and 1 preceding) as TotSales_previous_3_months,
sum(sales) over(product range between 6 preceding and 1 preceding) as TotSales_previous_6_months,
sum(sales) over(product_day range between 1 preceding and 1 preceding) as sale_one_month_ago,
sum(sales) over(product_day range between 12 preceding and 12 preceding) as sale_one_year_ago,
from test_sales
window
product as (partition by code1, code2 order by extract(year from date) * 12 + extract(month from date)),
product_date as (partition by code1, code2, date),
product_day as (partition by code1, code2, extract(day from date) order by extract(year from date) * 12 + extract(month from date))
if applied to sample data in your question - output is
Is there an alternative or efficient way to achieve the results?
So, definitely above is an alternative way with its own pros and cons
Whether it is more efficient - I do think so, but not 100% sure to be honest - it depends on your data - you need to test it against your data and see ...

How to write sql to generate cumulative monthly sales per customer in Postgres

Given that I have a table called orders
orders
id
customer_id
created_at
How do I write a query to return the monthly cumulative order counts for each customer? I want to include the missing months in the series for Jan 2018 to May 2018
data
id customer_id created_at
1 200 01/20/2018
2 300 01/21/2018
3 200 01/22/2018
4 200 03/20/2018
5 300 03/20/2018
6 200 04/20/2018
7 200 04/20/2018
expected result
customer_id month count
200 01/01/2018 2
200 02/01/2018 2
200 03/01/2018 3
200 04/01/2018 5
200 05/01/2018 5
300 01/01/2018 1
300 02/01/2018 1
300 03/01/2018 2
300 04/01/2018 2
300 05/01/2018 2
I have a query to calculate the net cumulative count per month. I didn't have much success while converting the query to work for per customer cumulative counts.
WITH monthly_orders AS (
SELECT date_trunc('month', orders.created_at) AS mon,
COUNT(orders.id) AS mon_count
from orders
GROUP BY 1
)
SELECT TO_CHAR(mon, 'YYYY-MM') AS mon_text,
COALESCE(SUM(c.mon_count) OVER (ORDER BY c.mon), 0) AS running_count
FROM generate_series('2018-01-01'::date, '2018-06-01'::date, interval '1 month') mon
LEFT JOIN monthly_orders c USING(mon)
ORDER BY mon_text;
If I understand correctly, you can just do:
select o.customer_id, date_trunc('month', o.created_at) AS mon,
count(*) AS mon_count,
sum(count(*)) over (partition by o.customer_id
order by date_trunc('month', o.created_at)
) as running_count
from orders o
group by o.customer_id, mon
order by o.customer_id, mon;

How to do a x-days grouped sum in redshift?

I have the following table,
that shows how many items from different units entered the inventory, in different dates.
ID Date Unit Quantity
---------------------------------
1 2017-08-01 A_red 05
2 2017-08-13 A_red 10
3 2017-09-20 A_red 20
4 2017-09-22 A_red 40
5 2017-10-05 A_red 40
6 2017-10-25 A_red 30
7 2017-10-24 A_blue 60
The problem is: entries within a time interval of 30 days of the same unit should be grouped.
So I want the following result:
ID Date Unit Quantity fst_entry30 Quantity30
-----------------------------------------------------
1 2017-08-01 A_red 05 T 15
2 2017-08-13 A_red 10 F 15
3 2017-09-20 A_red 20 T 100
4 2017-09-22 A_red 40 F 100
5 2017-10-05 A_red 40 F 100
6 2017-10-25 A_red 30 T 30
7 2017-10-24 A_blue 60 T 60
where fst_entry30 is a flag that points if the entry was the first, of this unit, in the last 30 days. Note that if i have a different unit (A_blue instead of A_red), it won't be grouped.
And quantity30 is the grouped sum of quantity.
For example, between 5 october and 20 september there is less than 30 days, so it was grouped.
Remembering that Redshift does not allow recursive common table expressions.
I already tried self-joins, but that turned out to be cumbersome.
You would just use lag() to define the groups:
select t.*,
(case when date >= lag(date) over (partition by unit order by date) + interval '30 day'
then 0 else 1
end) as grp_start
from t;
Then you can do a cumulative sum to assign a number to the group . . . and finally add them up using a window function:
select t.*, sum(quantity) over (partition by unit, grp)
from (select t.*,
sum(grp_start) over (partition by unit order by date) as grp
from (select t.*,
(case when date >= lag(date) over (partition by unit order by date) + interval '30 day'
then 0 else 1
end) as grp_start
from t
) t
) t

SQL count number of users every 7 days

I am new to SQL and I need to find count of users every 7 days. I have a table with users for every single day starting from April 2015 up until now:
...
2015-05-16 00:00
2015-05-16 00:00
2015-05-17 00:00
2015-05-17 00:00
2015-05-17 00:00
2015-05-17 00:00
2015-05-17 00:00
2015-05-18 00:00
2015-05-18 00:00
...
and I need to count the number of users every 7 days (weekly) so I have data weekly.
SELECT COUNT(user_id), Activity_Date FROM TABLE_NAME
I need output like this:
TotalUsers week1 week2 week3 ..........and so on
82 80 14 16
I am using DB Visualizer to query Oracle database.
You should try following,
Select
sum(Week1) + sum(Week2) + sum(Week3) + sum(Week4) + sum(Week5) as Total,
sum(Week1) as Week1,
sum(Week2) as Week2,
sum(Week3) as Week3,
sum(Week4) as Week4,
sum(Week5) as Week5
From (
select
case when week = 1 then 1 else 0 end as Week1,
case when week = 2 then 1 else 0 end as Week2,
case when week = 3 then 1 else 0 end as Week3,
case when week = 4 then 1 else 0 end as Week4,
case when week = 5 then 1 else 0 end as Week5
from
(
Select
CEILING(datepart(dd,visitdate)/7+1) week,
user_id
from visitor
)T
)D
Here is Fiddle
You need to add month & year in the result as well.
SELECT COUNT(user_id), Activity_Date FROM TABLE_NAME WHERE Activity_Date > '2015-06-31';
That would get the amount of users for the last 7 days.
This is my test table:
user_id act_date
1 01/04/2015
2 01/04/2015
3 04/04/2015
4 05/04/2015
..
This is my query:
select week_offset, count(*) nb from (
select trunc((act_date-to_date('01042015','DDMMYYYY'))/7) as week_offset from test_date)
group by week_offset
order by 1
and this is the output:
week_offset nb
0 6
1 3
4 5
5 7
6 3
7 1
18 1
Week offset is the number of the week from 01/04/2015, and we can show the first day of the week.
See here for live testing.
How do you define your weeks? Here's an approach for SQL Server that starts each seven-day block relative to the start of April. The expressions will vary according to your specific needs:
select
dateadd(
dd,
datediff(dd, cast('20150401' as date), Activity_Date) / 7 * 7,
cast('20150401' as date)
) as WeekStart,
count(*)
from T
group by datediff(dd, cast('20150401' as date), Activity_Date) / 7
Oracle:
select
trunc(Activity_date, 'DAY') as WeekStart,
count(*)
from T
group by trunc(Activity_date, 'DAY') /* D and DAY are the same thing */

Grouping the results of a query with CTEs

I have a CTE based query into which I pass about 2600 4-tuple latitude/longitude values - that have been ID tagged and held in a second table called coordinates. These top left and bottom right latitude / longitude values are passed into the CTE in order to display the amount of requests (hourly) made within those coordinates for given two timestamps).
However, I would like to get the total requests per day within the timestamps given. That is, I want to get the total count of user requests on every specified day. E.g. user opts to see every Wednesday or Wednesday AND Thursday etc. - between 11:55 and 22:04 between dates January 1 and 16, 2012 for every latitude/longitude 4-tuples I pass. The output would basically be like:
coordinates_id | stamp | zcount
1 Jan 4 2012 200 (total requests on Wednesday Jan 4 between 11:55 and 22:04)
1 Jan 11 2012 121 (total requests on Wednesday Jan 11 between 11:55 and 22:04)
2 Jan 4 2012 255 (total requests on Wednesday Jan 4 between 11:55 and 22:04)
2 Jan 11 2012 211 (total requests on Wednesday Jan 11 between 11:55 and 22:04)
.
.
.
How would I do that? My query is as below:
WITH v AS (
SELECT '2012-01-1 11:55:11'::timestamp AS _from -- provide times once
,'2012-01-16 22:02:21'::timestamp AS _to
)
, q AS (
SELECT c.coordinates_id
, date_trunc('hour', t.calltime) AS stamp
, count(*) AS zcount
FROM v
JOIN mytable t ON t.calltime BETWEEN v._from AND v._to
AND (t.calltime::time >= v._from::time AND
t.calltime::time <= v._to::time) AND
(extract(DOW from t.calltime) = 3)
JOIN coordinates c ON (t.lat, t.lon)
BETWEEN (c.bottomrightlat, c.topleftlon)
AND (c.topleftlat, c.bottomrightlon)
GROUP BY c.coordinates_id, date_trunc('hour', t.calltime)
)
, cal AS (
SELECT generate_series('2011-2-2 00:00:00'::timestamp
, '2012-4-1 05:00:00'::timestamp
, '1 hour'::interval) AS stamp
FROM v
)
SELECT q.coordinates_id, cal.stamp, COALESCE (q.zcount, 0) AS zcount
FROM v, cal
LEFT JOIN q USING (stamp)
WHERE (extract(hour from cal.stamp) >= extract(hour from v._from) AND
extract(hour from cal.stamp) <= extract(hour from v._to)) AND
(extract(DOW from cal.stamp) = 3)
AND cal.stamp >= v._from AND cal.stamp <= v._to
GROUP BY q.coordinates_id, cal.stamp, q.zcount
ORDER BY q.coordinates_id ASC, stamp ASC;
And the sample result it yields is like this:
coordinates_id | stamp | zcount
1 2012-01-04 16:00:00 1
1 2012-01-04 19:00:00 1
1 2012-01-11 14:00:00 1
1 2012-01-11 17:00:00 1
1 2012-01-11 19:00:00 1
2 2012-01-04 16:00:00 1
So, as I mentioned above, I would like to see this as
coordinates_id | stamp | zcount
1 2012-01-04 2
1 2012-01-11 3
2 2012-01-04 1
Change your final SELECT to:
SELECT q.coordinates_id, cal.stamp::date, sum(q.zcount) AS zcount
FROM v, cal
LEFT JOIN q USING (stamp)
WHERE extract(hour from cal.stamp) BETWEEN extract(hour from v._from)
AND extract(hour from v._to)
AND extract(DOW from cal.stamp) = 3
AND cal.stamp >= v._from
AND cal.stamp <= v._to
GROUP BY 1,2
ORDER BY 1,2;
The crucial part it to cast cal.stamp to date: cal.stamp::date.
That, and sum(q.zcount).