Monthly total count of overall users in sql - sql

I need the monthly new user count and the total count of users until that month end in Oracle SQL. Can calculate monthly new users, but struggling to count total for that month.
SELECT COUNT(*) AS REGISTERED_USERS, EXTRACT(MONTH FROM ADD_DATE) AS MN, EXTRACT(YEAR FROM ADD_DATE) AS YR
FROM USERS
WHERE EXTRACT(YEAR FROM ADD_DATE) = 2020
GROUP BY EXTRACT(MONTH FROM ADD_DATE), EXTRACT(YEAR FROM ADD_DATE)
ORDER BY EXTRACT(MONTH FROM ADD_DATE);

You want a running total. You get this with SUM OVER.
SELECT yr, mn, new_users, total_users
FROM
(
SELECT
EXTRACT(YEAR FROM add_date) AS yr,
EXTRACT(MONTH FROM add_date) AS mn,
COUNT(*) AS new_users,
SUM(COUNT(*)) OVER (ORDER BY EXTRACT(YEAR FROM add_date), EXTRACT(MONTH FROM add_date)) AS total_users
FROM users
GROUP BY EXTRACT(YEAR FROM add_date), EXTRACT(MONTH FROM add_date)
)
WHERE yr = 2020
ORDER BY yr, mn;

You can use the SUM analytical function with COUNT aggregate function and running the analytical function on add_date by month as follows:
SELECT EXTRACT(YEAR FROM DT) yr, EXTRACT(MONTH FROM DT) mn, new_users, total_users
FROM
(
SELECT
TRUNC(ADD_DATE, 'MON') AS DT,
COUNT(*) AS new_users,
SUM(COUNT(*)) OVER (ORDER BY TRUNC(ADD_DATE, 'MON')) AS total_users
FROM users
GROUP BY TRUNC(ADD_DATE, 'MON')
)
WHERE EXTRACT(YEAR FROM DT) = 2020
ORDER BY YR, MN;

Related

PostgreSQL: Simplifying a SQL query into a shorter query

I have a table called 'daily_prices' where I have 'sale_date', 'last_sale_price', 'symbol' as columns.
I need to calculate how many times 'last_sale_price' has gone up compared to previous day's 'last_sale_price' in 10 weeks.
Currently I have my query like this for 2 weeks:
select count(*) as "timesUp", sum(last_sale_price-prev_price) as "dollarsUp", 'wk1' as "week"
from
(
select last_sale_price, LAG(last_sale_price, 1) OVER (ORDER BY sale_date) as prev_price
from daily_prices
where sale_date <= CAST('2020-09-18' AS DATE) AND sale_date >= CAST('2020-09-14' AS DATE)
and symbol='AAPL'
) nest
where last_sale_price > prev_price
UNION
select count(*) as "timesUp", sum(last_sale_price-prev_price) as "dollarsUp", 'wk2' as "week"
from
(
select last_sale_price, LAG(last_sale_price, 1) OVER (ORDER BY sale_date) as prev_price
from daily_prices
where sale_date <= CAST('2020-09-11' AS DATE) AND sale_date >= CAST('2020-09-07' AS DATE)
and symbol='AAPL'
) nest
where last_sale_price > prev_price
I'm using 'UNION' to combine the weekly data. But as the number of weeks increase the query is going to be huge.
Is there a simpler way to write this query?
Any help is much appreciated. Thanks in advance.
you can extract week from sale_date. then apply group by on the upper query
select EXTRACT(year from sale_date) YEAR, EXTRACT('week' FROM sale_date) week, count(*) as "timesUp", sum(last_sale_price-prev_price) as "dollarsUp"
from (
select
sale_date,
last_sale_price,
LAG(last_sale_price, 1) OVER (ORDER BY sale_date) as prev_price
from daily_prices
where symbol='AAPL'
)
where last_sale_price > prev_price
group by EXTRACT(year from sale_date), EXTRACT('week' FROM sale_date)
to extract only weekdays you can add this filter
EXTRACT(dow FROM sale_date) in (1,2,3,4,5)
PS: make sure that monday is first day of the week. In some countries sunday is the first day of the week
You can filter on the last 8 weeks in the where clause, then group by week and do conditional aggregation:
select extract(year from sale_date) yyyy, extract(week from saledate) ww,
sum(last_sale_price - lag_last_sale_price) filter(where lag_last_sale_price > last_sale_price) sum_dollars_up,
count(*) filter(where lag_last_sale_price > last_sale_price) cnt_dollars_up
from (
select dp.*,
lag(last_sale_price) over(partition by extract(year from sale_date), extract(week from saledate) order by sale_date) lag_last_sale_price
from daily_price
where symbol = 'AAPL'
and sale_date >= date_trunc('week', current_date) - '8 week'::interval
) dp
group by 1, 2
Notes:
I am asssuming that you don't want to compare the first price of a week to the last price of the previous week; if you do, then just remove the partition by clause from the over() clause of lag()
this dynamically computes the date as of 8 (entire) weeks ago
if there is no price increase during a whole week, the query still gives you a row, with 0 as sum_dollars_up and cnt_dollars_up

Correlated subquery in having clause

I am working with a default oracle scott database with additional table PROJECT, where there are there are 5 columns: projno, projname, budget, start_date, end_date.
I want to select month with the highest number of projects in a specific year.
In instruction to my exercise it is said that it must be done with correlated subquery.
I know how to do it with uncorrelated subquery:
SELECT EXTRACT(month FROM end_date) as "Month", COUNT(*) as "No of projects"
FROM proj
WHERE EXTRACT(year FROM end_date) = 2016
GROUP BY EXTRACT(month FROM end_date)
HAVING COUNT(*) = (SELECT MAX(COUNT(*))
FROM proj
WHERE EXTRACT(year FROM end_date) = 2016
GROUP BY EXTRACT(month FROM end_date))
Here is my try with correlated subquery - it doesn't work:
SELECT (EXTRACT(month FROM end_date)), COUNT(*) as "liczba"
FROM proj p
WHERE EXTRACT(year FROM end_date) = 2016
GROUP BY EXTRACT(month FROM end_date)
HAVING COUNT(*) = MAX (SELECT(COUNT(EXTRACT(month FROM proj.end_date)))
FROM proj
WHERE EXTRACT(month FROM proj.end_date) = EXTRACT(month FROM p.end_date)
AND EXTRACT(year FROM proj.end_date) = 2016)

Choosing Specific Year when using EXTRACT

I'm trying to pull a few SUMs, but I'm getting stuck on how to narrow it down to a specific year.
I have the following code...
SELECT SITE_ID,
Extract(YEAR FROM DATE_ORDERED) YEAR,
Extract(MONTH FROM DATE_ORDERED) MONTH,
SUM(TOTAL_PRICE),
SUM(TOTAL_PRICE),
SUM(TOTAL_SAVINGS)
FROM DB.ACTUAL_SAVINGS_MVIEW
WHERE SITE_ID = 561
GROUP BY SITE_ID,
Extract(YEAR FROM DATE_ORDERED),
Extract(MONTH FROM DATE_ORDERED)
ORDER BY YEAR DESC,
MONTH DESC
This returns all available years, when I'm only looking for 2016.
Any and all help would be greatly appreciated!
How about adding it to your where clause:
SELECT SITE_ID,
Extract(YEAR FROM DATE_ORDERED) YEAR,
Extract(MONTH FROM DATE_ORDERED) MONTH,
SUM(TOTAL_PRICE),
SUM(TOTAL_PRICE),
SUM(TOTAL_SAVINGS)
FROM DB.ACTUAL_SAVINGS_MVIEW
WHERE SITE_ID = 561
AND Extract(YEAR FROM DATE_ORDERED) = 2016
GROUP BY SITE_ID,
Extract(YEAR FROM DATE_ORDERED),
Extract(MONTH FROM DATE_ORDERED)
ORDER BY YEAR DESC,
MONTH DESC

Append select with missing month and year values

I have SELECT:
SELECT month, year, ROUND(AVG(q_overall) OVER (rows BETWEEN 10000 preceding and current row),2) as avg
FROM (
SELECT EXTRACT(Month FROM date) as month, EXTRACT(Year FROM date) as year, ROUND(AVG(q_overall),1) as q_overall
FROM fb_parsed
WHERE business_id = 1
GROUP BY year, month
ORDER BY year, month) a
output:
month year avg
-----------------
12 2012 5
1 2013 4.5
2 2013 4.1
4 2013 4.8
5 2013 4.7
And I have to append this table with missing values (in this example with 3-rd month in 2013 year). The avg must be same as in previous row, that means I need to append this table with:
3 2013 4.1
Can I do this with SELF JOINS and generate_series, or with some UNION select?
You can simplify your select. It doesn't need a subquery:
SELECT EXTRACT(Month FROM date) as month,
EXTRACT(Year FROM date) as year,
ROUND(AVG(q_overall), 1) as q_overall,
ROUND(AVG(AVG(q_overall)) OVER (rows BETWEEN 10000 preceding and current row), 2)
FROM fb_parsed
WHERE business_id = 1
GROUP BY year, month;
The windows function needs an order by. I assume you really intend:
SELECT EXTRACT(Month FROM date) as month,
EXTRACT(Year FROM date) as year,
ROUND(AVG(q_overall), 1) as q_overall,
ROUND(AVG(AVG(q_overall)) OVER (ORDER BY year, month)), 2)
FROM fb_parsed
WHERE business_id = 1
GROUP BY year, month;
Then, to fill in the values you can use generate_series():
SELECT EXTRACT(Month FROM ym.date) as month,
EXTRACT(Year FROM ym.date) as year,
ROUND(AVG(AVG(q_overall)) OVER (ORDER BY year, month)), 2)
FROM (SELECT generate_series(date_trunc('month', min(date)),
date_trunc('month', max(date)),
interval '1 month') as date
FROM fb_parsed
) ym LEFT JOIN
fb_parsed p
ON EXTRACT(year FROM ym.date) = EXTRACT(year FROM p.date) AND
EXTRACT(month FROM ym.date) = EXTRACT(month FROM p.date) AND
p.business_id = 1
GROUP BY year, month;
I think this will do what you want.
Final query:
SELECT EXTRACT(Month FROM ym.date) as month,
EXTRACT(Year FROM ym.date) as year,
ROUND(AVG(AVG(q_overall)) OVER (ORDER BY EXTRACT(Year FROM ym.date), EXTRACT(Month FROM ym.date)), 2)
FROM
(SELECT generate_series(date_trunc('month', min(date)),
date_trunc('month', max(date)),
interval '1 month') as date
FROM fb_parsed WHERE business_id = 1 AND site = 'facebook')
ym LEFT JOIN
fb_parsed p
ON EXTRACT(year FROM ym.date) = EXTRACT(year FROM p.date) AND
EXTRACT(month FROM ym.date) = EXTRACT(month FROM p.date) AND
p.business_id = 1 AND site = 'facebook'
GROUP BY year, month;
Can I do this with SELF JOINS and generate_series?
Yep, you're close, but your current query does a Cumulative Average. The tricky part is the fill the gaps with the previous value (If PostgreSQL supported the IGNORE NULLS option of LAST_VALUE this would be easier...)
SELECT month,
year,
MAX(q_overall) -- assign the value to all rows within the same group
OVER (PARTITION BY grp)
FROM
(
SELECT all_months.month, all_months.year, p.q_overall,
-- assign a new group number whenever there's a value in q_overall
SUM(CASE WHEN q_overall IS NULL THEN 0 ELSE 1 END)
OVER (ORDER BY all_months.month, all_months.year
ROWS UNBOUNDED PRECEDING) AS grp
FROM
( -- create all months with min and max date
SELECT generate_series(date_trunc('month', min(date)),
date_trunc('month', max(date)),
interval '1 month') as date
FROM fb_parsed
) AS all_months
LEFT JOIN
( -- do the average per month calculation
SELECT EXTRACT(Month FROM date) as month,
EXTRACT(Year FROM date) as year,
ROUND(AVG(q_overall),1) as q_overall
FROM fb_parsed
WHERE business_id = 1
GROUP BY year, month
) AS p
ON EXTRACT(year FROM ym.date) = all_months.month
AND EXTRACT(month FROM ym.date) = all_months.year
) AS dt
Edit:
Oops, this was overly complicated, the question asked for a Cumulative Average and then NULLs will not change the result and there's no need to fill the gaps

SQL Reactivation Revenue

I'm looking for a query that will Sum Reactivation Revenue from a given date on. Currently I have the following query;
SELECT advertisable, EXTRACT(YEAR from day), EXTRACT(MONTH from day), ROUND(SUM(cost)/1e6) FROM adcube dac
WHERE advertisable_eid IN
(SELECT advertisable FROM adcube dac
GROUP BY advertisable HAVING SUM(cost)/1e6 > 100)
GROUP BY advertisable, EXTRACT(YEAR from day), EXTRACT(MONTH from day)
ORDER BY advertisable, EXTRACT(YEAR from day), EXTRACT(MONTH from day)
From this i then export to excel and check accounts thay have stopped spending for 4 months and then reactivated. I then track the new revenue from the new reactivtion month.
Is it possible to get an SQL query to do this without need of Excel?
Thanks
Assuming the four months is actually present in the data, you can do this using window functions. You can find N things in a row by taking the difference between two row_numbers(). Here is the idea:
with t as (
SELECT advertisable, EXTRACT(YEAR from day) as yy, EXTRACT(MONTH from day) as mon,
ROUND(SUM(cost)/1e6) as val
FROM adcube dac
WHERE advertisable_eid IN (SELECT advertisable
FROM adcube dac
GROUP BY advertisable
HAVING SUM(cost)/1e6 > 100
)
GROUP BY advertisable, EXTRACT(YEAR from day), EXTRACT(MONTH from day)
)
select advertisable, min(yy * 10000 + mon) as yyyymm
from (select t.*,
(row_number() over (partition by advertisable order by yy, mon) -
row_number() over (partition by advertisable, val order by yy, mon)
) as grp
from t
)
group by advertisable, grp, val
having count(*) >= 4 and val = 0;