Modify SQL to include cumulative sum of all enrollments

Modify SQL to include cumulative sum of all enrollments - sql

I am able to get members cancelled in a quarter with the below query -
SELECT DATEPART(YEAR, Canceldate) [Year],
DATEPART(QUARTER, Canceldate) [Quarter], COUNT(1) [id Count]
FROM Subscription
where DATEPART(YEAR, Canceldate) > 2016
GROUP BY DATEPART(YEAR, CancelDate),DATEPART(QUARTER, Canceldate)
ORDER BY 1,2
The output is
Year Quarter Count
2017 1 2406866
2017 2 1161904
2017 3 3432214
2017 4 10905218
2018 1 1416848
2018 2 258146
2018 3 2996401
2018 4 639415
2019 1 3425557
If we started out with 100 members and 1 member enrolled every quarter. How do I get the cumulative number of members enrolled during these periods. For example, I need this output
Year Quarter Count Enrolled
2017 1 2406866 100
2017 2 1161904 101
2017 3 3432214 102
2017 4 10905218 103
2018 1 1416848 104
2018 2 258146 105
2018 3 2996401 106
2018 4 639415 107
2019 1 3425557 108
The following sql can be used to calculate enrollments for every quarter.
SELECT DATEPART(YEAR, EnrollmentDt) [Year],
DATEPART(QUARTER, EnrollmentDt) [Quarter], COUNT(1) [id Count]
FROM Subscription
where DATEPART(YEAR, EnrollmentDt) > 2016
GROUP BY DATEPART(YEAR, EnrollmentDt),DATEPART(QUARTER, EnrollmentDt)
ORDER BY 1,2

You would use window functions:
SELECT DATEPART(YEAR, Canceldate) as [Year],
DATEPART(QUARTER, Canceldate) as [Quarter], COUNT(1) as [id Count],
99 + ROW_NUMBER() OVER (ORDER BY MIN(CancelDate)) as Enrolled
FROM Subscription
WHERE DATEPART(YEAR, Canceldate) > 2016
GROUP BY DATEPART(YEAR, CancelDate),DATEPART(QUARTER, Canceldate)
ORDER BY 1,2

Related

Summing sales dollars for most recent month and 2nd most recent month

For each of the 12 months, I'm looking to create a field that sums the sales dollars at the account level for the most recent month and the 2nd most recent month based on the current date.
For example, given that today's date is 10/6/22, 'MostRecentNovember' would sum up sales from November 2021. '2ndMostRecentNovember' would sum up sales from November 2020. Once the current date moves into November 2022, this query would adjust to pull MostRecentNovember sales from 2022 and 2ndMostRecentNovember sales from 2021.
Conversely, given that today's date is 10/6/22 'MostRecentJune' would sum up sales from June 2022 and '2ndMostRecentJune' would sum up sales from June 2021.
Below is my attempt at this code, I think this gets partially there, but not sure it's exactly what I want
SELECT NovemberMostRecent_Value =
sum(case when datepart(year,tran_date) = datepart(year, getdate())
AND DATEPART(month, tran_date) = 11 then value else 0 end)
NovemberSecondMostRecent_Value =
sum(case when datepart(year,tran_date) = datepart(year, getdate())-1
AND DATEPART(month, tran_date) = 11 then value else 0 end)
Here's a snippet of the source data table
account_no
tran_date
value
123
11/22/21
500
123
11/1/21
500
123
11/20/20
1500
123
6/3/22
5000
123
6/4/21
2000
456
11/3/20
525
456
11/4/21
125
Per Request in Comments. A table of desired Results
account_no
NovemberMostRecent
November2ndMostRecent
June MostRecent
June2ndMostRecent
123
1000
1500
5000
2000
456
125
525
0
0

Why don't you just sum up the sales then group by month and year for the last two years? Wouldn't that solve the problem?
Or you can show a table that depicts what you are trying to achieve.

This should work fine.
Note: I only assume the account_no is the same for all the rows, if they are different, then you will need to pass it as a condition in the subquery.
WITH CTE AS
(SELECT (SELECT SUM(value) FROM tablename WHERE datepart(year, tran_date) = YEAR(getdate()) AND datepart(month, tran_date) = 11)
AS first_value,
(SELECT SUM(value) FROM tablename WHERE datepart(year, tran_date) = YEAR(getdate())-1 AND datepart(month, tran_date) = 11)
AS second_value,
(SELECT SUM(value) FROM tablename WHERE datepart(year, tran_date) = YEAR(getdate())-2 AND datepart(month, tran_date) = 11)
AS third_value)
SELECT IIF (first_value>0, first_value, second_value) AS NovemberMostRecent_Value,
IIF (first_value>0, second_value, third_value) AS NovemberSecondMostRecent_Value FROM CTE;

check whether values lies between or not in SQL

These are my 2 tables
table 2:
date_val
yr_num
yr_wk_num
day_wk_num
yr_wk_nm
day
mo_num
20200808
2020
32
6
202032
Saturday
08
20200809
2020
32
7
202032
Sunday
08
20200810
2020
33
1
202033
Monday
08
20200811
2020
33
2
202033
Tuesday
08
20200812
2020
33
3
202033
Wednesday
08
table1:
sku_id
dateval
sales
ab124
20210603
10
ab124
20210502
20
ab123
20210606
30
Need to check sales is with in + or - 30% of 2 month avg sales
with CTE
as
(
select * from table1 where dateval >= dateadd(mm, -2, dateval)
)
select dateval, sum(sales) as [Total Sales], avg(sales) as [Average Sales] from CTE group by dateval order by 1
I tried below also...
with CTE
as
(
select * from table1 t1 left join table2 t2 on t1.dateval = t2.date_val where t2.date_val >= dateadd(mm, -2, t1.dateval)
)
select dateval,sum(sales) as [Total Sales], avg(sales) as [Average Sales] from CTE group by dateval order by 1
here am doing filtering within table1 but i need to use table 2 to get filtered for past two months and get avg sales.
Next, i need to do +30% to that result avg and -30% result avg and check whether my sales is withn avg sales( avg30% above or below) or not if yes '1' if not '0'
For Ex:
Historic 2 month avg sales 100.
(+30% of avg sales is 130)
(-30% of avg sales is 70)
if sales is 120. i need to check 120 lies between 70 to 130.
please suggest me how to achieve

SQL - use only clients that are present in all months

I have a dataset with different clients, and their sales count. Over time, some clients get added and deleted from the data. How do I make sure that when I look at the sales counts, that I am only using a selection of the clients that were in the data set all the time? Ie if I have a client that doesn't have a record for 2018-03, then I don't want that client to be part of the entire query. If a clients does not have a record in 2020-03, then I also do not want this client to be part of the entire query.
For example, the following query:
select DATE_PART (y, sold_date)as year, DATE_PART (mm, sold_date) as month, count(distinct(client))
from sales_data
where sold_date > '2018-01-01'
group by year, month
order by year,month
Yields
year month count
2018 1 78
2018 2 83
2018 3 80
2018 4 83
2018 5 84
2018 6 81
2018 7 83
2018 8 90
2018 9 89
2018 10 95
2018 11 94
2018 12 97
2019 1 102
2019 2 103
2019 3 102
2019 4 105
2019 5 103
2019 6 104
2019 7 104
2019 8 106
2019 9 106
2019 10 108
2019 11 109
2019 12 104
2020 1 104
2020 2 102
2020 3 103
2020 4 98
2020 5 97
2020 6 79
So I want to only use the clients that are in all months, they should not be more than 78, because there can not be more users than the minimal month (2018-1).
FYI, I am using Amazon Redshift here but I am OK with a query that's rdbms agnostic or works for SQL-Server/Oracle/MySQL/PostgreSQL, I am just interested in a pattern on how to solve this issue effectively.

If I'm understanding what you want correctly, and if this is just a one-off query, you could use a correlated subquery in the where clause:
SELECT
DATE_PART(y, s.sold_date) AS year,
DATE_PART(mm, s.sold_date) AS month,
COUNT(DISTINCT s.client)
FROM
sales_data AS s
WHERE
EXISTS (
SELECT sd.client FROM sales_data AS sd WHERE DATE_PART(y,
sd.sold_date) = 2018 AND DATE_PART(mm, sd.sold_date) = 1 AND
sd.client = s.client
) AND
s.sold_date > '2018-01-01'
GROUP BY
year,
month
ORDER
DATE_PART(y, s.sold_date),
DATE_PART(mm, s.sold_date)

presence in all months can be done with 2-step aggregation:
group sales data by customer ID having all months
group sales data joined to (1) by year, month
like this (=12 can be a dynamic expression, depending on the amount of history you have)
with
stable_customers as (
select customer_id
from sales_data
group by 1
having count(distinct date_trunc('month' from sold_date)=12
)
select
DATE_PART (y, sold_date) as year
,DATE_PART (mm, sold_date) as month,
,count(1)
from sales_date
join stable_customers
using (customer_id)
where sold_date > '2018-01-01'
group by year, month
order by year,month

Use window functions. Unfortunately, SQL Server does not support count(distinct) as a window function. Fortunately, there is a simple work-around using dense_rank():
select year, month, count(distinct client)
from (select sd.*, year, month,
(dense_rank() over (order by year, month) +
dense_rank() over (order by year desc, month desc)
) as num_months,
(dense_rank() over (partition by client order by year, month) +
dense_rank() over (partition by client order by year desc, month desc)
) as num_months_client
from sales_data sd cross apply
(values (year(sold_date), month(sold_date))) v(year, month)
where sd.sold_date > '2018-01-01'
) sd
where num_months_client = num_months
group by year, month
order by year, month;
Note: This looks at all months that are in the data. If all clients are missing 2019-03, then that months is not considered at all.

Grouping data on SQL Server

I have this table in SQL Server:
Year Month Quantity
----------------------------
2015 January 10
2015 February 20
2015 March 30
2014 November 40
2014 August 50
How can I identify the different years and months adding two more columns that group the same years with a number and then different months in sequential way like the example
Year Month Quantity Group Subgroup
------------------------------------------------
2015 January 10 1 1
2015 February 20 1 2
2015 March 30 1 3
2014 November 40 2 1
2014 August 50 2 2

You can use DENSE_RANK to calculate the groups for you:
SELECT t1.*, DENSE_RANK() OVER (ORDER BY Year DESC) AS [Group],
DENSE_RANK() OVER (PARTITION BY Year ORDER BY DATEPART(month, Month + ' 01 2010')) AS [SubGroup]
FROM t1
ORDER BY 4, 5
See this fiddle.

To associate group and subgroup with a number you can do this:
WITH RankedTable AS (
SELECT year, month, quantity,
ROW_NUMBER() OVER (partition by year order by Month) AS rn
FROM yourtable)
SELECT year, month, quantity,
SUM (CASE WHEN rn = 1 THEN 1 ELSE 0 END) OVER (ORDER BY YEAR) as year_group,
rn AS subgroup
FROM RankedTable
Here ROW_NUMBER() OVER clause calculates rank of a month within a year.
And SUM() ... OVER calculates running SUM for the months with rank 1.
SQL Fiddle

sql server calculate cumulative number per month for different year

I have a table with "date" column. Each row represents a survey.
date
11/19/2013 5:51:41 PM
11/22/2013 1:30:38 PM
11/23/2013 3:09:17 PM
12/2/2014 5:24:17 PM
12/25/2014 11:42:56 AM
1/6/2014 2:24:49 PM
I want to count the number of survey per month cumulatively. As you see from the above table, there are 3 surveys for Nov 2013, 2 surveys for Dec 2013, 1 survey for Jan 2014. The cumulative number of survey per month would be:
month | year | number_of_survey
11 | 2013 | 3
12 | 2013 | 5
1 | 2014 | 6
I have this query which shows correct number of surveys for 2013, and number of survey for 2014 is not cumulative.
with SurveyPerMonth as -- no of Survey per month
(
select datepart(month, s.date) as month,
datepart(year, s.date) as year,
count(*) as no_of_surveys
from myTable s
group by datepart(year, s.date), datepart(month, s.date)
)
select p1.month, p1.year, sum(p2.no_of_surveys) as surveys -- cumulatively
from SurveyPerMonth p1
inner join SurveyPerMonth p2 on p1.month >= p2.month and p1.year>=p2.year **-- the problem is probably comes from this line of code**
group by p1.month, p1.year
order by p1.year, p1.month;
This query returns:
month | year | surveys
11 | 2013 | 3
12 | 2013 | 5
1 | 2014 | 1 // 2014 is not cumulative
How can I calculate cumulative number of surveys per month for 2014 as well?

Something like this ?
SELECT date = create_date INTO #myTable FROM master.sys.objects
;WITH perMonth ( [year], [month], [no_of_surveys])
AS (SELECT DatePart(year, s.date) ,
DatePart(month, s.date),
COUNT(*)
FROM #myTable s
GROUP BY datepart(year, s.date),
datepart(month, s.date))
SELECT [year],
[month],
[no_of_surveys] = ( SELECT SUM([no_of_surveys])
FROM perMonth agg
WHERE (agg.[year] < pm.[year])
OR (agg.[year] = pm.[year] AND agg.[month] <= pm.[month]))
FROM perMonth pm
ORDER BY [year], [month]
Edit: seems I missed the ball with < and >, fixed it and added small example

'--This should work.I have added a new column 'monthyear'
with surveypermonth as -- no of survey per month
(
select datepart(month, s.date) as month,
datepart(year, s.date) as year,
datepart(year, s.date) *100 + datepart(month, s.date) as monthyear,
count(*) as no_of_surveys
from test s
group by datepart(year, s.date), datepart(month, s.date),datepart(year, s.date)*100 + datepart(month, s.date)
)
select a.month,substring(cast(monthyear as varchar(6)),1,4) as year,surveys from
(
select p1.month, p1.monthyear as monthyear, sum(p2.no_of_surveys) as surveys
from surveypermonth p1
inner join surveypermonth p2 on p1.monthyear>=p2.monthyear
group by p1.month, p1.monthyear
--order by p1.monthyear, p1.month
)a

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Modify SQL to include cumulative sum of all enrollments - sql

Related

Summing sales dollars for most recent month and 2nd most recent month

check whether values lies between or not in SQL

SQL - use only clients that are present in all months

Grouping data on SQL Server

sql server calculate cumulative number per month for different year

Categories

Resources