Get consecutive months and days difference from date range? - sql

So let's say I have a table like this:
subscriber_id
package_id
package_start_date
package_end_date
package_price_per_day
1081
231
2014-01-13
2014-12-31
$3.
1084
231
2014-03-21
2014-06-05
$3
1086
235
2014-06-21
2014-09-09
$4
Now I want the result for top 3 packages based on total revenue for each month for year 2014.
Note: For example for package 231 Revenue should be calculated such as 18 days of Jan * $3 +
28 days of feb * $3 + .... and so on.
For the second row the calculation would be same as first row (9 days of March* $3 + 30 days of April *$3 ....)
On the result the package should group by according to month and show rank depending on total revenue.
Sample result:
Month
Package_id
Revenue
Rank
Jan
231.
69499
1.
Jan.
235.
34345.
2.
Jan.
238.
23455.
3.
Feb.
231.
89274
1.
I wrote a query to filter the dates so that I get the active subscriber throughout the year 2014 (since initially there were values from different years),which shows the first table in the question, but I am not sure how do I break the months and days afterwards.
select subscriber_id, package_id, package_start_date, package_end_date
from (
select subscriber_id, package_id
, case when year(package_start_date) < '2014' then package_start_date = '01-Jan-2014' else package_start_date end as package_start_date
, case when year(package_start_date) > '2014' then package_end_date = '31-Dec-2014' else package_start_date end as package_end_date
, price_per_day
from subscription
) a
where year(package_start_date) = '2014' and year(package_end_date) = '2014'
Please do not emphasize on syntax - I am just trying to understand the logical approach in SQL.

Suppose you have a table that is a list of unique dates in a column called d, and the table is called d
It is then relatively trivial to do
SELECT *
FROM t
INNER JOIN d on d.d >= t.package_start_date AND d.d < t.package_end_date
Assuming you class a start date of jan 1 and an end date of jan 2 as 1 day. If you class as two, use <=
This will cause your package rows to multiply into the number of days, so start and end days of jan 1 and jan 11 would mean that row repeats 10 times. The d.d date is different on every row and you can extract the month from d.d and then group on it to give you totals for each month per package
Suppose you've CTEd that query above as x, it's like
SELECT DATEPART(month, x.dd), --the d.d date
package_id,
SUM(revenue)
FROM x
GROUP BY DATEPART(month, x.dd), package_id
Because the rows from T are repeated by Cartesian explosion when joined to d, you can safely group them or aggregate them to get them back to single values per month per package. If you have packages that stay with you more than a year you should also group on datepart year, to avoid mixing up the months from packages that stay from eg jan 2020 to feb 2021(they stay for two jans and two febs)
Then all you need to do is add the ranking of the revenue in, which looks like it would go in at the first step with something like
RANK(DATEDIFF(DAY, start, end)*revenue) OVER(PARTITION BY package_id)
I think I understand it correctly that you rank packages on total revenue over the entire period rather than per month.. look up the difference between rank and dense rank too as you may want dense instead

Related

How to spread annual amount and then add by month in SQL

Currently I'm working with a table that looks like this:
Month | Transaction | amount
2021-07-01| Annual Membership Fee| 45
2021-08-01| Annual Membership Fee| 145
2021-09-01| Annual Membership Fee| 2940
2021-10-01| Annual Membership Fee| 1545
the amount on that table is the total monthly amount (ex. I have 100 customers who paid $15 for the annual membership, so my total monthly amount would be $1500).
However what I would like to do (and I have no clue how) is divide the amount by 12 and spread it into the future in order to have a monthly revenue per month. As an example for 2021-09-01 I would get the following:
$2490/12 = $207.5 (dollars per month for the next 12 months)
in 2021-09-01 I would only get $207.5 for that specific month.
On 2021-10-01 I would get $1545/12 = $128.75 plus $207.5 from the previous month (total = $336.25 for 2021-10-01)
And the same operation would repeat onwards. The last period that I would collect my $207.5 from 2021-09-01 would be in 2022-08-01.
I was wondering if someone could give me an idea of how to perform this in a SQL query/CTE?
Assuming all the months you care about exist in your table, I would suggest something like:
SELECT
month,
(SELECT SUM(m2.amount/12) FROM mytable m2 WHERE m2.month BETWEEN ADD_MONTHS(m1.month, -11) AND m1.month) as monthlyamount
FROM mytable m1
GROUP BY month
ORDER BY month
For each month that exists in the table, this sums 1/12th of the current amount plus the previous 11 months (using the add_months function). I think that's what you want.
A few notes/thoughts:
I'm assuming (based on the column name) that all the dates in the month column end on the 1st, so we don't need to worry about matching days or having the group by return multiple rows for the same month.
You might want to round the SUMs I did, since in some cases dividing by 12 might give you more digits after the decimal than you want for money (although, in that case, you might also have to consider remainders).
If you really only have one transaction per month (like in your example), you don't need to do the group by.
If the months you care about don't exist in your table, then this won't work, but you could do the same thing generating a table of months. e.g. If you have an amount on 2020-01-01 but nothing in 2020-02-01, then this won't return a row for 2021-02-01.
CTE = set up dataset
CTE_2 = pro-rate dataset
FINAL SQL = select future_cal_month,sum(pro_rated_amount) from cte_2 group by 1
with cte as (
select '2021-07-01' cal_month,'Annual Membership Fee' transaction ,45 amount
union all select '2021-08-01' cal_month,'Annual Membership Fee' transaction ,145 amount
union all select '2021-09-01' cal_month,'Annual Membership Fee' transaction ,2940 amount
union all select '2021-10-01' cal_month,'Annual Membership Fee' transaction ,1545 amount)
, cte_2 as (
select
dateadd('month', row_number() over (partition by cal_month order by 1), cal_month) future_cal_month
,amount/12 pro_rated_amount
from
cte
,table(generator(rowcount => 12)) v)
select
future_cal_month
, sum(pro_rated_amount)
from
cte_2
group by
future_cal_month

Calculate average and standard deviation for pre defined number of values substituting missing rows with zeros

I have a simple table that contains a record of products and their total sales per day over a year (just 3 columns - Product, Date, Sales). So, for example, if product A is sold every single day, it'll have 365 records. Similarly, if product B is sold for only 50 days, the table will have just 50 rows for that product - one for each day of sale.
I need to calculate the daily average sales and standard deviation for the entire year, which means that, for product B, I need to have additional 365-50=315 entries with zero sales to be able to calculate the daily average and standard deviation for the year correctly.
Is there a way to do this efficiently and dynamically in SQL?
Thanks
We can generate 366 rows and join the sales data to it:
WITH rg(rn) AS (
SELECT 1 AS rn
UNION ALL
SELECT a.rn + 1 AS rn
FROM rg a
WHERE a.rn <= 366
)
SELECT
*
FROM
rg
LEFT JOIN (
SELECT YEAR(saledate) as yr, DATEPART(dayofyear, saledate) as doy, count(*) as numsales
FROM sales
GROUP BY YEAR(saledate), DATEPART(dayofyear, saledate)
) s ON rg.rn = s.doy
OPTION (MAXRECURSION 370);
You can replace the nulls (where there is no sale data for that day) with e.g. AVG(COALESCE(numsales, 0)). You'll probably also need a WHERE clause to eliminate the 366th day on non leap years (such as MODULO the year by 4 and only do 366 rows if it's 0).
If you're only doing a single year, you can use a where clause in the sales subquery to give only the relevant records; most efficient is to use a range like WHERE salesdate >= DATEFROMPARTS(YEAR(GetDate()), 1, 1) AND salesdate < DATEFROMPARTS(YEAR(GetDate()) + 1, 1, 1) rather than calling a function on every sales date to extract the year from it to compare to a constant. You can also drop the YEAR(salesdate) from the select/group by if there is only a single year
If you're doing multiple years, you could make the rg generate more rows, or (perhaps simpler) cross join it to a list of years so you get 366 rows multiplied by e.g. VALUES (2015),(2016),(2017),(2018),(2019),(2020) (and make the year from the sales part of the join too)
find the first and last day of the year and then use datediff() to find number of days in that year.
After that don't use AVG on sales, but SUM(Sales) / days_in_year
select *,
days_in_year = datediff(day, first_of_year, last_of_year) + 1
from (values (2019), (2020)) v(year)
cross apply
(
select first_of_year = dateadd(year, year - 1900, 0),
last_of_year = dateadd(year, year - 1900 + 1, -1)
) d
There's a different way to look at it - don't try to add additional empty rows, just divide by the number of days in a year. While the number of days a year isn't constant (a leap year will have 366 days), it can be calculated easily since the first day of the year is always January 1st and the last is always December 31st:
SELECT YEAR(date),
product,
SUM(sales) / DATEPART(dy, DATEFROMPARTS(YEAR(date)), 12, 31))
FROM sales_table
GROUP BY YEAR(date), product

How do I identify events that occurred in consecutive years?

I am trying to figure out if an event occurred in the three consecutive previous years by month. For example:
Item Type Month Year
Hat S May 2015
Shirt P June 2015
Hat S June 2015
Hat S May 2016
Shirt P May 2016
Hat S May 2017
I am interested in seeing what item was purchased/sold for three consecutive years in the same month. Hat was sold in May in 2015, 2016, and 2017; therefore, I would like to identify that. Shirt was purchased in June 2015 and May 2016. Since this is different months in consecutive years, it does not qualify.
Essentially, I want it to be able to look back 3 years and identify those purchases/sales that reoccurred in the same month each year, preferably with an indicator variable.
I tried the following code:
select distinct a.*
from dataset as a inner join dataset as b
on a.type = b.type
and a.month = b.month
and a.item = b.item
and a.year = b.year-1
and a.year = b.year-2;
I want to get:
Item Type Month Year
Hat S May 2015
Hat S May 2016
Hat S May 2017
I guess I should add that my data is longer than 2015-2017. It spans 10 years, but I want to see if there are any 3 consecutive years (or more) within that 10 year span.
There are many ways to do this, however, one way in SQL, with the key understanding that rows can be grouped by Item and Month, is to restrict Year to the three years between 2015 and 2017. In order to qualify for 3 consecutive the count of the distinct values of year within the group should be 3. Such criteria will handle data with repetition, such as a group with 3 S-type Hats and 3 P-type Hats.
select item, type, month, year
from have
where year between 2015 and 2017
group by item, month
having count(distinct year) = 3
order by item, type, month, year
For the more generic problem of identifying runs within a group, SAS Data step is very suited and powerful. The serial DOW loop technique loops first over a range of rows based on some condition, whilst computing a group metric -- in this case, consecutive year runlength. A second loops over the same rows and utilizes the group metric within.
Consider this example in which the rungroup is computed based on year adjacency of item/month. Once the rungroups are established, the double DOW technique is applied.
data have;
do comboid = 1 to 1000;
itemid = ceil(10 * ranuni(123));
typeid = ceil(2* ranuni(123));
month = ceil(12 * ranuni(123));
year = 2009 + floor (10 * ranuni(123));
output;
end;
run;
proc sort data=have;
by itemid month year;
run;
data have_rungrouped;
set have;
by itemid month year;
rungroup + (first.month or not first.month and year - lag(year) > 1);
run;
data want;
do index = 1 by 1 until (last.rungroup);
set have_rungrouped;
by rungroup;
* distinct number of years in rungroup;
years_runlength = sum (years_runlength, first.rungroup or year ne lag(year));
end;
do index = 1 to index;
set have_rungrouped;
if years_runlength >= 3 then output;
end;
run;
Here is an example that would check if any item happened in consecutive years and list all from original table that qualify for at least two consecutive years:
DECLARE #table TABLE
(
Item NVARCHAR(MAX),
Type CHAR,
Month NVARCHAR(MAX),
Year INT
)
INSERT INTO #table VALUES
('Hat','S','May','2015'),
('Shirt','P','June','2015'),
('Hat','S','June','2015'),
('Hat','S','May','2016'),
('Shirt','P','May','2016'),
('Hat','S','May','2017')
SELECT * FROM #table
WHERE CONCAT(Item,Month) IN
(
SELECT CONCAT(group1.Item, group1.Month) FROM
(
SELECT Item,Year,Month FROM #table
GROUP BY Year, Item, Month
) group1
FULL OUTER JOIN
(
SELECT Item,Year,Month FROM #table
GROUP BY Year, Item, Month
) group2
ON group1.Year = group2.Year + 1 AND group1.Item = group2.Item AND group1.Month = group2.Month
WHERE group1.Item IS NOT NULL AND group2.Item IS NOT NULL
)
ORDER BY Item,Month,Year
As you can see I found all items that matched year + 1 in the same month.
OUTPUT:
Hat S May 2015
Hat S May 2016
Hat S May 2017

SQL Server / SSRS: Calculating monthly average based on grouping and historical values

I need to calculate an average based on historical data for a graph in SSRS:
Current Month
Previous Month
2 Months ago
6 Months ago
This query returns the average for each month:
SELECT
avg_val1, month, year
FROM
(SELECT
(sum_val1 / count) as avg_val1, month, year
FROM
(SELECT
SUM(val1) AS sum_val1, SUM(count) AS count, month, year
FROM
(SELECT
COUNT(val1) AS count, SUM(val1) AS val1,
MONTH([SnapshotDate]) AS month,
YEAR([SnapshotDate]) AS year
FROM
[DC].[dbo].[KPI_Values]
WHERE
[SnapshotKey] = 'Some text here'
AND No = '001'
AND Channel = '999'
GROUP BY
[SnapshotDate]) AS sub3
GROUP BY
month, year, count) AS sub2
GROUP BY sum_val1, count, month, year) AS sub1
ORDER BY
year, month ASC
When I add the following WHERE clause I get the average for March (2 months ago):
WHERE month = MONTH(GETDATE())-2
AND year = YEAR(GETDATE())
Now the problem is when I want to retrieve data from 6 months ago; MONTH(GETDATE()) - 6 will output -1 instead of 12. I also have an issue with the fact that the year changes to 2016 and I am a bit unsure of how to implement the logic in my query.
I think I might be going about this wrong... Any suggestions?
Subtract the months from the date using the DATEADD function before you do your comparison. Ex:
WHERE SnapshotDate BETWEEN DATEADD(month, -6, GETDATE()) AND GETDATE()
MONTH(GETDATE()) returns an int so you can go to 0 or negative values. you need a user scalar function managing this, adding 12 when <= 0

How to find the average of last 52 weeks sales at each time by SQL

I have a CSV file with four columns: date, wholesaler, product, and sales.
I am looking for finding average of last 52 weeks sales for each Product and Wholesaler combination at each date. It means what is the average previous sales of product A at wholesaler B at time C in last 52 weeks.
For instance we know sales of product 'A' at wholesaler 'B' at Jan, Apr, May, Aug that are 100, 200, 300, 400 respectively. Let assume we do not have any record before Jan. So the average of previous sale of product 'A' in wholesaler 'B' at Apr is equal to 100/1, and at May is equal to (200+100)/2 and at Aug is (300+200+100)/3.
The following table shows my data:
date wholesaler product sales
12/31/2012 53929 UPE54 4
12/31/2012 13131 UPE55 1
2/23/2013 13131 UPE55 1156
4/24/2013 13131 UPE55 1
12/1/2013 83389 UPE54 9
12/17/2013 83389 UPE54 1
12/18/2013 52237 UPE54 9
12/19/2013 53929 UME24 1
12/31/2013 82204 UPE55 9
12/31/2013 11209 UME24 4
12/31/2013 52237 UPE54 1
Now I am using a python code that only works properly for small databases. Since my data-set has more than 25 million rows I am looking for a better way to find the solution. Thanks a million for your help!
I think this is what you are looking for.
WITH cte_prep
AS (
SELECT
YEAR(date) * 100 + DATEPART(WEEK, [DATE]) AS week
, date
, RANK() OVER ( PARTITION BY product, wholesaler ORDER BY YEAR(date) * 100 + DATEPART(WEEK, [DATE]) ) AS product_wholesaler_week_rank
, [wholesaler]
, [product]
, [sales]
FROM
[meta].[dbo].[sales]
)
SELECT
CW.wholesaler
, CW.product
, CW.week
, CW.product_wholesaler_week_rank
, CW.sales
, AVG(BW.sales) AS avg_sales
FROM
cte_prep AS CW
INNER JOIN cte_prep BW
ON BW.product = CW.product AND
BW.wholesaler = CW.wholesaler AND
CW.product_wholesaler_week_rank >= BW.product_wholesaler_week_rank
AND BW.product_wholesaler_week_rank >= CW.product_wholesaler_week_rank - 52
GROUP BY
CW.wholesaler
, CW.product
, CW.week
, CW.sales
, CW.product_wholesaler_week_rank
ORDER BY
CW.wholesaler
, CW.product
, CW.week desc
The results look like this
select sum('sales')/count('sales')
from table
Group by year(date)
What you're asking for is slightly more involved than the answer I gave. I gave an answer that works if you only want to group the year long periods between Jan 1 - Dec 31. It may be the case that you want year long periods, but maybe you want them from July 1 - June 30 instead.
The way to do this is to loo for ways to group by date ranges. Here are a handful of links you may find helpful.
https://dba.stackexchange.com/questions/59356/grouping-by-date-range-in-a-column
SQL Group by Date Range
In SQL, how can you "group by" in ranges?