Usage compared to last year - sql

I have a table that contains two columns - date an amount (usage). I want to compare my usage with last year's
Data
Date Amount
01-01-2015 23
02-01-2015 24
03-01-2015 19
04-01-2015 11
05-01-2015 5
06-01-2015 23
07-01-2015 21
08-01-2015 6
09-01-2015 26
10-01-2015 9
[...]
01-01-2016 30
02-01-2016 19
03-01-2016 5
04-01-2016 8
05-01-2016 15
06-01-2016 27
07-01-2016 19
08-01-2016 21
09-01-2016 24
10-01-2016 27
[until today's date]
The sql statement should return results up to today's date and the diff should be a running total - the amount on a day in 2016 minus the amount on the same day in 2015, plus the previous difference. Assuming today is Jan 10, the result would be:
Day Diff
01-01-2016 7 (30-23)
02-01-2016 2 (19-24+7)
03-01-2016 -12 (5-19+2)
04-01-2016 -15 (8-11-12)
05-01-2016 -5 (and so on...)
06-01-2016 -1
07-01-2016 -3
08-01-2016 12
09-01-2016 10
10-01-2016 28
I can't figure out to do this when the data is all in one table...
Thanks

Try this
SELECT t1.date Day,SUM(t2.Diff) FROM
amounts t1
INNER JOIN
(SELECT amounts.date Day, amounts.amount-prevamounts.amount Diff
FROM amounts inner join amounts prevamounts
ON amounts.date=date(prevamounts.date,'+1 years')) t2
ON t2.Day<=t1.date
group by t1.date;
demo here :
http://sqlfiddle.com/#!7/9bd4c/41

Related

count number of records by month over the last five years where record date > select month

I need to show the number of valid inspectors we have by month over the last five years. Inspectors are considered valid when the expiration date on their certification has not yet passed, recorded as the month end date. The below SQL code is text of the query to count valid inspectors for January 2017:
SELECT Count(*) AS RecordCount
FROM dbo_Insp_Type
WHERE (dbo_Insp_Type.CERT_EXP_DTE)>=#2/1/2017#);
Rather than designing 60 queries, one for each month, and compiling the results in a final table (or, err, query) are there other methods I can use that call for less manual input?
From this sample:
Id
CERT_EXP_DTE
1
2022-01-15
2
2022-01-23
3
2022-02-01
4
2022-02-03
5
2022-05-01
6
2022-06-06
7
2022-06-07
8
2022-07-21
9
2022-02-20
10
2021-11-05
11
2021-12-01
12
2021-12-24
this single query:
SELECT
Format([CERT_EXP_DTE],"yyyy/mm") AS YearMonth,
Count(*) AS AllInspectors,
Sum(Abs([CERT_EXP_DTE] >= DateSerial(Year([CERT_EXP_DTE]), Month([CERT_EXP_DTE]), 2))) AS ValidInspectors
FROM
dbo_Insp_Type
GROUP BY
Format([CERT_EXP_DTE],"yyyy/mm");
will return:
YearMonth
AllInspectors
ValidInspectors
2021-11
1
1
2021-12
2
1
2022-01
2
2
2022-02
3
2
2022-05
1
0
2022-06
2
2
2022-07
1
1
ID
Cert_Iss_Dte
Cert_Exp_Dte
1
1/15/2020
1/15/2022
2
1/23/2020
1/23/2022
3
2/1/2020
2/1/2022
4
2/3/2020
2/3/2022
5
5/1/2020
5/1/2022
6
6/6/2020
6/6/2022
7
6/7/2020
6/7/2022
8
7/21/2020
7/21/2022
9
2/20/2020
2/20/2022
10
11/5/2021
11/5/2023
11
12/1/2021
12/1/2023
12
12/24/2021
12/24/2023
A UNION query could calculate a record for each of 50 months but since you want 60, UNION is out.
Or a query with 60 calculated fields using IIf() and Count() referencing a textbox on form for start date:
SELECT Count(IIf(CERT_EXP_DTE>=Forms!formname!tbxDate,1,Null)) AS Dt1,
Count(IIf(CERT_EXP_DTE>=DateAdd("m",1,Forms!formname!tbxDate),1,Null) AS Dt2,
...
FROM dbo_Insp_Type
Using the above data, following is output for Feb and Mar 2022. I did a test with Cert_Iss_Dte included in criteria and it did not make a difference for this sample data.
Dt1
Dt2
10
8
Or a report with 60 textboxes and each calls a DCount() expression with criteria same as used in query.
Or a VBA procedure that writes data to a 'temp' table.

Calculate running sum of previous 3 months from monthly aggregated data

I have a dataset that I have aggregated at monthly level. The next part needs me to take, for every block of 3 months, the sum of the data at monthly level.
So essentially my input data (after aggregated to monthly level) looks like:
month
year
status
count_id
08
2021
stat_1
1
09
2021
stat_1
3
10
2021
stat_1
5
11
2021
stat_1
10
12
2021
stat_1
10
01
2022
stat_1
5
02
2022
stat_1
20
and then my output data to look like:
month
year
status
count_id
3m_sum
08
2021
stat_1
1
1
09
2021
stat_1
3
4
10
2021
stat_1
5
8
11
2021
stat_1
10
18
12
2021
stat_1
10
25
01
2022
stat_1
5
25
02
2022
stat_1
20
35
i.e 3m_sum for Feb = Feb + Jan + Dec. I tried to do this using a self join and wrote a query along the lines of
WITH CTE AS(
SELECT date_part('month',date_col) as month
,date_part('year',date_col) as year
,status
,count(distinct id) as count_id
FROM (date_col, status, transaction_id) as a
)
SELECT a.month, a.year, a.status, sum(b.count_id) as 3m_sum
from cte as a
left join cte as b on a.status = b.status
and b.month >= a.month - 2 and b.month <= a.month
group by 1,2,3
This query NEARLY works. Where it falls apart is in Jan and Feb. My data is from August 2021 to Apr 2022. The means, the value for Jan should be Nov + Dec + Jan. Similarly for Feb it should be Dec + Jan + Feb.
As I am doing a join on the MONTH, all the months of Aug - Nov are treated as being values > month of jan/feb and so the query isn't doing the correct sum.
How can I adjust this bit to give the correct sum?
I did think of using a LAG function, but (even though I'm 99% sure a month won't ever be missed), I can't guarantee we will never have a month with 0 values, and therefore my LAG function will be summing the wrong rows.
I also tried doing the same join, but at individual date level (and not aggregating in my nested query) but this gave vastly different numbers, as I want the sum of the aggregation and I think the sum from the individual row was duplicated a lot of stuff I do a COUNT DISTINCT on to remove.
You can use a SUM with a window frame of 2 PRECEDING. To ensure you don't miss rows, use a calendar table and left-join all the results to it.
SELECT *,
SUM(a.count_id) OVER (ORDER BY c.year, c.month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM Calendar c
LEFT JOIN a ON a.year = c.year AND a.month = c.month
WHERE c.year >= 2021 AND c.year <= 2022;
db<>fiddle
You could also use LAG but you would need it twice.
It should be #Charlieface's answer - only that I get one different result than you put in your expected result table:
WITH
-- your input - and I avoid keywords like "MONTH" or "YEAR"
-- and also identifiers starting with digits are forbidden -
indata(mm,yy,status,count_id,sum_3m) AS (
SELECT 08,2021,'stat_1',1,1
UNION ALL SELECT 09,2021,'stat_1',3,4
UNION ALL SELECT 10,2021,'stat_1',5,8
UNION ALL SELECT 11,2021,'stat_1',10,18
UNION ALL SELECT 12,2021,'stat_1',10,25
UNION ALL SELECT 01,2022,'stat_1',5,25
UNION ALL SELECT 02,2022,'stat_1',20,35
)
SELECT
*
, SUM(count_id) OVER(
ORDER BY yy,mm
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS sum_3m_calc
FROM indata;
-- out mm | yy | status | count_id | sum_3m | sum_3m_calc
-- out ----+------+--------+----------+--------+-------------
-- out 8 | 2021 | stat_1 | 1 | 1 | 1
-- out 9 | 2021 | stat_1 | 3 | 4 | 4
-- out 10 | 2021 | stat_1 | 5 | 8 | 9
-- out 11 | 2021 | stat_1 | 10 | 18 | 18
-- out 12 | 2021 | stat_1 | 10 | 25 | 25
-- out 1 | 2022 | stat_1 | 5 | 25 | 25
-- out 2 | 2022 | stat_1 | 20 | 35 | 35

Calculate Churn by aggregating by date range in SQL

I am trying to calculate the churn rate from a data that has customer_id, group, date. The aggregation is going to be by id, group and date. The churn formula is (customers in previous cohort - customers in last cohort)/customers in previous cohort
customers in previous cohort refers to cohorts in before 28 days
customers in last cohort refers to cohorts in last 28 days
I am not sure how to aggregate them by date range to calculate the churn.
Here is sample data that I copied from SQL Group by Date Range:
Date Group Customer_id
2014-03-01 A 1
2014-04-02 A 2
2014-04-03 A 3
2014-05-04 A 3
2014-05-05 A 6
2015-08-06 A 1
2015-08-07 A 2
2014-08-29 XXXX 2
2014-08-09 XXXX 3
2014-08-10 BB 4
2014-08-11 CCC 3
2015-08-12 CCC 2
2015-03-13 CCC 3
2014-04-14 CCC 5
2014-04-19 CCC 4
2014-08-16 CCC 5
2014-08-17 CCC 3
2014-08-18 XXXX 2
2015-01-10 XXXX 3
2015-01-20 XXXX 4
2014-08-21 XXXX 5
2014-08-22 XXXX 2
2014-01-23 XXXX 3
2014-08-24 XXXX 2
2014-02-25 XXXX 3
2014-08-26 XXXX 2
2014-06-27 XXXX 4
2014-08-28 XXXX 1
2014-08-29 XXXX 1
2015-08-30 XXXX 2
2015-09-31 XXXX 3
The goal is to calculate the churn rate every 28 days in between 2014 and 2015 by the formula given above. So, it is going to be aggregating the data by rolling it by 28 days and calculating the churn by the formula.
Here is what I tried to aggregate the data by date range:
SELECT COUNT(distinct customer_id) AS count_ids, Group,
DATE_SUB(CAST(Date AS DATE), INTERVAL 56 DAY) AS Date_min,
DATE_SUB(CURRENT_DATE, INTERVAL 28 DAY) AS Date_max
FROM churn_agg
GROUP BY count_ids, Group, Date_min, Date_max
Hope someone will help me with aggregation and churn calculation. I want to simply deduct the aggregated count_ids to deduct it from the next aggregated count_ids which is after 28 days. So this is going to be successive deduction of the same column value (count_ids). I am not sure if I have to use rolling window or simple aggregation to find the churn.
As corrected by #jarlh, it's not 2015-09-31 but 2015-09-30
You can use this to create 28 days calendar:
create table daysby28 (i int, _Date date);
insert into daysby28 (i, _Date)
SELECT i, cast('01-01-2014'as date) + i*INTERVAL '28 day'
from generate_series(0,50) i
order by 1;
After you use #jarlh churn_agg table creation he sent with the fiddle, with this query, you get what you want:
with cte as
(
select count(Customer) as TotalCustomer, Cohort, CohortDateStart From
(
select distinct a.Customer_id as Customer, b.i as Cohort, b._Date as CohortDateStart
from churn_agg a left join daysby28 b on a._Date >= b._Date and a._Date < b._Date + INTERVAL '28 day'
) a
group by Cohort, CohortDateStart
)
select a.CohortDateStart,
1.0*(b.TotalCustomer - a.TotalCustomer)/(1.0*b.TotalCustomer) as Churn from cte a
left join cte b on a.cohort > b.cohort
and not exists(select 1 from cte c where c.cohort > b.cohort and c.cohort < a.cohort)
order by 1
The fiddle of all together is here

SQL: Sum Only Certain Rows Depending on Start and End Date

I have two tables: The 1st table contains a unique identifier (UI). Each unique identifier has a column containing a start date (yyyy-mm-dd), and a column containing an end date (yyyy-mm-dd). The 2nd table contains the temperature for each day, with separate columns for the month, day, year and temperature. I would like to join those tables and get the compiled temperature for each unique identifier; however I would the compiled temperature to only include the days from the second table that fall between start and end dates from the 1st table.
For example, if one record has a start_date of 12/10/15 and an end date of 12/31/15, I would like to have a column containing compiled temperatures for the 10th-31s. If the next record has a start date 12/3/15-12/17/15, I'd like the column next to it to show the compiled temperature for the 3rd-17th. I'll include the query I have so far, but it is not too helpful because I have not really gotten very far:
; with Temps as (
select MONTH, DAY, YEAR, Temp
from Temperatures
where MONTH = 12
and YEAR = 2016
)
Select UI, start_date, end_date, location, SUM(temp)
from Table1 t1
Inner join Temps
on temps.month = month(t1.start_date)
I appreciate any help you might be able to give. Let me know I need to elaborate on anything.
Table 1
UI Start_Date End_Date
2080 12/5/2015 12/31/2015
1266 12/1/2015 12/31/2015
1787 12/17/2015 12/28/2015
1621 12/3/2015 12/20/2015
1974 12/10/2015 12/12/2015
1731 12/25/2015 12/31/2015
Table 2
Month Day Year Temp
12 1 2016 34
12 2 2016 32
12 3 2016 35
12 4 2016 37
12 5 2016 32
12 6 2016 30
12 7 2016 31
12 8 2016 36
12 9 2016 48
12 10 2016 42
12 11 2016 33
12 12 2016 41
12 13 2016 31
12 14 2016 29
12 15 2016 46
12 16 2016 48
12 17 2016 38
12 18 2016 29
12 19 2016 45
12 20 2016 37
12 21 2016 48
12 22 2016 46
12 23 2016 44
12 24 2016 45
12 25 2016 35
12 26 2016 44
12 27 2016 29
12 28 2016 38
12 29 2016 29
12 30 2016 35
12 31 2016 40
Table 3 (Expected Result)
UI Start_Date End_Date Compiled Temp
2080 12/5/2015 12/31/2015 1101
1266 12/1/2015 12/31/2015 1167
1787 12/17/2015 12/28/2015 478
1621 12/3/2015 12/20/2015 668
1974 12/10/2015 12/12/2015 126
1731 12/25/2015 12/31/2015 250
You could do something like this:
; WITH temps AS (
SELECT CONVERT(DATE, CONVERT(CHAR(4), [YEAR]) + '-' + CONVERT(CHAR(2), [MONTH]) + '-' + CONVERT(VARCHAR(2), [DAY])) [TDate], [Temp]
FROM Temperatures
WHERE [MONTH] = 12
AND [YEAR] = 2015
)
SELECT [UI], [start_date], [end_date]
, (SELECT SUM([temp])
FROM temps
WHERE [TDate] BETWEEN T1.[start_date] AND T1.[end_date]) [Compiled Temp]
FROM Table1 T1
No need for a join.
You can do a simple join of the two tables as well. You don't need to use a CTE.
--TEST DATA
if object_id('Table1','U') is not null
drop table Table1
create table Table1 (UI int, Start_Date date, End_Date date)
insert Table1
values
(2080,'12/05/2015','12/31/2015'),
(1266,'12/01/2015','12/31/2015'),
(1787,'12/17/2015','12/28/2015'),
(1621,'12/03/2015','12/20/2015'),
(1974,'12/10/2015','12/12/2015'),
(1731,'12/25/2015','12/31/2015')
if object_id('Table2','U') is not null
drop table Table2
create table Table2 (Month int, Day int, Year int, Temp int)
insert Table2
values
(12,1, 2015,34),
(12,2, 2015,32),
(12,3, 2015,35),
(12,4, 2015,37),
(12,5, 2015,32),
(12,6, 2015,30),
(12,7, 2015,31),
(12,8, 2015,36),
(12,9, 2015,48),
(12,10,2015,42),
(12,11,2015,33),
(12,12,2015,41),
(12,13,2015,31),
(12,14,2015,29),
(12,15,2015,46),
(12,16,2015,48),
(12,17,2015,38),
(12,18,2015,29),
(12,19,2015,45),
(12,20,2015,37),
(12,21,2015,48),
(12,22,2015,46),
(12,23,2015,44),
(12,24,2015,45),
(12,25,2015,35),
(12,26,2015,44)
--AGGREGATE TEMPS
select t1.Start_Date, t1.End_Date, avg(t2.temp) AvgTemp, sum(t2.temp) CompiledTemps
from table1 t1
join table2 t2 ON t2.Year between datepart(year, t1.Start_Date) and datepart(year, t1.End_Date)
and t2.Month between datepart(month,t1.Start_Date) and datepart(month,t1.End_Date)
and t2.Day between datepart(day, t1.Start_Date) and datepart(day, t1.End_Date)
group by t1.Start_Date, t1.End_Date

Using Sum() with multiple where clauses

I'm pretty new to this, so forgive if this has been posted (I had no idea what to even search on).
I have 2 tables, Accounts and Usage
AccountID AccountStartDate AccountEndDate
-------------------------------------------
1 12/1/2012 12/1/2013
2 1/1/2013 1/1/2014
UsageId AccountID EstimatedUsage StartDate EndDate
------------------------------------------------------
1 1 10 1/1 1/31
2 1 11 2/1 2/29
3 1 23 3/1 3/31
4 1 23 4/1 4/30
5 1 15 5/1 5/31
6 1 20 6/1 6/30
7 1 15 7/1 7/31
8 1 12 8/1 8/31
9 1 14 9/1 9/30
10 1 21 10/1 10/31
11 1 27 11/1 11/30
12 1 34 12/1 12/31
13 2 13 1/1 1/31
14 2 13 2/1 2/29
15 2 28 3/1 3/31
16 2 29 4/1 4/30
17 2 31 5/1 5/31
18 2 26 6/1 6/30
19 2 43 7/1 7/31
20 2 32 8/1 8/31
21 2 18 9/1 9/30
22 2 20 10/1 10/31
23 2 47 11/1 11/30
24 2 33 12/1 12/31
I'd like to write one query that gives me estimated usage for each month (starting now until the last month that we serve an account) for all accounts being served during that month.
The results would be as follows:
Month-Year Total Est Usage
------------------------------
Oct-12 0 (none being served)
Nov-12 0 (none being served)
Dec-12 34 (only accountid 1 being served)
Jan-13 23 (accountid 1 & 2 being served)
Feb-13 24 (accountid 1 & 2 being served)
Mar-13 51 (accountid 1 & 2 being served)
...
Dec-13 33 (only accountid 2 being served)
Jan-14 0 (none being served)
Feb-14 0 (none being served)
I'm assuming I need to sum and then do a Group By...but not really sure logically how I'd lay this out.
Revised Answer:
I've created a Months table with columns MonthID, Month with values like (201212, 12), (201301, 1), ...
I've also reorganised the usage table to have a month column rather than the start date and end date, as it makes the idea clearer.
See http://sqlfiddle.com/#!3/f57d84/6 for details
The query is now:
Select
m.MonthID,
Sum(u.EstimatedUsage) TotalEstimatedUsage
From
Accounts a
Inner Join
Usage u
On a.AccountID = u.AccountID
Inner Join
Months m
On m.MonthID Between
Year(a.AccountStartDate) * 100 + Month(a.AccountStartDate) And
Year(a.AccountEndDate) * 100 + Month(a.AccountEndDate) And
m.Month = u.Month
Group By
m.MonthID
Order By
1
Previous answer, for reference which assumed usages ranges were full dates rather than just months.
Select
Year(u.StartDate),
Month(u.StartDate),
Sum(Case When a.AccountStartDate <= u.StartDate And a.AccountEndDate >= u.EndDate Then u.EstimatedUsage Else 0 End) TotalEstimatedUsage
From
Accounts a
Inner Join
Usage u
On a.AccountID = u.AccountID
Group By
Year(u.StartDate),
Month(u.StartDate)
Order By
1, 2