Need to calculate the average of cost based on a per fiscal week basis - sql

I hope someone can help with this issue I have, which is I am trying to work out a weekly average from the following data example:
Practice ID Cost FiscalWeek
1 10.00 1
1 33.00 2
1 55.00 3
1 18.00 4
1 36.00 5
1 24.00 6
13 56.00 1
13 10.00 2
13 24.00 3
13 30.00 4
13 20.00 5
13 18.00 6
What I want is to group by the Practice ID but work out the average for each practice (there are over 500 of these not just those above) and work this out for each week so for example at Week 1 there will be no average, but Week 2 will be the average of Weeks 1 and 2, then Week 3 will be the average of Weeks 1, 2 and 3 and then so on. I need to then show this by Practice ID and for each Fiscal Week.
At the moment I have some code that is not pretty and there has to be an easier way, this code is:
I pass all the data into a table variable then using a CTE I then use case statements to set each individual week like:
CASE WHEN fiscalweek = 1 THEN cost ELSE 0 END AS [1],
CASE WHEN fiscalweek = 2 THEN cost ELSE 0 END AS [2],
CASE WHEN fiscalweek = 3 THEN cost ELSE 0 END AS [3]
This would then bring back the week 1 cost and so on into it's own column e.g. 1, 2, 3 etc. , then I've used a second CTE to sum the columns for each week so for example to work out week 6 I would use this code:
sum([1]) as 'Average Wk 1',
sum([1]+[2])/2 as 'Average Wk 2',
sum([1]+[2]+[3])/3 as 'Average Wk 3',
sum([1]+[2]+[3]+[4])/4 as 'Average Wk 4',
sum([1]+[2]+[3]+[4]+[5])/5 as 'Average Wk 5'
sum([1]+[2]+[3]+[4]+[5]+[6])/6 as 'Average Wk 6'
I've thought about various different ways of working out this average accurately in T-SQL so I can then drop this into SSRS eventually. I've thought about using a While..Loop, Cursor but failing to see an easy way of doing this.

You are looking for the cumulative average of the averages. In databases that support window/analytic functions, you can do:
select fiscalweek, avg(cost) as avgcost,
avg(avg(cost)) over (order by fiscalweek) as cumavg
from practices p
group by fiscalweek
order by 1;
If you don't have window functions, then you need to use some form of correlated subquery or join:
select p1.fiscalweek, avg(p1.avgcost)
from (select fiscalweek avg(cost) as avgcost
from practices p
group by fiscalweek
) p1 join
(select fiscalweek avg(cost) as avgcost
from practices p
group by fiscalweek
) p2
on p12 <= p1
group by p1.fiscalweek
order by 1;
I do want to caution you that you are calculating the "average of averages". This is different from the cumulative average, which could be calculated as:
select fiscalweek,
(sum(sum(cost)) over (order by fiscalweek) /
sum(count(*)) over (order by fiscalweek)
) avgcost
from practices p
group by fiscalweek
order by 1;
One treats every week as one data point in the final average (what you seem to want). The other weights each week by the number of points during the week (the latter solution). These can produce very different results when weeks have different numbers of points.

I dont know If I fully understand the question:But Try Executing this: should help you:
create table #practice(PID int,cost decimal,Fweek int)
insert into #practice values (1,10,1)
insert into #practice values (1,33,2)
insert into #practice values (1,55,3)
insert into #practice values (1,18,4)
insert into #practice values (1,36,5)
insert into #practice values (1,24,6)
insert into #practice values (13,56,1)
insert into #practice values (13,10,2)
insert into #practice values (13,24,3)
insert into #practice values (13,30,4)
insert into #practice values (13,20,5)
insert into #practice values (13,18,6)
select * from #practice
select pid,Cost,
(select AVG(cost) from #practice p2 where p2.Fweek <= p1.Fweek and p1.pid = p2.pid) WeeklyAVG,
Fweek,AVG(COST) over (Partition by PID) as PIDAVG
from #practice p1;

I think this would work:
SELECT t1.pid,
t1.fiscalweek,
(
SELECT SUM(t.cost)/COUNT(t.cost)
FROM tablename AS t
WHERE t.pid = t1.pid
AND t.fiscalweek <= t1.fiscalweek
) AS average
FROM tablename AS t1
GROUP BY t1.pid, t1.fiscalweek
EDIT
To take into account for fiscal weeks without an entry you can simply exchange
SELECT SUM(t.cost)/COUNT(t.cost)
for
SELECT SUM(t.cost)/t1.fiscalweek
to calculate from week 1 or
SELECT SUM(t.cost)/(t1.fiscalweek - MIN(t.fiscalweek) + 1)
to calculate from the first week of this practice.
If all practice averages should start the same week (and not necessarily week no 1) then you'd have to find the minimum of all week numbers.
Also, this won't work if you're calculating across multiple years, but I assume that is not he case.

Related

SQL Divide previous row balance by current row balance and insert that value into current rows column "Growth"

I have a table where like this.
Year
ProcessDate
Month
Balance
RowNum
Calculation
2022
20220430
4
22855547
1
2022
20220330
3
22644455
2
2022
20220230
2
22588666
3
2022
20220130
1
33545444
4
2022
20221230
12
22466666
5
I need to take the previous row of each column and divide that amount by the current row.
Ex: Row 1 calculation should = Row 2 Balance / Row 1 Balance (22644455/22855547 = .99% )
Row 2 calculation should = Row 3 Balance / Row 2 Balance etc....
Table is just a Temporary table I created titled #MonthlyLoanBalance2.
Now I just need to take it a step further.
Let me know what and how you would go about doing this.
Thank you in advance!
Insert into #MonthlytLoanBalance2 (
Year
,ProcessDate
,Month
,Balance
,RowNum
)
select
--CloseYearMonth,
left(ProcessDate,4) as 'Year',
ProcessDate,
--x.LOANTypeKey,
SUBSTRING(CAST(x.ProcessDate as varchar(38)),5,2) as 'Month',
sum(x.currentBalance) as Balance
,ROW_NUMBER()over (order by ProcessDate desc) as RowNum
from
(
select
distinct LoanServiceKey,
LoanTypeKey,
AccountNumber,
CurrentBalance,
OpenDateKey,
CloseDateKey,
ProcessDate
from
cu.LAFactLoanSnapShot
where LoanStatus = 'Open'
and LoanTypeKey = 0
and ProcessDate in (select DateKey from dimDate
where IsLastDayOfMonth = 'Y'
and DateKey > convert(varchar, getdate()-4000, 112)
)
) x
group by ProcessDate
order by ProcessDate desc;``
I am assuming your data is already prepared as shown in the table. Now you can try Lead() function to resolve your issue. Remember format() function is used for taking only two precision.
SELECT *,
FORMAT((ISNULL(LEAD(Balance,1) OVER (ORDER BY RowNum), 1)/Balance),'N2') Calculation
FROM #MonthlytLoanBalance2

Splitting up group by with relevant aggregates beyond the basic ones?

I'm not sure if this has been asked before because I'm having trouble even asking it myself. I think the best way to explain my dilemma is to use an example.
Say I've rated my happiness on a scale of 1-10 every day for 10 years and I have the results in a big table where I have a single date correspond to a single integer value of my happiness rating. I say, though, that I only care about my happiness over 60 day periods on average (this may seem weird but this is a simplified example). So I wrap up this information to a table where I now have a start date field, an end date field, and an average rating field where the start days are every day from the first day to the last over all 10 years, but the end dates are exactly 60 days later. To be clear, these 60 day periods are overlapping (one would share 59 days with the next one, 58 with the next, and so on).
Next I pick a threshold rating, say 5, where I want to categorize everything below it into a "bad" category and everything above into a "good" category. I could easily add another field and use a case structure to give every 60-day range a "good" or "bad" flag.
Then to sum it up, I want to display the total periods of "good" and "bad" from maximum beginning to maximum end date. This is where I'm stuck. I could group by the good/bad category and then just take min(start date) and max(end date), but then if, say, the ranges go from good to bad to good then to bad again, output would show overlapping ranges of good and bad. In the aforementioned situation, I would want to show four different ranges.
I realize this may seem clearer to me that it would to someone else so if you need clarification just ask.
Thank you
---EDIT---
Here's an example of what the before would look like:
StartDate| EndDate| MoodRating
------------+------------+------------
1/1/1991 |3/1/1991 | 7
1/2/1991 |3/2/1991 | 7
1/3/1991 |3/3/1991 | 4
1/4/1991 |3/4/1991 | 4
1/5/1991 |3/5/1991 | 7
1/6/1991 |3/6/1991 | 7
1/7/1991 |3/7/1991 | 4
1/8/1991 |3/8/1991 | 4
1/9/1991 |3/9/1991 | 4
And the after:
MinStart| MaxEnd | Good/Bad
-----------+------------+----------
1/1/1991|3/2/1991 |good
1/3/1991|3/4/1991 |bad
1/5/1991|3/6/1991 |good
1/7/1991|3/9/1991 |bad
Currently my query with the group by rating would show:
MinStart| MaxEnd | Good/Bad
-----------+------------+----------
1/1/1991|3/6/1991 |good
1/3/1991|3/9/1991 |bad
This is something along the lines of
select min(StartDate), max(EndDate), Good_Bad
from sourcetable
group by Good_Bad
While Jason A Long's answer may be correct - I can't read it or figure it out, so I figured I would post my own answer. Assuming that this isn't a process that you're going to be constantly running, the CURSOR's performance hit shouldn't matter. But (at least to me) this solution is very readable and can be easily modified.
In a nutshell - we insert the first record from your source table into our results table. Next, we grab the next record and see if the mood score is the same as the previous record. If it is, we simply update the previous record's end date with the current record's end date (extending the range). If not, we insert a new record. Rinse, repeat. Simple.
Here is your setup and some sample data:
DECLARE #MoodRanges TABLE (StartDate DATE, EndDate DATE, MoodRating int)
INSERT INTO #MoodRanges
VALUES
('1/1/1991','3/1/1991', 7),
('1/2/1991','3/2/1991', 7),
('1/3/1991','3/3/1991', 4),
('1/4/1991','3/4/1991', 4),
('1/5/1991','3/5/1991', 7),
('1/6/1991','3/6/1991', 7),
('1/7/1991','3/7/1991', 4),
('1/8/1991','3/8/1991', 4),
('1/9/1991','3/9/1991', 4)
Next, we can create a table to store our results, as well as some variable placeholders for our cursor:
DECLARE #MoodResults TABLE(ID INT IDENTITY(1, 1), StartDate DATE, EndDate DATE, MoodScore varchar(50))
DECLARE #CurrentStartDate DATE, #CurrentEndDate DATE, #CurrentMoodScore INT,
#PreviousStartDate DATE, #PreviousEndDate DATE, #PreviousMoodScore INT
Now we put all of the sample data into our CURSOR:
DECLARE MoodCursor CURSOR FOR
SELECT StartDate, EndDate, MoodRating
FROM #MoodRanges
OPEN MoodCursor
FETCH NEXT FROM MoodCursor INTO #CurrentStartDate, #CurrentEndDate, #CurrentMoodScore
WHILE ##FETCH_STATUS = 0
BEGIN
IF #PreviousStartDate IS NOT NULL
BEGIN
IF (#PreviousMoodScore >= 5 AND #CurrentMoodScore >= 5)
OR (#PreviousMoodScore < 5 AND #CurrentMoodScore < 5)
BEGIN
UPDATE #MoodResults
SET EndDate = #CurrentEndDate
WHERE ID = (SELECT MAX(ID) FROM #MoodResults)
END
ELSE
BEGIN
INSERT INTO
#MoodResults
VALUES
(#CurrentStartDate, #CurrentEndDate, CASE WHEN #CurrentMoodScore >= 5 THEN 'GOOD' ELSE 'BAD' END)
END
END
ELSE
BEGIN
INSERT INTO
#MoodResults
VALUES
(#CurrentStartDate, #CurrentEndDate, CASE WHEN #CurrentMoodScore >= 5 THEN 'GOOD' ELSE 'BAD' END)
END
SET #PreviousStartDate = #CurrentStartDate
SET #PreviousEndDate = #CurrentEndDate
SET #PreviousMoodScore = #CurrentMoodScore
FETCH NEXT FROM MoodCursor INTO #CurrentStartDate, #CurrentEndDate, #CurrentMoodScore
END
CLOSE MoodCursor
DEALLOCATE MoodCursor
And here are the results:
SELECT * FROM #MoodResults
ID StartDate EndDate MoodScore
----------- ---------- ---------- --------------------------------------------------
1 1991-01-01 1991-03-02 GOOD
2 1991-01-03 1991-03-04 BAD
3 1991-01-05 1991-03-06 GOOD
4 1991-01-07 1991-03-09 BAD
Is this what you're looking for?
IF OBJECT_ID('tempdb..#MyDailyMood', 'U') IS NOT NULL
DROP TABLE #MyDailyMood;
CREATE TABLE #MyDailyMood (
TheDate DATE NOT NULL,
MoodLevel INT NOT NULL
);
WITH
cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b),
cte_Calendar (dt) AS (
SELECT TOP (DATEDIFF(dd, '2007-01-01', '2017-01-01'))
DATEADD(dd, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1, '2007-01-01')
FROM
cte_n3 a CROSS JOIN cte_n3 b
)
INSERT #MyDailyMood (TheDate, MoodLevel)
SELECT
c.dt,
ABS(CHECKSUM(NEWID()) % 10) + 1
FROM
cte_Calendar c;
--==========================================================
WITH
cte_AddRN AS (
SELECT
*,
RN = ISNULL(NULLIF(ROW_NUMBER() OVER (ORDER BY mdm.TheDate) % 60, 0), 60)
FROM
#MyDailyMood mdm
),
cte_AssignGroups AS (
SELECT
*,
DateGroup = DENSE_RANK() OVER (PARTITION BY arn.RN ORDER BY arn.TheDate)
FROM
cte_AddRN arn
)
SELECT
BegOfRange = MIN(ag.TheDate),
EndOfRange = MAX(ag.TheDate),
AverageMoodLevel = AVG(ag.MoodLevel),
CASE WHEN AVG(ag.MoodLevel) >= 5 THEN 'Good' ELSE 'Bad' END
FROM
cte_AssignGroups ag
GROUP BY
ag.DateGroup;
Post OP update solution...
WITH
cte_AddRN AS ( -- Add a row number to each row that resets to 1 ever 60 rows.
SELECT
*,
RN = ISNULL(NULLIF(ROW_NUMBER() OVER (ORDER BY mdm.TheDate) % 60, 0), 60)
FROM
#MyDailyMood mdm
),
cte_AssignGroups AS ( -- Use DENSE_RANK to create groups based on the RN added above.
-- How it works: RN set the row number 1 - 60 then repeats itself
-- but we dont want ever 60th row grouped together. We want blocks of 60 consecutive rows grouped together
-- DENSE_RANK accompolishes this by ranking within all the "1's", "2's"... and so on.
-- verify with the following query... SELECT * FROM cte_AssignGroups ag ORDER BY ag.TheDate
SELECT
*,
DateGroup = DENSE_RANK() OVER (PARTITION BY arn.RN ORDER BY arn.TheDate)
FROM
cte_AddRN arn
),
cte_AggRange AS ( -- This is just a straight forward aggregation/rollup. It produces the results similar to the sample data you posed in your edit.
SELECT
BegOfRange = MIN(ag.TheDate),
EndOfRange = MAX(ag.TheDate),
AverageMoodLevel = AVG(ag.MoodLevel),
GorB = CASE WHEN AVG(ag.MoodLevel) >= 5 THEN 'Good' ELSE 'Bad' END,
ag.DateGroup
FROM
cte_AssignGroups ag
GROUP BY
ag.DateGroup
),
cte_CompactGroup AS ( -- This time we're using dense rank to group all of the consecutive "Good" and "Bad" values so that they can be further aggregated below.
SELECT
ar.BegOfRange, ar.EndOfRange, ar.AverageMoodLevel, ar.GorB, ar.DateGroup,
DenseGroup = ar.DateGroup - DENSE_RANK() OVER (PARTITION BY ar.GorB ORDER BY ar.BegOfRange)
FROM
cte_AggRange ar
)
-- The final aggregation step...
SELECT
BegOfRange = MIN(cg.BegOfRange),
EndOfRange = MAX(cg.EndOfRange),
cg.GorB
FROM
cte_CompactGroup cg
GROUP BY
cg.DenseGroup,
cg.GorB
ORDER BY
BegOfRange;

SQL First In First Out Loyalty Point

fellow developers and analysts. I have some experience in SQL and have resorted to similar posts. However, this is slightly more niche. Thank you in advance for helping.
I have the below dataset (edited. Apology)
Setup
CREATE TABLE CustomerPoints
(
CustomerID INT,
[Date] Date,
Points INT
)
INSERT INTO CustomerPoints
VALUES
(1, '20150101', 500),
(1, '20150201', -400),
(1, '20151101', 300),
(1, '20151201', -400)
and need to turn it into (edited. The figures in previous table were incorrect)
Any positive amount of points are points earned whereas negative are redeemed. Because of the FIFO (1st in 1st out concept), of the second batch of points spent (-400), 100 of those were taken from points earned on 20150101 (UK format) and 300 from 20151101.
The goal is to calculate, for each customer, the number of points spent within x and y months of earning. Again, thank you for your help.
I have already answered a similar question here and here
You need to explode points earned and redeemed by single units and then couple them, so each point earned will be matched by a redeemed point.
For each of these matching rows calculate the months elapsed from the earning to the redeeming and then aggregate it all.
For FN_NUMBERS(n) it is a tally table, look at other answers I have linked above.
;with
p as (select * from CustomerPoints),
e as (select * from p where points>0),
r as (select * from p where points<0),
ex as (
select *, ROW_NUMBER() over (partition by CustomerID order by [date] ) rn
from e
join FN_NUMBERS(1000) on N<= e.points
),
rx as (
select *, ROW_NUMBER() over (partition by CustomerID order by [date] ) rn
from r
join FN_NUMBERS(1000) on N<= -r.points
),
j as (
select ex.CustomerID, DATEDIFF(month,ex.date, rx.date) mm
from ex
join rx on ex.CustomerID = rx.CustomerID and ex.rn = rx.rn and rx.date>ex.date
)
-- use this select to see points redeemed in current and past semester
select * from j join (select 0 s union all select 1 s ) p on j.mm >= (p.s*6)+(p.s) and j.mm < p.s*6+6 pivot (count(mm) for s in ([0],[2])) p order by 1, 2
-- use this select to see points redeemed with months detail
--select * from j pivot (count(mm) for mm in ([0],[1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) p order by 1
-- use this select to see points redeemed in rows per month
--select CustomerID, mm, COUNT(mm) PointsRedeemed from j group by CustomerID, mm order by 1
output of default query, 0 is 0-6 months, 1 is 7-12 (age of redemption in months)
CustomerID 0 1
1 700 100
output of 2nd query, 0..12 is the age of redemption in months
CustomerID 0 1 2 3 4 5 6 7 8 9 10 11 12
1 0 700 0 0 0 0 0 0 0 0 0 100 0
output from 3rd query, is the age of redemption in months
CustomerID mm PointsRedeemed
1 1 700
1 11 100
bye

How to subtract totals of 2 different groupings of the same column per month

I am writing a query in which I am supposed to subtract total revenue by discounts per month. Problem is that there are multiple codes which represent either revenue or discounts.
To illustrate exactly what I mean, please see the following example.
Month Code Amount
May 4001 $50.05
May 4002 $49.95
May 6005 $15.00
May 6006 $5.00
March 4003 $65.00
Codes for revenue are 4001, 4002 and 4003. Discounts are 6005 and 6006.
In the end I should be seeing:
Month TotalRevenue TotalDiscount Total
May $100.00 $20.00 $80.00
March $65.00 $0.00 $65.00
I have tried CASE but it tells me I can only use 1 argument. I have also tried creating a sub query in the select statement, but I can't seem to be able to use 2 SUM statements (1 in main and 1 in sub). I guess it would work if I had a possibility to use 'join' but there is nothing to join it with.
Well, the first issue is to construct your query so that the Discount Codes are distinguished from the Revenue Codes. If you have table(s) containing the Codes (either one combined Codes table with an indicator to distinguish the two types, or separate tables), this table (or tables) should be used.
Since it's hard to tell the full set of Codes from your question, let's just pretend that Codes 6000-6999 are Discounts and all others are Revenue. Then a query that produces your desired results could look like this:
select Month,
Revenue = sum(
case when Code between 6000 and 6999 then
0
else
Amount
end
),
Discounts = sum(
case when Code between 6000 and 6999 then
Amount
else
0
end
),
Total = sum(
case when Code between 6000 and 6999 then
-1 * Amount
else
Amount
end
)
from MyTable
group by Month
Depending on whatever your actual criteria is for distinguishing the two types of codes, you just need to change the case statements to match and it should work.
This should get you pretty close to the desired result:
Setup (it would be really appreciated if provided in the question :) )
-- drop table Code
create table Code
(
Code INT,
IsRevenue BIT
)
insert into Code VALUES (4001, 1), (4002, 1), (4003, 1), (6005, 0), (6006, 0)
GO
create table MonthData
(
TheMonth VARCHAR(16),
Code INT,
Amount NUMERIC(18, 2)
)
GO
insert into MonthData values ('May', 4001, 50.05), ('May', 4002, 49.95),
('May', 6005, 15.00), ('May', 6006, 5.00), ('March', '4003', 65.00)
GO
select * from MonthData
GO
The query:
SELECT TheMonth,
SUM((CASE WHEN C.IsRevenue = 1 THEN MD.Amount ELSE -MD.Amount END)) AS TotalRevenue
FROM MonthData MD
JOIN Code C ON C.Code = MD.Code
GROUP BY TheMonth

SQL query to identify seasonal sales items

I need a SQL query that will identify seasonal sales items.
My table has the following structure -
ProdId WeekEnd Sales
234 23/04/09 543.23
234 30/04/09 12.43
432 23/04/09 0.00
etc
I need a SQL query that will return all ProdId's that have 26 weeks consecutive 0 sales. I am running SQL server 2005. Many thanks!
Update: A colleague has suggested a solution using rank() - I'm looking at it now...
Here's my version:
DECLARE #NumWeeks int
SET #NumWeeks = 26
SELECT s1.ProdID, s1.WeekEnd, COUNT(*) AS ZeroCount
FROM Sales s1
INNER JOIN Sales s2
ON s2.ProdID = s1.ProdID
AND s2.WeekEnd >= s1.WeekEnd
AND s2.WeekEnd <= DATEADD(WEEK, #NumWeeks + 1, s1.WeekEnd)
WHERE s1.Sales > 0
GROUP BY s1.ProdID, s1.WeekEnd
HAVING COUNT(*) >= #NumWeeks
Now, this is making a critical assumption, namely that there are no duplicate entries (only 1 per product per week) and that new data is actually entered every week. With these assumptions taken into account, if we look at the 27 weeks after a non-zero sales week and find that there were 26 total weeks with zero sales, then we can deduce logically that they had to be 26 consecutive weeks.
Note that this will ignore products that had zero sales from the start; there has to be a non-zero week to anchor it. If you want to include products that had no sales since the beginning, then add the following line after `WHERE s1.Sales > 0':
OR s1.WeekEnd = (SELECT MIN(WeekEnd) FROM Sales WHERE ProdID = s1.ProdID)
This will slow the query down a lot but guarantees that the first week of "recorded" sales will always be taken into account.
SELECT DISTINCT
s1.ProdId
FROM (
SELECT
ProdId,
ROW_NUMBER() OVER (PARTITION BY ProdId ORDER BY WeekEnd) AS rownum,
WeekEnd
FROM Sales
WHERE Sales <> 0
) s1
INNER JOIN (
SELECT
ProdId,
ROW_NUMBER() OVER (PARTITION BY ProdId ORDER BY WeekEnd) AS rownum,
WeekEnd
FROM Sales
WHERE Sales <> 0
) s2
ON s1.ProdId = s2.ProdId
AND s1.rownum + 1 = s2.rownum
AND DateAdd(WEEK, 26, s1.WeekEnd) = s2.WeekEnd;