12 month moving average by person, date - sql

I have a table [production] that contains the following structure:
rep (char(10))
,cyc_date (datetime) ---- already standardized to mm/01/yyyy
,amt (decimal)
I have data for each rep from 1/1/2011 to 8/1/2013. What I want to be able to do is create a 12 month moving average beginning 1/1/2012 for each rep, as follows:
rep cyc_dt 12moAvg
-------------------------
A 1/1/2012 10000.01
A 2/1/2012 13510.05
. ........ ........
A 8/1/2013 22101.32
B 1/1/2012 98328.22
B ........ ........
where each row represents the 12 month moving average for said rep at stated time. I found some examples that were vaguely close and I tried them to no avail. It seems the addition of a group by rep component is the major departure from other examples.
This is about as far as I got:
SELECT
rep,
cyc_date,
(
SELECT Avg([amt])
FROM production Q
WHERE Q.[cyc_date] BETWEEN DateAdd("yyyy",-1,[cyc_date]+1) AND [cyc_date]
) AS 12moavg
FROM production
That query seems to pull an overall average or sum, since there is no grouping in the correlated subquery. When I try to group by, I get an error that it can only return at most one row.

I think it may work with 2 adjustments to the correlated subquery.
Subtract 11 months in the DateAdd() expression.
Include another WHERE condition to limit the average to the same rep as the current row of the parent (containing) query.
SELECT
p.rep,
p.cyc_date,
(
SELECT Avg(Q.amt)
FROM production AS Q
WHERE
Q.rep = p.rep
AND
Q.cyc_date BETWEEN DateAdd("m", -11, p.cyc_date)
AND p.cyc_date
) AS [12moavg]
FROM production AS p;
Correlated subqueries can be slow. Make sure to index rep and cyc_date to limit the pain with this one.

Related

SQL - How can I sum up a column after the results have been grouped and filtered in the having clause?

Here is my current query: The objective is to find accounts that have received at least $500 in deposits within 30 days of their first deposit. Some accounts have been closed and re-opened, hence the first line of the 'WHERE' clause.
select Deposits.accountNumber,
min(Deposits.transDate) as "first deposit",
Deposits.transDate,
CAST(DATEADD(d,30,min(Deposits.transDate)) as date) as "30 days",
sum(Deposits.amount) as "sum",
Deposits.amount,
Members.accountOpenDate
from Deposits
inner join Members on Deposits.accountNumber = members.accountNumber
where Deposits.transDate >= members.accountOpenDate
and Deposits.accountNumber = 123456
group by Deposits.accountNumber
having Deposits.transDate between min(Deposits.transDate) and DATEADD('d',30,min(Deposits.transDate))
and sum(Deposits.amount) >= 500
The problem I am running into, is that the last line of the HAVING statement:
and sum(Deposits.amount) >= 500
is including all of the transactions for the account, as if there was no 'HAVING' clause. It is factoring in transactions that are excluded from the first line of the 'HAVING':
having Deposits.transDate between min(Deposits.transDate) and DATEADD('d',30,min(Deposits.transDate))
Here is what my data looks like (without grouping by account number):
accountNumber amount sum
123456 $100 $6,500
123456 $50 $6,500
123456 $50 $6,500
And here is what I am trying to get to:
accountNumber amount sum
123456 $100 $200
123456 $50 $200
123456 $50 $200
Thanks in advance. My DBMS is Intersystems-Cache. A link to their reference can be found Here.
You can try something like that:
select filtered.accountNumber,
min(filtered.transDate) as "first deposit",
filtered.transDate,
CAST(DATEADD(d,30,min(filtered.transDate)) as date) as "30 days",
sum(filtered.amount) as "sum",
filtered.amount,
filtered.accountOpenDate
from
(
select * from Deposits
inner join Members on Deposits.accountNumber = members.accountNumber
where Deposits.transDate >= members.accountOpenDate
and Deposits.accountNumber = 123456
having Deposits.transDate between min(Deposits.transDate) and DATEADD('d',30,min(Deposits.transDate))
) as filtered
group by filtered.accountNumber
having sum(filtered.amount) >= 500
With a query like that one you are first filtering your data applying the transDate condition then you can operate the filter on the sum of the amount
We need clarification:
1. Are the 3 transactions you show all within the 30 day window? If yes, then the total is less than $500. So, this account should be skipped.
2. Since $6500 is the total of all trans greater than the open date, why even calculate it? You only care about the 30 day window.
Besides that, I think the disconnect is the date calculation in the HAVING clause. You use MIN in the SELECT, but use a totally different aggregate date calculation in the HAVING. I think you should take the calculation out of the HAVING and make it part of the WHERE.
Of course, once you do that, you'll have to take the MIN out of the SELECT.

How to count the number of active days in a dataset with SQL Server 2008

SQL Server 2008, rendered in html via aspx webpage.
What I want to achieve, is to get an average per day figure that makes allowance for missing days. To do this I need to count the number of active days in a table.
Example:
Date | Amount
---------------------
2014-08-16 | 234.56
2014-08-16 | 258.30
2014-08-18 | 25.84
2014-08-19 | 259.21
The sum of the lot (777.961) divided by the number of active days (3) would = 259.30
So it needs to go "count number of different dates in the returned range"
Is there a tidy way to do this?
If you just want that one row of output then this should work:
select sum(amount) / count(distinct date) as your_average
from your_table
Fiddle:
http://sqlfiddle.com/#!2/7ffd1/1/0
I don't know this will be help to you, how about using Group By, Avg, count function.
SELECT Date, AVG(Amount) AS 'AmountAverage', COUNT(*) AS 'NumberOfActiveDays'
FROM YourTable WITH(NOLOCK)
GROUP BY Date
About AVG function, see here: Link

oracle sql: efficient way to calculate business days in a month

I have a pretty huge table with columns dates, account, amount, etc. eg.
date account amount
4/1/2014 XXXXX1 80
4/1/2014 XXXXX1 20
4/2/2014 XXXXX1 840
4/3/2014 XXXXX1 120
4/1/2014 XXXXX2 130
4/3/2014 XXXXX2 300
...........
(I have 40 months' worth of daily data and multiple accounts.)
The final output I want is the average amount of each account each month. Since there may or may not be record for any account on a single day, and I have a seperate table of holidays from 2011~2014, I am summing up the amount of each account within a month and dividing it by the number of business days of that month. Notice that there is very likely to be record(s) on weekends/holidays, so I need to exclude them from calculation. Also, I want to have a record for each of the date available in the original table. eg.
date account amount
4/1/2014 XXXXX1 48 ((80+20+840+120)/22)
4/2/2014 XXXXX1 48
4/3/2014 XXXXX1 48
4/1/2014 XXXXX2 19 ((130+300)/22)
4/3/2014 XXXXX2 19
...........
(Suppose the above is the only data I have for Apr-2014.)
I am able to do this in a hacky and slow way, but as I need to join this process with other subqueries, I really need to optimize this query. My current code looks like:
<!-- language: lang-sql -->
select
date,
account,
sum(amount/days_mon) over (partition by last_day(date))
from(
select
date,
-- there are more calculation to get the account numbers,
-- so this subquery is necessary
account,
amount,
-- this is a list of month-end dates that the number of
-- business days in that month is 19. similar below.
case when last_day(date) in ('','',...,'') then 19
when last_day(date) in ('','',...,'') then 20
when last_day(date) in ('','',...,'') then 21
when last_day(date) in ('','',...,'') then 22
when last_day(date) in ('','',...,'') then 23
end as days_mon
from mytable tb
inner join lookup_businessday_list busi
on tb.date = busi.date)
So how can I perform the above purpose efficiently? Thank you!
This approach uses sub-query factoring - what other RDBMS flavours call common table expressions. The attraction here is that we can pass the output from one CTE as input to another. Find out more.
The first CTE generates a list of dates in a given month (you can extend this over any range you like).
The second CTE uses an anti-join on the first to filter out dates which are holidays and also dates which aren't weekdays. Note that Day Number varies depending according to the NLS_TERRITORY setting; in my realm the weekend is days 6 and 7 but SQL Fiddle is American so there it is 1 and 7.
with dates as ( select date '2014-04-01' + ( level - 1) as d
from dual
connect by level <= 30 )
, bdays as ( select d
, count(d) over () tot_d
from dates
left join holidays
on dates.d = holidays.hol_date
where holidays.hol_date is null
and to_number(to_char(dates.d, 'D')) between 2 and 6
)
select yt.account
, yt.txn_date
, sum(yt.amount) over (partition by yt.account, trunc(yt.txn_date,'MM'))
/tot_d as avg_amt
from your_table yt
join bdays
on bdays.d = yt.txn_date
order by yt.account
, yt.txn_date
/
I haven't rounded the average amount.
You have 40 month of data, this data should be very stable.
I will assume that you have a cold body (big and stable easily definable range of data) and hot tail (small and active part).
Next, I would like to define a minimal period. It is a data range that is a smallest interval interesting for Business.
It might be year, month, day, hour, etc. Do you expect to get questions like "what was averege for that account between 1900 and 12am yesterday?".
I will assume that the answer is DAY.
Then,
I will calculate sum(amount) and count() for every account for every DAY of cold body.
I will not create a dummy records, if particular account had no activity on some day.
and I will save day, account, total amount, count in a TABLE.
if there are modifications later to the cold body, you delete and reload affected day from that table.
For hot tail there might be multiple strategies:
Do the same as above (same process, clear to support)
always calculate on a fly
use materialized view as an averege between 1 and 2.
Cold body table totalc could also be implemented as materialized view, but if data never change - no need to rebuild it.
With this you go from (number of account) x (number of transactions per day) x (number of days) to (number of account)x(number of active days) number of records.
That should speed up all following calculations.

SQL working days query

I am trying to write a query to calculate the number of working days between 2 dates. I first tried this in VBA, which worked, but this is not very efficient.
I have 2 queries, the first works out the date difference between a valueDate and cutOffDate; the second counts the number of dates from a table of holidays/weekends that fall between the valueDate and cutOffDay.
The trouble I'm having is how to combine these 2 parts to give the item age (number of working days between the dates).
My query examples are:
SELECT allOS.SortCode, allOS.NPA, allOS.valueDate, allOS.cutOffDate,
DateDiff("d",[allOS.valueDate],[allOS.cutOffDate]) AS Age
FROM allOS;
and
SELECT Count(Holidays.Holiday) AS NonWorkingDays
FROM Holidays
HAVING (([Holiday]>[#01/01/2013#] And [Holiday]<[#11/06/2013#]));
I need to subtract the result of the second query from the Age of the first query.
Sample input and output data
allOS:
sortCode|npa|valueDate|cutOffDate
111111|99999999|01-11-2013|15-11-2013
222222|77777777|04-11-2013|15-11-2013
333333|88888888|05-11-2013|15-11-2013
444444|66666666|06-11-2013|15-11-2013
555555|44444444|07-11-2013|15-11-2013
666666|33333333|12-11-2013|15-11-2013
777777|55555555|13-11-2013|15-11-2013
888888|11111111|14-11-2013|15-11-2013
999999|22222222|15-11-2013|15-11-2013
Holidays:
holiday|reason
02-11-2013|Saturday
03-11-2013|Sunday
08-11-2013|Long Weekend
09-11-2013|Saturday
10-11-2013|Sunday
11-11-2013|Long Weekend
16-11-2013|Saturday
17-11-2013|Sunday`
Result:
sortCode|npa|valueDate|cutOffDate|Age
111111|99999999|01-11-2013|15-11-2013|8
222222|77777777|04-11-2013|15-11-2013|7
333333|88888888|05-11-2013|15-11-2013|6
444444|66666666|06-11-2013|15-11-2013|5
555555|44444444|07-11-2013|15-11-2013|4
666666|33333333|12-11-2013|15-11-2013|3
777777|55555555|13-11-2013|15-11-2013|2
888888|11111111|14-11-2013|15-11-2013|1
999999|22222222|15-11-2013|15-11-2013|0
The results for age is the difference between the valueDate and cutOffDate less any of the days from the holiday table.
You can use a correlated subquery to calculate the number of non-work days included in each valueDate and cutOffDate date range.
Here is a preliminary query I tested with your sample data, and I included the first and last rows output from that query.
SELECT
a.sortCode,
a.npa,
a.valueDate,
a.cutOffDate,
DateDiff('d', a.valueDate, a.cutOffDate) AS raw_days,
(
SELECT Count(*)
FROM Holidays
WHERE holiday BETWEEN a.valueDate AND a.cutOffDate
) AS NonWorkDays
FROM allOS AS a;
sortCode npa valueDate cutOffDate raw_days NonWorkDays
-------- -------- ---------- ---------- -------- -----------
111111 99999999 11/1/2013 11/15/2013 14 6
999999 22222222 11/15/2013 11/15/2013 0 0
Notice the last row. The raw_days value is zero because both valueDate and cutOffDate are the same. If you want that to be one day, add one to the value returned by the DateDiff expression.
After you adjust that preliminary query as needed, you can use it as the data source for another query where you can calculate Age as raw_days - NonWorkDays. But I'll leave that final piece for you in case I've botched the preliminary query.
If subqueries are unfamiliar to you, I recommend two of Allen Browne's pages for useful background information: Subquery basics and Surviving Subqueries.
Also note that correlated subqueries demand extra work from the db engine. That SELECT Count(*) subquery must be run separately for each row of the table. You should have Holidays.holiday indexed to ease the db engine's burden.
It is easy, just read about with clause.
With it, you can run the first query and the second query then take the result and process it in the third query inside with clause
http://www.oracle-base.com/articles/misc/with-clause.php

Oracle SQL Sub-Query how to use result from query A to restrict query B

Goal:
Combine two queries I currently run.
Have the WEEK from query 1 be a filtering criteria for query 2.
Query 1:
----------------------------------------------------
-- ************************************************
-- Accounts Recieveable (WEEKLY) snapshot
-- ************************************************
----------------------------------------------------
SELECT
TRUNC(TX.ORIG_POST_DATE,'WW') AS WEEK,
SUM(TX.AMOUNT) AS OUTSTANDING
FROM
TX
WHERE
--Transaction types
(TX.DETAIL_TYPE = "Charges" OR
TX.DETAIL_TYPE = "Payments" OR
TX.DETAIL_TYPE = "Adjustments")
GROUP BY
TRUNC(tx.ORIG_POST_DATE,'WW')
ORDER BY
TRUNC(tx.ORIG_POST_DATE,'WW')
Output Query 1:
WEEK OUTSTANDING
1/1/2012 18203.95
1/8/2012 17605
1/15/2012 19402.33
1/22/2012 18693.45
1/29/2012 19100
Query 2:
----------------------------------------------------
-- ************************************************
-- Weekly Charge AVG over previous 13 weeks based on WEEK above
-- ************************************************
----------------------------------------------------
SELECT
sum(tx.AMOUNT)/91
FROM
TX
WHERE
--Post date
TX.ORIG_POST_DATE <= WEEK AND
TX.ORIG_POST_DATE >= WEEK-91 AND
--Charges
(TX.DETAIL_TYPE = "Charge")
Output Query 2:
thirteen_Week_Avg
1890.15626
Desired Output
WEEK OUTSTANDING Thirteen_Week_Avg
1/1/2012 18203.95 1890.15626
1/8/2012 17605 1900.15626
1/15/2012 19402.33 1888.65132
1/22/2012 18693.45 1905.654
1/29/2012 19100 1900.564
Note the Thirteen_Week_Avg is 13 weeks prior to the "WEEK" Field. So it changes each week as the window of the average moves forward.
Also what tutorials do you guys know of that I could read to better understand the solution this type of question?
Try using an analytic function such as:
select WEEK, sum(OUTSTANDING) as OUTSTANDING, THIRTEEN_WEEK_AVG
from (select trunc(TX.ORIG_POST_DATE, 'WW') as WEEK
,AMOUNT as OUTSTANDING
,avg(
TX.AMOUNT)
over (order by trunc(TX.ORIG_POST_DATE, 'WW')
range numtodsinterval(7 * 13, 'day') preceding)
as THIRTEEN_WEEK_AVG
from TX
where (TX.DETAIL_TYPE = 'Charges'
or TX.DETAIL_TYPE = 'Payments'
or TX.DETAIL_TYPE = 'Adjustments'))
group by WEEK, THIRTEEN_WEEK_AVG
order by WEEK
An introduction to analytic functions can be found here. And how NUMTODSINTERVAL works is here.
My first thought is this is best handled by a stored procedure that sets two cursors, one for each of the queries and each cursor takes in a week parameter. You could call the first cursor that outputs the week and outstanding and have this loop through however many times and move back 1 week each time through. then pass that week to the thirteen week avg cursor and let it output the avg amount.
If you just want it on the screen you can use dbms_output.put_line. If you want to write it to a file such as a csv then you need to set a filehandler and all the associated plumbing to create/open/write/save a file.
O'reilly has a pretty good pl/sql book that explains procs and cursors well.