how to calculate a rolling average based on a column in spotfire - properties

I have a data set where you have a Document Property that Selects "items", each "item" has a particular "usage days". I want to calculate an output of "Moving Average" for 1 or more selected items. the data for the moving average lives under a column named "usage days".
How do I calculate this taking into account the "selected date of my choice" and the rolling average number of days of my choice.
Do you have particular ideas of how I can perform the calculation i.e. in a calculated column or a text field?
Car/ Trip / Start Date/ End Date / Days on trip
1 AB123 / 2 / 6/07/2013
1 AB234 / 29/07/2013 / 6/09/2013 / 42
1 AB345 /6/09/2013 /28/09/2013 /22
1 AB456 /29/09/2013 /21/10/2013 /23
2 AB567 / 26/10/2013 / 12/11/2013 / 22
2 AB678 /12/11/2013 /8/12/2013 /26
[The rows above have an example of the problem (sorry couldn't paste an image because im new), I want to calculate the %usage of the Car and or cars for a selected range of time e.g (Select date range JUlY to AUGUST then (#of days on trip for car 1and 2)/#on days in that period)/2*100]

As phiver said, it is still difficult to see what you expect as a result... but I think I have something that might work. First, I slightly altered the dataset you provided, like so:
car trip startDate endDate daysOnTrip
1 AB123 7/6/2013 7/29/2013 23
1 AB234 7/29/2013 9/6/2013 42
1 AB345 9/6/2013 9/28/2013 22
1 AB456 9/29/2013 10/21/2013 23
2 AB567 10/26/2013 11/12/2013 22
2 AB678 11/12/2013 12/8/2013 26
I then added 2 document properties, "DateRangeFirst" and "DateRangeLast", to allow the user to select beginning and ending dates. Next I made input box property controls for each of the aforementioned document properties in a text area so the user can alter the date range. I then added a datatable visualization with a "Limit data using expression:" of "[startDate] >= Date(${DateRangeFirst}) and [endDate]<= Date(${DateRangeLast})" so we could see the trips selected. Finally, to get the average you appear to be looking for, a barchart set to % of total (daysOnTrip) / car with the same data limiting expression as above. The below screenshot should have everything you need to reproduce my results. I hope this gives you what you need.
NOTE: With this method if you select a date in the middle of a trip, an entire row and all of the days on that trip will be ignored.

Related

updating the next several row values based on the value of a row in another column

I'm trying to figure out how to add the values of one column (the amount column) to the next few rows based on the condition of another column (the days column). If the condition of the days column is greater than 1, for each day greater than 1 I add the amount column to that many following rows. So if days is three, I add the amount to the next two rows (the first day is just the current row). I actually think this is easier if I make a copy of the amount column, so I made a copy called backlog.
So let's say I have an amount column that represents the amount of support tickets that need to be resolved each day. Each amount has a number of days it takes for the amount to be resolved. I need the total amount to be a sum of the value today and the sum of the outstanding tickets. So if I have an amount of 1 for 2 days, I have 1 ticket amount today and I add that same 1 tomorrow to the ticket amount of tomorrow. If this doesn't make sense, the below examples will. I have a solution as well, but my main issue is doing this efficiently.
Here is a sample dataframe to use:
amount = list(np.zeros(10)) + [random.randint(1,3) for val in range(15)]
random.shuffle(amount)
ex = pd.DataFrame({
'Amount': amount
})
ex.loc[ex['Amount']>0, 'Days'] = [random.randint(0,4) for val in range(15)]
ex.loc[ex['Amount']==0, 'Days'] = 0
ex['Days'] = ex['Days'].astype(int)
ex['Backlog'] = ex['Amount']
ex.head(10)
Input Dataframe:
Amount
Days
Backlog
2
0
2
1
3
1
2
2
2
3
0
3
Desired Output Dataframe:
Amount
Days
Backlog
2
0
2
1
3
1
2
2
3
3
0
6
In the last two values of the backlog column, I have a value of 3 (2 from the current day amount plus 1 from the prior day amount) and a value of 6 (3 for the current day + 2 from the previous day amount + 1 from two days ago).
I have made code for this below, which I think achieves the outcome:
for i in range(0, len(ex['Amount'])):
Days = ex['Days'].iloc[i]
if Days >= 2:
for j in range (1,Days):
if (i+j)>= len(ex['Amount']):
break
ex['Backlog'].iloc[i+j] += ex['Amount'].iloc[i]
The problem is that I'm already using two for loops to slice the data frame for two features first, so when this code is used as a function for a very large data frame it runs far too slowly, and my main goal has been to implement a faster way to do this. Is there a more efficient pandas method to achieve the same outcome? Possibly without having to use slow iteration or a nested for loop? I'm at a loss.

Calculating Weekly Returns from Daily Time Series of Prices

I want to calculate weekly returns of a mutual fund from a time series of daily prices. My data looks like this:
A B C D E
DATE WEEK W.DAY MF.PRICE WEEKLY RETURN
02/01/12 1 1 2,7587
03/01/12 1 2 2,7667
04/01/12 1 3 2,7892
05/01/12 1 4 2,7666
06/01/12 1 5 2,7391 -0,007
09/01/12 2 1 2,7288
10/01/12 2 2 2,6707
11/01/12 2 3 2,7044
12/01/12 2 4 2,7183
13/01/12 2 5 2,7619 0,012
16/01/12 3 1 2,7470
17/01/12 3 2 2,7878
18/01/12 3 3 2,8156
19/01/12 3 4 2,8310
20/01/12 3 5 2,8760 0,047
The date is (dd/mm/yy) format and "," is decimal separator. This would be done by using this formula: (Price for last weekday - Price for first weekday)/(Price for first weekday). For example the return for the first week is (2,7391 - 2,7587)/2,7587 = -0,007 and for the second is (2,7619 - 2,7288)/2,7288 = 0,012.
The problem is that the list goes on for a year, and some weeks have less than five working days due to holidays or other reasons. So I can't simply copy and paste the formula above. I added the extra two columns for week number and week day using WEEKNUM and WEEKDAY functions, thought it might help. I want to automate this with a formula or using VBA and hoping to get a table like this:
WEEK RETURN
1 -0,007
2 0,012
3 0,047
.
.
.
As I said some weeks have less than five weekdays, some start with weekday 2 or end with weekday 3 etc. due to holidays or other reasons. So I'm thinking of a way to tell excel to "find the prices that correspond to the max and min weekday of each week and apply the formula (Price for last weekday - Price for first weekday)/(Price for first weekday)".
Sorry for the long post, I tried to be be as clear as possible, I would appreciate any help! (I have 5 separate worksheets for consecutive years, each with daily prices of 20 mutual funds)
To do it in one formula:
=(INDEX(D:D,AGGREGATE(15,6,ROW($D$2:$D$16)/(($C$2:$C$16=AGGREGATE(14,6,$C$2:$C$16/($B$2:$B$16=G2),1))*($B$2:$B$16=G2)),1))-INDEX(D:D,MATCH(G2,B:B,0)))/INDEX(D:D,MATCH(G2,B:B,0))
You may need to change all the , to ; per your local settings.
I would solve it using some lookup formulas to get the values for each week and then do a simple calculation for each week.
Resulting table:
H I J K L M
first last first val last val return
1 02.01.2012 06.01.2012 2,7587 2,7391 -0,007
2 09.01.2012 13.01.2012 2,7288 2,7619 0,012
3 16.01.2012 20.01.2012 2,747 2,876 0,047
Formula in column I:
=MINIFS($A:$A;$B:$B;$H2)
Fomula in column J:
=MAXIFS($A:$A;$B:$B;$H2)
Formula in column K:
=VLOOKUP($I2;$A:$D;4;FALSE)
Formula in column L:
=VLOOKUP($J2;$A:$D;4;FALSE)
Formula in column M:
=(L2-K2)/K2

How to find multiple subsets of numbers that are approximately equal to a given value?

I am using VBA that gets data from an Excel 2013 spreadsheet. I have a couple years experience in computer science from a while back using VBA and java, but I'm by no means an expert.
I have a column of numbers ranging from 20 to 60 total. Each of those numbers represents 'minutes' and can range from 3 to 500 (normally 60 to 300). Each number has an assigner called a 'load number' (such as N03, N22 and etc.) and a date/time. All of these values are attributed to a 'load' that needs to be picked. 'Pickers' are the ones that have the loads or minutes assigned to them. They can only pick so many minutes per given day which ranges from 400-600 (8 hour shift = 400 minutes).
What I need to do is assign sets of loads that are equal to an approximate amount of total minutes (set number w/ threshold) to two groups of pickers (The groups are AM and PM, each have 3-5 pickers). Once one load is assigned to a picker, it can't be assigned to another UNLESS the loads for a given day have too many minutes and all the pickers can't be assigned an approximate amount of minutes.
Example: Out of 8 pickers, 6 can be assigned loads totaling between 380-420 minutes, but 2 can't be assigned between 380-420 because of the remaining loads.
In the case of the given example, for the remaining 2 pickers, a total of 760 - 840 minutes can be assigned to BOTH of them.
Loads also need to be assigned based on their date/time. If pickers are picking loads due on the same day, the earliest loads need to be assigned to the AM group of pickers and, accordingly, the latest to the PM group of pickers. If all loads to be assigned are for the next day, they can be assigned to anyone as long as the earliest loads are prioritized.
Example: AM shift starts at 5AM w/ 5 pickers. There is three loads that are 200 minutes (4 hours, actual) due at 9AM on the same day
The three loads should be assigned to three different pickers, so the loads can be done on time. They would be marked as the #1 load, so each picker knows to do it first
Example: Another load is due at 9AM on the same day. It is 400 minutes though.
2 pickers can be assigned to this load as their #1 pick and 200 minutes would be assigned to both of them.
Once the loads are assigned to the pickers, the results will be displayed in a separate spreadsheet with each row having: AM/PM, Picker's name, Load number #'s 1-10 w/ load number and minutes to pick and the total minutes.
Example: PICKER | AM | Toby | 029-N10 (268), 030-N05 (93), 030-N04 (111) | 472 TOTAL
Any help / pointers on this problem would be appreciated. I've looked at similar questions posted on here and abroad, but couldn't find any that would give me enough to go by to start working on a solution. It's not too bad assigning loads manually, but it gets complex one there's over 30 and 4,000 minutes total and especially when most of them are larger. It would just be much easier having a program assign everything and save 1-2 hours in the process everyday.
Edit:
The data, in Excel, is structured into 8 columns and up to 50 rows. Each row represents a 'load' and has only 3 useful cells. I got all the information into three arrays, which can be used to display the info for any load by using the same element (1-50) for each array.
Dim LoadNumbers(1 To 50) As String
Dim LoadTime(1 To 50) As Double
Dim LoadMinutes(1 To 50) As Double
Dim C As Integer
C = 1
Do While C < 50
LoadNumbers(C) = Cells(C, 2)
LoadTime(C) = Cells(C, 5) * 24
LoadMinutes(C) = Cells(C, 7)
C = C + 1
Loop
For example:
LoadNumbers(5) & " # " & LoadTimes(5) & " Hours PST # " & LoadMinutes(5) & " Minutes"
Will return:
039-N06  # 9.5 Hours PST # 67.4 Minutes (9.5 hours = 9:30AM)
The LoadTimes and LoadMinutes arrays are the ones I need to assign loads. I will have another two cells that users will input the desired minutes (M) to be assigned and the threshold (T). I then need to VBA script to assign (M-T to M+T) minutes to each picker.
Here's what the values in LoadMinutes look like:
141.8
96
73.7
32.2
67.4
106.1
21.3
14.2
141.6
49.5
68.6
200.6
72
174.9
223.1
161.8
76.6
235.5
76.2
134.9
236.7
166.3
170.7
134.6
63.9
352.9
136.2
146.3
243.2
There's 29 loads # 3,818 minutes total
Lets say the minutes need to be between 430 to 470. Out of those 29 loads, I need to assign sets of different numbers adding up to 430 to 470 based on their time. The times in LoadTimes ranges from 7 to 20 (7AM to 8PM).

Table Total Column based on cell values - SQL Report Builder 3.0

I have a table built off a dataset containing timesheet data with possible multiple entries per day (day_date) for a given person. The table is grouped on day_date. The field for hours is effort_hr (see dataset and report layout below).
The table generates a single row with one column for each day (as expected).
For each day I want only one value (total hours for person) so the expression is Sum(Fields!effort_hr.Value) This is properly adding up all the hours for each day.
Now I add a total column at the end of the row to see ALL the hours for the whole timesheet. The expression in the total column cell is Sum(Fields!effort_hr.Value) which is exactly the same as the daily ones. Again, this is adding up all hours for the timesheet.
So this is working great.
I now need a new row that only shows a max of 8 hours per day. So if the person works less, it shows less, but if the person works more, show a max of 8.
In this case, the daily column expression is:
IIF(Sum(Fields!effort_hr.Value)>8.0,8.0,Sum(Fields!effort_hr.Value))
And again, it displays perfectly for each day.
The total for this row is where I run into trouble. I have tried so many ways, but I cannot get the total for the columns in this row. The report keeps showing an #Error in the cell. The report saves fine and there is no error in the expr.
The problem seems to come from the fact that there are 2 values for a given day. So in other words, for 5 days, the person has 6 entries. When I try it for a person with only 5 entries, no problem.
I have tried:
Sum(IIF(Sum(Fields!effort_hr.Value)>8.0,8.0,Sum(Fields!effort_hr.Value)))
RunningValue(IIF(Sum(Fields!effort_hr.Value)>8.0,8.0,Sum(Fields!effort_hr.Value)),Sum,Nothing)
I either get an #Error, or I get the wrong total. Is there any way to just get a total for the cell values in the table? The daily numbers are correct, just give me the total at the end (like Excel).
I could do this in the SQL, but that would mess up other parts of this report.
DataSet:
res_name | day_date | effort_hr
J. Doe | Apr 6, 2015 | 2
J. Doe | Apr 6, 2015 | 9
J. Doe | Apr 7, 2015 | 8
J. Doe | Apr 8, 2015 | 7
J. Doe | Apr 9, 2015 | 10
J. Doe | Apr 10, 2015 | 9
REPORT TABLE Layout:
| Apr 6 | Apr 7 | Apr 8 | Apr 9 | Apr 10 | Totals
Total | 11 | 8 | 7 | 10 | 9 | 45
Reg | 8 | 8 | 7 | 8 | 8 | 39
OT | 3 | 0 | 0 | 2 | 1 | 6
Problem:
Row 1 Column Total works great and gives 45 hours ;
Row 2 Column Total either gives #Error, 41, or some other wrong number - just need it to total the actual values of each cell in the row ;
same problem for Row 3 total
Thanks in advance for your time!
Posting another answer as the previous one has become so long.
I referred to this MSDN link, and used the selected answer. Apparently we need to use custom code to achieve this (if you are not willing to change your dataset and have the calculated values in there).
Right click on report --> report properties --> Go to tab 'Code' --> Paste this
Dim public nettotal as Double
Public Function Getvalue (ByVal subtotal AS Double) AS Double
nettotal = nettotal+ subtotal
return subtotal
End Function
Public Function Totalvalue()
return nettotal
End Function
In the row group expression of second row put
= code.Getvalue(IIF(Sum(Fields!Efforts.Value)>8.0,8.0,Sum(Fields!Efforts.Value)))
In the Total cell expression (for second row) put
=code.Totalvalue()
Save and run, you should see following result.
I used your input data and tried to create the report in given format. I used following function for Row 2 Total
=Sum(IIF(Fields!Efforts.Value>8.0,8.0,Fields!Efforts.Value),"DataSet1",Recursive)
This shows sum as 39 for second row. You can try and let me know if it works for you. If it doesn't I will list the exact steps how I created Matrix and groups.
Note: Don't forget to put your dataset name in the second argument of function Sum. And Recursive, as clear by name, applies Sum recursively for the group.
Update: I followed following steps.
1. Add a Matrix on the report.
2. Under Column group section on Matrix, Select any column name from the dataset. (Otherwise it won't show any columns in the next step)
2. Right click Column --> Add Group --> (Under column group) Add Parent Group. Select Day as Group By --> OK. It will create a new row. Put expression Sum(Efforts) in first row. And your expression =IIF(Sum(Fields!Efforts.Value)>8.0,8.0,Sum(Fields!Efforts.Value)) in the second row.
Right click on the column group section in the group pane --> Select Add Total --> After. It will add new column at the end of Matrix. Put expression Sum(Efforts) in first row and expression =Sum(IIF(Fields!Efforts.Value>8.0,8.0,Fields!Efforts.Value),"DataSet1",Recursive) in the second row.
Save and run you should see following in the report.
Remember to change the names of columns and dataset as par your code.
This is an idea on how to do such grouping, obviously you'd need to do changes for the headers and the 3rd row etc.
HTH.

Ignore non-empty values when aggregating

I have a cube I've built with three separate measures: "TY Sales", "LY Sales", and "% Change", what I'm trying to do is have special behavior for the aggregate rows, basically not including any "LY Sales" values when summing the total if "TY Sales" is 0. So currently my cube works like below:
LYSales TYSales %Change
Year 1 450 300 -33%
Week 1 100 125 +25%
Week 2 150 175 +14%
Week 3 200 0 +0%
The aggregate column "Year 1" in this example, is summing all values for each sales measure. What I want it to do instead, is only include values in LYSales if TYSales also has a non-zero value. So my ideal state would be below:
LYSales TYSales %Change
Year 1 250 300 +20%
Week 1 100 125 +25%
Week 2 150 175 +14%
Week 3 200 0 +0%
I'm new to SSAS, so any guidance is appreciated. Thanks
An easy and reliable way to achieve that would be to change the source column of LYSales to be zero if TYSales is zero. This would be done in the fact table on which the measure is based. You could implement that
either in the ETL process, changing the LYSales column values to be zero when TYSales is zero,
or in a view based on the fact table that is then used in the Data Source View instead of the original table,
or as a Calculated Calculation of the fact table in the Data Source View.
In the latter two cases, the calculation formula would be SQL like this:
case when TYSales <> 0 then LYSales else 0 end
Then switch the measure definition to use that column.