In an MS Access 2007 app, which manages contracts and changes for large construction projects, I need to create a Bell Curve representing a Contract Value, over a time period.
For example, a $500m contract runs for, say, 40 months, and I need a Bell Curve that distributes the Contract Value over these 40 months. The idea is to present a starting point for cashflow projections during the life of the contract.
Using VBA, I had thought to create the 'monthly' values and store them in a temp table, for later use in a report chart. However, I'm stuck trying to work out an algorithm.
Any suggestions on how I might tackle this would be most appreciated.
You will need the =NORMSDIST() function borrowed from Excel as follows:
Public Function Normsdist(X As Double) As Double
Normsdist = Excel.WorksheetFunction.Normsdist(X)
End Function
Requires some knowledge of statistics to use this function to distribute a cash flow over x periods, assuming a standard-normal. I created an Excel sheet to demonstrate how this function is used and posted it here:
Normal Distribution of a cash flow sample .XLSX
If for some reason you hated the idea of using an Excel function, you can pull any statistics text or search for the formula that generates a series of normal values. In your case you want to distribute the cash flow over three standard deviations in each tail. So that's a total of six (6) standard deviations. To divide 40 months over 6 standard deviations it's 6/40 = 0.15 standard deviations each data point (month). Use a for/next/step or similar loop to generate that to a temporary table, as you suggested, and graph it with a column chart (as seen in the above Excel example). Will take just a little VBA coding to make this variable as to user-supplied number of months and total contract.
A standard-normal distribution has a mean of 0 and standard deviation of 1. If you want a flatter bell curve, you can use the NormDist function instead where you can specify the mean and st. dev.
Related
Helly everyone,
I am a new user to pyomo and have some problem at the moment.
I am trying to develop a multi-time period mixed-integer optimization problem in python with pyomo. I have 4 technologies for which I want to optimize the capacity over 12 periods (1-12). If technology 1 gets chosen in a period technology 2 is not chosen in that period. The same goes for technologies 3 and 4. Each of these technologies has its own price per period. I set up a list for all the variables for each technologies in each period(x11-x124), for the binary variable of each technology in each period and for the price of each technology in each period. However, I am unable to write a working objective function for all these variables.
I would appreciate any help!
Below is the image of the code I have tried. I have also tried. I however get the error: list indices must be integers or slices, not str.
I have also tried first transforming the lists into numpy.arrays. I however then get an error because I cannot use numpy in a pyomo optimization
enter image description here
I am trying to figure out how percentiles are calculated by SQL's percentile_cont function and by SPSS in FREQUENCIES. I want to compare them and understand why they get different results.
I have tried looking this up myself, but finding a source for this information is difficult. If you have an explanation for why they differ, can you please share where I can read about that myself?
The percentile formula used in FREQUENCIES in SPSS Statistics is a weighted average method aimed at p(N+1), where p is the percentile expressed as a proportion (0-1 range) and N is the number of cases or records. Ignoring complications associated with weighted data, particularly non-integer weights, you order the data values in ascending order and if p(N+1) is an integer, you take the value of the p(N+1)th ordered case. If p(N+1) is between the integers associated with the ordinal positions of two numbers, you linearly interpolate between the values according to the fractional value of p(N+1).
This general formula is a commonly-used one, denoted method 4 in SAS and method 6 in a well-known November 1996 article in The American Statistician by Hyndman & Fan (Vol. 50, No. 4, pp. 361-365) that is the basis for the nine definitions used in the quantile function in R. There is one special point about the method in FREQUENCIES, which is that while other implementations of this method will set any percentile where p(N+1)>N to the value of the Nth case, in SPSS Statistics the value is given as missing instead.
The method used in SQL's percentile_cont appears to be method 7 in the list of nine from Hyndman & Fan, which aims at 1+(N-1)p. The EXAMINE procedure in SPSS Statistics offers the method used in FREQUENCIES (as the HAVERAGE method) and four additional methods. None of these match the method in SQL's percentile_cont.
Formulas for the statistics in FREQUENCIES, EXAMINE, and other procedures in SPSS Statistics are available in the IBM SPSS Statistics Algorithms manual, a pdf of which is freely downloadable.
I have a "Appeared date" column A and next to it i have a ">180" date column B. There is also "CONCAT" column C and a "ATTR" column D.
What i want to do is find out the latest date 180 or more from past, and write it in ">180" column, for each date in "Appeared Date" column, where the Concat column values are same.
The Date in >180 column should be more than 180 days from "Appeared date" column in the past, but should also be an earliest date found only from the "Appeared date" column.
Based on this i would like to check if a particular product had "ATTR" = 'NEW' >180 earlier also i.e. was it launched 180 days or more ago and appearing again recently?
Is there an excel formula which can get the nearest dates (>180) picked from the Appeared date and show it in the ">180" column?
Will it involve a mix of SMALL(), FREQUENCY(), MATCH(), INDEX() etc?
Or a VBA procedure is required?
To do this efficiently with formulas, you can use something called Range Slicing to reduce the size of the arrays to be processed, by efficiently truncating them so that they contain just the subset of those 3,000 to 50,000 rows that could possibly hold the correct answer, and THEN doing the actual equality check. (As apposed to your MAX/Array approach, which does computationally expensive array operations on all the rows, even though most of the rows have no relationship with the current row that you seek an answer for).
Here's my approach. First, here's my table layout:
...and here's my formulas:
180: =[#Appeared]-180
Start: =MATCH([#CONCAT],[CONCAT],0)
End: =MATCH([#CONCAT],[CONCAT],1)
LastRow: =MATCH(1,--(OFFSET([Appeared],[#Start],,[#End]-[#Start])>[#180]),0)+[#Start]-1
LastItem: =INDEX([Appeared],[#LastRow])
LastDate > 180: =IF([#Appeared]-[#LastItem]>180,[#LastItem],"")
Days: =IFERROR([#Appeared]-[#[LastDate > 180]],"")
Even with this small data set, my approach is around twice as fast as your MAX approach. And as the size of the data grows, your approach is going to get exponentially slower, as more and more processing power is wasted on crunching rows that can't possibly contain the answer. Whereas mine will get slower in a linear fashion. We're probably talking a difference of minutes, or perhaps even an hour or so at the extremes.
Note that while you could do my approach with a single mega-formula, you would be wise not to: it won't be anywhere near as efficient. splitting your mega-formulas into separate cells is a good idea in any case because it may help speed up calculation due to something called multithreading. Here’s what Diego Oppenheimer, a former program manager for Microsoft Excel, had to say on the subject back in 2005 :
Multithreading enables Excel to spot formulas that can be calculated concurrently, and then run those formulas on multiple processors simultaneously. The net effect is that a given spreadsheet finishes calculating in less time, improving Excel’s overall calculation performance. Excel can take advantage of as many processors (or cores, which to Excel appear as processors) as there are on a machine—when Excel loads a workbook, it asks the operating system how many processors are available, and it creates a thread for each processor. In general, the more processors, the better the performance improvement.
Diego went on to outline how spreadsheet design has a direct impact on any performance increase:
A spreadsheet that has a lot of completely independent calculations should see enormous benefit. People who care about performance can tweak their spreadsheets to take advantage of this capability.
The bottom line: Splitting formulas into separate cells increases the chances of calculating formulas in parallel, as further outlined by Excel MVP and calculation expert Charles Williams at the following links:
Decision Models: Excel Calculation Process
Excel 2010 Performance: Performance and Limit Improvements
I think i found the answer. Earlier i was using the MIN function, though incorrectly, as the dates in the array formula (when you select and hit F9 key) were coming in descending order. So i finally used the MAX function to find the earliest date which was more than 180 in the past.
=IF(MAX(IF(--(A2-$A$2:$A$33>=180)*(--(C2=$C$2:$C$33))*(--
($D$2:$D$33="NEW")),$A$2:$A$33))=0,"",MAX(IF(--(A2-$A$2:$A$33>=180)*(--
(C2=$C$2:$C$33))*(--($D$2:$D$33="NEW")),$A$2:$A$33)))
Check the revised Sample.xlsx which is self-explanatory. I have added the Attr='NEW' criteria in the formula for the final workaround, to find if there were any new items that came 180 days or earlier.
Though still an ADO query alternative may be required to process the large amounts of data.
I am currently working to integrate a third party mapping tool into my current system.
Problem is the tool itself as it replaces an existing system needs certain tweaks, as well as a summarized version of data to make SSRS reporting much faster.
Right now all I would like to do from a dataset perspective is return something similar to Sum(Numerator1) & First(Operator1) & Sum(Numerator2) & First(Operator2) & Sum(Numerator3) & First(Operator3) -- If Needed for another Numerator
The problem I have is my calculation can in theory be anything, so even storing it like this will be a huge pain.
so I'm passing balances into each one of those fields, Numerators being numbers and operators being (+,-,*,/). The reason I see this being my only option is I need Numerator's to be able to fluctuate between groups so if I'm grouping 5 rows vs 10 rows or a full total together I am still doing the same calculation my balances are just changing.
Problem is how can I make SSRS evaluate whatever I have to pass in here, and is it possible to do this as a string.
Division is the kicker here and the main reason I have to do this in the report as I might have data for 20 units. I need to provide the initial calculation for each unit as well as provide the calculation with each of the balances summed for all 20 units to figure say a percent of sales or something.
If I do this in the report I would have to have a total for each unit and then for the overall total. I don't want to do this because the report will have untold amount of additional sub totals and trying to bring it the final balance back in the query just will not work.
I appreciate any help or ideas anyone has for this.
Thank you,
Striker~
You can't evaluate a string as an expression in SSRS.
If you have the time and the know-how, then you could write a function in VB.net that parses the expression and returns the result.
You would then call that function from your report like so:
=Code.ParseString(Fields!MyStringExpression.Value)
Without telling us why your calculation could be anything, we can't provide much more information!
Ok, I'm just curious what the formula would be for calculating an expected income over the next X weeks/months/etc, if the only data I have in mySQL DB is all past transactions (dates of transactions, amounts, etc)
I am thinking taking some averages and whatnot, but I can't think of a specific formula (there must be something along those lines) to take say average rise of income over time (weekly/monthly) and then apply it to a select future period and display it weekly/monthly/etc?
Any suggestions?
use AVG() on the income in the past devide it to proper weekly/monthly amounts if neccessary.
see http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html#function_avg for more info on AVG()
Linear regression + simple integration is probably sufficient for your needs. I leave sorting out exact implementation for your DB up to you, but that follow that link to the "Estimation Methods" section, and probably use Ordinary Least Squares.
Alternatively, you can always slurp your data into something like R where the details are already implemented.
EDIT:
For more detail: you're trying to model INCOME = BASE + SCALING*T where we are assuming that a linear model is "good" (it's probably not great, but it's probably good enough on a short time scale). For two value linear regression, you're pretty much just taking averages; follow that link to "Fitting the Regression Line" and you'll see which things you need to average (y = INCOME and x = T). There are some tricks you can play to simplify the calculation for the computer if you can enforce some other conditions (e.g., having equally spaced time periods + no missing data), but you'll need to math a bit more yourself first if you want to do that (and you'll be less flexible in the face of changing db assumptions).