Clever way to check if value meets threshold in VBA - vba

Disclaimer: Numbers below are randomly generated
What I'm trying to do is, purely in VBA, look at the ratio of [column B]/[column A] and checking whether or not the ratio in row 10 (=1,241/468) is below the minimum of the ratios or above the maximum of the ratios in rows 1 through 9 but only compared to the rows where there is a 1 in column C.
That is, compare Cell(B10)/Cell(A10) to Cell(B2)/Cell(A2), Cell(B3)/Cell(A3), etc. (only comparing against rows with a 1 in column C).
The workbook I'm working with has a lot more data and columns and I'm not allowed to explicitly edit the cells, so defining a new column is out of the question. Is there a way to do this in VBA such that it essentially returns a boolean depending whether or not the ratio in the last row violates the threshold defined above?

You can achieve the minimum and maximum ratios (with criteria) easily with the AGGREGATE¹ function's SMALL sub-function and LARGE sub-function.
        
The formulas in D13:E13 are,
=AGGREGATE(15, 6, ((B1:B9)/(A1:A9))/C1:C9, 1)
=AGGREGATE(14, 6, ((B1:B9)/(A1:A9))/C1:C9, 1)
The 6 is the AGGREGATE parameter for ignoring error values. By dividing the ratio
by the value in column C we are producing #DIV/0! errors for anything we do not want considered leaving them ignored. If the values in C were more diverse, we could divide by (C1:C9=1) to produce the same results.
Since we are using the SMALL and LARGE sub-functions, we can easily retrieve the second, third, etc. ratios by increasing the k parameter (the 1 off the back end).
I've modified some of the values in your sample slightly to demonstrate that the min and max with criteria are being picked up correctly.
These can be adapted to VBA with the WorksheetFunction object or Application.Evaluate method.
¹The AGGREGATE¹ function's was introduced with Excel 2010. It is not available in previous versions.

Related

How to identify records which have clusters or lumps in data?

I have a tableau table as follows:
This data can be visualized as follows:
I'd like to flag cases that have lumps/clusters. This would flag items B, C and D because there are spikes only in certain weeks of the 13 weeks. Items A and E would not be flagged as they mostly have a 'flat' profile.
How can I create such a flag in Tableau or SQL to isolate this kind of a case?
What I have tried so far?:
I've tried a logic where for each item I calculate the MAX and MEDIAN. Items that need to be flagged will have a larger (MAX - MEDIAN) value than items that have a fairly 'flat' profile.
Please let me know if there's a better way to create this flag.
Thanks!
Agree with the other commenters that this question could be answered in many different ways and you might need a PhD in Stats to come up with an ideal answer. However, given your basic requirements this might be the easiest/simplest solution you can implement.
Here is what I did to get here:
Create a parameter to define your "spike". If it is going to always be a fixed number you can hardcode this in your formulas. I called min "Min Spike Value".
Create a formula for the Median Values in each bucket. {fixed [Buckets]: MEDIAN([Values])} . (A, B, ... E = "Buckets"). This gives you one value for each letter/bucket that you can compare against.
Create a formula to calculate the difference of each number against the median. abs(sum([Values])-sum([Median Values])). We use the absolute value here because a spike can either be negative or positive (again, if you want to define it that way...). I called this "Spike to Current Value abs difference"
Create a calculated field that evaluates to a boolean to see if the current value is above the threshold for a spike. [Spike to Current Value abs difference] > min([Min Spike Value])
Setup your viz to use this boolean to highlight the spikes. The beauty of the parameter is you can change the value for what a spike should be and it will highlight accordingly. Above the value was 4, but if you change it to 8:

How might one most efficiently calculate contingent values?

Suppose that I have 10 values n_1, n_2, ... n_10 and that given any 1 of these value, the other 9 can be calculated. Let f_i(n_j) be the function that calculates the value n_i using the values of n_j (where i != j). These functions are relatively simple (i.e. contain no more than a few exponential functions or powers).
In terms of the functions used, what would be the most efficient way of creating a program to calculate the other 9 values in n_1, ..., n_10 given the 1 that is initially known?
Would the best option be to minimize the number of functions used (and thus minimize the number of lines of code), or to create a function defining every single mapping?
For example, would it be most efficient to use only the 18 functions
f_1(n_2), f_1(n_3), ..., f_1(n_10) [1]
f_2(n_1), f_3(n_1), ..., f_10(n_1) [2]
And then, for whatever input is provided by the user, the value of n_1 may be calculated by using the relevant function in line 1, from which every other value of intererest may be calculated using functions from line [2]?
Or would it be better to define all 90 mappings, and so that only a single function (rather than 2 functions) must be called to calculate each of the 9 other values?
Edit: The specific result that I am trying to achieve is as follows...
I am currently using VBA, with a user form of the following format:
The conversion frequency is a required field (so lets just say, for example, that it is always equal to 2 and forget about it). I want to use on change events so that whenever the user changes any of the 6 fields below the conversion frequency field, the other 5 fields are auto-filled with the correct value. However, since the user need only update any one out of six fields, with the other 5 fields being calculated from this, we will require 6^6-6 = 30 different functions to do these calculations. We will thus end up with a lot of repetitive code.
My question regards the best practices to follow when working with a form where one of many inputs may be provided, and all other fields must be updated as a result of the input provided and its value.
Or, equivalently, is there a way to update all fields when the value of one field changes? Can this be done without the number of lines of code required increasing exponentially as the number of fields increases?
I think you are grossly overthinking this. Think of this in terms of the formulas you need; which I think are 6. 6 functions that take 5 inputs each:
calculateEIR(nominalInterestRate, ForceOfInterest, DiscountFactor, EffectiveDiscountRate, NominalDiscountRate)
calculateNIR(EffectiveInterestRate, ForceOfInterest, DiscountFactor, EffectiveDiscountRate, NominalDiscountRate)
' and so on...
The event handlers, and the code to calculate the values are their own thing. Your onchange event handlers simply need to call the correct methods; this is 6 event handlers calling 5 methods each, so 11 functions if you want to keep count. It's a lot of copypasta. For example:
sub textEffectiveInterestRate_onchange()
Me.textNominalInterstRate.value = calculateNIR(Me.textEffectiveInterestRate.value, Me.textForceOfInterest.value, etc...)
Me.textForceOfInterest.value = calculateForceOfInterest(Me.textEffectiveInterestRate.value, Me.textNominalInterstRate.value, etc...)
' And every other function aside from calculateEIR()
end sub
I am unsure about the specifics of how you are changing all the values based on a change in the others (since I don't know the formulas), but in general, you should not in any way need 30 functions...

Excel VBA using SUMPRODUCT and COUNTIFS - issue of speed

I have an issue of speed. (Apologies for the long post…). I am using Excel 2013 and 2016 for Windows.
I have a workbook that performs 10,000+ calculations on a 200,000 cell table (1000 rows x 200 columns).
Each calculation returns an integer (e.g. count of filtered rows) or more usually a percentage (e.g. sum of value of filtered rows divided by sum of value of rows). The structure of the calculation is variations of the SUMPRODUCT(COUNTIFS()) idea, along the lines of:
=IF($B6=0,
0,
SUMPRODUCT(COUNTIFS(
Data[CompanyName],
CompanyName,
Data[CurrentYear],
TeamYear,
INDIRECT(VLOOKUP(TeamYear&"R2",RealProgress,2,FALSE)),
"<>"&"",
Data[High Stage],
NonDom[NonDom]
))
/$B6
)
Explaining above:
the pair Data[Company Name] and CompanyName is the column in the table and the condition value for the first filter.
The pair Data[Current Year] and TeamYear are the same as above and constitute the second filter.
The third pair looks up a intermediary table and returns the name of the column, the condition ("<>"&"") is ‘not blank’, i.e. returns all rows that have a value in this column
Finally, the fourth pair is similar to 3 above but returns a set of values that matches the set of values in
Lastly, the four filters are joined together with AND statements.
It is important to note that across all the calculations the same principle is applied of using SUMPRODUCT(COUNTIFS()) – however there are many variations on this theme.
At present, using Calculate on a select range of sheets (rather than the slower calculating the whole workbook), yields a speed of calculation of around 30-40 seconds. Not bad, and tolerable as calculations aren’t performed all the time.
Unfortunately, the model is to be extended and now could approach 20,000 rows rather 1,000 rows. Calculation performance is directly linked to the number of rows or cells, therefore I expect performance to plummet!
The obvious solution [1] is to use arrays, ideally passing an array, held in memory, to the formula in the cell and then processing it along with the filters and their conditions (the lookup filters being arrays too).
The alternative solution [2] is to write a UDF using arrays, but reading around the internet the opinion is that UDFs are much slower than native Excel functions.
Two questions:
Is solution [1] possible, and the best way of doing this, and if so how would I construct it?
If solution [1] is not possible or not the best way, does anyone have any thoughts on how much quicker solution [2] might be compared with my current solution?
Are there other better solutions out there? I know about Power BI Desktop, PowerPivot and PowerQuery – however this is a commercial application for use by non-Excel users and needs to be presented in the current Excel ‘grid’ form of rows and columns.
Thanks so much for reading!
Addendum: I'm going to try running an array calculation for each sheet on the Worksheet.Activate event and see if there's some time savings.
Writing data to arrays is normally a good idea if looking to increase speed. Done like this:
Dim myTable As ListObject
Dim myArray As Variant
'Set path for Table variable
Set myTable = ActiveSheet.ListObjects("Table1")
'Create Array List from Table
myArray = myTable.DataBodyRange
(Source)

MAX function on 2 runtime-determined ranges that may contain #NV

I need some thoughts on how to improve my concept before I begin to prevent this from becoming a mile long write-only formula...
What I'm trying to do, graphically, is this:
I have two rows that have 4 mandatory cells (straight line) and 4 optional cells (dotted line) that I need to run a MAX function on. ANY number of the X's may contain #NV (for diagram purposes, these happen deliberately).
First, I need to determine the actual ranges. This is currently done with INDIRECT(..). Depending on the current quarter it selects a range of 5 to 8 cells in the rows.
INDIRECT("Q5:" & CHAR(CODE("T") + VarQuarter) & 5)
After that, MAX is performed on the range and then on the previously calculated MAX result of the two ranges. In case of an error (because of an #NV), that result needs to be omitted, otherwise both results are used. Should both results be erroneous, I'm fine with the resulting error as that one will be caught later.
My only idea for this would be endlessly long concatenations of IFERROR and redundant MAX statements...
Any ideas for improvement for any of these 2 steps? I was specifically told to perform this on the worksheet and not in code, for easier maintainability by others, so this will have to make do unless it is absolutely impossible.
Assuming the first row starts in Q5 and the second row starts in Q6 try
=MAX(IFERROR(MAX(OFFSET(Q5,0,0,1,varQuarter+4)),0),IFERROR(MAX(OFFSET(Q6,0,0,1,varQuarter+4)),0))

Converting from excel formula for Using forecast with times

When using forecast, you input a number and it should return a value based on the known X data and Known Y data.
However if you put in a time this does not work.
I need two things.
First of all I need the VBA equivalent of forecast. I suspect this to be application.forecast
Then how to use the date as a value for the forecast to work as it should
The formula is as follows:
=FORECAST(15:00:00,A10:A33,B10:B33)
Currently this equation flags up an error.
Any ideas to get this to work for time values?
I see two potential problem areas. The first is the time. Use the TIME function to get a precise time. Second, in D9:D12, the values are left-aligned. Typically, this means they are text, not true numbers. If you absolutely require the m suffix, use a Custom number Format of General\m in order that they retain their numeric status while displaying an m as an increment suffix. If you type the m in, they become text-that-look-like-numbers and are useless for any maths.
=FORECAST(TIME(15, 0, 0), B10:B33, A10:A33)
That returns 3.401666667 which is either 09:38 AM or 3.4 m (it's been a while since I played with the FORECAST function).