Creating what-if's in VBA - vba

Data tables in Excel are really nice to see how numbers would evolve if one of your input changes. However:
they're (very) slow
the variable you edit is not dynamic; i.e. changing the row or column input cell to =INDIRECT(<dynamic address>) works but doesn't update if the dynamic adress changes.
they force you to use an annoying structure
the variable needs to be in the same sheet as the table
Because I want to do a lot of them, point 1 and 2 make it very impractical to work with data tables. Is there any way to make a function that does this in VBA? I tried a UDF that changes the variable's value (e.g. =whatif(<hypothetical output>, <variable cell>, <variable hypothetical value>), but that's not allowed in VBA (see How to prevent VBA function from re-executing inside the code).

I am quite sure you can achieve your goals using subs rather than UDFs. The concept is to, basically, treat the worksheet like one huge function:
the sub sets up a group of input cells
the sub allows the worksheet to calculate
the sub monitors the output cell(s)
If the goal is some kind of non-linear analysis (like sensitivity analysis), the sub compares successive runs. If the goal is some kind of optimization, the sub drives the inputs to achieve a result. If the goal is some kind of statistical analysis or modelling, the sub can drive a Monti Carlo.

Related

Do graphs slow down my calculation speed in a large excel file?

I am working with a huge excel file that is updated with a set of macros. In the excel file there are also a large number of graphs to ensure easy output checks.
However, when I re-calculate the workbook it is extremely slow.
My question is: Do these graphs contribute to slowing down the calculation of the model? If so, is there a quick VBA way to only update graphs at the end of the overall calculation?
Without seeing your workbook this is hard to answer.
Most likely, it is not the charts (is that what you call "graphs") that are slowing down the recalc, but inefficient formulas.
Check the chart data sources. If they point to worksheet cells, then all is good. If they point to named ranges / named formulas, then check what these formulas are.
Recalculation is affected by
volatile formulas like Today(), Now(), Indirect(), Offset() and a few others
inefficient formulas that needlessly repeat calculations that have already been performed, typically done in running totals
And example of this would be
=Sum(A$1:A2) copied down, like in this screenshot
In each row, the calculation starts in row 1 and goes down to the current row. This is a waste of effort.
A much more efficient formula is in column C, where just the value from the row above is added to the value of the current row.
=SUM(C1,A2)
These details can make a heck of a difference.
For more information you may want to refer to Charles Williams' site http://www.decisionmodels.com/calcsecrets.htm and the pages linked from there.
It's a complex subject and can probably not be addressed in a simple answer to a seemingly simple question.

Formatting Data in excel sheet with blue prism

I'm trying to run a duplicate check In which varying data is pulled from a website and compared to a master list, the master list being stored in Excel. The information from the website is read from a table in which has line breaks. These breaks are translated over to the data collection they are initially stored in. Some of the data from the website us eventually written to the master list in Excel. So when I read the master list back into Blue Prism to run a duplicate check, the rows that have line breaks are written into a collection as multiple rows (ex. I should have on 7 rows in my collections but am getting 42). Since the rows are not EXACTLY the same between the 2 collections, when it runs the automation does not recognize the duplicates.
The easiest way to solve this would be if I could make the collection rows have no line breaks as soon as the data is read. I've attempted to use the calculation stage to do so with no luck. I'm not sure if it is actually possible to do this, but would appreciate any direction.
Record an Excel macro to do the data sorting/cleaning in Excel (possibly Text To Columns, etc..) and then include the running of the macro as part of your Blue Prism process by using an action stage and the MS Excel VBO - Run Macro. Get the process to create an Excel instance (and create a handle data item from that stage), then use Open Workbook (whatever workbook you store your Macro in) and then use the MS Excel VBO - Run Macro (use the same handle created earlier and type in the name of the "macro").
It sounds like what is happening is that the MS Excel VBO is grabbing the data from the Excel Worksheet wholesale.
This is to say that it's accessing your Worksheet table, copying the cell values BUT not the cell formatting data, and then dumping the values into a BP collection.
Since it did not bring along any of the original cell formatting data to reference when it went to populate the collection it's just breaking up the values based on crturn/line breaks. Thus, your collection is organized based on that, and not on the original Worksheet cell.
So, with that said, on to a solution!
Solution 1
Brute force the organization of the incoming Excel cell data to the collection by looping over the Excel Worksheet cell-by-cell.
Run a loop, and in that loop have BP go into the Excel Worksheet and grab the first populated cell it comes across. Run a formatting/cleanup Calculation stage over the data. Dump the cell value into a single collection field.
Repeat.
This is...inelegant, expensive at best, and not at all recommended for any medium to large dataset. But it's definitely the best way to do string manipulation and value comparisons before it hits your collection. Since it sounds like your using a Master template then you as-well know what the expected format of your data should be.
This method will enable you implement Trim(), Concat(), or Split() in a Calculation stage to better organize your incoming data before you dump it into a collection.
This is also basically what I think you're already trying to do, but cell-by-cell instead of Worksheet row-by-row or table-by-table.
Solution 2
Clean up the table data you grab from the website before you dump it into the Excel Worksheet.
This is basically Solution 1, but in reverse. Simply format/cleanup your data before it hits you Excel Worksheet.
I'm not sure this is any better than Solution 1, but, you know, it's something...
Solution 3
Format the cell data IN the MS Excel Worksheet itself.
Basically rearrange the cells and cell data in the Excel Worksheet into a more predictable format by using the Split, Trim, Merge, or other actions included in the MS Excel VBO. You can also do this using the Data - OLEDB utility object, but that requires some pretty solid understanding of SQL syntax.
This would look like this using the MS Excel VBO:
Grab the Excel Worksheet data wholesale and dump into a collection
Count the rows/fields of the collection
Is that number consistent with the desired/expected format of your data?
If not, have the bot go back into the Excel Worksheet and reformat the cells by removing any carriage returns/line breaks/whatever else
Repeat.
However, I'm always reluctant to reformat any original source, as it's then hard to figure out what wrong and where it went wrong when you've changed the original structure of your data. So it's best to always make a copy of the Worksheet before you make any manipulation.
Unfortunately I don't have access to my BP environment at the moment or I'd provide you with the act object actions you'd need to do any of this, my bad. Once I do I'll update this answer.

Excel Named Range Formula - Not Automatic Updating

I have a table in Excel that has column heading names (e.g. data_type1, data_type2, etc.). The data in this table changes based on parameters entered on another sheet, and they are pulled to charts which update dynamically.
As a convenience to a user who might be using this sheet I have added a 'user specified function' (non-vba) which also plots to one of the charts. By user specified function I mean I have three cells with dropdown lists. Two correspond to the table headings and one has a short list of operations that can be applied between the two selected data types (e.g. a user might select 'dataype1', '+', 'datatype2' which would produce a sum of the two in the final column of my table).
The user specified function is achieved by defining a named 'range/function' to match the drop downs with their respective column headers and then calls evaluate. See below:
=EVALUATE("="&ADDRESS(ROW('Raw Data'!XFD5),MATCH(user_in1,'Raw Data'!$A$4:$AF$4,0)) & user_operation & ADDRESS(ROW('Raw Data'!XFD5),MATCH(user_in2,'Raw Data'!$A$4:$AF$4,0)))
I name this 'user1_result' and then enter =user1_result in the final column of my table. This approach is nice because it calculates much faster than doing the same thing through building a UDF in VBA and then applying that UDF to every cell in a fairly long column.
Now here is my hangup, this works fine initially, but if the user makes a parameter change that affects one or both of the selected datatypes, the user specified column does not recalculate on-the-fly with the updated data. If the user re-toggles any of the dropdowns the data does recalculate. I am speculating this is from one of two things:
1) Excel does not recognize that a precedent of 'user1_result' has changed, and so for efficiency sake doesn't bother to recompute the column;
2) The 'Evaluate' function used in the named definition of 'user1_result' is not checked for updating, because it's not a normal function (doesn't show up through intellisense if you try to just add that to a cell).
So I am looking for some either confirmation or refutation of these speculations. In the case of confirmation I am hoping to get some advice on how to force the user specified column to update if its precedents change.
One solution is to have VBA do this checking for me and force the computation, but I would like to leave that as a last resort. So, non-VBA solutions preferred.
For posterity I'll answer the above question based on Mat'sMug's feedback:
Regarding the cause of the problem:
The reason the user specified column does not update is because the 'Evaluate' portion of the 'user1_result' named formula is intended to be used at the application level and not as a worksheet function. Because of this, Excel doesn't bother checking to see if its precedents change and ignores it for recalculation.
The problem's solution:
It was suggested to use VBA to watch for worksheet_change events, however, my problem requires that I do NOT use VBA. So, an alternative workaround that forces Excel to check precedents and recalculate the user specified function uses two steps. This functions as a pseudo worksheet_change stand-in.
First, I use a helper cell that performs a countif with an arbitrary counting condition. I don't need it to change, I just need it to share precedents with the inputs of 'user1_result'. So I have it count the number of cells in the first row of data that are larger than some constant:
=COUNTIF(A5:AK5,">100000")
The result of this computation doesn't matter, but in my case my data have small values and so this returns 0 always.
Second, I use a condition for the computation in the user specified column (the last column in my data table).
=IF($AO$1=$AO$1,user1_result)
Now, anytime my data table updates, the final column using the named function recomputes. Simple, and if using macros is not viable (for example due to a client/user's security concerns), this can sort of substitute for a worksheet change event.
I hope somebody out there gets use from this!

Automatically Sort Excel Columns When Data Changes - But With No Assumptions About My Data

Here is my situation:
I have a spreadsheet in Excel 2010 setup with various columns.
All of the columns are set to be filterable/sortable.
There are filters set individually on some columns.
There is a sort set via Data > Sort.
The filters on the columns and the sorting are not fixed and will change.
The data can be changed in any column
How can I make the spreadsheet automatically apply the filters and sorting whenever data is changed?
Any solution I can find online requires you to manually modify the VBA code to make it work with your specific data. You need to change things like: the range your data is in, the columns that need to be filtered by, the sorting that needs to be applied, etc, etc.
Some examples that I have found that don't live up to my expectations are here, here, and here
Is there a script that I could drop into any Excel spreadsheet that meets my requirements?
I do not have possibility to check it now but adding any new criteria to existing filter and then removing it should re-filter the range.
Maybe the same would do with sorting.
It could be triggered by worksheet_change() event.
This works when your filter is applied to ListObject:
Private Sub Worksheet_Change(ByVal Target As Range)
Target.ListObject.AutoFilter.ApplyFilter
End Sub
and does the thing with sorting as well.

Programmatically control/intercept a Data Table refresh

Background
I have an extremely large data table that takes up to 12 hours to run for around 1 million input scenarios on a high-end 64bit machine. The scenarios are based on a number of discrete Excel models, that are then fed into a financial model for detailed calculations
To improve the process, I am looking to test and compare the speeds of:
The current manual process
Using VBA to refresh the Data Table (with Calculation, ScreenUpdating etc off)
Running a VBS to refresh the Data Table in a invisible Excel instance
So, I am looking for the best approach to programmatically manage a Data Table
Update: using code in (2) and (3) did not provide a benefit on testing a simple example with a workbook with a single large data table
Rather surprisingly there seems to be very little - possibly no - direct support in VBA for Data Tables
My current knowledge and literature search
QueryTable BeforeRefresh and AfterRefresh Events can be added with this class module code. Intellisense doesn't provide this as an option for Data Tables
Individual PivotTables and QuertyTables can be accessed like so ActiveWorkbookk.Sheets(1).QueryTables(1). Not so Data Tables
Eliminating all other Data Tables and then running a RefreshAll was suggested in this MrExcel thread as a workaround.
The workaround is certainly do-able as I only have a single Data Table, but I'd prefer a direct approach if one exists.
Yes, I'm sticking to Excel :)
Please do not suggest other tools for this approach, both the input models and the overarching model that uses the data table are
part of a well established ongoing process that will stay Excel based,
have been professionally audited,
have been streamlined and optimised by some experience Excel designers
I was simply curious if there was a way to tweak the process by refreshing a specific data table with code, which my initial test results above have concluded no to.
So, you are looking for the best approach to programmatically manage a Data Table.
Well, Excel 2013 does record a macro for me when I manually create a data table, it goes
Selection.Table ColumnInput:=Range("G4")
The signature is
Range.Table(RowInput as Range, ColumnInput as Range) as Boolean
which is documented in Range.Table Method. The Range.Table() function seems to always return true.
This is the only way to create data tables using VBA. But that's all there is to data tables anyway.
AFAIK there is no class or object for data tables, so there is no dt.refresh() or similar method. And there is no collection of data tables you could query. You have to refresh the sheet or recreate the table with Range.Table().
There is a DataTable Interface, but it is related to charts and has nothing to do with Range.Table().
As you mention, you should turn off the usual suspects, i.e.
Application.ScreenUpdating = False
Application.DisplayStatusBar = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False
Try to have as little formulas in your workbook. Remove all formulas not related to the cells you base the data table on. Remove any intermediate results. Best have one cell with one, possibly big, formula.
Example: G4 is your ColumnInput, and it contains =2*G3, with G3 containing =G1+G2,
then better put =2*(G1+G2) into G4.
You may have 6 cores in your high end machine. Divide your scenarios into 6 chunks and
have 6 Excel instances calculate them in parallel.