Background
I have an extremely large data table that takes up to 12 hours to run for around 1 million input scenarios on a high-end 64bit machine. The scenarios are based on a number of discrete Excel models, that are then fed into a financial model for detailed calculations
To improve the process, I am looking to test and compare the speeds of:
The current manual process
Using VBA to refresh the Data Table (with Calculation, ScreenUpdating etc off)
Running a VBS to refresh the Data Table in a invisible Excel instance
So, I am looking for the best approach to programmatically manage a Data Table
Update: using code in (2) and (3) did not provide a benefit on testing a simple example with a workbook with a single large data table
Rather surprisingly there seems to be very little - possibly no - direct support in VBA for Data Tables
My current knowledge and literature search
QueryTable BeforeRefresh and AfterRefresh Events can be added with this class module code. Intellisense doesn't provide this as an option for Data Tables
Individual PivotTables and QuertyTables can be accessed like so ActiveWorkbookk.Sheets(1).QueryTables(1). Not so Data Tables
Eliminating all other Data Tables and then running a RefreshAll was suggested in this MrExcel thread as a workaround.
The workaround is certainly do-able as I only have a single Data Table, but I'd prefer a direct approach if one exists.
Yes, I'm sticking to Excel :)
Please do not suggest other tools for this approach, both the input models and the overarching model that uses the data table are
part of a well established ongoing process that will stay Excel based,
have been professionally audited,
have been streamlined and optimised by some experience Excel designers
I was simply curious if there was a way to tweak the process by refreshing a specific data table with code, which my initial test results above have concluded no to.
So, you are looking for the best approach to programmatically manage a Data Table.
Well, Excel 2013 does record a macro for me when I manually create a data table, it goes
Selection.Table ColumnInput:=Range("G4")
The signature is
Range.Table(RowInput as Range, ColumnInput as Range) as Boolean
which is documented in Range.Table Method. The Range.Table() function seems to always return true.
This is the only way to create data tables using VBA. But that's all there is to data tables anyway.
AFAIK there is no class or object for data tables, so there is no dt.refresh() or similar method. And there is no collection of data tables you could query. You have to refresh the sheet or recreate the table with Range.Table().
There is a DataTable Interface, but it is related to charts and has nothing to do with Range.Table().
As you mention, you should turn off the usual suspects, i.e.
Application.ScreenUpdating = False
Application.DisplayStatusBar = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False
Try to have as little formulas in your workbook. Remove all formulas not related to the cells you base the data table on. Remove any intermediate results. Best have one cell with one, possibly big, formula.
Example: G4 is your ColumnInput, and it contains =2*G3, with G3 containing =G1+G2,
then better put =2*(G1+G2) into G4.
You may have 6 cores in your high end machine. Divide your scenarios into 6 chunks and
have 6 Excel instances calculate them in parallel.
Related
I am working with a huge excel file that is updated with a set of macros. In the excel file there are also a large number of graphs to ensure easy output checks.
However, when I re-calculate the workbook it is extremely slow.
My question is: Do these graphs contribute to slowing down the calculation of the model? If so, is there a quick VBA way to only update graphs at the end of the overall calculation?
Without seeing your workbook this is hard to answer.
Most likely, it is not the charts (is that what you call "graphs") that are slowing down the recalc, but inefficient formulas.
Check the chart data sources. If they point to worksheet cells, then all is good. If they point to named ranges / named formulas, then check what these formulas are.
Recalculation is affected by
volatile formulas like Today(), Now(), Indirect(), Offset() and a few others
inefficient formulas that needlessly repeat calculations that have already been performed, typically done in running totals
And example of this would be
=Sum(A$1:A2) copied down, like in this screenshot
In each row, the calculation starts in row 1 and goes down to the current row. This is a waste of effort.
A much more efficient formula is in column C, where just the value from the row above is added to the value of the current row.
=SUM(C1,A2)
These details can make a heck of a difference.
For more information you may want to refer to Charles Williams' site http://www.decisionmodels.com/calcsecrets.htm and the pages linked from there.
It's a complex subject and can probably not be addressed in a simple answer to a seemingly simple question.
I'm trying to run a duplicate check In which varying data is pulled from a website and compared to a master list, the master list being stored in Excel. The information from the website is read from a table in which has line breaks. These breaks are translated over to the data collection they are initially stored in. Some of the data from the website us eventually written to the master list in Excel. So when I read the master list back into Blue Prism to run a duplicate check, the rows that have line breaks are written into a collection as multiple rows (ex. I should have on 7 rows in my collections but am getting 42). Since the rows are not EXACTLY the same between the 2 collections, when it runs the automation does not recognize the duplicates.
The easiest way to solve this would be if I could make the collection rows have no line breaks as soon as the data is read. I've attempted to use the calculation stage to do so with no luck. I'm not sure if it is actually possible to do this, but would appreciate any direction.
Record an Excel macro to do the data sorting/cleaning in Excel (possibly Text To Columns, etc..) and then include the running of the macro as part of your Blue Prism process by using an action stage and the MS Excel VBO - Run Macro. Get the process to create an Excel instance (and create a handle data item from that stage), then use Open Workbook (whatever workbook you store your Macro in) and then use the MS Excel VBO - Run Macro (use the same handle created earlier and type in the name of the "macro").
It sounds like what is happening is that the MS Excel VBO is grabbing the data from the Excel Worksheet wholesale.
This is to say that it's accessing your Worksheet table, copying the cell values BUT not the cell formatting data, and then dumping the values into a BP collection.
Since it did not bring along any of the original cell formatting data to reference when it went to populate the collection it's just breaking up the values based on crturn/line breaks. Thus, your collection is organized based on that, and not on the original Worksheet cell.
So, with that said, on to a solution!
Solution 1
Brute force the organization of the incoming Excel cell data to the collection by looping over the Excel Worksheet cell-by-cell.
Run a loop, and in that loop have BP go into the Excel Worksheet and grab the first populated cell it comes across. Run a formatting/cleanup Calculation stage over the data. Dump the cell value into a single collection field.
Repeat.
This is...inelegant, expensive at best, and not at all recommended for any medium to large dataset. But it's definitely the best way to do string manipulation and value comparisons before it hits your collection. Since it sounds like your using a Master template then you as-well know what the expected format of your data should be.
This method will enable you implement Trim(), Concat(), or Split() in a Calculation stage to better organize your incoming data before you dump it into a collection.
This is also basically what I think you're already trying to do, but cell-by-cell instead of Worksheet row-by-row or table-by-table.
Solution 2
Clean up the table data you grab from the website before you dump it into the Excel Worksheet.
This is basically Solution 1, but in reverse. Simply format/cleanup your data before it hits you Excel Worksheet.
I'm not sure this is any better than Solution 1, but, you know, it's something...
Solution 3
Format the cell data IN the MS Excel Worksheet itself.
Basically rearrange the cells and cell data in the Excel Worksheet into a more predictable format by using the Split, Trim, Merge, or other actions included in the MS Excel VBO. You can also do this using the Data - OLEDB utility object, but that requires some pretty solid understanding of SQL syntax.
This would look like this using the MS Excel VBO:
Grab the Excel Worksheet data wholesale and dump into a collection
Count the rows/fields of the collection
Is that number consistent with the desired/expected format of your data?
If not, have the bot go back into the Excel Worksheet and reformat the cells by removing any carriage returns/line breaks/whatever else
Repeat.
However, I'm always reluctant to reformat any original source, as it's then hard to figure out what wrong and where it went wrong when you've changed the original structure of your data. So it's best to always make a copy of the Worksheet before you make any manipulation.
Unfortunately I don't have access to my BP environment at the moment or I'd provide you with the act object actions you'd need to do any of this, my bad. Once I do I'll update this answer.
I will briefly explain what I have and need here, and later if I can, I will edit this post and add a reproducible example.
My project:
Query data from Oracle databases into one worksheet in Excel, then use a LOOKUP procedure to copy data into an editable table in a second worksheet. The second worksheet needs to be in a table format for filtering, and have a drop down option to filter the data by date ranges. The data needs to be refreshed 1-2 times a week only by 1-2 approved staff members.
Concerns:
Per suggestion I installed Power Query for Excel 2010, which required dependencies before it could work. The convenience factor is great and it makes it so that SQL queries can be edited without messing around in VBA code. However, the dependencies setup (Oracle client for data connections) limits casually deploying this as a solution.
The data connections and queries and the data lookup could all be done in VBA and assigned macros.
Questions:
Should I use Power Query to query the data and then a VBA for the second sheet LOOKUP and date range filtering -- or should this all be written in VBA Excel Macros?
Which is more future proof friendly? Are there any advantages for using Power Query that would make this task more edit friendly for non-coders?
Thanks!
This probably can be solved with PowerQuery only, without VBA. I wouldn't recommend you storing queries in Excel table, the best is to move it on a server. A view or a function would be suitable. Querying the database, editing this view/function will work for only for only approved users.
This is more secure and will require only 1 Excel workbook. In PowerQuery, you can refer old copy of the table at the moment you refresh it, therefore you can keep entered data and get new.
Your project seem to me as an ad-hoc solution.
Data tables in Excel are really nice to see how numbers would evolve if one of your input changes. However:
they're (very) slow
the variable you edit is not dynamic; i.e. changing the row or column input cell to =INDIRECT(<dynamic address>) works but doesn't update if the dynamic adress changes.
they force you to use an annoying structure
the variable needs to be in the same sheet as the table
Because I want to do a lot of them, point 1 and 2 make it very impractical to work with data tables. Is there any way to make a function that does this in VBA? I tried a UDF that changes the variable's value (e.g. =whatif(<hypothetical output>, <variable cell>, <variable hypothetical value>), but that's not allowed in VBA (see How to prevent VBA function from re-executing inside the code).
I am quite sure you can achieve your goals using subs rather than UDFs. The concept is to, basically, treat the worksheet like one huge function:
the sub sets up a group of input cells
the sub allows the worksheet to calculate
the sub monitors the output cell(s)
If the goal is some kind of non-linear analysis (like sensitivity analysis), the sub compares successive runs. If the goal is some kind of optimization, the sub drives the inputs to achieve a result. If the goal is some kind of statistical analysis or modelling, the sub can drive a Monti Carlo.
Hey. I urgently need some help on building up a complete table using VBA.
I have got data from two seperate sources of data and i need to put them into same table based on one of the category "Car". I've uploaded an example here .http://www.speedyshare.com/files/23058842/abc.xls
Quick and dirty to do this one time you can do this with the VLOOKUP worksheet function. Enter =VLOOKUP(B2,$F$2:$I$6,2,FALSE) into D2 and =VLOOKUP(B2,$F$2:$I$6,3,FALSE) into E2 in your example and then copy/paste the two cells down to see what I mean.
Writing the equivalent VBA will take a few more minutes. If the solution is reusable let me know and I'll spend the requisite few minutes, but be advised the VBA will vary slightly depending on the real sources of each of the tables of data (in their own worksheets? in their own workbooks?)