VBA - what is faster? Case statements or If Statements - vba

I am writing some code for a very large spreadsheet. Where certain things need to go in specific places, and I am wondering what sort of statement would be faster, and what are the pro's and con's of each statement.
Here is my dilemma -
There is a massive table, which could be ranging from column A to column CD, or some other column making it disgusting to look at in one go.
I am trying to create a "lastrow" variable (Integer) to describe the bottom row of data.
At the moment I have a very inefficient if loop, reading up from the bottom -".Rows.Count)End.Up(xlUp).Row", sort of thing
and it currently re-writes the lastrow variable if it is a larger number. I just want to find a quick and easy way to find the lowest row number.
Is looping the way forward with this?
Kind Regards,
lewisthegruffalo

Related

Excel VBA using SUMPRODUCT and COUNTIFS - issue of speed

I have an issue of speed. (Apologies for the long post…). I am using Excel 2013 and 2016 for Windows.
I have a workbook that performs 10,000+ calculations on a 200,000 cell table (1000 rows x 200 columns).
Each calculation returns an integer (e.g. count of filtered rows) or more usually a percentage (e.g. sum of value of filtered rows divided by sum of value of rows). The structure of the calculation is variations of the SUMPRODUCT(COUNTIFS()) idea, along the lines of:
=IF($B6=0,
0,
SUMPRODUCT(COUNTIFS(
Data[CompanyName],
CompanyName,
Data[CurrentYear],
TeamYear,
INDIRECT(VLOOKUP(TeamYear&"R2",RealProgress,2,FALSE)),
"<>"&"",
Data[High Stage],
NonDom[NonDom]
))
/$B6
)
Explaining above:
the pair Data[Company Name] and CompanyName is the column in the table and the condition value for the first filter.
The pair Data[Current Year] and TeamYear are the same as above and constitute the second filter.
The third pair looks up a intermediary table and returns the name of the column, the condition ("<>"&"") is ‘not blank’, i.e. returns all rows that have a value in this column
Finally, the fourth pair is similar to 3 above but returns a set of values that matches the set of values in
Lastly, the four filters are joined together with AND statements.
It is important to note that across all the calculations the same principle is applied of using SUMPRODUCT(COUNTIFS()) – however there are many variations on this theme.
At present, using Calculate on a select range of sheets (rather than the slower calculating the whole workbook), yields a speed of calculation of around 30-40 seconds. Not bad, and tolerable as calculations aren’t performed all the time.
Unfortunately, the model is to be extended and now could approach 20,000 rows rather 1,000 rows. Calculation performance is directly linked to the number of rows or cells, therefore I expect performance to plummet!
The obvious solution [1] is to use arrays, ideally passing an array, held in memory, to the formula in the cell and then processing it along with the filters and their conditions (the lookup filters being arrays too).
The alternative solution [2] is to write a UDF using arrays, but reading around the internet the opinion is that UDFs are much slower than native Excel functions.
Two questions:
Is solution [1] possible, and the best way of doing this, and if so how would I construct it?
If solution [1] is not possible or not the best way, does anyone have any thoughts on how much quicker solution [2] might be compared with my current solution?
Are there other better solutions out there? I know about Power BI Desktop, PowerPivot and PowerQuery – however this is a commercial application for use by non-Excel users and needs to be presented in the current Excel ‘grid’ form of rows and columns.
Thanks so much for reading!
Addendum: I'm going to try running an array calculation for each sheet on the Worksheet.Activate event and see if there's some time savings.
Writing data to arrays is normally a good idea if looking to increase speed. Done like this:
Dim myTable As ListObject
Dim myArray As Variant
'Set path for Table variable
Set myTable = ActiveSheet.ListObjects("Table1")
'Create Array List from Table
myArray = myTable.DataBodyRange
(Source)

VBA: Efficient Vlookup from another Workbook

I need to do a Vlookup from another workbook on about 400000 cells with Vba. These cells are all in one Column.And shall be written into one Column. I know already , how the Vlookup Works, but my runtime is much to high by using autofill. Do you have an Suggestion how i can approve it?
Dont use VLookup use Index Match: http://www.randomwok.com/excel/how-to-use-index-match/
If you are able to adjust what the data looks like a slight amount, you may be interested in using a binary search. Its been a while since I last used one (writing a code for group exercise check-in program). https://www.khanacademy.org/computing/computer-science/algorithms/binary-search/a/implementing-binary-search-of-an-array , was helpful in setting up the idea behind it.
If you are able to sort them in an order, say by last name (im not sure of what data you are working with) then add an order of numbers to use for the binary search.
Edit:
The reasoning for a binary search would be that with a binary search is that the computational time it takes. The amount of iterations it would take is log2(400000) vs 400000. So instead of 400000 possible iterations, it would take at most 19 times with a binary search, as you can see with the more data you use the binary search would yield much quicker times.
This would only be a beneficial way if you are able to manipulate the data in such a way that would allow you to use a binary search.
So, if you can give us a bit more background on what data you are using and any restrictions you have with that data we would be able to give more constructive feedback.

What do I need to do in order to structure these results?

I'm trying to learn VBA, but it's very different from the type of programming I'm used to. For that reason I would appreciate your guidance.
I want to structure the results I get from simulations. There are screenshots below illustrating what I'm trying to describe with words here:
What I want to do is:
Copy all the results from one sheet to a new sheet (to keep the original data).
Delete certain columns, for instance B & D:E
Move (or copy, doesn't matter) rows 30:38 up besides rows 11:19, with one empty column in between. The result will be as shown in the last figure below. (The number of rows in each block varies, and there are 4 blocks).
I don't know if these are the recommended procedures, but I know I can:
Delete columns this way:
Sub sbDeleteAColumnMulti()
Columns("D").Delete
End Sub
Copy/paste a range like this:
Sub copyRangeOver()
Dim i As Integer
i = 6
Dim copyRange As Range
Set copyRange = ThisWorkbook.Worksheets("Sheet2").Range("A" & i + 1 & ":CA" & i + 1)
Dim countD As Integer
countD = 10
copyRange.Copy Destination:=Cells(countD, 2)
End Sub
A few things that is complicating stuff for me: The headers (except the first one): Bus ( A ) -LL Fault, are shifted one column to the right (they are now above Ik'', not Bus Name).
I don't know in advance how many rows are in each "block", thus I need to check this (but I know there are only 4 "blocks"). All "blocks" are the same size, so I can just check the number of rows between two Bus Names.
Now, I don't want someone to write me a code! What I hope someone will help me with is to suggest a procedure I can follow for this to work. I know that many roads lead to Rome, and I see that this question might come of as either a "Primarily opinion-based question", or "Too broad". However, I think it's a legitimate question that belongs here. I'm not trying to start a debate over what the "best" way of doing this is, I just want to find a way that works, as I currently don't know where to start. I'm not afraid of "the complicated way", if it's more robust and cleaner.
What I don't know is what kind of Modules, Class Modules (if any) etc I need. Do I need Collections, create Public/Private subs? What would be the purpose of each of those be in this case?
What I start with: (Edit: none of the cells are merged, it's just a bunch of whitespaces)
What I want:
Update:
Here's the first chunk of code I get when recording a macro (note that my workbook has more columns and rows than in the example I gave):
Range("D:I,K:M,O:P").Select
Range("O1").Activate
Selection.Delete Shift:=xlToLeft
ActiveWindow.SmallScroll Down:=39
Range("C52:E78").Select
Selection.Copy
ActiveWindow.SmallScroll Down:=-42
Range("G13").Select
ActiveSheet.Paste
ActiveWindow.SmallScroll Down:=84
Range("C91:E117").Select
To me, this looks like a piece of crap. Of course, it's possible that I should have created the macro differently, but if I did it the right way, I don't think it's much to work with. I guess I can delete all the SmallScroll-lines, but still...
Also, I don't see how I can adapt this, so that it will work if I have a different number of rows in each block.
To get this, you're going to want to start with using the Macro Recorder from Excel.
If you are doing the exact same formatting options for the exact same data output each time, this is by far your best bet. The recorder will copy whatever you do for formatting and write the code you need. It may not be the best code but it will by far be the best option for what you are describing.
If (when?) you need to start adding logic other than the same formatting, you will then have functional code which will make your life easier.
But isn't the macro recorder going to generate bad code and/or it's better to just code from scratch?
I'm fairly experienced at this point and often use the macro recorder because... while it does put a lot of code there which isn't strictly speaking necessary, it gets you a ton of the more obscure stuff (how do I format the cell border to be this way?) etc. Of course it's better to not only use the recorder, but for your example it's even more perfect, you get all the formatting recorded and then can modify the logic and not have to waste time figuring out syntax for formatting, deleting columns, etc.
Very few languages offer the ability to basically say, "I want to do what I am doing now programmatically - how can I start?" the way VBA does. You can bypass a lot of annoying syntax issues when learning (especially if you've previously done any sort of coding) and focus right on the logic you want to add. It works out pretty well, honestly.

VBA: Performance of multidimensional List, Array, Collection or Dictionary

I'm currently writing code to combine two worksheets containing different versions of data.
Hereby I first want to sort both via a Key Column, combine 'em and subsequently mark changes between the versions in the output worksheet.
As the data amounts to already several 10000 lines and might some day exceed the lines-per-worksheet limit of excel, I want these calculations to run outside of a worksheet. Also it should perform better.
Currently I'm thinking of a Quicksort of first and second data and then comparing the data sets per key/line. Using the result of the comparison to subsequently format the cells accordingly.
Question
I'd just love to know, whether I should use:
List OR Array OR Collection OR Dictionary
OF Lists OR Arrays OR Collections OR Dictionaries
I have as of now been unable to determine the differences in codability and performance between this 16 possibilities. Currently I'm implementing an Array OF Arrays approach, constantly wondering whether this makes sense at all?
Thanks in advance, appreciate your input and wisdom!
Some time ago, I had the same problem with the macro of a client. Additionally to the really big number of rows (over 50000 and growing), it had the problem of being tremendously slow from certain row number (around 5000) when a "standard approach" was taken, that is, the inputs for the calculations on each row were read from the same worksheet (a couple of rows above); this process of reading and writing was what made the process slower and slower (apparently, Excel starts from row 1 and the lower is the row, the longer it takes to reach there).
I improved this situation by relying on two different solutions: firstly, setting a maximum number of rows per worksheet, once reached, a new worksheet was created and the reading/writing continued there (from the first rows). The other change was moving the reading/writing in Excel to reading from temporary .txt files and writing to Excel (all the lines were read right at the start to populate the files). These two modifications improved the speed a lot (from half an hour to a couple of minutes).
Regarding your question, I wouldn't rely too much on arrays with a macro (although I am not sure about how much information contains each of these 10000 lines); but I guess that this is a personal decision. I don't like collections too much because of being less efficient than arrays; and same thing for dictionaries.
I hope that this "short" comment will be of any help.

Is there a way for VBA UDF to "know" what other functions will be run?

Assume I have a UDF that will be used in a worksheet 100,000+ times. Is there a way, within the function, for it to know how many more times it is going to be called in the batch? Basically what I want to do is have every function create a to-do list of work to do. I want to do something like:
IF remaining functions to be executed after this one = 0 then ...
Is there a way to do this?
Background:
I want to make a UDF that will perform SQL queries with the user just giving parameters(date, hour, node, type). This is pretty easy to make if you're willing to actually execute the SQL query every time the function is run. I know its easy because I did this and it was ridiculously slow. My new idea is to have the function first see if the data it is looking for exists in a global cache variable and if it isn't to add it to a global variable "job-list".
What I want it to do is when the last function is called to then go through the job list and perform the fewest number of SQL queries and fill the global cache variable. Once the cache variable is full it would do a table refresh to make all the other functions get called again since on the subsequent call they'll find the data they need in the cache.
Firstly:
VBA UDF performance is extremely sensitive to the way the UDF is coded:
see my series of posts about writing efficient VBA UDFs:
http://fastexcel.wordpress.com/2011/06/13/writing-efficient-vba-udfs-part-3-avoiding-the-vbe-refresh-bug/
http://fastexcel.wordpress.com/2011/05/25/writing-efficient-vba-udfs-part-1/
You should also consider using an Array UDF to return multiple results:
http://fastexcel.wordpress.com/2011/06/20/writing-efiicient-vba-udfs-part5-udf-array-formulas-go-faster/
Secondly:
The 12th post in this series outlines using the AfterCalculate event and a cache
http://fastexcel.wordpress.com/2012/12/05/writing-efficient-udfs-part-12-getting-used-range-fast-using-application-events-and-a-cache/
Basically the approach you would need is for the UDF to check the cache & if not current or available then add a request to the queue. Then use the after-calculation event to process the queue and if neccessary trigger another recalc.
Performing 100,000 SQL queries from an Excel spreadsheet seems like a poor design. Creating a cache'ing mechanism on top of these seems to compound the problem, making it more complicated than it probably needs to be. There are some circumstances where this might be appropriate, but I would consider other design approaches instead.
The most obvious is to take the data from the Excel spreadsheet and load it into a table in the database. Then use the database to do the processing on all the rows as once. The final step is to read the result back into Excel.
I find that the best way to get large numbers of rows from Excel into a database is to save the Excel file as csv and bulk insert them.
This approach may not work for your problem. In general, though, set-based approaches running in the database are going to perform much better.
As for the cach'ing mechanism, if you have to go down that route. I can imagine a function that has the following pseudo-code:
Check if input values are in cache.
If so, read values from cache.
Else do complex processing.
Load values in cache.
This logic could go in the function. As #Bulat suggests, though, it is probably better to add an additional caching layer around the function.