Naming array dimensions in Excel VBA

Naming array dimensions in Excel VBA - vba

I'm working with threedimensional arrays and it would be neat if I could name the array dimensions. The question marks in the example below are giving me the idea that this is possible.
Is it, and if so, how does it work? I can't seem to find it anywhere.

The three question marks are showing you that this array has three dimensions. If there was only one question mark, it would mean that the variable was declared as one dimensional. This is built in to VB and can't be change, as far as I know.
I think there's real value into making your code more readable and self-documenting. If I had a three dim array, I would probably create some custom class modules to model the objects that I was using.
If your first dimension is a SchoolID, your second dimension is a ClassID, and your third dimension is a StudentID, then code using custom class modules like this
Debug.Print Schools(10).Classes(3).Students(7).Name
is more readable than
Debug.Print arrLeeftijdenG5(10,3,7)
I don't know what you're storing, so it's just an example. Consider using custom class module to model the real-world objects your code is manipulating. There's a bit more set up involved, but it pays dividends down the road.

Related

Creating an Excel User Defined Function to match synonyms from a list

At my job, we have several rental properties that we manage. Each one of these properties may go by different names. For example a property may be called Amber Gateway, Platinum Gateway, The Gateway, etc. We have maybe 500-600 Excel workbooks floating around with different types of information in them & I might be asked to pull information from various ones.
The lack of a consistent naming methodology prevents me from using a standard Index/Match function to look up data. I'm not sure if this is the best solution, but this has been my stab at solving the problem.
I've created a worksheet that has a list of all the property names in Column A. Any associated names are listed to the right on the same row in Column B, Column C, and so on. Just for simplicity, say there are only 5 properties and all my data is in A1:E5. Then say the property I'm interested in is in F1 and the property list I want to "match" it up against is in G1:G5. So my data would look something like this:
River Stream Creek Brook Rivulet
Apple Fruit
Rock Boulder Stone Slab
Candy Dessert Sweets
Forest Trees
Given the word 'boulder' and the following list:
Candy
Fruit
Creek
Slab
Forest
my goal is to return the list position of the synonym 'slab' - in this case, 4.
I think I can use the below array formula in place of the Match function. to accomplish this:
{=SUMPRODUCT(--(INDEX(A1:E5,SUMPRODUCT(--(A1:E5=F1)*ROW(A1:E5)),)
=G1:G5)*IF(G1:G5<>"",MATCH(G1:G5,G1:G5,0)))}
Now this formula is a bit unwieldy and I was hoping to translate it into a UDF to make it easier to work with. I'm unfamiliar with VBA though, and after doing a bit of searching, I realized that VBA logic works quite differently than Excel Formula logic. Specifically, I don't think I can use = to force my lookup grid into TRUE/FALSE values in VBA like I do in the SUMPRODUCT functions. Do I have to learn VBA in order to implement this as a UDF or is there another solution? In practice, my lookup grid (A1:E5) will be in an external workbook.
If my attempt is completely off the mark, I'm open to other solutions. I know the Match formula function supports wildcards, but it wouldn't work in the case of dramatically different names, so I was hoping for something more comprehensive.
This is my first time asking a question on here, so please let me know if this belongs in a different area or there's any matter of etiquette I'm overlooking.

How to write clear and maintainable code when dealing with tables?

In my projects I often take advantage of tables and underlying ListObjects and ListColumns. I like them as they're easier to reference and update than bare Range objects. Yet I still haven't found a sane and maintainable way to handle multiple ListObjects consisting of many ListColumns and being referenced across all Worksheets in a project.
Let's say I have Worksheet (with (Name) property set to "WorksheetA") that contains table (called TableA) with few columns (called Column1, Column2, ..., Column10).
Now I want to reference one of the columns from the code of another Worksheet. I could do it as follows:
WorksheetA.ListObjects("TableA").ListColumns("Column7")
Now, it's a bad practice to use string directly, as it's difficult to maintain and prone to errors.
So what now?
I could create dedicated module to store my string as constants. For example, module called "Constants":
Public Const TABLE_A As String = "TableA"
Public Const COLUMN7 As String = "Column7"
Then my reference could be converted to:
WorksheetA.ListObjects(Constants.TABLE_A).ListColumns(Constants.COLUMN7)
However, this solution has some disadvantages:
Constants module would grow ridiculously fast with each table and column added.
Reference itself grows and becomes less readable.
All constants related to tables from across all workbooks are thrown into one giant pit.
I could store constants inside WorksheetA, and make them available through Public Functions like:
Private Const TABLE_A As String = "TableA"
Private Const COLUMN7 As String = "Column7"
Public Function GetTableAName() As String
GetTableAName = TABLE_A
End Function
Public Function GetTableA() As ListObject
Set GetTableA = WorksheetA.ListObjects(TABLE_A)
End Function
Public Function GetTableAColumn7() As ListColumn
Set GetTableAColumn7 = GetTableA().ListColumns(COLUMN7)
End Function
This solution actually solves all three problems mentioned above, yet it's still a bit "dirty" and time-consuming, as adding a new table introduces a requirement to create a function for each column.
Do you have better idea how to deal with this problem?
EDIT1 (for clarity): Let's assume that user must not change any names (neither table names nor column names). If user does so, it is he/she to blame.
EDIT2 (for clarity): I've used Column7 as column name only as an example. Let's assume that columns have more meaningful names.

Here's my two cents. I'm not an educated programmer, but I do get paid to do it, so I guess it makes me professional.
The first line of defense is that I create a class to model a table. I fill the class from the table and no other code even knows where the data lives. When I initialize, I'll run code like
clsEmployees.FillFromListObject wshEmployees.ListObjects(1)
Then in the class, the code looks like
vaData = lo.DataBodyRange.Value
...
clsEmployee.EeName = vaData(i,1)
clsEmployee.Ssn = vaData(i,2)
etc
Only one ListObject per worksheet. That's my rule and I never break it. Anyone with access to the worksheet could rearrange the columns and break my code. If I want to use Excel as a database, and sometimes I do, then that is the risk I take. If it's so critical that I can't take that risk, then I store my data in SQL Server, SQLite, or JET.
Instead of putting the range in an array, I could actually call out the ListColumns names. That way if someone rearranged the columns, my code will still work. But it introduces that they could rename the columns, so I'm just trading one risk for another. It would make the code more readable, so it may be the trade you want to make. I like the speed of filling from an array, so that's the trade I make.
If my project is sufficiently small or is supposed to work directly with ListObjects, then I follow the same rules as I do for any Strings.
I use Strings in code exactly once.
If I uses it more than once, I make a procedure-level constant
If I use it in more than one procedure, I try to pass it as an argument
If I can't pass it as an argument, I make a module-level constant
If the two procedures are in different modules, I first ask myself why two procedures are in different modules that use the same constant. Shouldn't related procedures be in the same module?
If the two procedures really belong in the different modules, then I try to pass it as an argument
If none of that works, then it truly is a global constant and I set up in my MGlobals module.
If MGlobals takes up more than about half a screen, I'm doing something wrong and I need to step back and think through what I'm trying to accomplish. And then I make a custom class.

Design pattern to encapsulate generation of columns of tabular data?

Consider the following situation:
A CSV file gets generated from some data with lines like this:
011111;1;1000221;014501;100;343;0;0;0,085;8,5;0;0;0,075;7,5;0;0;0;0
There's a lot more fields and fields are added every once in a while.
The code generating each line is a function of 240 lines.
Now I want to refactor this code in a way that each column will get its own object with it's own logic encapsulated in every object. It would make adding new columns easier and the code more readable.
But what pattern to use here? Composite or Decorator?
Decorator because there's a basic line already and it could get "decorated" with extra columns.
And composite because every line is "composed" of all columns.
What would be a better choice?

If your columns have complex logic for evaluating and you wan't to accord to SOLID principles (especially the Open/Closed Principle) you could make a base class or interface for your columns, with an "EvaluateValue" method. Than you are able to add new columns by deriving new classes, without changing the existing code. Only the initializer that instantiates all columns must be extended by a new line (appending the new column) but this is less error prone than adding something in 240 lines of code. You could use some IoC/DI Container to create all Column instances as well to avoid this.

Domain objects presentation properties

Let's say in my domain I have a Money(amount, Currency(name)) value object (for example: new Money(1000, new Currency('USD'))).
However in my presentation layer (and only there really) I don't want to use USD currency name, but symbol ($) instead.
I don't want to overload my value object with presentation properties (since besides symbol there can be also such things as placement).
How do you guys handle this kind of mappings? Should I create some kind of CurrencyPropertyInMemoryRepository and fetch all info from there? What are my options?

I understand your concern that you want to separate this presentation aspect from your domain data, and if you want to go that way, I think using a repository for mapping the currency name to its symbol might be a good solution (retrieving the correct symbol could then be done in a ValueConverter for example that transforms your model data before they are presented in your UI).
But I personally would not have an issue by storing this additional symbol information also in the currency value object, for two reasons:
The currency symbol is highly related to the currency itself, so whenever the currency name changes, the symbol might also change. Therefore it would make sense to have both information stored in the same place or at least quite close to each other. When using an additional repository, your information is spread at least over two places.
If you have both information within your value object, you could also put additional behavior in your value object (e.g. not every currency has a symbol, in that case you need some logic to decide what to print instead).

Long variable names

Lets say i have a variable that contains the number of search engine names in a file, what would you name it?
number_of_seach_engine_names
search_engine_name_count
num_search_engines
engines
engine_names
other name?
The first name describes what the variable contains precisely, but isn't it too long?, any advice for choosing variable names? especially how to shorten a name that is too long or what kind of abbreviations to use?

How about numEngineNames?
Choosing variable names is more art than science. You want something that doesn't take an epoch to type, but long enough to be expressive. It's a subjective balance.
Ask yourself, if someone were looking at the variable name for the first time, is it reasonably likely that person will understand its purpose?

A name is too long when there exists a shorter name that equally conveys the purpose of the variable.
I think engineCount would be fine here. The number of engine names is presumably equal to the number of engines.
See JaredPar's post.

It depends on the scope of the variable. A local variable in a short function is usually not worth a 'perfect name', just call it engine_count or something like that. Usually the meaning will be easy to spot, if not a comment might be better than a two-line variable name.
Variables of wider scope – i.e. global variables (if they are really necessary!), member variables – deserve IMHO a name that is almost self documentary. Of course looking up the original declaration is not difficult and most IDE do it automatically, but the identifier of the variable should not be meaningless (i.e. number or count).
Of course, all this depends a lot on your personal coding style and the conventions at your work place.

Depends on the context, if its is a local variable, as eg
int num = text.scan(SEARCH_ENGINE_NAME).size();
the more explicit the right-hand of the expression the shorter the name I'd pick. The rational is that we are in a limited scope of maybe 4-5 lines and can thus assume that the reader will be able to make the connection between the short name and the right-hand-side expression. If however, it is the field of a class, I'd rather be as verbose as possible.

See similar question
The primary technical imperative is to reduce complexity. Variables should be named to reduce complexity. Sometimes this results in shorter names, sometimes longer names. It usually corresponds to how difficult it is for a maintainer to understand the complexity of the code.
On one end of the spectrums, you have for loop iterators and indexes. These can have names like i or j, because they are just that common and simple. Giving them longer names would only cause more confusion.
If a variable is used frequently but represents something more complex, then you have to give it a clear name so that the user doesn't have to relearn what it means every time they use it.
On the other end of the spectrum are variables that are used very rarely. You still want to reduce confusion here, but giving it a short name is less important, because the penalty for relearning the purpose of the variable is not paid very often.

When thinking about your code, try to look at it from the perspective of someone else. This will help not only with picking names, but with keeping your code readable as a whole.
Having really long variable names will muddle up your code's readability, so you want to avoid those. But on the other end of the spectrum, you want to avoid ultra-short names or acronyms like "n" or "ne." Short, cryptic names like these will cause someone trying to read your code to tear their hair out. Usually one to two letter variables are used for small tasks like being incremented in a for loop, for example.
So what you're left with is a balance between these two extremes. "Num" is a commonly used abbreviation, and any semi-experienced programmer will know what you mean immediately. So something like "numEngines" or "numEngineNames" would work well. In addition to this, you can also put a comment in your code next to the variable the very first time it's used. This will let the reader know exactly what you're doing and helps to avoid any possible confusion.

I'd name it "search_engine_count", because it holds a count of search engines.

Use Esc+_+Esc to write:
this_is_a_long_variable = 42
Esc+_+Esc and _ are not identical characters in Mathematica. That's why you are allowed to use the former but not the latter.

If it is a local variable in a function, I would probably call it n, or perhaps ne. Most functions only contain two or three variables, so a long name is unnecessary.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas