When are VBA Variables Instantiated - vba

I'm hesitant to ask, but there's no documentation that I can find for VBA.
Relevant (but I don't think a dupe):
C++ When are global variables created?
In Java, should variables be declared at the top of a function, or as they're needed?
C++ Declare variables at top of function or in separate scopes?
and the most likely relevant When are a module's variables in VB.NET instantiated?
I also took a look at C# on programmers.SE.
I think I'm using the word "Instantiate" right, but please correct me if I'm wrong. Instantiating is when a variable is created and allocated the resources it requires? So in VBA I see two ways of doing this.
Everything at the top!
Public Sub ToTheTop()
Dim var1 As Long
Dim var2 As Long
Dim var3 As Long
var1 = 10
var2 = 20
var3 = var1 + var1
Debug.Print var3
End Sub
Or close to use
Public Sub HoldMeCloser()
Dim var1 As Long
var1 = 10
Dim var2 As Long
var2 = 20
Dim var3 As Long
var3 = var1 + var1
Debug.Print var3
End Sub
I like to put them closer to use so that it's easier to remember what they are, whereas others might want to get them all out of the way. That's personal preference.
But, I think I remember reading somewhere that the VBE goes through a sub/function and instantiates all the variables before going on to anything else. This would indicate that there's no right way to do this in VBA because the variable scopes in time don't change. Not the scope as in Private vs Public.
Whereas in other languages it seems that scope can change based on placement and therefor has a best practice.
I've been searching for this documentation for a while now, but whatever words I'm using aren't pointing me in the right direction, or the documentation doesn't exist.

According to the reference documentation,
When a procedure begins running, all variables are initialized. A numeric variable is initialized to zero, a variable-length string is initialized to a zero-length string (""), and a fixed-length string is filled with the character represented by the ASCII character code 0, or Chr(0). Variant variables are initialized to Empty. Each element of a user-defined type variable is initialized as if it were a separate variable.
When you declare an object variable, space is reserved in memory, but its value is set to Nothing until you assign an object reference to it using the Set statement.
The implication is that regardless of where the variable declaration is stated, the space/memory for it is allocation when the procedure is entered.

The variables, constants, and objects, are instantiated that way :
at module level they are instantiated when the application starts, whether they are declared public, private or static
at procedure level (sub/function) they are instantiated when the procedure is executed.
You have to understand that, although it does have a "compiler", vba is NOT a true compiled language. The compiler is a syntax checker that checks for errors in your code to not encounter them at runtime. In MS access the compiler produce something that is called p-code and which is a combination of compiled and interpreted code.
As a rule of thumb:
always use option explicit statement (configure your compiler for this)
always declare your variables at one place, on top of your module or sub/function, and avoid doing it in the middle of your code, for the sake of clarity only. This doesn't affect the performance in any way.
avoid using variant data type
Worth a read doc:
Understanding the Lifetime of Variables (official mSDN), Visual/Access Basic Is Both a Compiler and an Interpreter (official MS) and Declaring variables. You might also find interesting this answer I recently gave about the vba garbage collector

Related

What are the risks of declaring a variable in the middle of the code?

I usually see in almost all of VBA codes all variables are declared after e.g. Sub/Function name line
I know and I used variable declaration in the middle of some of my codes (Not inside a loop) and saw no problems.
I usually avoided that because I see most of VBA example codes have them declared right after the first line. I just want to know what are the risks from an expert/experienced VB programmer point of view.
There are no risks of declaring it in the middle.
The effect of declaring a variable in the middle is that it can only be used after that point and not before (which is scope).
The lifetime of the variable is different: the variable is created (allocated and initialized to its respective flavour of zero) when you enter the procedure, but you may not actually use it until you reach its scope (the point in the procedure where it's declared).
Declaring inside or outside a loop does not make a difference in VB6/A as they do not have block scope, unlike VB.NET.
So there is no performance difference between the two approaches (because all variables are created when you enter the procedure), but there is a difference in usage (you may not use a created variable before its declaration line). If you think that distinction is helpful in making sure you are not using a variable wrongly, declare your variables only where needed. Otherwise you are free to pick any of the two approaches, just apply it consistently (it's probably not a good idea to declare most of the variables in the beginning and then some in the middle).
Declare your variables, when you actually need them. When you have all declarations lumped at the top of the procedure, refactoring becomes much harder. And when you want to double check your declaration as you read your code (or, perhaps, someone else), searching it at the top may be again quite inconvenient, unless you procedure is short.
I would try to declare variables in a location that conveys useful information to the next programmer, over and above being functionally correct. This normally means: follow the scoping rules of the language.
By declaring all variables at the top you are making them available (in scope) for the entire procedure. That increases the work for a reader in the future, trying to understand how they will be used. Better to have them as local as possible.
I would not declare them in a loop since that actually would not have significance in VB6/VBA - but someone else might find confusing or misleading, or worst case it may cause subtle bugs.
Of course remember that this is not the only coding practice that we should be mindful of - if the procedure is so long that the location of the variable declarations is a big problem, that's a really good sign that the procedure should be broken up into smaller discrete logical blocks. The variable declarations would just be a symptom, not the main cause.
IMO there were many bad programming practices back in the 90s and earlier when VBA/VB6 were invented, but the industry has significantly learned & improved since then. So code from that era (or inspired by it) is often not a good example.
Declaring your variables up front, at the top of your sub/function makes it easy for others (and perhaps for you if you come by the code after, say a month) to read and understand what your code needs to calculate, and what placeholders/variables are required for the code to function.
You can of course declare variables anywhere (as long as you remember not to use a variable unless you have actually declared it first). That can work, and it has no effect whatsoever on the performance of your code (unless your logic includes an early Exit Sub or Exit Function. In this case, there will be a difference in performance depending on if your code does actually allocate memory for the variables or not).
It just isn't good practice to declare some variables at the top then do some work, then declare another set of variables mid-code. There are exceptions of course. When the variable you declared mid-code is for a temporary use, or something like that.
Sub CalculateAge()
Dim BirthYear As Integer
Dim CurrentYear As Integer
'Code to fetch current year
'Code to get BirthYear from user/or document
'Code to report result
End Sub
Compare that with the following:
Sub CalculateAge2()
Dim BirthYear As Integer
'Code to ask the user or fetch the birth year from the document
Dim CurrentYear As Integer
'Code to populate currentYear
'Code to do the calculation and report result
End Sub
In the first example, there is a clear separation from variables and logic. In the second, everything is mixed.
The first example is a lot easier to read and understand, especially if you use a good naming convention.
If you look at how classes are written or defined, you will see properties usually are first declared, then methods/logic below. This is the common practice used to write code.
PS: In other languages, you can declare and assign variables in the same line. in C# or VB.Net you could say something like:
int Age = CurrentYear - BirthYear; //C#
Dim Age As Integer = CurrentYear - BirthYear 'VB.Net
This is great if you use a lot of temporary variables, that you don't intend to declare ahead of time or maybe it would be more clear if declared mid-logic. But that's not possible in VBA. You need a separate line to declare a variable, and another to assign a value. You end up with a lot of Dim ___ As ___ statements. You might as well move the declaration part somewhere else to reduce distraction while reading the logic. Again, this works best if you use a good and consistent naming convention. If not, you end up in a worse situation like:
Dim w As Integer
Dim a As Integer
a = 42 'we don't know what this variable is for
'but we know its type from the previous line
Some_Lines_Of_code_And_Logic
' more code
' more code
w = 2 'we don't know what (w) is for, and we have to
'look up its declaration to get a hint
'which might be tedious

Variable declaration placement guidelines in VBScript

Is there any rule for placement of variable declaration in VBScript, like if it should always be declared in the beginning? Or can I declare the variable while using it? Which one is more efficient?
Let's try with a simple code, with Option Explicit included so VBScript parser requests that all the variables used in the code are declared
Option Explicit
WScript.Echo TypeName( data )
WScript.Echo TypeName( MY_DATA )
Dim data : data = 10
Const MY_DATA = 10
WScript.Echo TypeName( data )
WScript.Echo TypeName( MY_DATA )
When executed it will ouptut
Empty
Integer
Integer
Integer
That is
The first access to data does not generate any error. Variable declaration (the Dim statement) is hoisted. If the variable is declared inside the same (or outer) scope where it will be used then there will not be any problem.
But the first output is Empty. Only the declaration is hoisted, not the value assignment that is not executed until the line containing it is reached.
That does not apply to constant declaration. Its value is replaced in code where it is used but the real declaration is delayed until the const line is reached (read here).
As long as the variables/constants can be reached (they are declared in the same or outer scope) it is irrelevant (to the VBScript parser/engine) where you place the declaration.
But, of course, you or others will have to maintain the code. Being able to put the variables anywhere doesn't mean you should do something like the previous code (please, don't). It is a lot easier to read/maintain the code if variable declaration is done before initialization/usage. The exact way of doing it just depends on coding style.

Why should I use the DIM statement in VBA or Excel?

So there is a question on what DIM is, but I can't find why I want to use it.
As far as I can tell, I see no difference between these three sets of code:
'Example 1
myVal = 2
'Example 2
DIM myVal as Integer
myVal = 2
'Example 3
DIM myVal = 2
If I omit DIM the code still runs, and after 2 or 3 nested loops I see no difference in the output when they are omitted. Having come from Python, I like to keep my code clean*.
So why should I need to declare variables with DIM? Apart from stylistic concerns, is there a technical reason to use DIM?
* also I'm lazy and out of the habit of declaring variables.
Any variable used without declaration is of type Variant. While variants can be useful in some circumstances, they should be avoided when not required, because they:
Are slower
Use more memory
Are more error prone, either through miss spelling or through assigning a value of the wrong data type
Using Dim makes the intentions of your code explicit and prevents common mistakes like a typo actually declaring a new variable. If you use Option Explicit On with your code (which I thoroughly recommend) Dim becomes mandatory.
Here's an example of failing to use Dim causing a (potentially bad) problem:
myVar = 100
' later on...
myVal = 10 'accidentally declare new variable instead of assign to myVar
Debug.Print myVar 'prints 100 when you were expecting 10
Whereas this code will save you from that mistake:
Option Explicit
Dim myVar as Integer
myVar = 100
' later on...
myVal = 10 ' error: Option Explicit means you *must* use Dim
More about Dim and Option Explicit here: http://msdn.microsoft.com/en-us/library/y9341s4f.aspx
Moderators, I'm making an effort, assuming you'll treat me with due respect in thefuture.
All local variables are stored on the stack as with all languages (and most parameters to functions). When a sub exits the stack is returned to how it was before the sub executed. So all memory is freed. Strings and objects are stored elsewhere in a object manager or string manager and the stack contains a pointer but vb looks after freeing it. Seting a vbstring (a bstr) to zero length frees all but two bytes. That's why we try to avoid global variables.
In scripting type programs, typeless programming has many advantages. Programs are short and use few variables so memory and speed don't matter - it will be fast enough. As programs get more complex it does matter. VB was designed for typeless programming as well as typed programming. For most excel macros, typeless programming is fine and is more readable. Vbscript only supports typeless programming (and you can paste it into vba/vb6).

Difference between Long and Object data type in VBA

In VBA, the Long and Object data type are both 4-bytes, which is the size of a memory address. Does this mean that, technically, the Object data type doesn't do anything that a Long couldn't do? If yes, then is it safe to say that the Object data type exists simply to make it easier for the programmer to distinguish between the purpose of the variable?
This question came up as I was considering Win32 API function declarations. They are often times declared as Long, and, unless I am mistaken, their return value is simply a memory address. Seems like defining these functions as Object would have been more appropriate, then.
Am I totally off? Thanks in advance.
Based on VBA/MSDN help:
Long (long integer) variables are stored as signed 32-bit (4-byte)
numbers ranging in value from -2,147,483,648 to 2,147,483,647.
and the other definition:
Object variables are stored as 32-bit (4-byte) addresses that refer to
objects. Using the Set statement, a variable declared as an Object can
have any object reference assigned to it.
From practical point of view they are both different and used in different situation. Which are essential: Long >> refers to numbers and Object >> refers to object.
Look into the following VBA code (for Excel) where I added comments which is allowed and which is not:
Sub test_variables()
Dim A As Object
Dim B As Long
'both below are not allowed, throwing exceptions
'A = 1000
'Set B = ActiveSheet
'both are appropriate
Set A = ActiveSheet
B = 1000
End Sub
Finally, in terms of API it's better to stay with original declaration and not manipulate with that to avoid any risk on unexpected behaviour of API functions.

Why do we declare variables at the start of a module?

As my first language and as completely taught from other's example I never questioned the standard practice in VBA of grouping all variable declarations at the start of the module, routine or function they are scoped to as in this example.
Sub Traditional()
Dim bVariable as Boolean
Dim OtherVariable
' Some code using OtherVariable goes here
'
' Now we use bVariable
bVariable = True
Do While bVariable
bVariable = SomeFunction()
Loop
End Sub
Now I'm learning that standard practice in other languages is to declare variables as close to where they are used as possible, like this:
Sub Defensive()
Dim OtherVariable as String
' Some code using OtherVariable goes here
'
' Now we use bVariable
Dim bVariable as Boolean
bVariable = True
Do While bVariable
bVariable = SomeFunction()
Loop
End Sub
This seems completely sensible to me as a defensive programming practice - in that it limits both span and live time (as explained in Code Complete), so I'm wondering if there is any reason for not doing the same in VBA? Possible reasons I can think of are memory, running time (e.g. repeatedly declaring inside a loop), tradition - arguably a good reason as there must be hundreds of thousands of VBA programmers who expect to see all used variables at the start of the routine. Are there any I've missed that might explain the benefit of this practice or at least where it came from?
I declare all my variables at the top. I think declaring them closer to first use masks (at least) two other problems that should be fixed first.
Procedures too long: If you're procedure is more than fits on a screen, perhaps it's doing too much and should be broken into smaller chunks. You'll also find that unit tests are way easier to write when your procedures are small and only do one thing.
Too many variables: If you have a bunch of related variables, consider using a custom class module or user-defined type. It will make the code more readable and easier to maintain.
If your procedures are short and you're using classes and UDTs, the benefits of declaring the variables at the point of use are lessened or eliminated.
I think both way are just different coding style in VBA
In old C Standard, all Declaration must be on the top, I think many people just adopt this habit and bring it into other PL such as VBA.
Declaring variable on the top is clear for short list of variable names. It will be unreadable for a long list of variable name
Declaring variable close to where it's being used is introduced later. I think this practice has a clear advantage over "declare on the top" for PLs that has optimizer or more scope than VBA. (Like you can declare variables where the scope is visible in a FOR loop only) Then the optimizer will change the scope for you. (In VBA words, it may change a GLOBAL variable to a PROCEDURE variable)
For VBA, no perference