In trying to solve this question, I wrote the following in an attempt to implement the Box-Muller transform to generate random normal variables in pure VBA:
Function RandNorm(Optional mean As Double = 0, Optional sd As Double = 1) As Double
Dim s As Double
s = Sqr(-2 * Log(Rnd())) * Cos(6.283185307 * Rnd()) '6.28 etc. is 2*pi
RandNorm = mean + sd * s
End Function
The following somewhat weak test always works, returning a number close to 0:
Sub test1()
Randomize
Dim s As Double
Dim i As Long
For i = 1 To 17000000
s = s + RandNorm()
Next i
Debug.Print s / 17000000
End Sub
On the other hand, the following test never works (because it tries to take the log of 0, which is undefined):
Sub test2()
Randomize
Dim s As Double
Dim i As Long
Debug.Print Rnd() 'just to clock it
For i = 1 To 17000000
s = s + RandNorm()
Next i
Debug.Print s / 17000000
End Sub
The problem is that rnd() returns 0 on average once out of every 2^24 (a bit less than 17,000,000) calls. It is of course easy enough to tweak the definition of RandNorm to avoid the zero (see the linked-to question), but I am still puzzled by the above code. It would make perfect sense to me if each test failed half the time (when the zero is fed into Log()) and worked half the time (when the zero is fed into Cos()). It seems that Randomize avoids at least half of the possible seeds.
Why does Randomize behave this way? Is there a way to seed the random number generator so that all possible states of the random number generator can occur?
On Edit
If I define the following sub:
Sub ReRandomize()
Dim r As Double
Randomize
If Rnd() > 0.5 Then r = Rnd()
End Sub
And modify test1 and test2 above to use ReRandomize instead of Randomize, both of the test subs will fail 50% of the time, so that might answer the part of the question about if there is "a way to seed the random number generator so that all possible states of the random number generator can occur?" It is still mysterious as to why Randomize behaves the way that it does. This is the second time that an Excel VBA question made me realize that Randomize is a weird sub. None of this matters very much for typical use of rnd(), but it does underscore that it is a somewhat low quality random number generator which shouldn't be used for serious statistical work.
I simply modified the Rnd calc to not include 0 or 1. You have to remember that the Rnd Function can produce a number (of type double) in the range of 0 or 1. Therefore, it's chances of having a duplicate number are pretty low.
dbl1stRnd = Rnd()
dblRnd = (0.9999 - 0.0001) * dbl1stRnd + 0.0001
s = Sqr(-2 * Log(dblRnd)) * Cos(6.283185307 * dblRnd) '6.28 etc. is 2*pi
Some example outputs of the regular Rnd() function with Randomize:
3.633606E-02
0.2324036
0.3460443
0.5870923
5.553758E-02
0.2629338
0.2400494
0.1982901
0.5923058
0.7915452
0.4874671
0.2062811
0.5676001
0.1178594
1.932621E-03
0.4326598
0.8291379
I hope this explains some and is what you are looking for.
Related
I have trouble comparing 2 double in Excel VBA
suppose that I have the following code
Dim a as double
Dim b as double
a = 0.15
b = 0.01
After a few manipulations on b, b is now equal to 0.6
however the imprecision related to the double data type gives me headache because
if a = b then
//this will never trigger
end if
Do you know how I can remove the trailing imprecision on the double type?
You can't compare floating point values for equality. See this article on "Comparing floating point numbers" for a discussion of how to handle the intrinsic error.
It isn't as simple as comparing to a constant error margin unless you know for sure what the absolute range of the floats is beforehand.
if you are going to do this....
Dim a as double
Dim b as double
a = 0.15
b = 0.01
you need to add the round function in your IF statement like this...
If Round(a,2) = Round(b,2) Then
//code inside block will now trigger.
End If
See also here for additional Microsoft reference.
It is never wise to compare doubles on equality.
Some decimal values map to several floating point representations. So one 0.6 is not always equal to the other 0.6.
If we subtract one from the other, we probably get something like 0.00000000051.
We can now define equality as having a difference smaller that a certain error margin.
Here is a simple function I wrote:
Function dblCheckTheSame(number1 As Double, number2 As Double, Optional Digits As Integer = 12) As Boolean
If (number1 - number2) ^ 2 < (10 ^ -Digits) ^ 2 Then
dblCheckTheSame = True
Else
dblCheckTheSame = False
End If
End Function
Call it with:
MsgBox dblCheckTheSame(1.2345, 1.23456789)
MsgBox dblCheckTheSame(1.2345, 1.23456789, 4)
MsgBox dblCheckTheSame(1.2345678900001, 1.2345678900002)
MsgBox dblCheckTheSame(1.2345678900001, 1.2345678900002, 14)
As has been pointed out, many decimal numbers cannot be represented precisely as traditional floating-point types. Depending on the nature of your problem space, you may be better off using the Decimal VBA type which can represent decimal numbers (base 10) with perfect precision up to a certain decimal point. This is often done for representing money for example where 2-digit decimal precision is often desired.
Dim a as Decimal
Dim b as Decimal
a = 0.15
b = 0.01
Late answer but I'm surprised a solution hasn't been posted that addresses the concerns outlined in the article linked in the (currently) accepted answer, namely that:
Rounding checks equality with absolute tolerance (e.g. 0.0001 units if rounded to 4d.p.) which is rubbish when comparing different values on multiple orders of magnitude (so not just comparing to 0)
Relative tolerance that scales with one of the numbers being compared meanwhile is not mentioned in the current answers, but performs well on non-zero comparisons (however will be bad at comparing to zero as the scaling blows up around then).
To solve this, I've taken inspiration from Python: PEP 485 -- A Function for testing approximate equality to implement the following (in a standard module):
Code
'#NoIndent: Don't want to lose our description annotations
'#Folder("Tests.Utils")
Option Explicit
Option Private Module
'Based on Python's math.isclose https://github.com/python/cpython/blob/17f94e28882e1e2b331ace93f42e8615383dee59/Modules/mathmodule.c#L2962-L3003
'math.isclose -> boolean
' a: double
' b: double
' relTol: double = 1e-09
' maximum difference for being considered "close", relative to the
' magnitude of the input values
' absTol: double = 0.0
' maximum difference for being considered "close", regardless of the
' magnitude of the input values
'Determine whether two floating point numbers are close in value.
'Return True if a is close in value to b, and False otherwise.
'For the values to be considered close, the difference between them
'must be smaller than at least one of the tolerances.
'-inf, inf and NaN behave similarly to the IEEE 754 Standard. That
'is, NaN is not close to anything, even itself. inf and -inf are
'only close to themselves.
'#Description("Determine whether two floating point numbers are close in value, accounting for special values in IEEE 754")
Public Function IsClose(ByVal a As Double, ByVal b As Double, _
Optional ByVal relTol As Double = 0.000000001, _
Optional ByVal absTol As Double = 0 _
) As Boolean
If relTol < 0# Or absTol < 0# Then
Err.Raise 5, Description:="tolerances must be non-negative"
ElseIf a = b Then
'Short circuit exact equality -- needed to catch two infinities of
' the same sign. And perhaps speeds things up a bit sometimes.
IsClose = True
ElseIf IsInfinity(a) Or IsInfinity(b) Then
'This catches the case of two infinities of opposite sign, or
' one infinity and one finite number. Two infinities of opposite
' sign would otherwise have an infinite relative tolerance.
'Two infinities of the same sign are caught by the equality check
' above.
IsClose = False
Else
'Now do the regular computation on finite arguments. Here an
' infinite tolerance will always result in the function returning True,
' since an infinite difference will be <= to the infinite tolerance.
'This is to supress overflow errors as we deal with infinity.
'NaN has already been filtered out in the equality checks earlier.
On Error Resume Next
Dim diff As Double: diff = Abs(b - a)
If diff <= absTol Then
IsClose = True
ElseIf diff <= CDbl(Abs(relTol * b)) Then
IsClose = True
ElseIf diff <= CDbl(Abs(relTol * a)) Then
IsClose = True
End If
On Error GoTo 0
End If
End Function
'#Description "Checks if Number is IEEE754 +/- inf, won't raise an error"
Public IsInfinity(ByVal Number As Double) As Boolean
On Error Resume Next 'in case of NaN
IsInfinity = Abs(Number) = PosInf
On Error GoTo 0
End Function
'#Description "IEEE754 -inf"
Public Property Get NegInf() As Double
On Error Resume Next
NegInf = -1 / 0
On Error GoTo 0
End Property
'#Description "IEEE754 +inf"
Public Property Get PosInf() As Double
On Error Resume Next
PosInf = 1 / 0
On Error GoTo 0
End Property
'#Description "IEEE754 signaling NaN (sNaN)"
Public Property Get NaN() As Double
On Error Resume Next
NaN = 0 / 0
On Error GoTo 0
End Property
'#Description "IEEE754 quiet NaN (qNaN)"
Public Property Get QNaN() As Double
QNaN = -NaN
End Property
Updated to incorporate great feedback from Cristian Buse
Examples
The IsClose function can be used to check for absolute difference:
assert(IsClose(0, 0.0001233, absTol:= 0.001)) 'same to 3 d.p.?
... or relative difference:
assert(IsClose(1234.5, 1234.6, relTol:= 0.0001)) '0.01% relative difference?
... but generally you specify both and if either tolerance is met then the numbers are considered close. It has special handling of +-infinity which are only close to themselves, and NaN which is close to nothing (see the PEP for full justification, or my Code Review post where I'd love feedback on this code :)
The Currency data type may be a good alternative. It handles relatively large numbers with fixed four digit precision.
Work-a-round??
Not sure if this will answer all scenarios, but I ran into a problem comparing rounded double values in VBA. When I compared to numbers that appeared to be identical after rounding, VBA would trigger false in an if-then compare statement.
My fix was to run two conversions, first double to string, then string to double, and then do the compare.
Simulated Example
I did not record the exact numbers that caused the error mentioned in this post, and the amounts in my example do not trigger the problem currently and are intended to represent the type of issue.
Sub Test_Rounded_Numbers()
Dim Num1 As Double
Dim Num2 As Double
Let Num1 = 123.123456789
Let Num2 = 123.123467891
Let Num1 = Round(Num1, 4) '123.1235
Let Num2 = Round(Num2, 4) '123.1235
If Num1 = Num2 Then
MsgBox "Correct Match, " & Num1 & " does equal " & Num2
Else
MsgBox "Inccorrect Match, " & Num1 & " does not equal " & Num2
End If
'Here it would say that "Inccorrect Match, 123.1235 does not equal 123.1235."
End Sub
Sub Fixed_Double_Value_Type_Compare_Issue()
Dim Num1 As Double
Dim Num2 As Double
Let Num1 = 123.123456789
Let Num2 = 123.123467891
Let Num1 = Round(Num1, 4) '123.1235
Let Num2 = Round(Num2, 4) '123.1235
'Add CDbl(CStr(Double_Value))
'By doing this step the numbers
'would trigger if they matched
'100% of the time
If CDbl(CStr(Num1)) = CDbl(CStr(Num2)) Then
MsgBox "Correct Match"
Else
MsgBox "Inccorrect Match"
End If
'Now it says Here it would say that "Correct Match, 123.1235 does equal 123.1235."
End Sub
Depending on your situation and your data, and if you're happy with the level of precision shown by default, you can try comparing the string conversions of the numbers as a very simple coding solution:
if cstr(a) = cstr(b)
This will include as much precision as would be displayed by default, which is generally sufficient to consider the numbers equal.
This would be inefficient for very large data sets, but for me was useful when reconciling imported data which was identical but was not matching after storing the data in VBA Arrays.
Try to use Single values if possible.
Conversion to Double values generates random errors.
Public Sub Test()
Dim D01 As Double
Dim D02 As Double
Dim S01 As Single
Dim S02 As Single
S01 = 45.678 / 12
S02 = 45.678
D01 = S01
D02 = S02
Debug.Print S01 * 12
Debug.Print S02
Debug.Print D01 * 12
Debug.Print D02
End Sub
45,678
45,678
45,67799949646
45,6780014038086
In response to this question I ran the following VBA experiment:
Sub Test()
Dim i As Long, A As Variant
Dim count1 As Long, count2 As Long
ReDim A(1 To 10000)
For i = 1 To 10000
Randomize
A(i) = IIf(Rnd() < 0.5, 0, 1)
Next i
'count how often A(i) = A(i+1)
For i = 1 To 9999
If A(i) = A(i + 1) Then count1 = count1 + 1
Next i
For i = 1 To 10000
A(i) = IIf(Rnd() < 0.5, 0, 1)
Next i
'count how often A(i) = A(i+1)
For i = 1 To 9999
If A(i) = A(i + 1) Then count2 = count2 + 1
Next i
Debug.Print "First Loop: " & count1
Debug.Print "Second Loop: " & count2 & vbCrLf
End Sub
When I saw output like this:
First Loop: 5550
Second Loop: 4976
I was pretty sure that I knew what was happening: VBA was converting the system clock into something of lower resolution (perhaps microsecond) which as a consequence would lead to Randomize sometimes producing identical seeds in two or more passes through the loop. In my original answer I even confidently asserted this. But then I ran the code some more and noticed that the output was sometimes like this:
First Loop: 4449
Second Loop: 5042
The overseeding is still causing a noticeable autocorrelation -- but in the opposite (and unexpected) direction. Successive passes through the loop with the same seed should produce identical outputs, hence we should see successive values agreeing more often than chance would predict, not disagreeing more often than chance would predict.
Curious now, I modified the code to:
Sub Test2()
Dim i As Long, A As Variant
Dim count1 As Long, count2 As Long
ReDim A(1 To 10000)
For i = 1 To 10000
Randomize
A(i) = Rnd()
Next i
'count how often A(i) = A(i+1)
For i = 1 To 9999
If A(i) = A(i + 1) Then count1 = count1 + 1
Next i
For i = 1 To 10000
A(i) = Rnd()
Next i
'count how often A(i) = A(i+1)
For i = 1 To 9999
If A(i) = A(i + 1) Then count2 = count2 + 1
Next i
Debug.Print "First Loop: " & count1
Debug.Print "Second Loop: " & count2 & vbCrLf
End Sub
Which always gives the following output:
First Loop: 0
Second Loop: 0
It seems that it isn't the case that successive calls to Randomize sometimes returns the same seed (at least not often enough to make a difference).
But if that isn't the source of the autocorrelation -- what is? And -- why does it sometimes manifest itself as a negative rather than a positive autocorrelation?
Partial answer only, fell free to edit and complete.
Well, there is clearly a correlation when you overuse the Randomize function.
I tried the following code, with a conditional formatting (black fill for values >0.5), and there is clearly patterns appearing (try to comment the Randomize to see a more "random" pattern. (best seen with 20 pt columns and 10% zoom)
Function Rndmap()
Dim i As Long, j As Long
Dim bmp(1 To 512, 1 To 512) As Long
For i = 1 To 512
For j = 1 To 512
' Rnd -1 ' uncomment this line to get a big white and black lines pattern.
Randomize 'comment this line to have a random pattern
bmp(i, j) = IIf(Rnd() < 0.5, 0, 1)
Next j
Next i
Range(Cells(1, 1), Cells(512, 512)) = bmp
End Function
So while the MSDN states that "Using Randomize with the same value for number does not repeat the previous sequence.", implying that if the Timer returns twice the same value, the Rnd should keep on the same random sequence without reseting, there is still some behind the scene link..
Some screenshots:
Rnd() only:
Using Randomize:
Using Rnd -1 and Randomize:
The Randomize method initialises the Rnd function with the current system time as it's seed, you can also specify a number with Randomize to be used as the seed.
I decided to test how long a sequence continues before repeating itself:
Sub randomRepeatTest()
For i = 1 To 100000
Randomize
randomThread = randomThread & Int(9 * Rnd + 1)
If i Mod 2 = 0 Then
If Left(randomThread, i / 2) = Right(randomThread, i / 2) Then
Debug.Print i / 2
Exit Sub
End If
End If
Next i
End Sub
This sub generates a random sequence of the digits 0 - 9, and as the sequence becomes an even length it is tested to see if the first half of the sequence matches the second half, and if so it outputs the length the sequence reached before repeating. After running it a number of times, and discounting where a digit is repeated twice at the beginning, the result comes out at 256 (nice).
Providing any value to Randomize will still return a result of 256.
We're randomizing Rnd every loop, so what's going on here?
Well as I said at the beginning, if no value is given to Randomize, it will use the system time as the seed. The resolution of this time is something I can't seem to find sourced, however I believe it to be low.
I have tested using the value of timer which returns the time of day in seconds to 2 decimal places (e.g. 60287.81). I have also tried GetTickCount which returns the system active time (starts counting at boot) in milliseconds. Both of these still result in the 256 sequence limit.
So, why when we're randomizing every loop does the sequence repeat? Well the reality is, the code is executed within a millisecond. Essentially, we're providing the same number to randomize every loop, and so we're not actually shuffling the seed.
So, is Rnd more random without Randomize?
I ran the above sub again without Randomize; nothing returned. I upped the loop count to 2,000,000; still nothing.
I've managed to source the algorithm used by the workbook Rand formula, which I believe is the same as Rnd with no initialised seed:
C IX, IY, IZ SHOULD BE SET TO INTEGER VALUES BETWEEN 1 AND 30000 BEFORE FIRST ENTRY
IX = MOD(171 * IX, 30269)
IY = MOD(172 * IY, 30307)
IZ = MOD(170 * IZ, 30323)
RANDOM = AMOD(FLOAT(IX) / 30269.0 + FLOAT(IY) / 30307.0 + FLOAT(IZ) / 30323.0, 1.0)
It is an iterative function which uses the result of the previous call to generate a new number. Referenced as the Wichman-Hill procedure, it guarantees that more than 10^13 numbers will be generated before the sequence repeats itself.
The problem with Rnd
For the algorithm to work, it first needs to be initialised with values for IX, IY & IZ. The problem we have here is that we can't initialise the algorithm with random variables, as it is this algorithm we need in order to get random values, so the only option is to provide some static values to get it going.
I have tested this and it seems to be the case. Opening a fresh instance of Excel, ? Rnd() returns 0.70554. Doing the same again returns the exact same number.
So the problem we have is Rnd without using Randomize gives us a much longer sequence of random numbers, however that sequence will start at the same place each time we open Excel. Where functions are dependant on random generation, such as password generation, this doesn't suffice as we will get the same repeated results each time we open Excel.
The solution
Here's a function I have come up with and it seems to work well:
Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal Milliseconds As LongPtr)
Public Declare Function GetTickCount Lib "kernel32" () As Long
Public randomCount As Long
Function getRandom()
If randomCount Mod 255 = 0 Then
Sleep 1
End If
Randomize GetTickCount
getRandom = Rnd()
randomCount = randomCount + 1
End Function
It makes use of the GetTickCount function as the Randomize seed. Each call adds 1 to a randomCount variable, and after every 255 runs the macro is forced to sleep for 1 millisecond (although this actually works out at around 15 on my system) so that the seed of GetTickCount will be changed, and so a new sequence of numbers will be returned by Rnd
This of course will return the same sequence if by chance it is used at the same system time, however for most cases it will be a sufficient method for generating more random numbers. If not, it would need some fancy work using something like the Random.Org API.
I'll try and keep this as basic and to the point as possible.
Basically, I have weights/probabilities associated with a certain range of numbers. For example :
0: 10%
1: 50%
2: 15%
3: 25%
This then translates into cumulative probabilities :
0: 10%
1: 60%
2: 75%
3: 100%
Using a uniform RNG in VBA, the program generates numbers between 0 and 1, or whatever inferior limit it is set to. Using the same values as the previous example, but only generating numbers greater than 60% (and <= 100%), this results in numbers between 0.6 - 1.0.
This is where I'm stuck. I need to convert these random numbers very efficiently into their "corresponding values".
All of it is stored in VBA variables and needless to say, I don't want to have to write a Select Case for every situation since they're actually 120 different variables and weights.
As of right now, this is what I have to generate those numbers:
RandomNumber = LowerLimit + Rnd() * (1 - LowerLimit)
Thanks is advance for all your help! If I missed a post that was discussing this particular issue please feel free to refer me to it but I really didn't find anything relating to corresponding random numbers.
Place the following function into a public module. You would call it like so mynumber = WeightedRnd(Array(0, 1, 2, 3), Array(0.1, 0.5, 0.15, 0.25)).
Public Function WeightedRnd(values As Variant, weights As Variant) As Double
'First, calculate the cumulative weights
Dim cumulativeWeight As Double
For i = 0 To UBound(weights)
weights(i) = weights(i) + cumulativeWeight
cumulativeWeight = weights(i)
Next
'Next, generate our random number
Dim randomNumber As Double
randomNumber = Rnd()
'Finally, figure out which "bucket" it falls into
For i = 0 To UBound(weights)
If randomNumber <= weights(i) Then
WeightedRnd = values(i)
Exit Function
End If
Next
End Function
I have a Monte Carlo simulation in VBA. The client wants (dont question why) to fix random number sequence, i.e. every time you run the model, sequence shall stay the same. I managed to fix random seed as described here. BUT it is not the same on different PCs. Any idea why and how can I also fix it on different machines?
You can use the rnd function with a negative argument to achieve a repeating list of random numbers.
Here is a link to the documentation:
http://office.microsoft.com/en-us/access-help/rnd-function-HA001228901.aspx
Note To repeat sequences of random numbers, call Rnd with a negative argument immediately before using Randomize with a numeric argument. Using Randomize with the same value for number does not repeat the previous sequence.
Sub TestRandomNumberSequence()
rnd (-10)
For i = 1 To 5
Randomize 10
MsgBox BetweenRange(1, 20, rnd)
Next i
'always returns the following sequence
'5
'18
'19
'6
'17
End Sub
Function BetweenRange(min As Integer, max As Integer, ByVal rnd As Double) As Integer
BetweenRange = Int((max - min + 1) * rnd + min)
End Function
As per your request, please checkout the following link:
Wabash College Download
Is there an easy way in either language to generate a large set of random data quickly so far all the functions I've tried haven't worked too well when I need to generate a group of say 500,000 characters :( Any ideas?
Use UUIDGen.
Don't. GUIDs aren't really random. You can actually generate large amounts of data very fast using the System.Random class in VB.NET. 500,000 characters/bytes are no problem:
Dim buffer As Byte() = Nothing
Array.Resize(buffer, 500000)
Call New Random().NextBytes(buffer)
My.Computer.FileSystem.WriteAllBytes("filename", buffer, False)
This code takes considerably less than one second.
In VB6 the code would go something like this
Public Function FillRandomCol() as Collection
Dim C As Collection
Dim I As Long
Set C = New Collection
Randomize Timer
For I = 1 To 500000
C.Add RandomChar
Next I
Set FillRandomCol = C
End Sub
Public Function Random(ByVal Number As Integer) As Integer
Random = CLng(Rnd * 1000000) Mod Number + 1
End Function
Public Function RandomChar() As String
Const AlphaNum = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
RandomChar = Mid$(AlphaNum, Random(36), 1)
End Function
Takes 1/2 second on a 2 Core Intel 2.40 GHz computer.
Use UUIDGen. At least the chunks will be bigger.