VBA - Run WorksheetFunction on [Range derived] Variant array? - vba

I have a need to run successive passes of built in excel functions on a single matrix of input.
The problem is, the input [range] is what I assume, a pointer constant.
So sure, I can do a WorkSheetFunction calculations on the [range] input and place the output into a variant.
But, I do have a need to run more passes on the variant data. I have a more advanced calculation that is going to run 4 transforms on data that use standard excel functions like average, and median.
Here's my code
Public Function RankECDF(ByRef r_values As Range, Optional ByVal zeroFlag As Boolean = 0) As Variant()
Dim i As Integer, j As Integer, N As Integer, M As Integer
Dim total As Integer
Dim y() As Variant
N = r_values.Rows.Count
M = r_values.Columns.Count
y = r_values.Value 'copy values from sheet into an array
Dim V() As Variant
Dim AltV As Variant
Dim OutV As Variant
Dim OutAltV As Variant
'quite possible to makes the Variant larger to hold the "other arrays"
ReDim V(1 To N, 1 To M)
ReDim AltV(1 To N, 1 To M)
ReDim OutV(1 To N, 1 To M)
ReDim OutAltV(1 To N, 1 To M)
'first pass just checks for zero's. Could speed this process up by implementing the zeroFlag check to skip the double loop
total = WorksheetFunction.Sum(r_values)
For R = 1 To N
For C = 1 To M
If y(R, C) = "" Then
V(R, C) = ""
AltV(R, C) = 0
Else
'would error if cell was ""
'V(R, C) = WorksheetFunction.Average(WorksheetFunction.Rank(y(R, C), r_values, 1), WorksheetFunction.CountIf(r_values, "<=" & y(R, C))) / WorksheetFunction.Count(r_values)
V(R, C) = y(R, C)
AltV(R, C) = y(R, C)
End If
Next C
Next R
'second loop does rankecdf conversions
For RA = 1 To N
For CA = 1 To M
'OutV(RA, CA) = 1
'OutV(RA, CA) = WorksheetFunction.Rank(V(RA, CA), V, 1)
'OutAltV(RA, CA) = 2
'OutAltV(RA, CA) = WorksheetFunction.Average(WorksheetFunction.Rank(y(RA, CA), r_values, 1), WorksheetFunction.CountIf(r_values, "<=" & y(RA, CA))) / WorksheetFunction.Count(r_values)
Next CA
Next RA
If (zeroFlag) Then
RankECDF = AltV
'RankECDF = OutAltV(1 to N, 1 to M)
Else
RankECDF = V
'RankECDF = OutV(N, M)
End If
End Function
The problem can be identified right around here:
OutV(RA, CA) = WorksheetFunction.Rank(V(RA, CA), V, 1)

WorksheetFunction.Rank(y(R, C), r_values, 1)
You cannot put an Array on arg1. Just do:
i = y(R, C)
Then:
WorksheetFunction.Rank(i, r_values, 1)
It worked fine for me

Updated from comments as I see the answer I initially posited misread the problem:
As a general rule, arrays and performing calculations purely in memory are faster than you might think. For one example I used to use the Application.Match function to find the index position of a value in an array, rather than simple brute force iteration. Turns out that iteration was a far faster (up to 10x faster!!!) method. Check out Tim's answer to my question about Matching values in a string array.
I suspect it is the same with rank/sorting. Worksheet functions are expensive. For/Next is not, relatively speaking.
As for the specific needs to rank from an array, there are examples of custom functions which rank and sort arrays, collections, dictionaries, etc. I ultimately end up using a bunch of Chip Pearson's Array helper functions, he has a number of them; which do really cool sh!t like reversing an array, sorting array, determining whether an array is allocated (I use this one a lot) or empty, or all numeric, etc. There are about 30 of them.
here is the code to sort an array.
Note: I did not post his code because there is a lot of it. WHile it appears daunting, because it is a lot of code to re-invent the wheel, it does work and saves a lot of trouble and is very useful. I don't even use these in Excel, since I do most of my dev in PowerPoint now -- I think all of these modules ported over with zero or almost zero debugging on my end. They're really buttoned up quite nicely.
Getting the rank
Once the array is "sorted" then determining the rank of any value within it is trivial and only requires some tweaking since you may want to handle ties appropriately. One common way of dealing with ties is to "skip" the next value, so if there is a two-way tie for 2nd place, the rank would go {1, 2, 2, 4, 5, 6, etc.}
Function GetRank(arr As Variant, val As Variant)
'Assumes arr is already sorted ascending and is a one-dimensional array
Dim rank As Long, i As Long
Dim dictRank As Object
Set dictRank = CreateObject("Scripting.Dictionary")
rank = 0
For i = LBound(arr) To UBound(arr)
rank = rank + 1
If dictRank.Exists(arr(i)) Then
'Do nothing, handles ties
Else
'store the Key as your value, and the Value as the rank position:
dictRank(arr(i)) = rank
End If
If arr(i) = val Then Exit For
Next
GetRank = rank
End Function

Related

Fuzzy string matching optimization (not checking certain words) - Excel VBA function

I have a function in Excel that calculates the Levenshtein Distance between two strings (the number of insertions, deletions, and/or substitutions needed to transform one string into another). I am using this as part of a project I'm working on that involves "fuzzy string matching."
Below you will see the code for the LevenshteinDistance function and a valuePhrase function. The latter exists for the purposes of executing the function in my spreadsheet. I have taken this from what I read in this thread.
'Calculate the Levenshtein Distance between two strings (the number of insertions,
'deletions, and substitutions needed to transform the first string into the second)`
Public Function LevenshteinDistance(ByRef S1 As String, ByVal S2 As String) As Long
Dim L1 As Long, L2 As Long, D() As Long 'Length of input strings and distance matrix
Dim i As Long, j As Long, cost As Long 'loop counters and cost of
'substitution for current letter
Dim cI As Long, cD As Long, cS As Long 'cost of next Insertion, Deletion and
Substitution
L1 = Len(S1): L2 = Len(S2)
ReDim D(0 To L1, 0 To L2)
For i = 0 To L1: D(i, 0) = i: Next i
For j = 0 To L2: D(0, j) = j: Next j
For j = 1 To L2
For i = 1 To L1
cost = Abs(StrComp(Mid$(S1, i, 1), Mid$(S2, j, 1), vbTextCompare))
cI = D(i - 1, j) + 1
cD = D(i, j - 1) + 1
cS = D(i - 1, j - 1) + cost
If cI <= cD Then 'Insertion or Substitution
If cI <= cS Then D(i, j) = cI Else D(i, j) = cS
Else 'Deletion or Substitution
If cD <= cS Then D(i, j) = cD Else D(i, j) = cS
End If
Next i
Next j
LevenshteinDistance = D(L1, L2)
End Function
Public Function valuePhrase#(ByRef S1$, ByRef S2$)
valuePhrase = LevenshteinDistance(S1, S2)
End Function
I am executing this valuePhrase function in a table in one of my sheets where the column and row headers are names of insurance companies. Ideally, the smallest number in any given row (the shortest Levenshtein distance) should correspond to a column header with the name of the insurance company in the table that most closely matches the name of that insurance company in the row header.
My problem is that I am trying to calculate this in a case where the strings in question are names of insurance companies. With that in mind, the code above strictly calculates the Levenshtein distance and is not tailored specifically to this case. To illustrate, a simple example of why this can be an issue is because the Levenshtein distance between two insurance company names can be quite small if they both share the words "insurance" and "company" (which, as you might expect, is common), even if the insurance companies have totally different names with respect to their unique words. So, I may want the function to ignore those words when comparing two strings.
I am new to VBA. Is there a way I can implement this fix in the code? As a secondary question, are there other unique issues that could arise from comparing the names of insurance companies? Thank you for the help!
Your whole question can be replaced by "How do I use the replace function in VBA?". In general, the algorithm in the question looked interesting, thus I have done this for you. Simply add anything in the Array() of the function, it will work (Just write in lower case the values in the array):
Public Function removeSpecificWords(s As String) As String
Dim arr As Variant
Dim cnt As Long
arr = Array("insurance", "company", "firma", "firm", "holding")
removeSpecificWords = s
For cnt = LBound(arr) To UBound(arr)
removeSpecificWords = Replace(LCase(removeSpecificWords), LCase(arr(cnt)), vbNullString)
Next cnt
End Function
Public Sub TestMe()
Debug.Print removeSpecificWords("InsHolding")
Debug.Print removeSpecificWords("InsuranceInsHoldingStar")
End Sub
In your case:
S1 = removeSpecificWords(S1)
S2 = removeSpecificWords(S2)
valuePhrase = LevenshteinDistance(S1, S2)
When I had a similar issue in trying to remove duplicate addresses, I approached the problem the other way and used the Longest Common Substring.
Function DetermineLCS(source As String, target As String) As Double
Dim results() As Long
Dim sourceLen As Long
Dim targetLen As Long
Dim counter1 As Long
Dim counter2 As Long
sourceLen = Len(source)
targetLen = Len(target)
ReDim results(0 To sourceLen, 0 To targetLen)
For counter1 = 1 To sourceLen
For counter2 = 1 To targetLen
If Mid$(source, counter1, 1) = Mid$(target, counter2, 1) Then
results(counter1, counter2) = results(counter1 - 1, counter2 - 1) + 1
Else
results(counter1, counter2) = WorksheetFunction.Max(results(counter1, _
counter2 - 1), results(counter1 - 1, counter2))
End If
Next counter2
Next counter1
'return the percentage of the LCS to the length of the source string
DetermineLCS = results(sourceLen, targetLen) / sourceLen
End Function
For addresses, I've found that about an 80% match gets me close to a hundred percent matches. with insurance agency names (and I used to work in the industry, so I know the problem you face), I might suggest a 90% target or even a mix of the Levenshtein Distance and LCS, minimizing the former while maximizing the latter.

VBA: adding random numbers to a grid that arent already in the grid

Sub FWP()
Dim i As Integer
Dim j As Integer
Dim n As Integer
n = Range("A1").Value
For i = 1 To n
For j = 1 To n
If Cells(i + 1, j) = 0 Then
Cells(i + 1, j).Value = Int(((n ^ 2) - 1 + 1) * Rnd + 1)
ElseIf Cells(i + 1, j) <> 0 Then
Cells(i + 1, j).Value = Cells(i + 1, j).Value
End If
Next j
Next i
I am trying to do a part of a homework question that asks to fill in missing spaces in a magic square in VBA. It is set up as a (n x n) matrix with n^2 numbers in; the spaces I need to fill are represented by zeros in the matrix. So far I have some code that goes through checking each individual cell value, and will leave the values alone if not 0, and if the value is 0, it replaces them with a random number between 1 and n^2. The issue is that obviously I'm getting some duplicate values, which isn't allowed, there must be only 1 of each number.
How do I code it so that there will be no duplicate numbers appearing in the grid?
I am attempting to put in a check function to see if they are already in the grid but am not sure how to do it
Thanks
There are a lot of approaches you can take, but #CMArg is right in saying that an array or dictionary is a good way of ensuring that you don't have duplicates.
What you want to avoid is a scenario where each cell takes progressively longer to populate. It isn't a problem for a very small square (e.g. 10x10), but very large squares can get ugly. (If your range is 1-100, and all numbers except 31 are already in the table, it's going to take a long time--100 guesses on average, right?--to pull the one unused number. If the range is 1-40000 (200x200), it will take 40000 guesses to fill the last cell.)
So instead of keeping a list of numbers that have already been used, think about how you can effectively go through and "cross-off" the already used numbers, so that each new cell takes exactly 1 "guess" to populate.
Here's one way you might implement it:
Class: SingleRandoms
Option Explicit
Private mUnusedValues As Scripting.Dictionary
Private mUsedValues As Scripting.Dictionary
Private Sub Class_Initialize()
Set mUnusedValues = New Scripting.Dictionary
Set mUsedValues = New Scripting.Dictionary
End Sub
Public Sub GenerateRange(minimumNumber As Long, maximumNumber As Long)
Dim i As Long
With mUnusedValues
.RemoveAll
For i = minimumNumber To maximumNumber
.Add i, i
Next
End With
End Sub
Public Function GetRandom() As Long
Dim i As Long, keyID As Long
Randomize timer
With mUnusedValues
i = .Count
keyID = Int(Rnd * i)
GetRandom = .Keys(keyID)
.Remove GetRandom
End With
mUsedValues.Add GetRandom, GetRandom
End Function
Public Property Get AvailableValues() As Scripting.Dictionary
Set AvailableValues = mUnusedValues
End Property
Public Property Get UsedValues() As Scripting.Dictionary
Set UsedValues = mUsedValues
End Property
Example of the class in action:
Public Sub getRandoms()
Dim r As SingleRandoms
Set r = New SingleRandoms
With r
.GenerateRange 1, 100
Do Until .AvailableValues.Count = 0
Debug.Print .GetRandom()
Loop
End With
End Sub
Using a collection would actually be more memory efficient and faster than using a dictionary, but the dictionary makes it easier to validate that it's doing what it's supposed to do (since you can use .Exists, etc.).
Nobody is going to do your homework for you. You would only be cheating yourself. Shame on them if they do.
I'm not sure how picky your teacher is, but there are many ways to solve this.
You can put the values of the matrix into an array.
Check if a zero value element exists, if not, break.
Then obtain your potential random number for insertion.
Iterate through the array with a for loop checking each element for this value. If it is not present, replace the zero element.

Lee-Ready tick test using VBA

I am trying to build Lee-Ready tick test for estimating trade direction from tick data using Excel. I have a dataset containing the trade prices in descending order, and I am trying to build a VBA code that is able to loop over all the 4m+ cells in as efficient manner as possible.
The rule for estimating trade direciton goes as follows:
If Pt>Pt-1, then d=1
If Pt<Pt-1, then d=-1
If Pt=Pt-1, then d is the last value taken by d.
So to give a concrete example, I would like to transform this:
P1;P2;P3;P4
1.01;2.02;3.03;4.04
1.00;2.03;3.03;4.02
1.01;2.02;3.01;4.04
1.00;2.03;3.00;4.04
into this
d1;d2;d3;d4
1;-1;1;1
-1;1;1;-1
1;-1;1;0
0;0;0;0
Fairly straightforward nested loops suffice:
Function LeeReady(Prices As Variant) As Variant
'Given a range or 1-based, 2-dimensional variant array
'Returns an array of same size
'consisiting of an array of the same size
'of trade directions computed according to
'Lee-Ready rule
Dim i As Long, j As Long
Dim m As Long, n As Long
Dim priceData As Variant, directions As Variant
Dim current As Variant, previous As Variant
If TypeName(Prices) = "Range" Then
priceData = Prices.Value
Else
priceData = Prices
End If
m = UBound(priceData, 1)
n = UBound(priceData, 2)
ReDim directions(1 To m, 1 To n) As Long 'implicitly fills in bottom row with 0s
For i = m - 1 To 1 Step -1
For j = 1 To n
current = priceData(i, j)
previous = priceData(i + 1, j)
If current > previous Then
directions(i, j) = 1
ElseIf current < previous And previous > 0 Then
directions(i, j) = -1
Else
directions(i, j) = directions(i + 1, j)
End If
Next j
Next i
LeeReady = directions
End Function
This can be called from a sub or used directly on the worksheet:
Here I just highlighted a block of cells of the correct size to hold the output and then used the formula =LeeReady(A2:D5) (pressing Ctrl+Shift+Enter to accept it as an array formula).
On Edit: I modified the code slightly (by adding the clause And previous > 0 to the If statement in the main loop) so that it can now handle ranges in which come of the columns have more rows than other columns. The code assumes that price data is always > 0 and fills in the return array with 0s as place holders in the columns that end earlier than other columns:

Custom sort routine for unique string A being place after another string B, C, D, etc if string A is found within them

Situation
I have a UDF that works with a range that it is passed that is of variable height and 2 columns wide. The first row will contain text in column 1 and an empty column2. The remainder of column 1 will contain unsorted text with an associated value in the same row in column 2. I need to sort the data such that if some text in column 1 also appears in some other text in column.
Problem
My VBA skills are all self taught and mimimal at best. I remember a few decades ago in university we did bubble sorts and played with pointers, but I no longer remember how we achieved any of that. I do well reading code but creating is another story.
Objective
I need to generate a sort procedure that will produce unique text towards the bottom of the list. I'll try wording this another way. If text in column1 can be found within other text in column, that the original text need to be placed below the other text it can be found in along with its associated data in column 2. The text is case sensitive. Its not an ascending or descending sort.
I am not sure if its a restriction of the UDF or not, but the list does not need to be written back to excel, it just needs to be available for use in my UDF.
What I have
Public Function myFunk(rng As Range) As Variant
Dim x As Integer
Dim Datarange As Variant
Dim Equation As String
Dim VariablesLength As Integer
Dim Variable As String
Datarange = rng.Value
'insert something around here to get the list "rng or Datarange" sorted
'maybe up or down a line of code depending on how its being done.
Equation = Datarange(1, 1)
For x = 2 To UBound(Datarange, 1)
VariablesLength = Len(Datarange(x, 1)) - 1
Variable = Left$(Datarange(x, 1), VariablesLength)
Equation = Replace$(Equation, Variable, Datarange(x, 2))
Next x
myFunk = rng.Worksheet.Evaluate(Equation)
End Function
Example Data
Any help with this would be much appreciated. In that last example I should point out that the "=" is not part of the sort. I have a routine that strips that off the end of the string.
So in order to achieve what I was looking for I added a SWAP procedure and changed my code to look like this.
Public Function MyFunk(rng As Range) As Variant
Dim x As Integer
Dim y As Integer
Dim z As Integer
Dim datarange As Variant
Dim Equation As String
Dim VariablesLength As Integer
Dim Variable As String
'convert the selected range into an array
datarange = rng.Value
'verify selected range is of right shape/size
If UBound(datarange, 1) < 3 Or UBound(datarange, 2) <> 2 Then
MyFunk = CVErr(xlErrNA)
Exit Function
End If
'strip the equal sign off the end if its there
For x = 2 To UBound(datarange, 1)
If Right$(datarange(x, 1), 1) = "=" Then
datarange(x, 1) = Left$(datarange(x, 1), Len(datarange(x, 1)) - 1)
End If
Next x
'sort the array so that a variable does not get substituted into another variable
'do a top down swap and repeat? Could have sorted by length apparently.
For x = 2 To UBound(datarange, 1) - 1
For y = x + 1 To UBound(datarange, 1)
If InStr(1, datarange(y, 1), datarange(x, 1)) <> 0 Then
For z = LBound(datarange, 2) To UBound(datarange, 2)
Call swap(datarange(y, z), datarange(x, z))
Next z
y = UBound(datarange, 1)
x = x - 1
End If
Next y
Next x
'Set the Equation
Equation = datarange(1, 1)
'Replace the variables in the equation with values
For x = 2 To UBound(datarange, 1)
Equation = Replace$(Equation, datarange(x, 1), datarange(x, 2))
Next x
'rest of function here
End Function
Public Sub swap(A As Variant, B As Variant)
Dim Temp As Variant
Temp = A
A = B
B = Temp
End Sub
I sorted by checking to see if text would substitute into other text in the list. Byron Wall made a good point that I could have sorted based on text length. Since I had completed this before I saw the suggestion it did not get implemented though I think it may have been a simpler approach.

LinEst function

I'm trying to teach myself some basic VBA in Excel 2010 and I've come across a problem I can't google myself out of. The objective is to create a button which when pressed, automatically does linest for me and writes the values into an array. So far, this is my code.
Private Sub CommandButton1_Click()
Dim linest As Variant
Dim linestArray(4,1) as Variant
Dim i As Integer, j as Integer
linest = Application.LinEst(Range("U49:U51"), Range("T49:T51"), True, True)
For i = 0 To 4
linestArray(i,0) = accessing values of linest variable fyrst column
Cells(68 + i, 21) = linestArray(i,0)
Next
For j = 0 To 4
linestArray(j,1) = accessing values of linest variable second column
Cells(68 + j, 22) = linestArray(j,0)
Next
End Sub
How do I access the values of variable linest so I can store them to an array and print them? Thank you.
EDIT: I figured it out. Variable linest is already an array! I feel pretty dumb. Sorry, this can be ignored.
New code:
Dim linestArray As Variant
linestArray = Application.LinEst(Range("U49:U51"), Range("T49:T51"), True, True)
For i = 0 To 4
For j = 0 To 1
Cells(68 + i, 21 + j) = linestArray(i + 1, j + 1)
Next
Next
The output of any such formula is a Variant array. So you've got that part right.
For a general approach to these Application. (use WorksheetFunction. instead, it's much faster) type functions is...
Type the function in Excel (as an array formula, Ctrl-Shift-Return, if need be)
The output is an N x M matrix of some sort (N =1 , M =1) for most cases
When you do Var = Application.Linest(xyx), the answer gets stored in Var
Get the size of the Var using Ubound(Var, 1), Ubound(Var, 2) to get number of rows and columns (note that these are base 0 type arrays)
Usually, the size will be one x one. In that case, your variable is stored in Var(0,0) i.e. a zero base multidimensional variant array, the top left element.
Hope this helps.