VBA excel finding the statistical mode of a collection - vba

So I'm trying to analysis some data in Excel and having some trouble finding the most frequent numbers. I have an unknown number of locations which can have an unknown number of donations. For example
Brantford $50.00
Brantford $25.00
Brantford $50.00
Windsor $200.00
Quebec $25.00
Quebec $100.00
Quebec $50.00
Quebec $50.00
Quebec $25.00
Quebec $50.00
Quebec $50.00
Quebec $25.00
Quebec $100.00
Quebec $40.00
Windsor $140.00
Windsor $20.00
Windsor $20.00
So I need to use VBA to find for each location the count, sum, mean and mode (has to be done through VBA, can't just write instructions on how to do this using advanced filters/pivot tables :().
So right now using VBA I have a dictionary object that stores the location name as a key and each donation in a collection. Using the count of the collection I have the count, can easily loop through the collection for the sum, using those I have the mean; but, I am not sure the most efficient way to get the mode.
I know I can find it if my data was in an array using Application.mode, but that doesn't appear to work for collections :(. Converting a collection to an array though to find the mode though really doesn't strike me as the most efficient solution. Only other option i can find of though is to sort the collections then loop through them to find the mode.
So wondering if anyone knows of a good way to find the statistical mode of a collection?
Dim locdata As Object
Set locdata = CreateObject("scripting.dictionary")
For counter = 2 To max
mykey = Cells(counter, loccol).value
If Not (locdata.exists(mykey)) Then
locdata.Add (mykey), New Collection
End If
locdata(mykey).Add (Cells(counter, donamountcol).value)
Next counter
For Each k In locdata.keys
locname = k
Cells(counter, 1) = k
Cells(counter, 2) = locdata(k).Count
donationtotal = 0
For Each donvalue In locdata(k)
donationtotal = donationtotal + donvalue
Next donvalue
Cells(counter, 3) = donationtotal
Cells(counter, 4) = donationtotal / CDbl(locdata(k).Count)
'Cells(counter, 5) = Application.mode(locdata(k)) doesn't work :(
counter = counter + 1
Next k
edit: Ideally the output should be (using Quebec as an example)
Quebec: Count: 10 Sum: 515 Average: 51.5 Mode: 50

Here is how you can have values in a range into aarray dynamically. And I would use CountIF in the VBA to find the most frequent objects by their names.. Since you don't know the location names or the donations. Then array is the way to go.
Dim ar as Variant
Dim endRow as Long
'get last row in the range
endRow = Sheets(1).Cells(Sheets(1).Rows.Count, "A").End(xlUp).Row
'ar = WorksheetFunction.Transpose(Shets(1).Range("A1:A12")
'using endrow
ar = WorksheetFunction.Transpose(Shets(1).Range("A1").resize(endRow).value)
UPDATE: The subroutine below uses one iteration (for loop) to find the Mode..
Sub FrequencyByLocDonations()
Dim ar As Variant, dc As Object
Dim rngInput As Range, mxRng As Range
Dim endRow As Long, i As Integer
Dim counts As Double, maxLoc As Double
Dim maxLocation As String
Set dc = CreateObject("Scripting.Dictionary")
'-- When you know the range
' ar = WorksheetFunction.Transpose(Shets(1).Range("A1:A12").Value
'get last row in the range when you don't know but the starting cell
endRow = Sheets(3).Cells(Sheets(3).Rows.Count, "C").End(xlUp).Row
Set rngInput = Sheets(3).Range("C2").Resize(endRow - 1, 1)
'--you may also use that set rngInput as well
' WorksheetFunction.Transpose(rngInput).Value
'-- using endrow-1 to not to take an extra blank row at the end
ar = WorksheetFunction.Transpose(Sheets(3).Range("C2").Resize(endRow - 1, 2).Value)
For i = LBound(ar, 2) To UBound(ar, 2)
If Not (dc.exists(ar(1, i))) Then
counts = Application.WorksheetFunction.CountIf(rngInput, ar(1, i))
If counts >= maxLoc Then
maxLocation = ar(1, i)
maxLoc = counts
End If
dc.Add ar(1, i), counts
End If
Next i
'-- output to the Sheet
Sheets(3).Range("C2").Offset(0, 2).Resize(UBound(dc.keys) + 1, 1) = _
Application.Transpose(dc.keys)
Sheets(3).Range("C2").Offset(0, 3).Resize(UBound(dc.items) + 1, 1) = _
Application.Transpose(dc.items)
Sheets(3).Range("C2").Offset(0, 4) = "Most Frequent Location :" _
& maxLocation & "; " & maxLoc
Set dc = Nothing
End Sub
output:

I have run into a similar situation in the past. It seemed to me there was a very powerful VBA function missing from excel - equivalent to the "where" statement in MySQL.
So I wrote a very simple one myself... This lacks a lot of functionality, but it will allow you to do what you are asking for, while minimizing the amount of code you write.
Fundamental concept: you can return an array from a function call, and Excel built-in functions can operate on such an array as they would on a function. Thus, if you have a function that returns "all the numbers of which I want the mode", then =MODE(myfunction()) will give you the answer you want.
I chose to call my function subset(criteria, range1, range2).
In its most simple form it returns the elements in range2 that correspond to elements in range1 that meet criteria.
This is NOT extensively tested, but I hope you get the idea.
By the way, you can enter this as an array formula (shift-ctrl-enter) in multiple cells; in that case you get the first returned element in the first cell, etc. Sometimes that's a useful trick when you have a function that needs to return more than one value (e.g. a range) - but for this case you only need the result to feed to another function.
Option Explicit
' Function subset(criteria, range1, range2)
' Return an array with the elements in range2 that correspond to
' elements in range1 that match "criteria"
' where "criteria" can be a string, or a value with a < = > sign in front of it
' example: =subset("bravo", A1:A10, B1:B10)
' returns all cells from B that corresponds to cells in A with "bravo"
' =subset("<10", A1:A10, B1:B10) returns all cells in B corresponding to
' cells in A with a value < 10
' This is analogous to the "where" function in SQL, but much more primitive
Function subset(criteria As String, range1 As Range, range2 As Range)
Dim c
Dim result
Dim ii, jj As Integer
On Error GoTo etrap
If range1.Cells.Count <> range2.Cells.Count Then Exit Function
ReDim result(1 To range1.Cells.Count)
ii = 1
jj = 1
For Each c In range1.Cells
If compare(c.Value, criteria) = 0 Then
result(ii) = range2.Cells(jj).Value
ii = ii + 1
End If
jj = jj + 1
Next c
If ii > 1 Then
ReDim Preserve result(1 To ii - 1)
subset = result
Else
subset = Nothing
End If
Exit Function
etrap:
MsgBox "Error " & Err.Description
End Function
Private Function compare(a, b)
' type of a decides what kind of comparison we do
If TypeName(a) <> TypeName("hello") Then
' use numerical comparison
compare = Not (Evaluate(a & b))
Else
' use string comparison
compare = StrComp(a, b, vbTextCompare)
End If
End Function

I actually just decided to make a dictionary of dictionaries. So I have the locations and each location than has a dictionary of the count of each donation amount. Was easy enough to compare counts that way to find the mode.

Related

Excel VBA - Expand range from using just one column to using multiple

I have a working piece of code that looks at a columns value, copies those values, and strips off the 'speed' component string of that value - Turning '200 Mbps' into just '200', etc.
They updated the source data on me and the values are now in three columns - AC, AD, AE instead of just AC now. So values can exist in either column and any row, can be Gbps and Mbps, etc.
At the end of the day, I need the total of the three columns and X number of rows. I have sample data below.
How can I (or can I even) modify this existing code to account for the extra two columns. I am wondering if the Dictionary approach is correct at this point. It was originally added at someone else's suggestion.
Dim Cla As Range
With CreateObject("scripting.dictionary")
For Each Cla In wbFrom.Sheets("Sheet0").Range("AC9", Range("AC" & Rows.Count).End(xlUp))
Cla.Value = Replace(Cla.Value, " Mbps", "")
Cla.Value = Replace(Cla.Value, " Gbps", "")
If Not .exists(Cla.Value) Then
.Add Cla.Value, Nothing
End If
Next Cla
wbTo.Sheets("Sheet1").Range("D13").Resize(.Count).Value = Application.Transpose(.keys)
End With
I don't really understand the If and With loops and how they combine with keys and Transpose like this. ((Thanks TinMan for the info))
I've tried moving this out, but having this outside the loop breaks the code. Is there something in this section that I need to update?
If Not .exists(Cla.Value) Then
.Add Cla.Value, Nothing
End If
Next Cla
Some sample data looks like this: Notice each element is on its own row.
AC AD AE
300
123
72
200
101
The 300 gets paste where it belongs but nothing else adds up or get grabbed I think. Also, when the data looks like THIS, it pastes two values instead of just one:
Notice the 300 and 123 are now on the same line, 300 gets paste into the destination cell and 123 gets paste into two cells below that.
AC AD AE
300 123
72
200
101
Example 1
Use Range.Resize() to extend the number of columns targeted.
For Each Cla In wbFrom.Sheets("Sheet0").Range("AC9", wbFrom.Sheets("Sheet0").Range("AC" & Rows.Count).End(xlUp)).Resize(, 3)
Next
Example 2
Set a helper variable to the Range and use Range.Resize() to extend the number of columns targeted.
Dim Target As Range
With wbFrom.Sheets("Sheet0")
Set Target = .Sheets("Sheet0").Range("AC9", .Range("AC" & Rows.Count).End(xlUp)).Resize(, 3)
Debug.Print Target.Address
End With
Addendum
This will strip away the " Mbps" and " Gbps" as well as insert the sum the a;; the numbers into Range("D13").
With wbFrom.Sheets("Sheet0")
Set Target = .Range("AC9", .Range("AC" & .Rows.Count).End(xlUp)).Resize(, 3)
For Each Cla In Target
Cla.Value = Replace(Cla.Value, " Mbps", "")
Cla.Value = Replace(Cla.Value, " Gbps", "")
Next Cla
.Range("D13").Value = WorksheetFunction.Sum(Target)
End With
I'm not sure that you are describing the problem correctly. Clearly, using the dictionary, get you a list of unique distinct values. But you stated that:
At the end of the day, I need the total of the three columns and X
number of rows.
The existing code leads me to believe that you are going to be doing a count of different speeds...how many 200 Mbps, how many 72 Mpbs, etc.
Either that, or the previous code didn't work as intended.
Assuming that you described the problem correctly and all you want is the total bandwidth then this should do the trick...
Dim LastRow As Long, Value As Long, Sum As Long, Count As Long
' Get the last row, looking at all 3 columns "AC:AE"
LastRow = FindLastRow(wbFrom.Sheets("Sheet0").Range("AC:AE"))
' Iterate through all 3 columns
For Each Cla In wbFrom.Sheets("Sheet0").Range("AC9:AE" & LastRow)
' Use Val() to get just the numeric value and the inline IIF() statment to automatically adjust the speed
Value = Val(Cla.Value) * IIf(InStr(Cla.Value, "Gbps") > 0, 1000, 1)
' Check if there is a Value, if so, Add it to the Sum (and increment the count)
If Value > 0 Then
Sum = Sum + Value
Count = Count + 1
End If
Next Cla
' Write the Sum to the other Workbook (not sure if you need the Count)
wbTo.Sheets("Sheet1").Range("D13") = Sum
And a function to find the last cell in a range (even if the list is filtered)
Public Function FindLastRow(r As Range) As Long
' Works on Filtered Lists/Hidden rows
Const NotFoundResult As Long = 1 ' If all cells are empty (no value, no formula), this value is returned
FindLastRow = r.Worksheet.Evaluate("IFERROR(LARGE(ROW('" & r.Worksheet.Name & "'!" & r.Address & ")*--(NOT(ISBLANK('" & r.Worksheet.Name & "'!" & r.Address & "))),1)," & NotFoundResult & ")")
End Function
From the comments on your question I gather that you want the sum of the original inputs that are contained in columns AC, AD and AE. Such sum you want to store it at cell d13. Since I have limited input, this is the least ugly code I can provide:
nRow1 = ActiveSheet.Cells(ActiveSheet.Rows.Count, "AC").End(xlUp).Row
nRow2 = ActiveSheet.Cells(ActiveSheet.Rows.Count, "AD").End(xlUp).Row
nRow3 = ActiveSheet.Cells(ActiveSheet.Rows.Count, "AE").End(xlUp).Row
nRow = Application.Max(nRow1, nRow2, nRow3)
input_range = Range("AC9:AE" & nRow)
acum = 0
For Each cell In input_range
If Not IsEmpty(cell) Then
temp = Split(cell, " ")
acum = acum + CInt(temp(0))
End If
Next
Range("D13").Value = acum

Pass user input from excel cells to an Array

I am very new to VBA, so I apologize if this is a very simple question. I am trying to pass user input data into an array. Actually, 4 different arrays. All 4 arrays can have up to 3 elements, but could only need one at any given time. They are then sorted a specific way via For Loops and then will output the sendkeys function to the active window (which will not be excel when it is running). I have the for loops figured out and it is sorting the way i need it to. I just need to be able to get the user input into those arrays and then output them to a phantom keyboard (i.e. sendkeys). I appreciate any help or advice!
FYI, I have declared the arrays as strings and the variables as long... the message boxes are there to just test the sort, they are not very important
For i = 0 To UBound(SheetPosition)
If j = UBound(Position) Then
j = 0
End If
For j = 0 To UBound(Position)
If k = UBound(Direction) Then
k = 0
End If
For k = 0 To UBound(Direction)
If l = UBound(Temper) Then
l = 0
End If
For l = 0 To UBound(Temper)
MsgBox(i)
MsgBox(SheetPosition(i))
MsgBox(j)
MsgBox(Position(j))
MsgBox(k)
MsgBox(Direction(k))
MsgBox(l)
MsgBox(Temper(l))
Next
Next
Next
Next
you could use Application.InputBox() method in two ways:
Dim myArray As Variant
myArray = Application.InputBox("List the values in the following format: " & vbCrLf & "{val1, val2, val3, ...}", Type:=64) '<--| this returns an array of 'Variant's
myArray = Split(Application.InputBox("List the values in the following format: " & vbCrLf & "val1, val2, val3, ...", Type:=2), ",") '<--| this returns an array of 'String's
Yes, you could get the input from the user using Input boxes:
myValue = InputBox("Give me some input")
Or forms, which is the preferred method. Unfortunately, forms take some time to develop and are best deployed through Excel add-ins, which also require time to learn how to setup.
Here is a good tutorial on using the SendKeys method:
http://www.contextures.com/excelvbasendkeys.html
The usual way of getting data from cells into an array would be:
Dim SheetPosition As Variant
SheetPosition = Range("A1:A3").Value
or perhaps
Dim SheetPosition As Variant
SheetPosition = Range("A1:A" & Cells(Rows.Count, "A").End(xlUp).Row).Value
A few things to note:
The array needs to be dimensioned as a Variant.
The dimension of the array will be rows x columns, so in the first example above SheetPosition will be dimensioned 1 To 3, 1 To 1, and in the second example it might be dimensioned 1 To 5721, 1 To 1 (if the last non-empty cell in column A was A5721)
If you need to find the dimensions of a multi-dimensioned array, you should use UBound(SheetPosition, 1) to find the upper bound of the first dimension and UBound(SheetPosition, 2) to find the upper bound of the second dimension.
Even if you include Option Base 0 at the start of your code module, the arrays will still be dimensioned with a lower bound of 1.
If you want a single dimensioned array and your user input is in a column, you can use Application.Transpose to achieve this:
Dim SheetPosition As Variant
SheetPosition = Application.Transpose(Range("A1:A3").Value)
In this case SheetPosition will be dimensioned 1 To 3.
If you want a single dimensioned array and your user input is in a row, you can still use Application.Transpose to achieve this, but you have to use it twice:
Dim SheetPosition As Variant
SheetPosition = Application.Transpose(Application.Transpose(Range("A1:C1").Value))
FWIW - Your If statements in the code in the question are not achieving anything - each of the variables that are being set to 0 are going to be set to 0 by the following For statements anyway. So your existing code could be:
For i = LBound(SheetPosition) To UBound(SheetPosition)
For j = LBound(Position) To UBound(Position)
For k = LBound(Direction) To UBound(Direction)
For l = LBound(Temper) To UBound(Temper)
MsgBox i
MsgBox SheetPosition(i)
MsgBox j
MsgBox Position(j)
MsgBox k
MsgBox Direction(k)
MsgBox l
MsgBox Temper(l)
Next
Next
Next
Next

Lee-Ready tick test using VBA

I am trying to build Lee-Ready tick test for estimating trade direction from tick data using Excel. I have a dataset containing the trade prices in descending order, and I am trying to build a VBA code that is able to loop over all the 4m+ cells in as efficient manner as possible.
The rule for estimating trade direciton goes as follows:
If Pt>Pt-1, then d=1
If Pt<Pt-1, then d=-1
If Pt=Pt-1, then d is the last value taken by d.
So to give a concrete example, I would like to transform this:
P1;P2;P3;P4
1.01;2.02;3.03;4.04
1.00;2.03;3.03;4.02
1.01;2.02;3.01;4.04
1.00;2.03;3.00;4.04
into this
d1;d2;d3;d4
1;-1;1;1
-1;1;1;-1
1;-1;1;0
0;0;0;0
Fairly straightforward nested loops suffice:
Function LeeReady(Prices As Variant) As Variant
'Given a range or 1-based, 2-dimensional variant array
'Returns an array of same size
'consisiting of an array of the same size
'of trade directions computed according to
'Lee-Ready rule
Dim i As Long, j As Long
Dim m As Long, n As Long
Dim priceData As Variant, directions As Variant
Dim current As Variant, previous As Variant
If TypeName(Prices) = "Range" Then
priceData = Prices.Value
Else
priceData = Prices
End If
m = UBound(priceData, 1)
n = UBound(priceData, 2)
ReDim directions(1 To m, 1 To n) As Long 'implicitly fills in bottom row with 0s
For i = m - 1 To 1 Step -1
For j = 1 To n
current = priceData(i, j)
previous = priceData(i + 1, j)
If current > previous Then
directions(i, j) = 1
ElseIf current < previous And previous > 0 Then
directions(i, j) = -1
Else
directions(i, j) = directions(i + 1, j)
End If
Next j
Next i
LeeReady = directions
End Function
This can be called from a sub or used directly on the worksheet:
Here I just highlighted a block of cells of the correct size to hold the output and then used the formula =LeeReady(A2:D5) (pressing Ctrl+Shift+Enter to accept it as an array formula).
On Edit: I modified the code slightly (by adding the clause And previous > 0 to the If statement in the main loop) so that it can now handle ranges in which come of the columns have more rows than other columns. The code assumes that price data is always > 0 and fills in the return array with 0s as place holders in the columns that end earlier than other columns:

VBA Filter Function for dynamic array doesn't seem to be filtering on occasion

I am writing a subroutine in VBA to cycle through all the listed job numbers in a multi-tab time sheet and create a list of all job numbers that have been used (so it takes the original list (with possibly multiple job number occurrences) and creates a list with only one occurrence of each job number. The job numbers on each sheet are found in range("A8:A30"). The code below seems to work for the first several job names on the sample that I'm testing, but then seems to stop filtering. A8:A21 of the first sheet is:
14GCI393
14GCI393
13GCI373
13GCI373
13GCI388
13GCI367:2
14GCI408
14GCI408
13GCI373
13GCI388
14GCI415
14GCI415
00GCI000
And the code is:
Sub listusedjobs()
Dim usedjobs() As String
Dim nextjob As String
Dim i, m, n, lastsheetindexnumber As Integer
Application.ScreenUpdating = False
lastsheetindexnumber = ThisWorkbook.Sheets.Count
m = 0
ReDim usedjobs(m)
usedjobs(m) = "initialize"
For i = 1 To lastsheetindexnumber
Sheets(i).Activate
For n = 8 To 30
nextjob = Range("A" & n).Value
If Not IsInArray(nextjob, usedjobs) Then 'determine if nextjob is already in usedjobs()
ReDim usedjobs(m)
usedjobs(m) = nextjob 'Add each unique job to array "usedjobs"
Sheets(lastsheetindexnumber).Cells(m + 40, 1).Value = nextjob 'Print job name that was just added
m = m + 1
End If
Next n
Next i
Application.ScreenUpdating = True
End Sub
Function IsInArray(stringToBeFound As String, arr As Variant) As Boolean
IsInArray = (UBound(Filter(arr, stringToBeFound, , vbTextCompare)) > -1)
End Function
Any help figuring out what is going wrong will be much appreciated! The current output I get for this code is below and contains multiple doubles.
14GCI393
13GCI373
13GCI388
13GCI367:2
14GCI408
13GCI373
13GCI388
14GCI415
00GCI000
I think that your problem may be not using ReDim Preserve inside your If Not

Excel VBA - Perform Operations on visible cells only

I have a database that has in excess on 200,000 rows. When I was writing a VBA script I had a database of about 20,000 rows in mind so I didn't care whether the database was filtered or not because the VBA script ran quickly. So given the realization that the database is huge and testing the VBA script I was surprised to notice how slowly it ran. So without further to say this is how my code looks like :
Set wsDB = ThisWorkbook.Sheets("DB")
Dim nameIndex As Long: nameIndex = Application.Match(name, wsDB.Rows(1), 0)
Dim formula As String
formula = "=IFERROR(AVERAGEIFS(" + GRA(nameIndex) + "," + GRA(dateIndex) + ",R2C," + GRA(cellNameIndex) + ",RC1" + "),"""")"
where GRA is a function that returns the address of the range of a column.
Private Function GRA(ByRef rngIndex As Long)
GRA = "DB!" + CStr(Range(Cells(2, rngIndex), Cells(rowNos, rngIndex)).Address(1, 1, xlR1C1, 0, 0))
End Function
So given that I now filter the table beforehand how can I adjust my code so that it ignores all the hidden rows and takes into account only what is visible. Of course I am aware that a simple dirty solution would be to simply copy the filter database and paste it in a new sheet but that will affect the performance which is what I'm trying to improve.
You can use the following function to return a range of only visible cells.
Function VisibleCells(Rng As Range) As Variant
Dim R As Range
Dim Arr() As Integer
Dim RNdx As Long
Dim CNdx As Long
If Rng.Areas.Count > 1 Then
VisibleCells = CVErr(xlErrRef)
Exit Function
End If
ReDim Arr(1 To Rng.Rows.Count, 1 To Rng.Columns.Count)
For RNdx = 1 To Rng.Rows.Count
For CNdx = 1 To Rng.Columns.Count
Set R = Rng(RNdx, CNdx)
If (R.EntireRow.Hidden = True) Or _
(R.EntireColumn.Hidden = True) Then
Arr(RNdx, CNdx) = 0
Else
Arr(RNdx, CNdx) = 1
End If
Next CNdx
Next RNdx
VisibleCells = Arr
End Function
The above code came from http://www.cpearson.com/excel/VisibleCells.aspx.
Normally I would only post code that I write however this does exactly what I was thinking.