Excel VBA Determining Mass of Truck Shipments - vba

I have a system where I have a list of data from a truck scale reading the weight of a truck on a scale. This data ranges from -30,000lbs or so due to the scale being tared but truckless, to 40,000lbs with a full and tared truck on it. My task is to determine the total weight that has left our facility via truck. The problem is some days only a few trucks leave our facility and others a dozen leave, all with slightly different weights.
The graph of these weights looks like a saw tooth pattern. It is a largely negative value (due to tare), quickly reaches approximately zero as a truck pulls onto the scale, and slowly builds to a final weight. After the final weight is reached the weight quickly goes back to the largely negative value as the truck pulls away.
My idea on how to approach this is look for where the data is less than zero and return the max weight of the sensor between zeros. If the max weight is above some noise filter value (say, 5000lbs) then add the max weight to some counter. In theory, not bad, in practice, a bit out of my league.
Here's my code so far, as I know I need to show my effort so far. I recommend ignoring it as it's mostly just a failed start after a few days of restarting work.
Public Function TruckLoad(rngData As Range)
Dim intCount As Integer
intCount = 0
For Each cell In rngData
intCount = intCount + 1
Next cell
Dim n As Integer
n = 1
Dim x As Integer
x = 1
Dim arr() As Double
For i = 1 To intCount
If rngData(i, 1) < 0 Then
arr(n) = x
n = n + 1
x = x + 1
Else
x = x + 1
End If
Next
TruckLoad = arr(1)
End Function
If anyone could give me advice on how to proceed it would be extremely valuable. I'm not a computer programmer outside of the very basics.
Edit: Sorry, I should have said this initially. I can't post the entirety of the raw sample data but I can post a photo of a graph. There is a degree to which I can't post publicly (not that you can do anything particularly nefarious with the data, it's a corporate rule).
www.imgur.com/a/LGQY9

My understanding of the data is in line with Robin's comment. There are a couple of ways to solve this problem. I've written a function loops through data range looking for the 'next zero' in the data set, and calculates the max value between the current row and the row that the 'next zero' is in. If the max value is above the value of your noise filter, the value will be added to the running total.
Option Explicit
Private Const NOISE_FILTER As Double = 5000
Public Function TruckLoad(rngData As Range) As Double
Dim r As Integer
Dim runningTruckLoad As Double
Dim maxLoadReading As Double
Dim nextZeroRow As Integer
For r = 1 To rngData.Rows.Count
nextZeroRow = FindNextZeroRow(r, rngData)
maxLoadReading = Application.WorksheetFunction.Max(Range(rngData.Cells(r, 1), rngData.Cells(nextZeroRow, 1)))
If maxLoadReading > NOISE_FILTER Then
runningTruckLoad = runningTruckLoad + maxLoadReading
End If
r = nextZeroRow 'skip the loop counter ahead to our new 0 row
Next r
TruckLoad = runningTruckLoad
End Function
Private Function FindNextZeroRow(startRow As Integer, searchRange As Range) As Integer
Dim nextZeroRow As Range
Set nextZeroRow = searchRange.Find(0, searchRange.Rows(startRow))
If nextZeroRow.Row < startRow Then 'we've hit the end of the data range
FindNextZeroRow = startRow
ElseIf nextZeroRow.Value <> 0 Then 'we've found a data point with a zero in it, not interested in this row
FindNextZeroRow = FindNextZeroRow(nextZeroRow.Row, searchRange)
Else
FindNextZeroRow = nextZeroRow.Row 'we've found our next zero data point
End If
End Function

Related

Excel vba: program performance is slow, out of memory error when dealing with large dataset in collection

So I have a really huge excel spreadsheet with about 100 thousands rows. Each row represents a client. Each column stores that client's name, debt, probability of default and so on. There are about 12 columns.
In my code, I loop through the rows and store the data for each client as a class object, then add that object to a collection:
Function getClients(dataWorkbook As Workbook)
Dim resultColl As Collection
Set resultColl = New Collection
...
With dataWorkbook.Worksheets(globals("DATA_SHEET"))
For i = firstRow To lastRow
Set clientCopy = New Client
clientCopy.setClientName = .Cells(i, column_names).value
clientCopy.setContractNumber = .Cells(i, column_contract_numbers).value
...
resultColl.Add clientCopy
Next
End With
Set getClients = resultColl
End Function
Then for each client in the collection I calculate the random numbers and store them in a collection for that client (the collection of size N, depending the number a user wants, usually 1000 - 5000).
Then I go through those random numbers of each client and store an outcome in an outcomes collection, depending on the random number's value (if the number is greater than the client's number P, then I store 0, else 1).
Then I calculate a financial result for each client, meaning, I check for each outcome in the outcomes collection and store appropriate values in losses and profits collections, depending if the outcome is 1 or 0.
These methods inside the Client class:
Public Sub generateRandoms()
Set randomNumbers = New Collection
Dim i As Long
For i = 1 To simulationCount
randomNumbers.Add Rnd()
'Debug.Print "random: " & randomNumbers(i)
Next
End Sub
Public Sub calculateOutcomes()
Set outcomes = New Collection
Dim i As Long
If totalPd <> -1 Then
For i = 1 To simulationCount
If randomNumbers(i) < totalPd Then
outcomes.Add 1
Else
outcomes.Add 0
End If
Next
Else
For i = 1 To simulationCount
outcomes.Add Null
Next
End If
End Sub
Public Sub calculateFinancialResult()
Set losses = New Collection
Set profits = New Collection
Dim i As Long
For i = 1 To outcomes.Count
If outcomes(i) = 1 Then
losses.Add totalLoss
profits.Add 0
ElseIf outcomes(i) = 0 Then
losses.Add 0
profits.Add totalProfit
Else
losses.Add Null
profits.Add Null
End If
Next
End Sub
The code in the main module goes like this:
Dim clientsColl As Collection
Set clientsColl = getClients(dataWorkbook)
Dim clientCopy As Client
For Each clientCopy In clientsColl
clientCopy.setSimulationCount = globals("SIMULATION_COUNT")
clientCopy.generateRandoms 'up to this line it's fast
'clientCopy.calculateOutcomes 'this line is slow, so i have to shut excel down
'clientCopy.calculateFinancialResult
clientCopy.clearSums
Next
MsgBox ("Done for")
While the data collection from the data workbook is rather slow, it takes a reasonable enough amount of time. At the same time, the calculations itself are extraordinary slow. I tried to figure out which function or sub takes the most time. So far it looks like calculateOutcomes is pretty slow, while generateRandoms is quite fast and takes less than a minute.
I don't understand why is that so - both of them loop through each client and store data in a collection N times. At the same time the first is fast, while the second is slow. If I don't terminate the excel eventually I encounter an Out of memory error. Why?
Is there anything I can do about this code's performance? Any suggestions are appreciated. What are the general practices in situations like this?

On making MATCH function like FIND function

I'm trying to make MATCH function work like FIND function. First of all, I generate the dummy data to be use for testing. Here is the routine I use:
Sub Data_Generator()
Randomize
Dim Data(1 To 100000, 1 To 1)
Dim p As Single
For i = 1 To 100000
p = Rnd()
If p < 0.4 Then
Data(i, 1) = "A"
ElseIf p >= 0.4 And p <= 0.7 Then
Data(i, 1) = "B"
Else
Data(i, 1) = "C"
End If
Next i
Range("A1:A100000") = Data
End Sub
Now, I create a sub-routine to find the string A in the range Data. There are two methods I use here that employ MATCH function. The first method is to reset the range of lookup array like the following code:
Sub Find_Match_1()
T0 = Timer
Dim i As Long, j As Long, k As Long, Data As Range
Dim Output(1 To 100000, 1 To 1)
On Error GoTo Finish
Do
Set Data = Range(Cells(j + 1, 1), "A100000") 'Reset the range of lookup array
i = WorksheetFunction.Match("A", Data, 0)
j = j + i
Output(j, 1) = j 'Label the position of A
k = k + 1 'Counting the number of [A] found
Loop
Finish:
Range("B1:B100000") = Output
InputBox "The number of [A] found are " & k & " in", "Process is complete", Timer - T0
End Sub
And for the second method, I assign the cell of range where A is located by value vbNullString instead of resetting Range("A1:A100000"). The idea is to delete the string A after being found and to expect MATCH function to find the next string A in the Range("A1:A100000"). Here is the code to implement the second method:
Sub Find_Match_2()
T0 = Timer
Dim n As Long, i As Long, j As Long
Dim Data_Store()
Dim Output(1 To 100000, 1 To 1)
Data_Store = Range("A1:A100000")
On Error GoTo Finish
Do
j = WorksheetFunction.Match("A", Range("A1:A100000"), 0)
Output(j, 1) = j
Cells(j, 1) = vbNullString
n = n + 1
Loop
Finish:
Range("A1:A100000") = Data_Store
Range("B1:B100000") = Output
InputBox "The number of [A] found are " & n & " in", "Process is complete", Timer - T0
End Sub
The goal is to determine which method is better at employing MATCH function in its performance. It turns out the first method only completes less than 0.4 seconds meanwhile the second method completes about a minute on my PC. So my questions are:
Why does the second method take time too long to complete?
How does one improve the performance of the second method?
Can MATCH function be used in an array?
I agree that this is more of a Code Review question, but I chose to look into it for my own curiosity, so I'll share what I found.
I think you're hitting a very classic case of N vs N^2 computational complexity. Look at your two methods, which seem remarkably similar, and consider what they're actually doing, keeping in mind that the MATCH function is probably just a linear search when you use Match_type=0 (because your data is unsorted, whereas other match types could do a binary search on your sorted data).
Method 1:
Start at A1
Continue down the range until an "A" is found
Restart at the cell below the MATCH
Method 2:
Start at A1
Continue down the range until an "A" is found
Clear the "A"
Restart at A1
It should be instantly apparent that while one method is continually shrinking the range it searches, the other is always starting at the first cell and searching the whole range. This will account for some of the speedup, and already boosts Method 1 to a nice lead, but it's not even nearly the full story.
The real key lies in the amount of work Match has to do for each situation. Because its range constantly shrinks and moves its start further down the list, whichever cell Method 1's Match starts from, it only has to search a small number of cells before it hits an A and resumes the outer loop. Meanwhile, Method 2 is continually destroying A's, making them less and less dense and forcing itself to search more and more of the range before getting any hits. By the end, Method 2 is looping through almost 100,000 empty cells/B's/C's before finding its next A.
So on average, the Match for Method 1 is only looking through a couple of cells each time, while the Match for Method 2 is taking longer and longer as time goes on, until the end when it is forced to loop through the entire range. On top of that, Method 2 is doing a bunch of writes to cell values, which is slower than you might think when you have to do it tens of thousands of times.
In all honesty, your best bet would be to just loop through the cells yourself once, looking for A's and handling them as you go. MATCH brings no advantage to the table, and Method 1 is basically just a more complicated version of the loop I described.
I'd write this something like:
Sub Find_Match_3()
T0 = Timer
Dim k As Long, r As Range
Dim Output(1 To 100000, 1 To 1)
For Each r In Range("A1:A100000").Cells
If r.Value = "A" Then
Output(r.Row, 1) = r.Row 'Label the position of A
k = k + 1 'Counting the number of [A] found
End If
Next
Range("B1:B100000") = Output
InputBox "The number of [A] found are " & k & " in", "Process is complete", Timer - T0
End Sub
Which is about 30% faster on my machine.

VBA: adding random numbers to a grid that arent already in the grid

Sub FWP()
Dim i As Integer
Dim j As Integer
Dim n As Integer
n = Range("A1").Value
For i = 1 To n
For j = 1 To n
If Cells(i + 1, j) = 0 Then
Cells(i + 1, j).Value = Int(((n ^ 2) - 1 + 1) * Rnd + 1)
ElseIf Cells(i + 1, j) <> 0 Then
Cells(i + 1, j).Value = Cells(i + 1, j).Value
End If
Next j
Next i
I am trying to do a part of a homework question that asks to fill in missing spaces in a magic square in VBA. It is set up as a (n x n) matrix with n^2 numbers in; the spaces I need to fill are represented by zeros in the matrix. So far I have some code that goes through checking each individual cell value, and will leave the values alone if not 0, and if the value is 0, it replaces them with a random number between 1 and n^2. The issue is that obviously I'm getting some duplicate values, which isn't allowed, there must be only 1 of each number.
How do I code it so that there will be no duplicate numbers appearing in the grid?
I am attempting to put in a check function to see if they are already in the grid but am not sure how to do it
Thanks
There are a lot of approaches you can take, but #CMArg is right in saying that an array or dictionary is a good way of ensuring that you don't have duplicates.
What you want to avoid is a scenario where each cell takes progressively longer to populate. It isn't a problem for a very small square (e.g. 10x10), but very large squares can get ugly. (If your range is 1-100, and all numbers except 31 are already in the table, it's going to take a long time--100 guesses on average, right?--to pull the one unused number. If the range is 1-40000 (200x200), it will take 40000 guesses to fill the last cell.)
So instead of keeping a list of numbers that have already been used, think about how you can effectively go through and "cross-off" the already used numbers, so that each new cell takes exactly 1 "guess" to populate.
Here's one way you might implement it:
Class: SingleRandoms
Option Explicit
Private mUnusedValues As Scripting.Dictionary
Private mUsedValues As Scripting.Dictionary
Private Sub Class_Initialize()
Set mUnusedValues = New Scripting.Dictionary
Set mUsedValues = New Scripting.Dictionary
End Sub
Public Sub GenerateRange(minimumNumber As Long, maximumNumber As Long)
Dim i As Long
With mUnusedValues
.RemoveAll
For i = minimumNumber To maximumNumber
.Add i, i
Next
End With
End Sub
Public Function GetRandom() As Long
Dim i As Long, keyID As Long
Randomize timer
With mUnusedValues
i = .Count
keyID = Int(Rnd * i)
GetRandom = .Keys(keyID)
.Remove GetRandom
End With
mUsedValues.Add GetRandom, GetRandom
End Function
Public Property Get AvailableValues() As Scripting.Dictionary
Set AvailableValues = mUnusedValues
End Property
Public Property Get UsedValues() As Scripting.Dictionary
Set UsedValues = mUsedValues
End Property
Example of the class in action:
Public Sub getRandoms()
Dim r As SingleRandoms
Set r = New SingleRandoms
With r
.GenerateRange 1, 100
Do Until .AvailableValues.Count = 0
Debug.Print .GetRandom()
Loop
End With
End Sub
Using a collection would actually be more memory efficient and faster than using a dictionary, but the dictionary makes it easier to validate that it's doing what it's supposed to do (since you can use .Exists, etc.).
Nobody is going to do your homework for you. You would only be cheating yourself. Shame on them if they do.
I'm not sure how picky your teacher is, but there are many ways to solve this.
You can put the values of the matrix into an array.
Check if a zero value element exists, if not, break.
Then obtain your potential random number for insertion.
Iterate through the array with a for loop checking each element for this value. If it is not present, replace the zero element.

Lee-Ready tick test using VBA

I am trying to build Lee-Ready tick test for estimating trade direction from tick data using Excel. I have a dataset containing the trade prices in descending order, and I am trying to build a VBA code that is able to loop over all the 4m+ cells in as efficient manner as possible.
The rule for estimating trade direciton goes as follows:
If Pt>Pt-1, then d=1
If Pt<Pt-1, then d=-1
If Pt=Pt-1, then d is the last value taken by d.
So to give a concrete example, I would like to transform this:
P1;P2;P3;P4
1.01;2.02;3.03;4.04
1.00;2.03;3.03;4.02
1.01;2.02;3.01;4.04
1.00;2.03;3.00;4.04
into this
d1;d2;d3;d4
1;-1;1;1
-1;1;1;-1
1;-1;1;0
0;0;0;0
Fairly straightforward nested loops suffice:
Function LeeReady(Prices As Variant) As Variant
'Given a range or 1-based, 2-dimensional variant array
'Returns an array of same size
'consisiting of an array of the same size
'of trade directions computed according to
'Lee-Ready rule
Dim i As Long, j As Long
Dim m As Long, n As Long
Dim priceData As Variant, directions As Variant
Dim current As Variant, previous As Variant
If TypeName(Prices) = "Range" Then
priceData = Prices.Value
Else
priceData = Prices
End If
m = UBound(priceData, 1)
n = UBound(priceData, 2)
ReDim directions(1 To m, 1 To n) As Long 'implicitly fills in bottom row with 0s
For i = m - 1 To 1 Step -1
For j = 1 To n
current = priceData(i, j)
previous = priceData(i + 1, j)
If current > previous Then
directions(i, j) = 1
ElseIf current < previous And previous > 0 Then
directions(i, j) = -1
Else
directions(i, j) = directions(i + 1, j)
End If
Next j
Next i
LeeReady = directions
End Function
This can be called from a sub or used directly on the worksheet:
Here I just highlighted a block of cells of the correct size to hold the output and then used the formula =LeeReady(A2:D5) (pressing Ctrl+Shift+Enter to accept it as an array formula).
On Edit: I modified the code slightly (by adding the clause And previous > 0 to the If statement in the main loop) so that it can now handle ranges in which come of the columns have more rows than other columns. The code assumes that price data is always > 0 and fills in the return array with 0s as place holders in the columns that end earlier than other columns:

Random numbers in array without any duplicates

I'm trying to randomize an array from numbers 0 to 51 using loops but I just cannot seem to pull it off. My idea was that
Generate a Random Number
Check if this random number has been used by storing the previous in an array
If this random number has been used, generate new random number until it is not a duplicate
If it's not a duplicate, store it
My attempt:
Dim list(51) As Integer
Dim templist(51) As Integer
For i As Integer = 0 To 51 Step 1
list(i) = i
Next i
Do While counter <= 51
p = rand.Next(0, 52)
templist(counter) = p
For n As Integer = 0 To 51 Step 1
p = rand.Next(0, 52)
If templist(n) = p Then
Do While templist(n) = p
p = rand.Next(0, 52)
Loop
templist(n) = p
Else
templist(n) = p
End If
Next
counter += 1
Loop
For n As Integer = 0 To 51 Step 1
ListBox1.Items.Add(templist(n))
Next
It will be a lot easier if you just have a list of all of the possible numbers (0 to 51 in your case), then remove the number from the list so it can't be picked again. Try something like this:
Dim allNumbers As New List (Of Integer)
Dim randomNumbers As New List (Of Integer)
Dim rand as New Random
' Fill the list of all numbers
For i As Integer = 0 To 51 Step 1
allNumbers.Add(i)
Next i
' Grab a random entry from the list of all numbers
For i As Integer = 0 To 51 Step 1
Dim selectedIndex as Integer = rand.Next(0, (allNumbers.Count - 1) )
Dim selectedNumber as Integer = allNumbers(selectedIndex)
randomNumbers.Add(selectedNumber)
allNumbers.Remove(selectedNumber)
' Might as well just add the number to ListBox1 here, too
ListBox1.Items.Add(selectedNumber)
Next i
If your goal is to get the numbers into ListBox1, then you don't even need the "randomNumbers" list.
EDIT:
If you must have an array, try something like this:
Function RandomArray(min As Integer, max As Integer) As Integer()
If min >= max Then
Throw New Exception("Min. must be less than Max.)")
End If
Dim count As Integer = (max - min)
Dim randomNumbers(count) As Integer
Dim rand As New Random()
' Since an array of integers sets every number to zero, and zero is possibly within our min/max range (0-51 here),
' we have to initialize every number in the array to something that is outside our min/max range.
If min <= 0 AndAlso max >= 0 Then
For i As Integer = 0 To count
randomNumbers(i) = (min - 1) ' Could also be max + 1
Next i
End If
Dim counter As Integer = 0
' Loop until the array has count # of elements (so counter will be equal to count + 1, since it is incremented AFTER we place a number in the array)
Do Until counter = count + 1
Dim someNumber As Integer = rand.Next(min, max + 1)
' Only add the number if it is not already in the array
If Not randomNumbers.Contains(someNumber) Then
randomNumbers(counter) = someNumber
counter += 1
End If
Loop
Return randomNumbers
End Function
This is good enough for your assignment, but the computer scientist in my hates this algorithm.
Here's why this algorithm is much less desirable. If zero is in your range of numbers, you will have to loop through the array at least 2N times (so 104+ times if you are going from 0 to 51). This is a best case scenario; the time complexity of this algorithm actually gets worse as the range of numbers scales higher. If you try running it from 0 to 100,000 for example, it will fill the first few thousand numbers very quickly, but as it goes on, it will take longer and longer to find a number that isn't already in the list. By the time you get to the last few numbers, you could potentially have randomly generated a few trillion different numbers before you find those last few numbers. If you assume an average complexity of 100000! (100,000 factorial), then the loop is going to execute almost ten to the half-a-millionth power times.
An array is more difficult to "shuffle" because it is a fixed size, so you can't really add and remove items like you can with a list or collection. What you CAN do, though, is fill the array with your numbers in order, then go through a random number of iterations where you randomly swap the positions of two numbers.
Do While counter <= 51
p = rand.Next(0, 52)
While Array.IndexOf(list, p) = -1
p = rand.Next(0, 52)
End While
counter += 1
Loop
Haven't written VB in about 5 years, but try this out:
Function GetRandomUniqueNumbersList(ByVal fromNumber As Integer, ByVal toNumber As Integer) As List(Of Integer)
If (toNumber <= fromNumber) Then
Throw New ArgumentException("toNumber must be greater than fromNumber", toNumber)
End If
Dim random As New Random
Dim randomNumbers As New HashSet(Of Integer)()
Do
randomNumbers.Add(random.Next(fromNumber, toNumber))
Loop While (randomNumbers.Count < toNumber - fromNumber)
Return randomNumbers.ToList()
End Function
Ok, that was painful. Please someone correct it if I made any mistakes. Should be very quick because it's using a HashSet.
First response to forum on stackoverflow - be gentle.
I was looking for a way to do this but couldn't find a suitable example online.
I've had a go myself and eventually got this to work:
Sub addUnique(ByRef tempList, ByVal n, ByRef s)
Dim rand = CInt(Rnd() * 15) + 1
For j = 0 To n
If tempList(j) = rand Then
s = True
End If
Next
If s = False Then
tempList(n) = rand
Else
s = False
addUnique(tempList, n, s)
End If
End Sub
Then call the sub using:
Dim values(15) As Byte
Dim valueSeen As Boolean = False
For i = 0 To 15
addUnique(values, i, valueSeen)
Next
This will randomly add the numbers 1 to 16 into an array. Each time a value is added, the previous values in the array are checked and if any of them are the same as the randomly generated value, s is set to true. If a value is not found (s=false), then the randomly generated value is added. The sub is recursively called again if s is still true at the end of the 'For' loop. Probably need 'Randomize()' in there somewhere.
Apologies if layout is a bit wobbly.