I am using excel macro to validate data fields in a table. The data contains some fields that can contain one of the values listed in the dictionary.
When I tried to run the validation macro for 700,000 records, it literally gets stuck and takes a long time to complete. Can anyone help with improving the performance of this code?
The following is a sample code I am using for one of the fields to check the content of cells in a column against a list defined in the dictionary. This never completes when run over 700,000 column records, whereas takes around 30 seconds for 50,000 column records.
Sub Validate_Action_Type()
'Speed Up
Application.ScreenUpdating = False
Application.DisplayStatusBar = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False
ActiveSheet.DisplayPageBreaks = False
'Speed Up end
'Define the variables
Dim DicActionType As New Scripting.Dictionary
Dim CountActionTypeErrors As Long
Dim StartTime As Double
Dim SecondsElapsed As Double
'Start the timer, used to calculate elapsed time
StartTime = Timer
'Create a dictionary of allowed marker type values
DicActionType.Add "Insert", 1
DicActionType.Add "Update", 2
DicActionType.Add "Delete", 3
'Check the Marker Type Column using the dictionery created
For Each d2 In Range(Range("C2"), Range("C2").End(xlDown))
If Not DicActionType.Exists(d2.Text) Then
d2.Interior.ColorIndex = 3
CountActionTypeErrors = CountActionTypeErrors + 1
Else
d2.Interior.ColorIndex = xlNone
End If
Next
'Calculate elapsed time
SecondsElapsed = Round(Timer - StartTime, 2)
'Pop-up the outcome message
MsgBox "Time taken in Seconds = " & SecondsElapsed _
& vbCrLf _
& "Total Errors = " & CountActionTypeErrors _
, , "Check Cells Highlighted RED"
'Restore state: undo the speed up settings
Application.ScreenUpdating = ScreenUpdateState
Application.DisplayStatusBar = statusBarState
Application.Calculation = calcState
Application.EnableEvents = eventsState
ActiveSheet.DisplayPageBreaks = displayPageBreaksState 'note this is a sheet-level setting
End Sub
Referencing cells is always very slow. As you want to color your cells, you need a reference, but only for coloring, not for checking. For checking you can use a much faster array.
In the following code I used an array to check the actions. On my machine its about 5 times faster having ~17% errors in my sample cells.
Sub Validate_Action_Type()
'Speed Up
Application.ScreenUpdating = False
Application.DisplayStatusBar = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False
ActiveSheet.DisplayPageBreaks = False
'Speed Up end
'Define the variables
Dim DicActionType As New Scripting.Dictionary
Dim CountActionTypeErrors As Long
Dim StartTime As Double
Dim SecondsElapsed As Double
Dim ActionArr, I As Integer
ActionArr = Range(Range("D2"), Range("C2").End(xlDown)).Value
'Start the timer, used to calculate elapsed time
StartTime = Timer
'Create a dictionary of allowed marker type values
DicActionType.Add "Insert", 1
DicActionType.Add "Update", 2
DicActionType.Add "Delete", 3
'Check the Marker Type Column using the dictionery created
Columns("C").Interior.ColorIndex = xlNone
For I = 1 To UBound(ActionArr)
If Not DicActionType.Exists(ActionArr(I, 1)) Then
'ActionArr(I, 2) = 3
Cells(I + 1, 3).Interior.ColorIndex = 3
CountActionTypeErrors = CountActionTypeErrors + 1
Else
ActionArr(I, 2) = 0
End If
Next I
'Calculate elapsed time
SecondsElapsed = Round(Timer - StartTime, 2)
'Pop-up the outcome message
MsgBox "Time taken in Seconds = " & SecondsElapsed _
& vbCrLf _
& "Total Errors = " & CountActionTypeErrors _
, , "Check Cells Highlighted RED"
'Restore state: undo the speed up settings
Application.ScreenUpdating = ScreenUpdateState
Application.DisplayStatusBar = statusBarState
Application.Calculation = calcState
Application.EnableEvents = eventsState
ActiveSheet.DisplayPageBreaks = displayPageBreaksState 'note this is a sheet-level setting
End Sub
As promised I have investigated what the time is going on my laptop. Without knowing which statements are the slowest, you can easily spend your time optimising sections of your code that have little effect on the duration. There is a lot of bad advice available based on people assumptions about what is slow. Application.ScreenUpdating = False will save very significant amounts of time. Other changes are often of little significance. It appears there is some significant overhead with calling WorksheetFunction from VBA because I have yet to find one that is faster than VBA. Unless someone says “I have run timings”, don’t believe them.
I will explain my implementation of your code and then tell you what I discovered. You will need to concatenate the blocks of code below if you wish to perform similar tests on your computer.
Option Explicit
Const ColCrnt As Long = 3
Const NumNames As Long = 30
Const RowDataFirst As Long = 2
Const RowMax As Long = 700000
Const ErrorMax As Long = 70000
I have used constants to specify values I might wish to vary between test runs.
I never changed ColCrnt from 3 (= C) or RowDataFirst from 2 since I do not believe their values are relevant.
I tried much lower values for RowMax and ErrorMax at first but most of my timings were with the values shown. My tests were with 10% errors. I am sure your data is much better so my timings for worksheet updates should be much worse than yours.
I have called the values you place in a dictionary, “names” or “valid names” so NumNamesis the constant I have changed the most.
Sub CtrlCheckAll()
Call CtrlCheck1
Call CtrlCheck1
Call CtrlCheck1
End Sub
I created three variations of your code. This routine allowed me to call all three in one go. It became clear that variations 2 and 3 were not significantly faster or slower than variation 1. In the end, I just used this routine to call variation 1 three times.
When you look at the timings, you will see how much variation there is from run to run. Most of this variation is probably background processes (task manager, virus checkers and the like). However, Excel also has background tasks (such as garbage collection). I prefer much longer test runs because they give more stable timings. Apart from increasing the number of rows, I am not sure how to slow your routine down; I have settled for running it several times and averaging their separate durations.
Sub CtrlCheck1()
Dim CountActionTypeErrors As Long
Dim d2 As Variant
Dim ExistsCrnt As Boolean
‘Dim InxVn As Long
Dim ValidNames As New Dictionary
Dim TimeCheckStart As Single
Dim TimeExistsTotal As Single
Dim TimeStart As Single
Dim TimeWshtTotal As Single
Dim Wsht As Worksheet
Set Wsht = Worksheets("Data")
TimeStart = Timer
Call GenDic(ValidNames, NumNames)
Debug.Print "Create time " & Format(Timer - TimeStart, "00.00000")
'TimeStart = Timer
'For InxVn = 0 To ValidNames.Count - 1
' Debug.Print ValidNames.Keys(InxVn)
'Next
'Debug.Print "Access time " & Format(Timer - TimeStart, "00.00000")
TimeStart = Timer
Call GenWsht(Wsht:=Wsht, RowDataFirst:=RowDataFirst, ColCrnt:=ColCrnt, _
RowMax:=RowMax + RowDataFirst - 1, _
ErrorMax:=ErrorMax, Dic:=ValidNames)
Debug.Print "Build worksheet time " & Format(Timer - TimeStart, "00.000")
Application.ScreenUpdating = False
TimeExistsTotal = 0!
TimeWshtTotal = 0!
TimeCheckStart = Timer
You will recognise some of the variables while others have been introduced by me.
Your code accesses the active worksheet. This relies on the user having the correct worksheet active when the macro is started. You would not do this if you had ever had to fix the mess caused by a user running a macro against the wrong worksheet and not having a backup of the undamaged worksheet. Being explicit about the worksheet to be accessed makes your code clearer, reduces the opportunities for disaster and has no noticeable time penalty.
I have not given you the code for GenDic since you have your own real data. However, I will add the code if you want it. Notice, I have placed Timer around this call. I wanted to know if creating a dictionary was slow process. I discovered the duration was less than Timer can accurately record. Normally the duration was given as zero thousandths of a second although occasionally it was four thousandths of a second
I have commented out the code to list the keys in the dictionary because it was creating too many lines. Again my motive was to see if accessing keys was slow but again the duration was less than Timer can accurately record.
I have not given you the code for GenWsht since you have your own real data. Again, I will add the code if you want it.
The last three statements of the above block are the important ones. TimeCheckStart is used to calculate the total duration of the main block of code. TimeExistsTotal and TimeWshtTotalare used to accumulate the duration of the two statements of thought might be the most expensive in terms of time. I will explain them later.
With Wsht
For Each d2 In Range(.Cells(RowDataFirst, ColCrnt), _
.Cells(RowDataFirst, ColCrnt).End(xlDown))
TimeStart = Timer
ExistsCrnt = ValidNames.Exists(d2.Text)
TimeExistsTotal = TimeExistsTotal + Timer - TimeStart
If Not ExistsCrnt Then
TimeStart = Timer
d2.Interior.ColorIndex = 3
TimeWshtTotal = TimeWshtTotal + Timer - TimeStart
CountActionTypeErrors = CountActionTypeErrors + 1
End If
Next
End With
This is a slightly modified version of the critical part of your code. As explained earlier, I access cells in a named worksheet. I do not believe this could have a noticeable
I have split the test for a name existing out of the If statement so I can place Timer statements around the test. This probably has a minor effect on the duration but you cannot add timer statements without having an effect. Note that what these Timer statements are doing is accumulating the total duration of these tests. I have also places Timer statements around d2.Interior.ColorIndex = 3
Debug.Print "##### Check 1 #####"
Debug.Print " Number rows " & RowMax
Debug.Print " Number errors " & ErrorMax
Debug.Print " Valid names " & ValidNames.Count
Debug.Print " Total check time " & Format(Timer - TimeCheckStart, "00.000")
Debug.Print "Total exists time " & Format(TimeExistsTotal, "00.000")
Debug.Print " Total wsht time " & Format(TimeWshtTotal, "00.000")
End Sub
This is the final block of variation 1 of my code.
Two of my timings were:
Total check time 12.766 9.820
Total exists time 10.031 7.852
Total wsht time 2.152 1.543
The first issue to notice is the difference between the two sets of figures with the first run taking 33% more time than the second. This is typical of the variation you will get in the duration of short runs.
The second issue is that the total duration of the two timed statements is around .6 seconds less that the total duration. This is to be expected since all the other statement take some time; there is no expensive statement yet to find. Since your durations are so much longer than mine, you need to review your timings and perhaps test other statements if your unexplained duration is excessive.
The last issue is that the total duration of the worksheet update is so much less that the existence check. Since 10% of my data is faulty which I assume far exceeds your error rate, your worksheet update time should be much, much less than mine. Optimising the worksheet update would have minimal effect on the total duration.
With variation 2, the inner code is:
RowLast = .Cells(Rows.Count, ColCrnt).End(xlUp).Row
For RowCrnt = RowDataFirst To RowLast
TimeStart = Timer
ExistsCrnt = ValidNames.Exists(.Cells(RowCrnt, ColCrnt).Text)
TimeExistsTotal = TimeExistsTotal + Timer - TimeStart
If Not ExistsCrnt Then
TimeStart = Timer
.Cells(RowCrnt, ColCrnt).Interior.ColorIndex = 3
TimeWshtTotal = TimeWshtTotal + Timer - TimeStart
CountActionTypeErrors = CountActionTypeErrors + 1
End If
Next
Here I have used a For Loop instead of a For Each Loop. It is not clear from my timings which is faster. I will not bother to test further. I would use whichever I found more convenient even if I knew one was fractionally faster than the other.
For variation 3, I introduced a With statement for the cell:
With Wsht
RowLast = .Cells(Rows.Count, ColCrnt).End(xlUp).Row
For RowCrnt = RowDataFirst To RowLast
With .Cells(RowCrnt, ColCrnt)
TimeStart = Timer
ExistsCrnt = ValidNames.Exists(.Text)
TimeExistsTotal = TimeExistsTotal + Timer - TimeStart
If Not ExistsCrnt Then
TimeStart = Timer
.Interior.ColorIndex = 3
TimeWshtTotal = TimeWshtTotal + Timer - TimeStart
CountActionTypeErrors = CountActionTypeErrors + 1
End If
End With
Next
End With
This does look to be faster than variation 2, as expected, and perhaps a little faster than variation 1 but the evidence for these conclusions is poor.
Here are the timings:
In the bold lines, the timings are averages of the individual run times shown below.
The big reveal from this is that the size of the dictionary has no effect on run time.
My investigation has revealed nothing that would help you. I have explained how I created my timings and how I interpreted those timings so you can test what is different about your system.
Related
I have a program that creates 100 000 objects of class Client, puts them into array and then goes through that array 100 times, each time assigning each Client a different random number through Rnd() function:
Main sub:
Sub start()
Dim i As Long
Dim j As Long
Dim clientsColl() As Client
ReDim clientsColl(1 To 100000) As Client
For j = 1 To 100000
Set clientsColl(j) = New Client
clientsColl(j).setClientName = "Client_" & j
Application.StatusBar = "Getting client " & j
DoEvents
Next
Dim tempCount As Long
Dim clientCopy As Variant
For i = 1 To 100
tempCount = 0
For Each clientCopy In clientsColl
tempCount = tempCount + 1
clientCopy.generateRandom
'Application.StatusBar = "Calculating " & i & ": " & tempCount & "/" & 100000 '(1)
'DoEvents
Next
Application.StatusBar = "Calculating " & i
DoEvents
Next
MsgBox ("done")
End Sub
Client class:
Option Explicit
Dim clientName As String
Dim randomNumber As Double
Public Sub generateRandom()
randomNumber = Rnd()
End Sub
Public Property Get getClientName()
getClientName = clientName
End Property
Public Property Let setClientName(value As String)
clientName = value
End Property
The problem is, the execution time depends on whether or not line (1) is commented out. If it's executed, the statusbar gets renewed, but the execution time is very slow. If it's not executed, the program gets done really fast.
Why does this happen?
VBA is fast enough as long as you stay within. Whenever you turn to Excel, it may get much slower because Excel makes thousands of operations every time it gets control. You may consider turning off a few more services of Excel like I do in my applications:
Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
Application.DisplayStatusBar = False
Application.EnableEvents = False
ActiveSheet.DisplayPageBreaks = False
... and as far as I know DoEvents is the best way to make Excel update the status bar when you turn off automatic updates.
Another timesaving workaround can be to display only every 100th or 1000th message from within the inner loop.
when doing a progressbar or statusbar, you need to use it wisely.
Basically the progress info needs to be refreshed only every 0.1 seconds or so.
Knowing your max number of loops , and the time it takes, you might want to update the info only every (in your case) , let's say, 100 iterations of the loop.
This is done like this: if j mod 100=0 then application.statusbar="..." : doevents
Usually i even go further by using doevents less than my progressbar (second if j mod).
I have created macro in Excel VBA to perform iterative operations. The intention of this code is to paste the values from one range of cells, which contain formulas, into another range.
In this code, I have specified the location of the ranges (both for the ones being copied and for the ones being pasted).
During the modification of the Excel Model, I change the number of rows or columns (by adding or deleting), thus changing the actual location of the range of cells. As an example, what used to be Range("N786:BT786") can become Range("N650:BT650").
I would certainly appreciate if you could help to modify my code so that the ranges would not be fixed to the static location, but rather automatically update as the model itself is being updated.
Thank you in advance.
Below you may find my code.
Option Explicit
Sub calculations_update()
Application.DisplayStatusBar = False
Application.ScreenUpdating = False
Dim StartTime As Double
Dim SecondsElapsed As Double
StartTime = Timer
Do While i < 95
i = i + 1
ThisWorkbook.Worksheets("Electrity").Activate
Range("N786:BT786").Value = Range("N787: BT787").Value
Range("R826:BT826").Value = Range("R827: BT827").Value
ThisWorkbook.Worksheets("Efficiency").Activate
Range("H814").Value = Range("H815").Value
Range("H826").Value = Range("H827").Value
Range("H846").Value = Range("H847").Value
Loop
SecondsElapsed = Round(Timer - StartTime, 2)
MsgBox "The update was successful in " &SecondsElapsed& " seconds", bInformation
End Sub
In order to avoid using static cell addresses within your code, it is usually better to assign names to the relevant cells (using Formulas / Name Manager within Excel) and then use those names in your code.
Assuming you assigned names (with "Workbook" scope) to your ranges as follows:
Range Name
Electricity!$N$786:$BT$786 DstRng1
Electricity!$N$787:$BT$787 SrcRng1
Electricity!$R$826:$BT$826 DstRng2
Electricity!$R$827:$BT$827 SrcRng2
Efficiency!$H$814 DstRng3
Efficiency!$H$815 SrcRng3
Efficiency!$H$826 DstRng4
Efficiency!$H$827 SrcRng4
Efficiency!$H$846 DstRng5
Efficiency!$H$847 SrcRng5
(obviously, you could use more meaningful names if you like)
the following code could be used:
Option Explicit
Sub calculations_update()
Application.DisplayStatusBar = False
Application.ScreenUpdating = False
Dim StartTime As Double
Dim SecondsElapsed As Double
Dim i As Long
StartTime = Timer
Do While i < 95
i = i + 1
Range("DstRng1").Value = Range("SrcRng1").Value
Range("DstRng2").Value = Range("SrcRng2").Value
Range("DstRng3").Value = Range("SrcRng3").Value
Range("DstRng4").Value = Range("SrcRng4").Value
Range("DstRng5").Value = Range("SrcRng5").Value
Loop
SecondsElapsed = Round(Timer - StartTime, 2)
MsgBox "The update was successful in " & SecondsElapsed & " seconds", vbInformation
Application.DisplayStatusBar = True
Application.ScreenUpdating = True
End Sub
Context
I have a big macro where I have declared lots of global constant in a dedicated module (i.e, a module which contains only Public Const declarations).
About 100 of those global constants are used to assign columns names to each column of my main Data worksheet :
Public Const columnName = "A"
Public Const columnCity = "B"
Public Const columnPhone = "C"
...
Public Const columnColor = "CX"
This let me reference to the columns (of my Data worksheet, from the 10 other modules) using .Range(columnColor & l) instead of using .Range("CX" & l) (where l is obviously the row number) . This is much easier to code (I don't need to search for the right column) or to update if I decide to insert a column before "F" (I only have to update my const module and not the 10 other code modules).
However, it looks like using .Range(columnCity & l) is notably slower than using .Range("A" & l). (SEE EDIT BELOW)
The most processors intensive tasks are done using big 2D arrays. But I'm still probably calling those global column variables 100 000 times in some subs, since I'm not only checking/updating values/formulas (which I could do on a 2D array) but also dealing with cell's .Interior.Color, .Comment.Text ...
Question
How bad an idea is it to use such global variables (Public Const columnName...) to reference columns ?
Is there some standard way of doing so ?
Edit
As pointed by Tim, I think I indeed spent time changing every .Cells(l, 1) to .Range(columnName & 1) when I refactored my code to use the column variable. That means that :
My problem probably comes from using .Range vs .Cells rather than from the global variables.
I should probably refactor back to .Cells(l, colIndexName).
There is no "standard" way to do this, so you should run some performance tests and figure out the most efficient method if performance is an issue for you.
For example, using a numeric constant and Cells() seems to be about twice as fast as using Range():
Option Explicit
Public Const columnName As String = "A"
Public Const colIndexName As Long = 1
Sub Tester()
Dim l As Long, v, t
t = Timer
With Sheet1
For l = 1 To 300000#
v = .Range(columnName & 1).Value
Next l
End With
Debug.Print Timer - t '>> approx. 1.3 sec
t = Timer
With Sheet1
For l = 1 To 300000#
v = .Cells(1, colIndexName).Value
Next l
End With
Debug.Print Timer - t '>> approx. 0.6 sec
End Sub
However, it's likely only twice as fast if that's all you're doing - as soon as you add in other tasks that difference may wash out.
Need an Answer to provide formatting and results, though this is more of a comment.
I have found no significant difference between .Range(columnCity & l) and .Range("A" & l). Can you provide more insight on how you came to this conclusion?
Here is my code for speed comparison:
Public Const p_sCol As String = "A"
Sub tgr()
Dim ws As Worksheet
Dim i As Long, l As Long
Dim sTemp As String
Dim dTimer As Double
Dim aResults(1 To 1, 1 To 2) As Double
Set ws = ActiveWorkbook.ActiveSheet
l = 1
dTimer = Timer
For i = 1 To 100000
sTemp = vbNullString
sTemp = ws.Range(p_sCol & l).Value
Next i
aResults(1, 1) = Timer - dTimer
dTimer = Timer
For i = 1 To 100000
sTemp = vbNullString
sTemp = ws.Range("A" & l).Value
Next i
aResults(1, 2) = Timer - dTimer
ws.Range("C1:D1").Value = aResults
End Sub
I ran the test 10 times, and the average result for the public variable concatenation over 100,000 iterations was 0.4375 seconds while the average result for hard coding the column letter was 0.429688 seconds, which is a difference of 0.007813 seconds. Sure, the hard coded method was slightly faster, but not noticeably and certainly not significantly.
Imgur Album with screens of worksheets: http://imgur.com/a/6rFWF
Long story short, I am writing an Excel VBA utility that will assign two types of security shifts (called coverages and weekend duties) to security staff members. Basically, I have a worksheet with all of the staff members and their various availability information in it (the top image in the imgur album) and a worksheet with all of the coverage dates in it (the bottom image in the imgur album). Note that I don't have an image of the weekend duty dates as it looks similar to the coverage dates (but with the Friday and Saturday shifts).
The utility basically assigns a random staff member to each date, checking to make sure it doesn't violate any of their availability requirements. Unfortunately, I realize that I am creating a large chance for an infinite loop to occur. In my own testing, there has only been 1 attempt out of around 15-16 that did not enter an infinite loop near the end. So I'm looking for your help to account for this so the utility doesn't eat itself.
Here is the "pseudo-code" for the procedure in question.
'Loop for Column A in the Coverage Slips sheet (image 2 in imgur album)
Do Until (CoverageRowNumber = LastCoverageSlipRow + 1)
Get a Random Staff Member by RNG
If staff member still needs more shifts (see Requirements columns) Then
If staff member does not have an "X" under the day of the week Then
If staff member does not have a matching date conflict Then
Assign the coverage
Increase CoverageRowNumber
End If
End If
End If
Loop
'Loop for Column B in the coverage slips sheet (image 2 in imgur album)
Do Until...
Same as the loop above
Loop
Edit: Disregard that I have the dates in two columns for now. I'll be fixing that once I solve the problem of this post...it's an easy fix and will cut the code almost in half.
The problem is that as the utility gets near the end of the list of dates, it often runs into the scenario where the only staff members left cannot sit that specific shift (whether because of day of the week or specific date). In the event that it runs into this scenario, I can see a couple of acceptable options (though I don't know how I'd go about programming them):
Undo all of the work that the utility did and start over until it can get lucky and find a solution that works. This would save me some time doing manual placements for the last few shifts but might take a very long time. Additionally, I'd have to store all of the original values and then paste them back into the spreadsheet anytime it starts over.
Simply stop assigning shifts and just exit the procedure. I will be able to manually place the last few shifts by moving a few people around. I sure is a lot less work than manually assigning 200 shifts by hand like I've been doing it the past few years.
Do you guys have any thoughts that could be of help here? I'm not even sure how I could have the procedure check to see if there are any available options or not, but either way there's got to be a way to detect (and deter) this infinite loop before it crashes the program.
Sorry for the novel, and thanks in advance for any help!
Edit: In an effort to provide a little more clarity, I figured I'd copy and paste the actual code below:
'------------------------------------------------------------'
'Create ws variables for each worksheet
Dim wsConflicts As Worksheet
Dim wsCoverageSlips As Worksheet
Dim wsWDSlips As Worksheet
Dim wsCoverageOutput As Worksheet
Dim wsWDOutput As Worksheet
'------------------------------------------------------------'
Public Function SetSheets()
'Assign the worksheets to the ws variables
Set wsConflicts = Worksheets("Conflicts")
Set wsCoverageSlips = Worksheets("Coverage Slips")
Set wsWDSlips = Worksheets("WD Slips")
Set wsCoverageOutput = Worksheets("Coverage Output")
Set wsWDOutput = Worksheets("WD Output")
'Display a message (debugging)
'MsgBox "The sheets have been assigned successfully"
End Function
'------------------------------------------------------------'
Public Function ColumnLetter(ColumnNumber As Integer) As String
Dim n As Long
Dim c As Byte
Dim s As String
n = ColumnNumber
Do
c = ((n - 1) Mod 26)
s = Chr(c + 65) & s
n = (n - c) \ 26
Loop While n > 0
ColumnLetter = s
End Function
'------------------------------------------------------------'
Sub AssignCoverages()
'Fill the ws variables
Call SetSheets
'Set the first and last row numbers
Dim FirstStaffMemberRow As Integer
FirstStaffMemberRow = 3
Dim LastStaffMemberRow As Integer
LastStaffMemberRow = wsConflicts.UsedRange.Rows.Count
'Count the number of required coverages and weekend duties
Dim RequiredCoverages As Integer
Dim RequiredWDs As Integer
For i = FirstStaffMemberRow To LastStaffMemberRow
RequiredCoverages = RequiredCoverages + wsConflicts.Range("B" & i).Value
RequiredWDs = RequiredWDs + wsConflicts.Range("C" & i).Value
Next i
'Display a message (debugging)
MsgBox "You currently have " & RequiredCoverages & " required coverages and " & RequiredWDs & " required weekend duties."
'Count the number of coverage slips and weekend duty slips
Dim FirstCoverageSlipRow As Integer
FirstCoverageSlipRow = 1
Dim LastCoverageSlipRow As Integer
LastCoverageSlipRow = wsCoverageSlips.UsedRange.Rows.Count
Dim NumCoverageSlips As Integer
NumCoverageSlips = (LastCoverageSlipRow - FirstCoverageSlipRow + 1)
Dim FirstWDSlipRow As Integer
FirstWDSlipRow = 1
Dim LastWDSlipRow As Integer
LastWDSlipRow = wsWDSlips.UsedRange.Rows.Count
Dim NumWDSlips As Integer
NumWDSlips = (LastWDSlipRow - FirstWDSlipRow + 1)
'Check to make sure there are enough required shifts for slips
If RequiredCoverages <> NumCoverageSlips Then
MsgBox "The number of shifts you require (Columns B & C on Conflicts sheet) does not match the number of slips you've entered. You have " & RequiredCoverages & " required coverages and " & NumCoverageSlips & " coverage slips. You have " & RequiredWDs & " required weekend duties and " & NumWDSlips & " weekend duty slips. Please correct this error and retry."
Exit Sub
Else
'Debugging
'MsgBox "The number of shifts you require (Columns B & C on Conflicts sheet) matches the number of slips you've entered. You have " & RequiredCoverages & " required coverages and " & NumCoverageSlips & " coverage slips. You have " & RequiredWDs & " required weekend duties and " & NumWDSlips & " weekend duty slips."
End If
'Massive loop to assign coverages to random staff members
Dim NumRemainingCoverages As Integer
NumRemainingCoverages = NumCoverageSlips
Dim SlipRowNumber As Integer
SlipRowNumber = FirstCoverageSlipRow
'Loop for Column A
Do Until (SlipRowNumber = LastCoverageSlipRow + 1)
'Get a random staff member row
StaffMemberRow = GetRandomStaffMemberRow(FirstStaffMemberRow, LastStaffMemberRow)
'Check to make sure the staff member has remaining required coverages
If wsConflicts.Range("B" & StaffMemberRow).Value > 0 Then
'Check to make sure the staff member can sit the day of the week
Dim CurrentDate As Date
CurrentDate = wsCoverageSlips.Range("A" & SlipRowNumber).Value
Dim CurrentDay As Integer
CurrentDay = Weekday(CurrentDate)
Dim CurrentDayColumn As String
If CurrentDay = 1 Then CurrentDayColumn = "D"
If CurrentDay = 2 Then CurrentDayColumn = "E"
If CurrentDay = 3 Then CurrentDayColumn = "F"
If CurrentDay = 4 Then CurrentDayColumn = "G"
If CurrentDay = 5 Then CurrentDayColumn = "H"
If CurrentDay = 6 Then CurrentDayColumn = "I"
If CurrentDay = 7 Then CurrentDayColumn = "J"
If wsConflicts.Range(CurrentDayColumn & StaffMemberRow).Value = "" Then
'Check to make sure the staff member does not have a date conflict
Dim ColumnNumber As Integer
Dim ColumnLetterText As String
Dim CoverageDateConflicts As Integer
CoverageDateConflicts = 0
For ColumnNumber = 11 To 20
ColumnLetterText = ColumnLetter(ColumnNumber)
Dim CoverageSlipDate As Date
If IsDate(wsConflicts.Range(ColumnLetterText & StaffMemberRow).Value) = True Then
CoverageSlipDate = wsConflicts.Range(ColumnLetterText & StaffMemberRow).Value
Else
CoverageSlipDate = DateValue("01/01/1900")
End If
If CurrentDate = CoverageSlipDate Then
CoverageDateConflicts = CoverageDateConflicts + 1
End If
Next ColumnNumber
If CoverageDateConflicts = 0 Then
'Assign the coverage
Dim BlankCoverageOutputRow As Integer
BlankCoverageOutputRow = wsCoverageOutput.UsedRange.Rows.Count + 1
wsCoverageOutput.Range("A" & BlankCoverageOutputRow).Value = wsConflicts.Range("A" & StaffMemberRow).Value
wsCoverageOutput.Range("B" & BlankCoverageOutputRow).Value = CurrentDate
'Reduce the staff member's required coverages by 1
Dim CurrentRequirements As Integer
CurrentRequirements = wsConflicts.Range("B" & StaffMemberRow).Value
wsConflicts.Range("B" & StaffMemberRow).Value = CurrentRequirements - 1
'Reduce the number of remaning coverages by 1
NumRemainingCoverages = NumRemainingCoverages - 1
'Increase the slip row number by 1
SlipRowNumber = SlipRowNumber + 1
'Message box for debugging
'MsgBox "Coverage Date (" & CurrentDate & ") assigned to " & wsConflicts.Range("A" & StaffMemberRow).Value & "."
End If 'End date check
End If 'End day check
End If 'End requirements check
Loop 'End loop for column A
End Sub
'------------------------------------------------------------'
Public Function GetRandomStaffMemberRow(FirstStaffMemberRow As Integer, LastStaffMemberRow As Integer)
'Pick a random number between the first staff member row and the last
Call Randomize
GetRandomStaffMemberRow = Int((LastStaffMemberRow - FirstStaffMemberRow + 1) * Rnd + FirstStaffMemberRow)
End Function
The question is too open for a detailed answer, so I try with some guidelines. I hope it helps.
I would use a class Solution with the following members:
Solution.ReadInputFromSheet() reads the table from the sheet into the class members
Solution.GenerateRandom() creates a new random solution. Try to find a balance between smart (add some logic to avoid totally random solutions) and speed (don't get stuck, exit after trying 10 or 50 random numbers that don't work), but speed is more important
Solution.Quality() As Double calculates the quality of the solution. For example a solution that is not valid returns 0, if Joe has 10 consecutive shifts returns 20, if the shifts are better distributed returns 100.
Solution.WriteOnSheet() write the data from the class members into the sheet.
Solution.Clone() As Solution() creates a new Solution instance with the same data
Make a cycle that creates a solution, checks if its quality is better than the best quality solution found so far, if it is better keep it, otherwise go and calculate another solution.
Set BestS = New Solution
BestS.ReadInputFromSheet
BestS.GenerateRandom()
Set S = New Solution
S.ReadInputFromSheet
For I = 1 To 10000
S.GenerateRandom()
If S.Quality() > BestS.Quality() Then Set BestS = S.Clone()
Next I
BestS.WriteOnSheet
Instead of 10000 you can use Timer to run it for a finite number of seconds, or make a button to interrupt it when you come back from lunch break.
A faster solution generator function is better than risking of getting stuck with one difficult (or impossible) solution.
For a smarter solution generator function I need more details on the rules.
So I went ahead and developed my own solution to this problem--it's not perfect and it's probably not the best way to handle the scenario. But it works, and it solved my problem in a matter of minutes instead of hours learning other methods.
Basically, I created two new "counter" variables. The first is FailedAttempts. Every time the procedure tries a random staff member but runs into a conflict, it increments FailedAttempts by 1. Every time the random staff member is a successful match (no conflicts), it resets FailedAttempts to 0. If at any time FailedAttempts = 100, it immediately exits the loop and starts over. In other words, if it tries 100 random staff members in a row without finding a match, I assume it's not going to find a match and just cut my losses.
The second variable, Assignments, is incremented by 1 every time that the procedure makes a successful assignment. When this number equals the number of shifts that the procedure is supposed to assign, it immediately exits the loop.
To do this, I had to use a couple of forbidden 'GoTo' commands (I wasn't sure how else to exit the loop. You can exit a For loop with Exit For but I believe this is invalid for Do While loops. I ended up only needing two GoTo's, one for exiting the loop and one to go back to the beginning of the procedure. I also made sure that the cells in the worksheet that change during the procedure are reset to their original state before it retries the assignment procedure.
I'll save everyone the trouble of reading through the extended version of the code, but in 'pseudo-code' form it looks like this:
Retry: 'Label for GoTo command
Do Until (CoverageRowNumber = LastCoverageSlipRow + 1)
Get a Random Staff Member by RNG
If staff member still needs more shifts (see Requirements columns) Then
If staff member does not have an "X" under the day of the week Then
If staff member does not have a matching date conflict Then
'Assign the coverage
'Increase CoverageRowNumber
Assignments = Assignments + 1
Else
FailedAttempts = FailedAttempts + 1
End If
Else
FailedAttempts = FailedAttempts + 1
End If
Else
FailedAttempts = FailedAttempts + 1
End If
If FailedAttempts > 100 Then
GoTo ExitLoop
End If
Loop
ExitLoop: 'Label for GoTo command
If Assignments <> NumCoverageSlips Then
GoTo Retry
End If
'Do rest of procedure
Again, there may be (and certainly is) a more elegant and "correct" way of accomplishing the task at hand. This method worked for me with the given environment. Thanks to those who provided solutions--even though I ended up going a different direction they provided great food for thought and helped me learn a bunch of new methods (especially the class idea from #stenci).
Thanks all.
I have large tables in RTF format, from 20-150 Mb in size. I first tried to export the RTF -> HTML -> Import to excel. It took about 35 minutes for a 60 Mb file. Next, I tried copying the table directly from Word -> excel. It always fails midway (everything gets pasted, by data is not in the right cell).
I tried a few more ways (importing all cells into memory before transferring to excel, and other permutations, and methods detailed in this and other sources) before settling on the .ConvertToText method.
This method is relatively faster, taking about 25 minutes for the same 60 Mb file (this is without displaying Word, setting repagination, events, dispayupdate and tableautofit to false).
Considering that these files can be entirely loaded into RAM memory in less than 10 seconds, I wonder why does it take 25 minutes to read data off a 60 Mb file. I understand that the table engine in Word is slow because of the change to HTML format, but reading a table cell by cell is atrociously slow. First few cells are super fast, Last cells are slower - I'm sure manual reading is faster than that. It defeats the whole purpose of automation. However, I do not have a choice.
The code is:
Dim oWord As Word.Application
Dim RTF As Word.Document
Set oWord = CreateObject("Word.Application")
Set RTF = oWord.Documents.Open(filename:=Fname, ConfirmConversions:=False, ReadOnly:=False) ', ReadOnly:=True)
Application.StatusBar = vbNullString
Application.EnableEvents = False
Application.ScreenUpdating = False
Application.DisplayAlerts = False
With oWord
Options.Pagination = False
Options.AllowReadingMode = False
Application.AutoRecover.Enabled = False
Options.SaveInterval = 0
Options.CheckGrammarAsYouType = False
Options.CheckGrammarWithSpelling = False
End With
With RTF
Options.Pagination = False
Options.AllowReadingMode = False
Application.AutoRecover.Enabled = False
Options.SaveInterval = 0
Options.CheckGrammarAsYouType = False
Options.CheckGrammarWithSpelling = False
End With
Dim AAF As Table
For Each AAF In RTF.Tables
AAF.AllowAutoFit = False
Next
oWord.Visible = False
Dim rng As Word.Range
Dim sData As String
Dim aData1() As String
Dim aData2() As String
Dim aDataAll() As String
Dim nrRecs As Long
Dim nrFields As Long
Dim lRecs As Long
Dim lFields As Long
Dim CTbl As Table 'Data Table
Dim oCell As Cell
'I'm not displaying the code which replaces all ^p with a spl character to maintain the table structure - it is staright forward, and does the job
Set rng = CTbl.ConvertToText(Separator:="$", NestedTables:=False)
sData = rng.Text 'This contains the entire table, delimited by vbCr and $...
Application.StatusBar = "Closing open files..."
RTF.Close (wdDoNotSaveChanges) 'All data has been extracted, hence quit word
oWord.Quit
Set oWord = Nothing
sData = Mid(sData, 1, Len(sData) - 1)
aData1() = Split(sData, vbCr)
nrRecs = UBound(aData1())
If Dbg Then MsgBox "The table contained " & nrRecs + 1 & " rows"
For lRecs = LBound(aData1()) To nrRecs 'Cycle through all rows
aData2() = Split(aData1(lRecs), "$") 'Split rows into arrays
Debug.Print aData1(lRecs)
nrFields = UBound(aData2()) 'Find out the number of columns
If lRecs = LBound(aData1()) Then 'If this is the first row/cycle,
ReDim Preserve aDataAll(nrRecs, 9) 'nrFields) 'Resize the array - currently I'm using a fixed size for the column since the first row of my table contains merged rows
End If
For lFields = LBound(aData2()) To nrFields 'Cycle through all columns
aDataAll(lRecs, lFields) = aData2(lFields) 'Collate the data in a single array
'If MsgBox(aDataAll(lRecs, lFields), vbYesNo, "Continue?") = vbNo Then Exit For
Next
Next 'All of this was slapped together from MS code samples and stackoverflow examples
Any suggestions to improve performance?
The conversion will go a lot faster if you first split the table (I assume there's one very large table) into smaller tables and then convert each table to text.
I tried this on a table with 10000 rows and 10 columns. The time to convert to text went from ~280 seconds to ~70 seconds (i.e. 4X faster).
I ran the code below directly from the document with the 10000 row table (as opposed to running from Excel) for simplicity.
Splt then convert:
Sub SplitThenConvert()
Dim t As Table
Set t = ActiveDocument.Tables.Item(1)
Dim rowCount As Integer
Dim index As Integer
Dim numSteps As Integer
Dim splitRow As Integer
Dim increment As Integer
Dim start_time, end_time
start_time = Now()
Application.ScreenUpdating = False
rowCount = t.Rows.Count
numSteps = 10
increment = rowCount / numSteps
splitRow = rowCount - increment
For index = 1 To numSteps
Debug.Print "Split #" + CStr(index)
ActiveDocument.Tables(1).Rows(splitRow).Select
Selection.SplitTable
splitRow = splitRow - increment
If splitRow < increment Then
Exit For
End If
Next index
index = ActiveDocument.Tables.Count
While index > 0
Debug.Print "Convert #" + CStr(index)
ActiveDocument.Tables(index).ConvertToText ","
index = index - 1
Wend
end_time = Now()
Application.ScreenUpdating = True
MsgBox (DateDiff("s", start_time, end_time))
End Sub
Convert entire table without splitting:
Sub ConvertAll()
Dim start_time, end_time
Application.ScreenUpdating = False
start_time = Now()
ActiveDocument.Tables(1).ConvertToText ","
end_time = Now()
Application.ScreenUpdating = True
MsgBox (DateDiff("s", start_time, end_time))
End Sub
I do agree with #KazJaw: reading/writing from/to MS Office programs (including .rtf because is treated as Word) is very computational expensive, better relying on other means as much as possible (just converting the .rtf file reading into a simple .txt file reading would improve the speed a lot). I have recently answered a post on these lines.
The other proposal I have is reducing the number of "live Office variables" as much as possible. Instead creating the RTF and the oDoc variables at the same time, better doing it one after the other (the same for Excel). What should be done only under exceptional circumstamces (because of being too computational expensive) is copying/pasting in real time between two different instances (for example, two different Word documents).
Thus, use the connection to Office programs for what it is intended, that is, top-level access to a file storing information in a pretty complex way: populate values, change formatting, perform complex actions (e.g., searching through the whole document); but intend to reduce the iterative behaviour (e.g., copying from one cell and pasting into another one over and over) as much as possible. See it in this way: copying/pasting in a .txt file involves just inspecting the input value/the target location and performing the action; doing it in Word involves the same than in the .txt file plus accounting for the huge amount of variables analysed while considering each record (formatting, references to other elements, special actions, etc.).