Optimize vba code when it comes to processing a high number of rows - vba

Greeting all! I wrote a code that allows me to compare two EXCEL worksheets for same values; here it is:
Sub compare()
Dim i As Integer
Dim j As Integer
Dim oldVal1 As Variant
Dim oldVal2 As Variant
Dim newVal1 As Variant
Dim newVal2 As Variant
Dim count As Integer
Const equal = "equal"
Dim WKB As Workbook
Dim OldWS As Worksheet
Dim NewWS As Worksheet
Dim DiffWS As Worksheet
Const OldWSName = "Sheet1"
Const NewWSName = "Sheet2"
Const DiffWSName = "Sheet3"
Set WKB = ActiveWorkbook
Set OldWS = WKB.Worksheets(OldWSName)
Set NewWS = WKB.Worksheets(NewWSName)
Set DiffWS = WKB.Worksheets(DiffWSName)
Dim OldRow As Long
Dim NewRow As Long
Call OptimizeCode_Begin
oldRow = OldWS.Cells(Rows.Count, 1).End(xlUp).Row
newRow = NewWS.Cells(Rows.Count, 1).End(xlUp).Row
count = 1
For i = 2 To oldRow
oldVal1 = OldWS.Cells(i, 1).Value
oldVal2 = OldWS.Cells(i, 4).Value
For j = 2 To newRow
newVal1 = NewWS.Cells(j, 1).Value
newVal2 = NewWS.Cells(j, 4).Value
If (oldVal1 = newVal1) And (oldVal2 = newVal2) Then
count = count + 1
DiffWS.Cells(count, 1).Value = equal
DiffWS.Cells(count, 2).Value = oldVal1
DiffWS.Cells(count, 3).Value = oldVal2
End If
Next j
Next i
Call OptimizeCode_End
DiffWS.Activate
'Reset variables
Set WKB = Nothing
Set OldWS = Nothing
Set NewWS = Nothing
Set DiffWS = Nothing
Application.ScreenUpdating = True
MsgBox ("Your data has been compared!")
End Sub
This code is preceded by the variables definitions, long list that I chose not to paste in here. But basically, oldVal1 is the first value compared from OldWS worksheet and oldVal2 the second one from the same worksheet. Those values are being compared with newVal1 and newVal2 from NewWS worksheet (second worksheet). Same values are copied over to DiffWS (third worksheet) with an additional column on the left for the status equal, hence DiffWS.Cells(count + 1, 2).Value = oldVal1.
I've added the following functions to optimize the code and make it run fast when it comes to comparing 2 worksheets of at east 100000 rows:
Sub OptimizeCode_Begin()
Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
EventState = Application.EnableEvents
Application.EnableEvents = False
PageBreakState = ActiveSheet.DisplayPageBreaks
ActiveSheet.DisplayPageBreaks = False
End Sub
AND
Sub OptimizeCode_End()
ActiveSheet.DisplayPageBreaks = PageBreakState
Application.EnableEvents = EventState
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
End Sub
The execution is faster on a low number of rows I must admit, but it just doesn't work when the worksheets grow bigger. My EXCEL crashes when I run my code.
Any other optimization tips I should know of? because I am at loss as to how to do this. My code would be useless if no solution is available, and I'd better know it now and think about a way other than EXCEL to compare my data.
Thanks in advance for shedding light on this.

Your code sample doesn't indicate how you're opening each of the workbooks. In my experience, almost any time Excel crashes, it's because of a memory issue, and often that's because instances of Excel are being opened in the background, and then not .Close'd properly, or perhaps the excel objects not being Set to Nothing (perhaps from repeated execution of the code due to errors).
If this is the case, then Ctrl+Alt+Del → Task Manager → Processes will show multiple instances of Excel, and the easiest fix is to reboot, and then, of course, fix the handling of the Excel object in your code.
If the goal is to compare two worksheets, then perhaps a better question is why you are attempting to re-create functionality that already exists existing solutions available, most likely even built-in to your copy of Office.
Depending on your version, you may already have a utility installed.
For example, if you're running Office Pro Plus 2013, you can use Microsoft Spreadsheet Compare to run a report on the differences.
More information:
Basic tasks in Spreadsheet Compare
I'm running Excel 2016 from an Office 365 Subscription. I've never had a need to compare spreadsheets, but out of curiosity, I just:
Hit the Windows Key
Start typing: spreadsheet compare
Sit back and let the professionally built analysis/merge tool do it's job.
If all else fails, there are a number of other (3rd-Party) free and paid utilities available as well (such as xlCompare).

Related

how to copy data based on a specific cell value from several workbooks into a master workbook

I've have a problem where I need to copy data from 2 workbooks into a master one based on a specific set of values (several names) in a column 3. I'm new to VBA, and probably I can't precisely ask the question to find an answer, apologies for this. Would you please help me, i need to pull rows of data from each workbook only if a column 3 contains a name I'm looking for. I have the below code to pull the data from every workbook in a specific folder, however it grabs absolutely everything.
Sub copyDataFromManyFiles()
With Application
.DisplayAlerts = False
.EnableEvents = False
.ScreenUpdating = False
End With
Dim FolderPath As String, FilePath As String, FileName As String
FolderPath = "C:\Users\Jasiek\Desktop\Yuuuge MacroTest\"
FilePath = FolderPath & "*ennik*.xl*"
FileName = Dir(FilePath)
Dim lastrow As Long, lastcolumn As Long
Do While FileName <> ""
Workbooks.Open (FolderPath & FileName)
lastrow = ActiveSheet.Cells(Rows.Count, 1).End(xlUp).Row
lastcolumn = ActiveSheet.Cells(1, Columns.Count).End(xlToLeft).Column
Range(Cells(2, 1), Cells(lastrow, lastcolumn)).Copy
Application.DisplayAlerts = False
ActiveWorkbook.Close
erow = Sheet1.Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
ActiveSheet.Paste Destination:=Worksheets(1).Range(Cells(erow, 1), Cells(erow, 30))
FileName = Dir
Loop
With Application
.DisplayAlerts = True
.EnableEvents = True
.ScreenUpdating = True
End With
'Call removeDuplicates
End Sub
How should I modify the code to filter the data before rows are copied? I do care about performance as there are 100k+ records. I would really appreciate help. Thanks.
Presuming that you're dealing with a fixed set of names (i.e.: 2-3) the easiest way to proceed would probably to use the instr() method to determine whether the cell value has a name that you're looking for.
If you're dealing with a larger list of names you could also just iterate through a collection (or array) of names for each row check. The instr() method below returns the integer position of the searched string from the string you're examining. If it finds something it will return some value greater than 0 for that position (i.e.: position 1 being the smallest value where it finds a match). The vbTextCompare option just ensures that it is case insensitive as it does the comparison (i.e.: upper or lower case mismatch).
I like to refer to worksheets and workbooks explicitly with my calls to ensure things are clean and specific, but you may be able to use ActiveWorkbook or ActiveWorksheet if you have a strong preference for it.
In terms of efficiency if you store each cell's value in a temporary string variable and just run the comparison against the tempStr string you will save retrieval time compared to referring to the cell value each time (which takes a much longer time to complete execution for x comparison checks for each row for x names).
Option Compare Text
Option Explicit
Public Sub start()
Dim foundBool As Boolean
Dim oxl As Excel.Application
Dim wb1 As Excel.Workbook
Dim tempName1 As String
Dim tempName2 As String
Dim tempStr1 As String
Dim tempStr2 As String
Dim tempInt1 As Integer
Dim tempInt2 As Integer
Set oxl = New Excel.Application
Set wb1 = oxl.Workbooks().Open("[path here]\Book1.xlsm")
wb1.Activate
'ActiveWorkbook.Worksheets ("") can also possibly be used
'if needed in lieu of a oxl and wb1 declaration, or perhaps
'even ActiveSheet.
'cells().value provides the apparent value of the cell taking formatting into account. Applies to things like dates
'or currencies where the underlying value might be different than what's shown.
tempStr1 = CStr(wb1.Worksheets("Sheet1").Cells(1, 1).Value)
tempName1 = "John"
'cells().value2 provides the underlying actual value of the cell (i.e.: without any formatting or interperetation).
'if you deal a lot with dates/currencies and don't want excel formatting to get between you and the source
'data use this one. Used here only to show the two different options
tempStr2 = CStr(wb1.Worksheets("Sheet1").Cells(2, 1).Value2) '<---next row's value
tempName2 = "Peter"
tempInt1 = InStr(1, tempStr1, tempName1, vbTextCompare)
tempInt2 = InStr(1, tempStr1, tempName2, vbTextCompare)
If tempInt1 > 0 Or _
tempInt2 > 0 Then
foundBool = True
End If
wb1.Close
oxl.Quit
Set wb1 = Nothing
Set oxl = Nothing
End Sub

Runtime error 13 in a for i loop, which used to work

Background:
I want to hide columns in a sheet based on whether there is an x in row 7. The x is not typed in but filled in via a formula.
I used the following code in another worksheet, were it works. The only thing I changed is the name of the sub, the worksheet and the row (7 instead of 5).
However whenever I try to manually run this sub from the vba editor as a test, it produces a runtime error 13 (mismatched type).
Sub hidCol2()
Dim i As Long
Application.ScreenUpdating = False
Set ws = ThisWorkbook.Worksheets("Zeitplan")
ws.Cells.EntireColumn.Hidden = False
For i = Cells(7, Columns.Count).End(xlToLeft).Column To 1 Step -1
If Cells(7, i) = "x" Then Cells(7, i).EntireColumn.Hidden = True
Next i
Application.ScreenUpdating = True
End Sub
My Question:
Why does the above code produce a runtime error 13, what do I need to correct?
Here it is :
Note it works without Dim ws but I think it's a good practice to dimension the variables before use.
If anyone can let me know why Dim ws here wasn't necessary that would clear some doubts in my head.
Sub hidCol2()
Dim i As Long
Dim ws As Worksheet 'As Suggested by #eirikdaude but I don't know why it worked without it as well (Tested on a workbook with a single worksheet)
Application.ScreenUpdating = False
Set ws = ThisWorkbook.Worksheets("Zeitplan")
ws.Activate
ws.Cells.EntireColumn.Hidden = False
For i = ws.Cells(7, Columns.Count).End(xlToLeft).Column To 1 Step -1
If Trim(ws.Cells(7, i).Text) = "x" Then ws.Cells(7, i).EntireColumn.Hidden = True
Next i
Application.ScreenUpdating = True
End Sub

slow cell formatting using vba?

Disclaimer: I am relatively new to vba and macros.
I have written a macro to update value and formatting in some individual cells after reading and parsing a json through http and the process is very slow, so I broke down the code into different portions to see where the bottleneck might be. Turns out the cell updating is the problem, I have the following code:
Sub test()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False
Application.EnableCancelKey = False
t = Timer
With Range("A1")
.Font.Italic = True
.Interior.ColorIndex = 37
.Value = 3412
End With
Debug.Print Timer - t
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Application.EnableEvents = True
Application.EnableCancelKey = True
End Sub
the debug print is about 0.3 to 0.5 sec... I have afterwards further wrapped the timer around each of the italic, colorIndex, and value lines and they all turns out take about 0.015 sec each... I have tried searching online how to make the code more efficient, hence the screenupdating toggles as well as no selection, but 0.5 sec still seem a bit slow in updating a cell to me.
please note that I am not whining, I just want to know if I am doing the right thing here. Is there a more efficient way to implement the formatting and value changes that I posted here, or is it just a fact that excel takes this amount of time to update the cell? I am just very curious because the json reading and parsing that I also implemented are significantly faster than this.
Also I have tested this script on at least 3 computers and they all take around the same time so I don't think it's an individual computer problem. And I used excel 2007 and 2010 to test.
I assume you are wanting to format more than a single cell? If so, it will be faster to create a range reference to all the cells requiring the same format (it need not be contiguous), then apply the required format to that range object in one step
Following example demo's creating a range reference, and applying format in one go
Option Explicit
Private Declare Function GetTickCount Lib "kernel32" () As Long
Sub Demo()
Dim t As Long
Dim n As Long, i As Long
Dim m As Long
Dim ws As Worksheet
Dim cl As Range
Dim rSearch As Range
Dim rResult As Range
Set ws = ActiveSheet ' or another sheet...
Set rSearch = ws.Range("A1:A1000")
' note, this is an inefficient loop, can be made much faster
' details will depend on the use case
For Each cl In rSearch
' determine if cell is to be formatted
If cl.Row Mod 2 = 0 Then
' add cl to Result range
If rResult Is Nothing Then
Set rResult = cl
Else
Set rResult = Application.Union(rResult, cl)
End If
End If
Next
Debug.Print "Result Range includes ", rResult.Cells.Count, "Cells"
t = GetTickCount
' Apply format
With rResult
.Font.Italic = True
.Interior.ColorIndex = 37
.Value = 3412
End With
Debug.Print (GetTickCount - t) / 1000, "seconds"
End Sub

Pulling duplicate rows/criteria not working?

I'm a VBA newbie, pretty much learning by modifying existing macros and trial and error, so I apologize if what I am asking seems bush-league.
I combined/modified some macros to pull rows from an external workbook onto my workbook based on criteria. The issue is that I am randomly being given duplicate rows, and I cannot figure out as to why. Can anyone see what the issue is? The second issue I am having is that in addition to the column AF criteria, I would like to filter so that the dates in column E fall between the dates specified in two cells on a different sheet.
I have been trying to use the line:
If DateValue.Sheets("Control")("B1") < ("E1:E" & B) < DateValue.Sheets("Control")("C1") Then
but I am either placing it incorrectly or just completely off on the coding... Can someone help me out with that?
Sub FetchComplaints()
Dim path As String
Dim FileName As String
Dim LastCell As Range
Dim Wkb As Workbook
Dim WS As Worksheet
Dim ThisWB As String
Sheets("Sheet1").Range("A2:S1000").Clear
Sheets("ComplaintsFetched").Activate
Range("A1:AP5000").Clear
ThisWB = ThisWorkbook.Name
Application.EnableEvents = False
Application.ScreenUpdating = False
path = GetFileName
Set Wkb = Workbooks.Open(FileName:=path)
For Each WS In Wkb.Worksheets
WS.Select
B = Application.CountA(Range("A:A"))
If B = 0 Then
Else
For Each cell In Range("AF1:A" & B)
If cell.Value = True Or cell.Value = "Written" Then
Anum = Application.CountA(Workbooks(ThisWB).Sheets("ComplaintsFetched").Range("A:A")) + 1
cell.EntireRow.Copy Workbooks(ThisWB).Sheets("ComplaintsFetched").Range("A" & Anum)
End If
Next cell
End If
Next WS
Wkb.Close False
Application.EnableEvents = True
Application.ScreenUpdating = True
Set Wkb = Nothing
Set LastCell = Nothing
Set LastCell = Nothing
GetFileName is a function which retrieves the path.
Thanks for any help, and sorry if I am missing something very simple.
Try to use this syntax:
If Sheets("Control").Range("B1").Value < Sheets("Control").Range("C" & B).Value ...
Multiple conditions should be combined using And.

Efficiently assign cell properties from an Excel Range to an array in VBA / VB.NET

In VBA / VB.NET you can assign Excel range values to an array for faster access / manipulation. Is there a way to efficiently assign other cell properties (e.g., top, left, width, height) to an array? I.e., I'd like to do something like:
Dim cellTops As Variant : cellTops = Application.ActiveSheet.UsedRange.Top
The code is part of a routine to programmatically check whether an image overlaps cells that are used in a workbook. My current method of iterating over the cells in the UsedRange is slow since it requires repeatedly polling for the top / left / width / height of the cells.
Update: I'm going to go ahead an accept Doug's answer as it does indeed work faster than naive iteration. In the end, I found that a non-naive iteration works faster for my purposes of detecting controls that overlap content-filled cells. The steps are basically:
(1) Find the interesting set of rows in the used range by looking at the tops and heights of the first cell in each row (my understanding is that all the cells in the row must have the same top and height, but not left and width)
(2) Iterate over the cells in the interesting rows and perform overlap detection using only the left and right positions of the cells.
The code for finding the interesting set of rows looks something like:
Dim feasible As Range = Nothing
For r% = 1 To used.Rows.Count
Dim rowTop% = used.Rows(r).Top
Dim rowBottom% = rowTop + used.Rows(r).Height
If rowTop <= objBottom AndAlso rowBottom >= objTop Then
If feasible Is Nothing Then
feasible = used.Rows(r)
Else
feasible = Application.Union(used.Rows(r), feasible)
End If
ElseIf rowTop > objBottom Then
Exit For
End If
Next r
Todd,
The best solution I could think of was to dump the tops into a range and then dump those range values into a variant array. As you said, the For Next (for 10,000 cells in my test) took a few seconds. So I created a function that returns the top of the cell that it's entered into.
The code below, is mainly a function that copies the usedrange of a sheet you pass to it and then enters the function described above into each cell of the usedrange of the copied sheet. It then transposes and dumps that range into a variant array.
It only takes a second or so for 10,000 cells. Don't know if it's useful, but it was an interesting question. If it is useful you could create a separate function for each property or pass the property you're looking for, or return four arrays(?)...
Option Explicit
Option Private Module
Sub test()
Dim tester As Variant
tester = GetCellProperties(ThisWorkbook.Worksheets(1))
MsgBox tester(LBound(tester), LBound(tester, 2))
MsgBox tester(UBound(tester), UBound(tester, 2))
End Sub
Function GetCellProperties(wsSourceWorksheet As Excel.Worksheet) As Variant
Dim wsTemp As Excel.Worksheet
Dim rngCopyOfUsedRange As Excel.Range
Dim i As Long
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
wsSourceWorksheet.Copy after:=ThisWorkbook.Worksheets(ThisWorkbook.Worksheets.Count)
Set wsTemp = ActiveSheet
Set rngCopyOfUsedRange = wsTemp.UsedRange
rngCopyOfUsedRange.Formula = "=CellTop()"
wsTemp.Calculate
GetCellProperties = Application.WorksheetFunction.Transpose(rngCopyOfUsedRange)
Application.DisplayAlerts = False
wsTemp.Delete
Application.DisplayAlerts = True
Set wsTemp = Nothing
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
End Function
Function CellTop()
CellTop = Application.Caller.Top
End Function
Todd,
In answer to your request for a non-custom-UDF I can only offer a solution close to what you started with. It takes about 10 times as long for 10,000 cells. The difference is that your back to looping through cells.
I'm pushing my personal envelope here, so maybe somebody will have a way to to it without a custom UDF.
Function GetCellProperties2(wsSourceWorksheet As Excel.Worksheet) As Variant
Dim wsTemp As Excel.Worksheet
Dim rngCopyOfUsedRange As Excel.Range
Dim i As Long
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
wsSourceWorksheet.Copy after:=ThisWorkbook.Worksheets(ThisWorkbook.Worksheets.Count)
Set wsTemp = ActiveSheet
Set rngCopyOfUsedRange = wsTemp.UsedRange
With rngCopyOfUsedRange
For i = 1 To .Cells.Count
.Cells(i).Value = wsSourceWorksheet.UsedRange.Cells(i).Top
Next i
End With
GetCellProperties2 = Application.WorksheetFunction.Transpose(rngCopyOfUsedRange)
Application.DisplayAlerts = False
wsTemp.Delete
Application.DisplayAlerts = True
Set wsTemp = Nothing
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
End Function
I would add to #Doug the following
Dim r as Range
Dim data() as Variant, i as Integer
Set r = Sheet1.Range("A2").Resize(100,1)
data = r.Value
' Alternatively initialize an empty array with
' ReDim data(1 to 100, 1 to 1)
For i=1 to 100
data(i,1) = ...
Next i
r.Value = data
which shows the basic process of getting a range into an array and back again.