Most efficient way to change values conditionally in a range? - vba

I'm just wanting to run through a (large) range and replace certain values (if they're above a given max or below a given min...also one particular character) with a given replacement value.
My first thought is to simply traverse each cell and check/replace when necessary. I have a feeling this procedure would be really slow though, and I'm curious if there's a better way to accomplish this.
Any time I write code that does something similar to this in VBA I watch each cell have its value altered cell by cell and it seems like there must be better way. Thanks in advance.
edit:
I haven't even written this implementation yet because I know what the result will be and I would rather do something different if it's possible, but here's what it would look like
For something
If(Range.Value == condition)
Range.Value = replacement_value
Range = Range.Offset(a, b)
End For

Make a formula in a separate column, and then copy/paste special, values only.
= if(A2 > givenvalue; replace; if(A2< anothergivenvalue; anotherreplace; if (A2 = "particularcharacterortext"; replaceonemore; A2)))
Put the formula in an empty cell in an empty column, drag it or copy/paste to the entire column. After that, if the new values are ok, copy/paste values only to the original position.

The following VBA code provides a simple framework that you can customize to meet your needs. It incorporates many of the optimizations that have been mentioned in the comments to your question, such turning off screen updating and moving the comparison from the worksheet to an array.
You will notice that the macro does a rather large compare and replace. The data set I ran it on was 2.5 million random numbers between 1 and 1000 in the range A1:Y100000. If a number was greater than 250 and less than 500, I replaced it with 0. This required replacing 24.9 percent of all the numbers in the data set.
Sub ReplaceExample()
Dim arr() As Variant
Dim rng As Range
Dim i As Long, _
j As Long
Dim floor as Long
Dim ceiling as Long
Dim replacement_value
'assign the worksheet range to a variable
Set rng = Worksheets("Sheet2").Range("A1:Y100000")
floor = 250
ceiling = 500
replacement_value = 0
' copy the values in the worksheet range to the array
arr = rng
' turn off time-consuming external operations
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False
'loop through each element in the array
For i = LBound(arr, 1) To UBound(arr, 1)
For j = LBound(arr, 2) To UBound(arr, 2)
'do the comparison of the value in an array element
'with the criteria for replacing the value
If arr(i, j) > floor And arr(i, j) < ceiling Then
arr(i, j) = replacement
End If
Next j
Next i
'copy array back to worksheet range
rng = arr
'turn events back on
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Application.EnableEvents = True
End Sub
I did some performance testing on different alternatives for coding this simple compare and replace, with results that I would expect are consistent with VBA performance results by others. I ran each alternative 10 times, calculating the elapsed time for each run, and averaging the 10 elapsed times.
The results reveal the large impact that using arrays can have, especially when the data set is large: Compared to code that tested and changed worksheet cell values one-by-one, the array operation -- copying the data set from the worksheet into an array, comparing and changing the array values, and then writing the array results back to the worksheet -- in this case reduced average run times by 98 percent, from 3.6 minutes to 4 seconds.
While the optimizations that turned off external events made a noticeable difference in worksheet operations, with a 22 percent reduction in run times, those optimizations had very little impact when most of the computational work is array-based.

Related

VBA Get Values from AutoFilter

Let's say I have very large set of data with over 100,000+ rows. In Column A, I want to find each unique number.
I understand this can be done using the .Find feature and Collections/Arrays but those seem to take a good bit of time - especially with 100,000+ rows.
However, after AutoFiltering Column A, when I hit the down arrow it displays only unique variables. Is it possible to simply extract those values out of the selections in this way?
'pseudocode
filter.Count
Dim X As Long
For x = 2 to filter.Count
Cells(x, 14) = filter(x)
Next x
You can use advanced filter, it's pretty darn quick. I tried it with 127k rows, the results were instant.
Columns("A:A").AdvancedFilter Action:=xlFilterCopy, CopyToRange:=Range("D1"), Unique:=True
You can extract the visible cells in to an array. Say your total range (without filter) is A2:A10000. Run your filter, then you can run this macro:
Sub t()
Dim arr() As Variant
arr = Range("A2:A10000").SpecialCells(xlCellTypeVisible)
Dim i As Long
For i = LBound(arr) To UBound(arr)
Debug.Print (arr(i, 1))
' Do things with each entry in array
Next i
End Sub

compare two sheets values in excel

How can I compare one column's all values of a sheet-1 to another column values with different sheet-2 and if match then returns the value corresponding one of the columns of sheet-1 to another column of sheet-2 in excel?
Let's assume your values are in columns A of sheets named Sheet1 and Sheet2. Then, you can place the following formula into B1 of Sheet2 and drag down enough to cover you desired range: =IF(Sheet1!A1=Sheet2!A1,Sheet2!A1,"")
or, if you'e rather use VBA, place this code into a module:
Sub columnCompare()
Dim sh1 As Worksheet, sh2 As Worksheet, r1 As Range, r2 As Range
Set sh1 = Worksheets("Sheet1")
Set sh2 = Worksheets("Sheet2")
Set r1 = sh1.Range("A1")
Set r2 = sh2.Range("A1")
While r1 <> "" And r2 <> ""
If r1 = r2 Then r2.Offset(0, 1) = r1
Set r1 = r1.Offset(1, 0)
Set r2 = r2.Offset(1, 0)
Wend
End Sub
If am understanding correctly, this is what you want,
Sheet1
Sheet2
Enter the below formula in B2 of sheet2 and drag down as in the image,
=INDEX(Sheet1!B:B,MATCH(A2,Sheet1!A:A,0),1)
I can only answer part of your question: comparing two columns and detecting that they differ.
You have an excellent tutorial answer for that in Tony M's answer, above.
However, this will perform very slowly on a large data set, because:
Reading a range one cell at a time is very slow;
Comparing values pair-by-pair is inefficient, especially for strings, when the number of values gets into the tens of thousands,
Point(1) is the important one: it takes the same amount of time for VBA to pick up a single cell using var = Range("A1") as it does to pick up the entire range in one go using var = Range("A1:Z1024"); and every interaction with the sheet takes four times as much time as a string comparison in VBA, and twenty times longer than an comparison between floating-point decimals; and that, in turn, is three times longer than an integer comparison.
So your code is probably four times faster, and possibly a hundred times faster, if you read the entire range in one go, and work on the Range.Value2 array in VBA.
That's in Office 2010 and 2013 (I tested them); for older version of Excel, you'll see quoted times between 1/50th and 1/500th of a second, for each VBA interaction with a cell or range of cells. That'll be *way** slower because, in both old and new versions of Excel, the VBA actions will still be in single-digit numbers of microseconds: your code will run at least a hundred times faster, and probably thousands of times faster, if you avoid cell-by-cell reads from the sheet in older versions of Excel.
So big gains are there to be made - an interval perceptible to the user - in picking up the ranges in a single 'hit' and then performing the comparison on each item of an array in VBA.
arr1 = Range1.Values
arr2 = Range2.Values
' Consider checking that the two ranges are the same size
For i = LBound(arr1, 1) To Ubound(arr1, 2)
For j = LBound(arr1, 2) To Ubound(arr1, 2)
If arr1(i, j) <> arr2(i, j) Then
bMatchFail = True
Exit For
End If
Next j
If bMatchFail Then Exit For
Next i
Erase arr1
Erase arr2
You'll notice that this code sample is generic, for two ranges of the same size taken from anywhere - even from separate workbooks. If you're comparing two adjacent columns, loading a single array of two columns and comparing IF arrX(i, 1) <> arrX(i,2) Then is going to halve the runtime.
Your next challenge is only relevant if you're picking up tens of thousands of values from large ranges: there's no performance gain in this extended answer for anything smaller than that.
What we're doing is:
Using a hash function to compare the values of two large ranges
The idea is very simple, although the underlying mathematics is quite challenging for non-mathematicians: rather than comparing one value at a time, we run a mathematical function that 'hashes' the values into a short identifier for easy comparison.
If you're comparing ranges against a 'reference' copy, you can store the 'reference' hash, and this halves the workload.
There are some fast and reliable hashing functions out there, and they are available in Windows as part of the security and cryptography API. There is a slight problem in that they run on strings, and we have an array to work on; but you can easily find a fast 'Join2D' function that gets a string from the 2D arrays returned by a range's .Value2 property.
So a fast comparison function for two large ranges will look like this:
Public Function RangeCompare(Range1 as Excel.Range, Range2 As Excel.Range) AS Boolean
' Returns TRUE if the ranges are identical.
' This function is case-sensitive.
' For ranges with fewer than ~1000 cells, cell-by-cell comparison is faster
' WARNING: This function will fail if your range contains error values.
RangeCompare = False
If Range1.Cells.Count &LT;&GT; Range2.Cells.Count Then
RangeCompare = False
ElseIf Range1.Cells.Count = 1 then
RangeCompare = Range1.Value2 = Range2.Value2
Else
RangeCompare = MD5(Join2D(Range1.Value2)) = MD5(Join2D(Range2.Value2))
Endif
End Function
I've wrapped the Windows System.Security MD5 hash in this VBA function:
Public Function MD5(arrBytes() As Byte) As String
' Return an MD5 hash for any string
' Author: Nigel Heffernan Excellerando.Blogspot.com
' Note the type pun: you can pass in a string, there's no type conversion or cast
' because a string is stored as a Byte array and VBA recognises this.
Dim oMD5 As Object 'Set a reference to mscorlib 4.0 to use early binding
Dim HashBytes() As Byte
Dim i As Integer
Set oMD5 = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")
HashBytes = oMD5.ComputeHash_2((arrBytes))
For i = LBound(HashBytes) To UBound(HashBytes)
MD5 = MD5 & Right("00" & Hex(HashBytes(i)), 2)
Next i
Set oMD5 = Nothing ' if you're doing this repeatedly, declare at module level and persist
Erase HashBytes
End Function
There are other VBA implementations out there, but nobody seems to know about the Byte Array / String type pun - they are not equivalent, they are identical - so everyone codes up unnecessary type conversions.
A fast and simple Join2D function was posted by Dick Kusleika on Daily Dose of Excel in 2015:
Public Function Join2D(ByVal vArray As Variant, Optional ByVal sWordDelim As String = " ", Optional ByVal sLineDelim As String = vbNewLine) As String
Dim i As Long, j As Long
Dim aReturn() As String
Dim aLine() As String
ReDim aReturn(LBound(vArray, 1) To UBound(vArray, 1))
ReDim aLine(LBound(vArray, 2) To UBound(vArray, 2))
For i = LBound(vArray, 1) To UBound(vArray, 1)
For j = LBound(vArray, 2) To UBound(vArray, 2)
'Put the current line into a 1d array
aLine(j) = vArray(i, j)
Next j
'Join the current line into a 1d array
aReturn(i) = Join(aLine, sWordDelim)
Next i
Join2D = Join(aReturn, sLineDelim)
End Function
If you need to excise blank rows before you make the comparison, you'll need the Join2D function I posted in StackOverflow back in 2012.
The most common application of this type of hash comparison is for spreadsheet control - change monitoring - and you'll see Range1.Formula used instead of Range1.Value2: but your question is about comparing values, not formulae.

Using scripting dictionary to find/highlight skips in groups of repeating numbers in Column A using Excel VBA

I'm attempting to use a Scripting Dictionary in a way as to be able to find and ultimately highlight same values or groups of same values where there are inconsistencies (ie blanks or different values in between the two same values or groups of same values). Normally these same values will repeat, but what I'm trying to catch is when they do not repeat together (See example image below taken from my previous post).
Some context that will hopefully help this make a little more sense:
This is a follow-up of sorts to one of my previous questions here. I have a conditional formatting formula:
=NOT(AND(IFERROR(COUNTIF(OFFSET(A1,0,0,-COUNTIF($A$1:$A1,A2)),A2),0)=IFERROR(COUNTIF($A$1:$A1,A2),0),IFERROR(COUNTIF(OFFSET(A3,0,0,COUNTIF($A3:$A$5422,A2)),A2),0)=IFERROR(COUNTIF($A3:$A$5422,A2),0),A2<>""))
Which works perfectly. However, in my tinkering after receiving this formula as the answer to that previous question I realized that using conditional formatting of any sort for the amount of data I typically deal with (15000+ rows with 140 consistent columns) is an extremely slow endeavor, both when applying the formula and when filtering/adjusting afterwards. I've also tried applying this formula via the "helper column" route, but to no surprise, that is just as slow.
So, where I'm at now:
Essentially, I'm trying to translate that formula into a piece of code that does the same thing, but more efficiently, so that's where I starting thinking to use a Scripting Dictionary as a way to speed up my code execution time. I have the steps outlined, so I know what I need to do. However, I feel as though I am executing it wrong, which is why I'm here to ask for assistance. The following is my attempt at using a Scripting Dictionary to accomplish highlighting inconsistencies in Column A (my target column) along with the steps I figured out that I need to do to accomplish the task:
'dump column A into Array
'(Using Scripting.Dictionary) While cycling through check if duplicate
'IF duplicate check to make sure there is the same value either/or/both in the contiguous slot before/after the one being checked
'If not, then save this value (so we can go back and highlight all instances of this value at the end)
'Cycle through all trouble values and highlight all of their instances.
Sub NewandImprovedXIDCheck()
Dim d As Long, str As String, columnA As Variant
Dim dXIDs As Object
Application.ScreenUpdating = False
Set dXIDs = CreateObject("Scripting.Dictionary")
dXIDs.comparemode = vbTextCompare
With ActiveSheet
With .Cells(1, 1).CurrentRegion
With .Resize(.Rows.Count - 1, .Columns.Count).Offset(1, 0)
'.Value2 is faster than using .Value
columnA = .Columns(1).Value2
For d = LBound(columnA, 1) To UBound(columnA, 1)
str = columnA(d, 1)
If dXIDs.exists(str) Then
'the key exists in the dictionary
'Check if beside its like counterparts
If Not UBound(columnA, 1) Then
If (str <> columnA(d - 1, 1) And str <> columnA(d + 1, 1)) Or str <> columnA(d - 1, 1) Or str <> columnA(d + 1, 1) Then
'append the current row
dXIDs.Item(str) = dXIDs.Item(str) & Chr(44) & "A" & d
End If
End If
Else
'the key does not exist in the dictionary; store the current row
dXIDs.Add Key:=str, Item:="A" & d
End If
Next d
'reuse a variant var to provide row highlighting
Erase columnA
For Each columnA In dXIDs.keys
'if there is more than a single cell address, highlight all
If CBool(InStr(1, dXIDs.Item(columnA), Chr(44))) Then _
.Range(dXIDs.Item(columnA)).Interior.Color = vbRed
Next columnA
End With
End With
End With
dXIDs.RemoveAll: Set dXIDs = Nothing
Application.ScreenUpdating = True
End Sub
I feel like my logic is going wrong somewhere in my code execution, but can't seem to pinpoint where or how to correct it. Any help would be greatly appreciated. If you can provide any sort of code snippet that would also be a great help.
Here's one approach:
Sub HiliteIfGaps()
Dim rng As Range, arr, r As Long, dict As Object, v
Dim num As Long, num2 As Long
Set dict = CreateObject("scripting.dictionary")
With ActiveSheet
Set rng = .Range(.Range("A2"), .Cells(.Rows.Count, 1).End(xlUp))
End With
arr = rng.Value
For r = 1 To UBound(arr, 1)
v = arr(r, 1)
If Len(v) > 0 Then
If Not dict.exists(v) Then
num = Application.CountIf(rng, v) 'how many in total?
'all where expected?
num2 = Application.CountIf(rng.Cells(r).Resize(num, 1), v)
dict.Add v, (num2 < num)
End If
If dict(v) Then rng.Cells(r).Interior.Color = vbRed
Else
'highlight blanks
rng.Cells(r).Interior.Color = vbRed
End If
Next r
End Sub
EDIT: every time a new value is found (i.e. not already in the dictionary) then take a count of how many of those values in total there are in the range being checked. If all of those values are contiguous then they should all be found in the range rng.Cells(r).Resize(num, 1): if we find fewer than expected (num2<num) then that means the values are not contiguous so we insert True into the dictionary entry for that value, and start highlighting that value in the column.
#Tim Williams's approach did the job perfectly! I only made one slight alteration (to suit my needs). I changed
.Cells(.Rows.Count, 1).End(xlUp) to .Range("A" & .UsedRange.Rows.count)
Just because there are instances where the bottom-most row(s) might have missing values (be blank) and in this instance I feel safe enough using the .UsedRange reference because this snippet of code is one of the very first ones ran in my overall macro, so it (.UsedRange) is more likely to be accurate. I also added a Boolean operator (xidError, set to False) to be changed to True whenever we have to highlight. After I'm done looping through the Array I check xidError and if True I prompt the user to fix the error, then end the entire macro since there's no use in continuing until this particular error is corrected.
If xidError Then
'Prompt User to fix xid problem
MsgBox ("XID Error. Please fix/remove problematic XIDs and rerun macro.")
'Stop the macro because we can't continue until the xid problem has been sorted out
End
End If
Again, much thanks to Tim for his very efficient approach!

(Excel 2003 VBA) Delete entire rows based on multiple conditions in a column

I have an interesting issue. I've tried searching this site and Google but there are only slightly related problems, none which actually address this specific need.
I have a spreadsheet with 10 columns (let's call them A-J). I need to delete all the rows that do NOT have a value of "30", "60", "90", "120" or blank within the cells of column H.
Though there are many ways of doing this, all of them have relied on loops, which doesn't work for me as this dataset has over 25k rows and it takes 10+ minutes to run - too long.
I've been looking at autofilter options in conjunction with the .Find function (e.g. find all rows with H cells that don't meet the criteria and delete) but AutoFilter on 2003 only works with 2 criteria, while I have 5 to check against. I'm not sure how to proceed.
Any help is appreciated.
This deleted all matching rows (~10%) in a sample of 25k rows in 20sec
Sub tt()
Dim rw As Range
Dim all As Range
Dim t
Dim b As Boolean
t = Timer
For Each rw In Range("A1").CurrentRegion.Rows
If rw.Cells(8).Value < 1 Then
If b Then
Set all = Application.Union(rw, all)
Else
Set all = rw
b = True
End If
End If
Next rw
If not all is nothing then all.EntireRow.Delete
Debug.Print "elapsed: " & Timer - t
End Sub
You can try Advanced Filter option where you can give more than two criteria to filter the list. After filtering the list matching the criteria you set, the filtered list can be copied to another location (option available) and the remaining deleted.
You can add a column with the condition of your own:
=IF(OR(H1=30;H1=60;H1=90;H1=120;H1="");"DELETE";"")
(the formula is given for row 1, you have to copy-paste it to the entire range)
Then use filtering and sorting to select the rows to delete.
Some speed tips:
When using large data, assign values to array and use array instead of *.Value;
When working with full columns, ignore empty columns at bottom;
When making intensive changes in worksheet, disable screen update and automatic calculation.
Stating this, I would use this code:
Sub Macro1()
Dim su As Boolean, cm As XlCalculation
Dim r As Long, v(), r_offset As Long
su = Application.ScreenUpdating
Application.ScreenUpdating = False 'Disable screen updating
cm = Application.Calculation
Application.Calculation = xlCalculationManual 'Disable automatic calculation
'Only use used values
With Intersect(Range("H:H"), Range("H:H").Worksheet.UsedRange)
v = .Value 'Assign values to array
r_offset = .Row - LBound(v) 'Mapping between array first index and worksheet row number
End With
'Check all row from bottom (so don't need to deal with row number changes after deletion)
For r = UBound(v) To LBound(v) Step -1
Select Case v(r, 1)
Case "30", "60", "90", "120", Empty 'Do nothing
Case Else
Sheet1.Rows(r + r_offset).EntireRow.Delete
End Select
Next
Application.ScreenUpdating = su 'Restore screen updating
Application.Calculation = cm 'Restore calculation mode
End Sub
Thanks to all who've suggested solutions. In the between time I ended up figuring out a way to do this in <1 second - apparently I myself didn't realise that AutoFilter could've supported comparison criteria (greater than, less than etc).
Using a series of autofilters I simply filtered for, then deleted all rows that filtered to "<30", "30120".
Not elegant, but it did the trick.

Excel VBA Length-1 In a Range

I recently got into Excel macro development after a long time of not having the need to.
I have one column with two-hundred rows where each row has a value. I wrote a loop to iterate to each row value, read the current value and then write the value back minus the last character.
Here is some actual (and pseudo) code of what I wrote.
Dim theRow as Long
Dim totRow as Long
Dim fooStr as String
theRow = 2 'we begin on the second row of the colummn
totRow = 201 'there are 200 values
For theRow = 2 to totRow
fooStr = WorkSheets(DestSheet).Cells(theRow,"A").Formula 'read the cell value
fooStr = Left(fooStr,Len(fooStr)-1 'subtract the last character from the value
Cells(theRow,1).Value = fooStr 'write the value back
Next theRow
After I did some reading I learned that it is best practice to read and write values using a Range. Is it possible to rewrite what I am doing using a Range so it willl go faster.
Here is what I came up with so far.
Range("A2:A201").Value = Len(Range.Left("A2:A201").Value)-1
However, this doesn't work.
Any clues on how to do this if this is indeed possible?
Thanks for any tips.
If you want maximum performance (you don't need it for 200 rows, but...) you have to minimize the number of reads and writes (mostly writes) to ranges. That means reading the whole range into an array, manipulating the array, then writing it back to the range. That's one read and one write compared to 200 in a loop. Here's an example.
Sub RemoveLastChar()
Dim vaValues As Variant
Dim i As Long
vaValues = Sheet1.Range("A2").Resize(200).Value
For i = LBound(vaValues, 1) To UBound(vaValues, 1)
vaValues(i, 1) = Left$(vaValues(i, 1), Len(vaValues(i, 1)) - 1)
Next i
Sheet1.Range("A2").Resize(UBound(vaValues, 1), UBound(vaValues, 2)).Value = vaValues
End Sub
You could do something like
Sub StringTrim()
Dim xCell as Range
Range("A1:A201").Select
For Each xCell in Selection
xCell.Value = Left(xCell.Value, Len(xCell.Value) - 1)
Next
End Sub
I don't know what kind of speed improvements you are seeking, but that would also do the job.
You might know this already but putting Application.ScreenUpdating = False at the top of your code can speed it up significantly (unless you like to watch everything flash by as the script works). You should reset the value to True at the end of your code.