Let's say I have very large set of data with over 100,000+ rows. In Column A, I want to find each unique number.
I understand this can be done using the .Find feature and Collections/Arrays but those seem to take a good bit of time - especially with 100,000+ rows.
However, after AutoFiltering Column A, when I hit the down arrow it displays only unique variables. Is it possible to simply extract those values out of the selections in this way?
'pseudocode
filter.Count
Dim X As Long
For x = 2 to filter.Count
Cells(x, 14) = filter(x)
Next x
You can use advanced filter, it's pretty darn quick. I tried it with 127k rows, the results were instant.
Columns("A:A").AdvancedFilter Action:=xlFilterCopy, CopyToRange:=Range("D1"), Unique:=True
You can extract the visible cells in to an array. Say your total range (without filter) is A2:A10000. Run your filter, then you can run this macro:
Sub t()
Dim arr() As Variant
arr = Range("A2:A10000").SpecialCells(xlCellTypeVisible)
Dim i As Long
For i = LBound(arr) To UBound(arr)
Debug.Print (arr(i, 1))
' Do things with each entry in array
Next i
End Sub
Related
I'm attempting to use a Scripting Dictionary in a way as to be able to find and ultimately highlight same values or groups of same values where there are inconsistencies (ie blanks or different values in between the two same values or groups of same values). Normally these same values will repeat, but what I'm trying to catch is when they do not repeat together (See example image below taken from my previous post).
Some context that will hopefully help this make a little more sense:
This is a follow-up of sorts to one of my previous questions here. I have a conditional formatting formula:
=NOT(AND(IFERROR(COUNTIF(OFFSET(A1,0,0,-COUNTIF($A$1:$A1,A2)),A2),0)=IFERROR(COUNTIF($A$1:$A1,A2),0),IFERROR(COUNTIF(OFFSET(A3,0,0,COUNTIF($A3:$A$5422,A2)),A2),0)=IFERROR(COUNTIF($A3:$A$5422,A2),0),A2<>""))
Which works perfectly. However, in my tinkering after receiving this formula as the answer to that previous question I realized that using conditional formatting of any sort for the amount of data I typically deal with (15000+ rows with 140 consistent columns) is an extremely slow endeavor, both when applying the formula and when filtering/adjusting afterwards. I've also tried applying this formula via the "helper column" route, but to no surprise, that is just as slow.
So, where I'm at now:
Essentially, I'm trying to translate that formula into a piece of code that does the same thing, but more efficiently, so that's where I starting thinking to use a Scripting Dictionary as a way to speed up my code execution time. I have the steps outlined, so I know what I need to do. However, I feel as though I am executing it wrong, which is why I'm here to ask for assistance. The following is my attempt at using a Scripting Dictionary to accomplish highlighting inconsistencies in Column A (my target column) along with the steps I figured out that I need to do to accomplish the task:
'dump column A into Array
'(Using Scripting.Dictionary) While cycling through check if duplicate
'IF duplicate check to make sure there is the same value either/or/both in the contiguous slot before/after the one being checked
'If not, then save this value (so we can go back and highlight all instances of this value at the end)
'Cycle through all trouble values and highlight all of their instances.
Sub NewandImprovedXIDCheck()
Dim d As Long, str As String, columnA As Variant
Dim dXIDs As Object
Application.ScreenUpdating = False
Set dXIDs = CreateObject("Scripting.Dictionary")
dXIDs.comparemode = vbTextCompare
With ActiveSheet
With .Cells(1, 1).CurrentRegion
With .Resize(.Rows.Count - 1, .Columns.Count).Offset(1, 0)
'.Value2 is faster than using .Value
columnA = .Columns(1).Value2
For d = LBound(columnA, 1) To UBound(columnA, 1)
str = columnA(d, 1)
If dXIDs.exists(str) Then
'the key exists in the dictionary
'Check if beside its like counterparts
If Not UBound(columnA, 1) Then
If (str <> columnA(d - 1, 1) And str <> columnA(d + 1, 1)) Or str <> columnA(d - 1, 1) Or str <> columnA(d + 1, 1) Then
'append the current row
dXIDs.Item(str) = dXIDs.Item(str) & Chr(44) & "A" & d
End If
End If
Else
'the key does not exist in the dictionary; store the current row
dXIDs.Add Key:=str, Item:="A" & d
End If
Next d
'reuse a variant var to provide row highlighting
Erase columnA
For Each columnA In dXIDs.keys
'if there is more than a single cell address, highlight all
If CBool(InStr(1, dXIDs.Item(columnA), Chr(44))) Then _
.Range(dXIDs.Item(columnA)).Interior.Color = vbRed
Next columnA
End With
End With
End With
dXIDs.RemoveAll: Set dXIDs = Nothing
Application.ScreenUpdating = True
End Sub
I feel like my logic is going wrong somewhere in my code execution, but can't seem to pinpoint where or how to correct it. Any help would be greatly appreciated. If you can provide any sort of code snippet that would also be a great help.
Here's one approach:
Sub HiliteIfGaps()
Dim rng As Range, arr, r As Long, dict As Object, v
Dim num As Long, num2 As Long
Set dict = CreateObject("scripting.dictionary")
With ActiveSheet
Set rng = .Range(.Range("A2"), .Cells(.Rows.Count, 1).End(xlUp))
End With
arr = rng.Value
For r = 1 To UBound(arr, 1)
v = arr(r, 1)
If Len(v) > 0 Then
If Not dict.exists(v) Then
num = Application.CountIf(rng, v) 'how many in total?
'all where expected?
num2 = Application.CountIf(rng.Cells(r).Resize(num, 1), v)
dict.Add v, (num2 < num)
End If
If dict(v) Then rng.Cells(r).Interior.Color = vbRed
Else
'highlight blanks
rng.Cells(r).Interior.Color = vbRed
End If
Next r
End Sub
EDIT: every time a new value is found (i.e. not already in the dictionary) then take a count of how many of those values in total there are in the range being checked. If all of those values are contiguous then they should all be found in the range rng.Cells(r).Resize(num, 1): if we find fewer than expected (num2<num) then that means the values are not contiguous so we insert True into the dictionary entry for that value, and start highlighting that value in the column.
#Tim Williams's approach did the job perfectly! I only made one slight alteration (to suit my needs). I changed
.Cells(.Rows.Count, 1).End(xlUp) to .Range("A" & .UsedRange.Rows.count)
Just because there are instances where the bottom-most row(s) might have missing values (be blank) and in this instance I feel safe enough using the .UsedRange reference because this snippet of code is one of the very first ones ran in my overall macro, so it (.UsedRange) is more likely to be accurate. I also added a Boolean operator (xidError, set to False) to be changed to True whenever we have to highlight. After I'm done looping through the Array I check xidError and if True I prompt the user to fix the error, then end the entire macro since there's no use in continuing until this particular error is corrected.
If xidError Then
'Prompt User to fix xid problem
MsgBox ("XID Error. Please fix/remove problematic XIDs and rerun macro.")
'Stop the macro because we can't continue until the xid problem has been sorted out
End
End If
Again, much thanks to Tim for his very efficient approach!
I'm just wanting to run through a (large) range and replace certain values (if they're above a given max or below a given min...also one particular character) with a given replacement value.
My first thought is to simply traverse each cell and check/replace when necessary. I have a feeling this procedure would be really slow though, and I'm curious if there's a better way to accomplish this.
Any time I write code that does something similar to this in VBA I watch each cell have its value altered cell by cell and it seems like there must be better way. Thanks in advance.
edit:
I haven't even written this implementation yet because I know what the result will be and I would rather do something different if it's possible, but here's what it would look like
For something
If(Range.Value == condition)
Range.Value = replacement_value
Range = Range.Offset(a, b)
End For
Make a formula in a separate column, and then copy/paste special, values only.
= if(A2 > givenvalue; replace; if(A2< anothergivenvalue; anotherreplace; if (A2 = "particularcharacterortext"; replaceonemore; A2)))
Put the formula in an empty cell in an empty column, drag it or copy/paste to the entire column. After that, if the new values are ok, copy/paste values only to the original position.
The following VBA code provides a simple framework that you can customize to meet your needs. It incorporates many of the optimizations that have been mentioned in the comments to your question, such turning off screen updating and moving the comparison from the worksheet to an array.
You will notice that the macro does a rather large compare and replace. The data set I ran it on was 2.5 million random numbers between 1 and 1000 in the range A1:Y100000. If a number was greater than 250 and less than 500, I replaced it with 0. This required replacing 24.9 percent of all the numbers in the data set.
Sub ReplaceExample()
Dim arr() As Variant
Dim rng As Range
Dim i As Long, _
j As Long
Dim floor as Long
Dim ceiling as Long
Dim replacement_value
'assign the worksheet range to a variable
Set rng = Worksheets("Sheet2").Range("A1:Y100000")
floor = 250
ceiling = 500
replacement_value = 0
' copy the values in the worksheet range to the array
arr = rng
' turn off time-consuming external operations
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False
'loop through each element in the array
For i = LBound(arr, 1) To UBound(arr, 1)
For j = LBound(arr, 2) To UBound(arr, 2)
'do the comparison of the value in an array element
'with the criteria for replacing the value
If arr(i, j) > floor And arr(i, j) < ceiling Then
arr(i, j) = replacement
End If
Next j
Next i
'copy array back to worksheet range
rng = arr
'turn events back on
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Application.EnableEvents = True
End Sub
I did some performance testing on different alternatives for coding this simple compare and replace, with results that I would expect are consistent with VBA performance results by others. I ran each alternative 10 times, calculating the elapsed time for each run, and averaging the 10 elapsed times.
The results reveal the large impact that using arrays can have, especially when the data set is large: Compared to code that tested and changed worksheet cell values one-by-one, the array operation -- copying the data set from the worksheet into an array, comparing and changing the array values, and then writing the array results back to the worksheet -- in this case reduced average run times by 98 percent, from 3.6 minutes to 4 seconds.
While the optimizations that turned off external events made a noticeable difference in worksheet operations, with a 22 percent reduction in run times, those optimizations had very little impact when most of the computational work is array-based.
I have a loop wherein I take the mean of several columns of numbers with the same number of rows each.
The point of the loop is to capture these means in a new vector.
So for each loop I need to indicate "all rows". In matlab this would be easy, just use ":" But I can't figure out what the analogy is in VB. Please help! Thanks.
(Please advise me as to what I put in the code below where I have ALLROWS).
My attempt so far:
For i = 1 To CA
mrCA11(i) = Application.WorksheetFunction.Average(revCA11(**ALLROWS**,i))
Next i
In matlab this would be:
For i = 1:CA
mrCA11(i) = mean(revCA11(:,i));
Next i
EDIT: I've also tried this trick to no avail:
For j = 1 To CA
For i = 1 To s11
temp11(i) = revCA11(i, j)
Next i
mrCA11(j) = Application.WorksheetFunction.Average(temp11)
Next j
I get the error message: "Unable to get the Average property of the Worksheet Function class"
As everybody (Tim and shahkalpesh at least) pointed out, we need to understand what is revCall or more specifically, we need to understand how you want to give them ALL ROWS in argument.
Finding the last row (or column or cell)
A common Excel issue is to find the last used row / column / cell.
This will give you the end of your vector.
Excel give you several methods to deal with this:
xlTypeLastCell
Last cell used in the entire sheet (regardless if it's used in column A or not)
lastRow = ActiveSheet.Cells.SpecialCells(xlCellTypeLastCell).Row
End(xlUp)
Last cell used (including blanks in-between) in Column A is as simple as this:
lastRow = Range("A" & Rows.Count).End(xlUp).Row
End(xlToLeft)
Last cell used (including blanks in-between) in Row 1 is as simple as this:
lastRow = ActiveSheet.Cells(1, Columns.Count).End(xlToLeft).Row
UsedRange
Last cell used in the WorkSheet (according to Excel interpretation):
Set rangeLastCell = ActiveSheet.UsedRange
Using an array as argument
The methods above told you how to find the last row (if this is what you need). You can then easily create your vector and use it in your procedure revCA11.
You can either give an array as argument as Tim pointed out in his answer with this kind of statement:
myArray = ActiveSheet.Range("A1", Cells(lastRow, lastColumn).Value
Or you can use the integer (or long) to build your vector inside your procedure as simple as declaring a range:
Range("A1:A" & lastRow)
You might clarify exactly how revCA11 is declared/created, but maybe something along these lines might work for you:
Sub Tester()
Dim arr, x
arr = ActiveSheet.Range("A1:D5").Value '2-D array
'average each column
Debug.Print "Columns:"
For x = 1 To UBound(arr, 2)
Debug.Print x, Application.Average(Application.Index(arr, 0, x))
Next x
'average each row
Debug.Print "Rows:"
For x = 1 To UBound(arr, 1)
Debug.Print x, Application.Average(Application.Index(arr, x, 0))
Next x
End Sub
Ok I have tried these and grasped some view on variants and I have written these code
Sub main()
Dim Vary As Variant
Vary = Sheet1.Range("A1:D11").Value
For i = 1 To UBound(Vary)
For j = i + 1 To UBound(Vary)
If Vary(i, 1) = Vary(j, 1) Then
'I should delete the vary(j,1) element from vary
'in excel sheet we use selection.entirerow.delete
End If
Next j
Next i
End Sub
This is the sample I tried
A B C D
1 somevalues in BCD columns
2
3
1
Now Delete the 4th row don think I'm working for unique records I'm just learning stuff to do and while I was learning variant I am stuck at this point deleting a complete row stored in variant
I have stored (A1:D11).value in variant
Now how can I delete the A6 element or row in variant so that I can avoid it while I copy the variant to some other sheet?
Can I also delete the C AND B columns in variant so that when i do transpose it wont copy the C and B columns?
I don't know what exactly a variant is - I was thinking to take a set of range and do operations like what we do for an excel sheet then take that variant and transpose it back to sheet.
Is that the right way of thinking or did I misunderstand the use of variants?
`variant(k,1)=text(x)` some array shows mismatch ? whats wrong?
If you are planning on using a varray to look at cells in each row to decide if you should delete the row or not, you should loop through your varray backwards, the same way you would if you did a for loop through the cell range. Since you are starting on row 1, the variable i will always equal the row number the element was located on, so you can use that to delete the proper row.
Here's a sample (more simple than what you are trying to do, though) that will delete each row in which the cells in columns A and B are the same.
Sub test()
Dim varray As Variant
varray = Range("A1:B11").Value
For i = UBound(varray, 1) To 1 Step -1
If varray(i, 1) = varray(i, 2) Then
Cells(i, 1).EntireRow.Delete
End If
Next
End Sub
Notes of interest:
UBound(varray, 1) gives the count of the rows
UBound(varray, 2) gives the count of the columns
One workaround without a second array is to introduce a deliberate error into an element you want to replace, then use SpecialCells to delete the cell after dumping the variant array back over the range. This sample introduces an error into the array position corresponding to A6 (outside the loop as its an example), then when the range is dumped to E1, the SpecialCell error removal shifts F6:H6 into E6:G6. ie
pls save before testing - this code will overwrite E6:H11 in the first worksheet
Sub main()
Dim Vary As Variant
Dim rng1 As Range
Set rng1 = Sheets(1).Range("A1:D11")
Set rng2 = rng1.Offset(0, 4)
Vary = rng1.Value2
For i = 1 To UBound(Vary)
For j = i + 1 To UBound(Vary)
'your test here
Next j
Next i
Vary(6, 1) = "=(1 / 0)"
With rng2
.Value2 = Vary
On Error Resume Next
.SpecialCells(xlFormulas, xlErrors).Delete xlToLeft
End With
End Sub
I recently got into Excel macro development after a long time of not having the need to.
I have one column with two-hundred rows where each row has a value. I wrote a loop to iterate to each row value, read the current value and then write the value back minus the last character.
Here is some actual (and pseudo) code of what I wrote.
Dim theRow as Long
Dim totRow as Long
Dim fooStr as String
theRow = 2 'we begin on the second row of the colummn
totRow = 201 'there are 200 values
For theRow = 2 to totRow
fooStr = WorkSheets(DestSheet).Cells(theRow,"A").Formula 'read the cell value
fooStr = Left(fooStr,Len(fooStr)-1 'subtract the last character from the value
Cells(theRow,1).Value = fooStr 'write the value back
Next theRow
After I did some reading I learned that it is best practice to read and write values using a Range. Is it possible to rewrite what I am doing using a Range so it willl go faster.
Here is what I came up with so far.
Range("A2:A201").Value = Len(Range.Left("A2:A201").Value)-1
However, this doesn't work.
Any clues on how to do this if this is indeed possible?
Thanks for any tips.
If you want maximum performance (you don't need it for 200 rows, but...) you have to minimize the number of reads and writes (mostly writes) to ranges. That means reading the whole range into an array, manipulating the array, then writing it back to the range. That's one read and one write compared to 200 in a loop. Here's an example.
Sub RemoveLastChar()
Dim vaValues As Variant
Dim i As Long
vaValues = Sheet1.Range("A2").Resize(200).Value
For i = LBound(vaValues, 1) To UBound(vaValues, 1)
vaValues(i, 1) = Left$(vaValues(i, 1), Len(vaValues(i, 1)) - 1)
Next i
Sheet1.Range("A2").Resize(UBound(vaValues, 1), UBound(vaValues, 2)).Value = vaValues
End Sub
You could do something like
Sub StringTrim()
Dim xCell as Range
Range("A1:A201").Select
For Each xCell in Selection
xCell.Value = Left(xCell.Value, Len(xCell.Value) - 1)
Next
End Sub
I don't know what kind of speed improvements you are seeking, but that would also do the job.
You might know this already but putting Application.ScreenUpdating = False at the top of your code can speed it up significantly (unless you like to watch everything flash by as the script works). You should reset the value to True at the end of your code.