How can I compare one column's all values of a sheet-1 to another column values with different sheet-2 and if match then returns the value corresponding one of the columns of sheet-1 to another column of sheet-2 in excel?
Let's assume your values are in columns A of sheets named Sheet1 and Sheet2. Then, you can place the following formula into B1 of Sheet2 and drag down enough to cover you desired range: =IF(Sheet1!A1=Sheet2!A1,Sheet2!A1,"")
or, if you'e rather use VBA, place this code into a module:
Sub columnCompare()
Dim sh1 As Worksheet, sh2 As Worksheet, r1 As Range, r2 As Range
Set sh1 = Worksheets("Sheet1")
Set sh2 = Worksheets("Sheet2")
Set r1 = sh1.Range("A1")
Set r2 = sh2.Range("A1")
While r1 <> "" And r2 <> ""
If r1 = r2 Then r2.Offset(0, 1) = r1
Set r1 = r1.Offset(1, 0)
Set r2 = r2.Offset(1, 0)
Wend
End Sub
If am understanding correctly, this is what you want,
Sheet1
Sheet2
Enter the below formula in B2 of sheet2 and drag down as in the image,
=INDEX(Sheet1!B:B,MATCH(A2,Sheet1!A:A,0),1)
I can only answer part of your question: comparing two columns and detecting that they differ.
You have an excellent tutorial answer for that in Tony M's answer, above.
However, this will perform very slowly on a large data set, because:
Reading a range one cell at a time is very slow;
Comparing values pair-by-pair is inefficient, especially for strings, when the number of values gets into the tens of thousands,
Point(1) is the important one: it takes the same amount of time for VBA to pick up a single cell using var = Range("A1") as it does to pick up the entire range in one go using var = Range("A1:Z1024"); and every interaction with the sheet takes four times as much time as a string comparison in VBA, and twenty times longer than an comparison between floating-point decimals; and that, in turn, is three times longer than an integer comparison.
So your code is probably four times faster, and possibly a hundred times faster, if you read the entire range in one go, and work on the Range.Value2 array in VBA.
That's in Office 2010 and 2013 (I tested them); for older version of Excel, you'll see quoted times between 1/50th and 1/500th of a second, for each VBA interaction with a cell or range of cells. That'll be *way** slower because, in both old and new versions of Excel, the VBA actions will still be in single-digit numbers of microseconds: your code will run at least a hundred times faster, and probably thousands of times faster, if you avoid cell-by-cell reads from the sheet in older versions of Excel.
So big gains are there to be made - an interval perceptible to the user - in picking up the ranges in a single 'hit' and then performing the comparison on each item of an array in VBA.
arr1 = Range1.Values
arr2 = Range2.Values
' Consider checking that the two ranges are the same size
For i = LBound(arr1, 1) To Ubound(arr1, 2)
For j = LBound(arr1, 2) To Ubound(arr1, 2)
If arr1(i, j) <> arr2(i, j) Then
bMatchFail = True
Exit For
End If
Next j
If bMatchFail Then Exit For
Next i
Erase arr1
Erase arr2
You'll notice that this code sample is generic, for two ranges of the same size taken from anywhere - even from separate workbooks. If you're comparing two adjacent columns, loading a single array of two columns and comparing IF arrX(i, 1) <> arrX(i,2) Then is going to halve the runtime.
Your next challenge is only relevant if you're picking up tens of thousands of values from large ranges: there's no performance gain in this extended answer for anything smaller than that.
What we're doing is:
Using a hash function to compare the values of two large ranges
The idea is very simple, although the underlying mathematics is quite challenging for non-mathematicians: rather than comparing one value at a time, we run a mathematical function that 'hashes' the values into a short identifier for easy comparison.
If you're comparing ranges against a 'reference' copy, you can store the 'reference' hash, and this halves the workload.
There are some fast and reliable hashing functions out there, and they are available in Windows as part of the security and cryptography API. There is a slight problem in that they run on strings, and we have an array to work on; but you can easily find a fast 'Join2D' function that gets a string from the 2D arrays returned by a range's .Value2 property.
So a fast comparison function for two large ranges will look like this:
Public Function RangeCompare(Range1 as Excel.Range, Range2 As Excel.Range) AS Boolean
' Returns TRUE if the ranges are identical.
' This function is case-sensitive.
' For ranges with fewer than ~1000 cells, cell-by-cell comparison is faster
' WARNING: This function will fail if your range contains error values.
RangeCompare = False
If Range1.Cells.Count <> Range2.Cells.Count Then
RangeCompare = False
ElseIf Range1.Cells.Count = 1 then
RangeCompare = Range1.Value2 = Range2.Value2
Else
RangeCompare = MD5(Join2D(Range1.Value2)) = MD5(Join2D(Range2.Value2))
Endif
End Function
I've wrapped the Windows System.Security MD5 hash in this VBA function:
Public Function MD5(arrBytes() As Byte) As String
' Return an MD5 hash for any string
' Author: Nigel Heffernan Excellerando.Blogspot.com
' Note the type pun: you can pass in a string, there's no type conversion or cast
' because a string is stored as a Byte array and VBA recognises this.
Dim oMD5 As Object 'Set a reference to mscorlib 4.0 to use early binding
Dim HashBytes() As Byte
Dim i As Integer
Set oMD5 = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")
HashBytes = oMD5.ComputeHash_2((arrBytes))
For i = LBound(HashBytes) To UBound(HashBytes)
MD5 = MD5 & Right("00" & Hex(HashBytes(i)), 2)
Next i
Set oMD5 = Nothing ' if you're doing this repeatedly, declare at module level and persist
Erase HashBytes
End Function
There are other VBA implementations out there, but nobody seems to know about the Byte Array / String type pun - they are not equivalent, they are identical - so everyone codes up unnecessary type conversions.
A fast and simple Join2D function was posted by Dick Kusleika on Daily Dose of Excel in 2015:
Public Function Join2D(ByVal vArray As Variant, Optional ByVal sWordDelim As String = " ", Optional ByVal sLineDelim As String = vbNewLine) As String
Dim i As Long, j As Long
Dim aReturn() As String
Dim aLine() As String
ReDim aReturn(LBound(vArray, 1) To UBound(vArray, 1))
ReDim aLine(LBound(vArray, 2) To UBound(vArray, 2))
For i = LBound(vArray, 1) To UBound(vArray, 1)
For j = LBound(vArray, 2) To UBound(vArray, 2)
'Put the current line into a 1d array
aLine(j) = vArray(i, j)
Next j
'Join the current line into a 1d array
aReturn(i) = Join(aLine, sWordDelim)
Next i
Join2D = Join(aReturn, sLineDelim)
End Function
If you need to excise blank rows before you make the comparison, you'll need the Join2D function I posted in StackOverflow back in 2012.
The most common application of this type of hash comparison is for spreadsheet control - change monitoring - and you'll see Range1.Formula used instead of Range1.Value2: but your question is about comparing values, not formulae.
Related
Let's say I have very large set of data with over 100,000+ rows. In Column A, I want to find each unique number.
I understand this can be done using the .Find feature and Collections/Arrays but those seem to take a good bit of time - especially with 100,000+ rows.
However, after AutoFiltering Column A, when I hit the down arrow it displays only unique variables. Is it possible to simply extract those values out of the selections in this way?
'pseudocode
filter.Count
Dim X As Long
For x = 2 to filter.Count
Cells(x, 14) = filter(x)
Next x
You can use advanced filter, it's pretty darn quick. I tried it with 127k rows, the results were instant.
Columns("A:A").AdvancedFilter Action:=xlFilterCopy, CopyToRange:=Range("D1"), Unique:=True
You can extract the visible cells in to an array. Say your total range (without filter) is A2:A10000. Run your filter, then you can run this macro:
Sub t()
Dim arr() As Variant
arr = Range("A2:A10000").SpecialCells(xlCellTypeVisible)
Dim i As Long
For i = LBound(arr) To UBound(arr)
Debug.Print (arr(i, 1))
' Do things with each entry in array
Next i
End Sub
My research shows that I need to use Visual Basic. I am a programmer/developer, but have never used VB so if anyone could dumb it down it would be appreciated.
Here's my working excel function:
=IF(MATCH(1,E1:DP1,0),D1,FALSE)
I want to loop a few of those numbers such that:
=IF(MATCH(141,E1:DP378,0),D378,FALSE)
THEN take my answers (which will be strings, because column D are all strings, the rest of the excel file are numbers)
=CONCAT
end goal: have 141 String arrays populated based on the data in my table.
I went ahead and made my first attempt at VBA like this:
Sub myFunc()
'Initialize Variables
Dim strings As Range, nums As Integer, answer() As Variant, listAnswers() As Variant
'set variables
strings = ("C1:C378")
nums = 141
i = 0
j = 0
ReDim Preserve answer(i)
ReDim Preserve listAnswers(j)
'answer() = {""}
'for each in nums
For counter = 0 To nums
ReDim Preserve listAnswers(0 To j)
'set each list of answers
listAnswers(i) = Join(answer(), "insertJSONcode")
j = j + 1
'for each in Stings
For Each cell In strings
If cell <> "" Then
ReDim Preserve answer(0 To i)
answer(i) = 'essentially this: (MATCH(2,E1:DP1,0),D1,FALSE)
i = i + 1
end If
next cell 'end embedded forEach
Next LCounter 'end for loop
'is this possible? or wrong syntax?
Range("A:A").Value = listAnswers() ' should print 141 arrays from A1 to A141
End Sub
EDIT:
Important note I do NOT need to call the sheet by Name. I've successfully written integer values to by excel sheet in column A without doing so.
Also, the VBA I wrote I was never intended to work, I know it's broken at least where answer(i) is supposed to write to something. I'm only putting that code there to show I was able to at least able to get into spitting distance of the proper logic and prove I've put some serious effort into solving the problem and give a rough starting point.
Here's an image of the excel format. Column C goes down to 378 and the numbers listed from E through DP are populated by a database. It consists of blank cells and numbers between 1 and 141.
Looking back at my if statement:
=IF(MATCH(2,E2:DP2,0),D2,FALSE)
If I were to type that exactly into cell B2 it would output the correct answer "text2". which is neat and all, but I need every instance of text 2 written out, then CONCAT those results. Easy so far, I could drag that down all the way through column B and have all of my "text" strings in one column, CONCAT that column and there's the answer. However I don't just need #2, I need each number between 1 and 141. Plus I want to avoid writing 141 columns with a CONCAT on top of each one.
I'm just wanting to run through a (large) range and replace certain values (if they're above a given max or below a given min...also one particular character) with a given replacement value.
My first thought is to simply traverse each cell and check/replace when necessary. I have a feeling this procedure would be really slow though, and I'm curious if there's a better way to accomplish this.
Any time I write code that does something similar to this in VBA I watch each cell have its value altered cell by cell and it seems like there must be better way. Thanks in advance.
edit:
I haven't even written this implementation yet because I know what the result will be and I would rather do something different if it's possible, but here's what it would look like
For something
If(Range.Value == condition)
Range.Value = replacement_value
Range = Range.Offset(a, b)
End For
Make a formula in a separate column, and then copy/paste special, values only.
= if(A2 > givenvalue; replace; if(A2< anothergivenvalue; anotherreplace; if (A2 = "particularcharacterortext"; replaceonemore; A2)))
Put the formula in an empty cell in an empty column, drag it or copy/paste to the entire column. After that, if the new values are ok, copy/paste values only to the original position.
The following VBA code provides a simple framework that you can customize to meet your needs. It incorporates many of the optimizations that have been mentioned in the comments to your question, such turning off screen updating and moving the comparison from the worksheet to an array.
You will notice that the macro does a rather large compare and replace. The data set I ran it on was 2.5 million random numbers between 1 and 1000 in the range A1:Y100000. If a number was greater than 250 and less than 500, I replaced it with 0. This required replacing 24.9 percent of all the numbers in the data set.
Sub ReplaceExample()
Dim arr() As Variant
Dim rng As Range
Dim i As Long, _
j As Long
Dim floor as Long
Dim ceiling as Long
Dim replacement_value
'assign the worksheet range to a variable
Set rng = Worksheets("Sheet2").Range("A1:Y100000")
floor = 250
ceiling = 500
replacement_value = 0
' copy the values in the worksheet range to the array
arr = rng
' turn off time-consuming external operations
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False
'loop through each element in the array
For i = LBound(arr, 1) To UBound(arr, 1)
For j = LBound(arr, 2) To UBound(arr, 2)
'do the comparison of the value in an array element
'with the criteria for replacing the value
If arr(i, j) > floor And arr(i, j) < ceiling Then
arr(i, j) = replacement
End If
Next j
Next i
'copy array back to worksheet range
rng = arr
'turn events back on
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Application.EnableEvents = True
End Sub
I did some performance testing on different alternatives for coding this simple compare and replace, with results that I would expect are consistent with VBA performance results by others. I ran each alternative 10 times, calculating the elapsed time for each run, and averaging the 10 elapsed times.
The results reveal the large impact that using arrays can have, especially when the data set is large: Compared to code that tested and changed worksheet cell values one-by-one, the array operation -- copying the data set from the worksheet into an array, comparing and changing the array values, and then writing the array results back to the worksheet -- in this case reduced average run times by 98 percent, from 3.6 minutes to 4 seconds.
While the optimizations that turned off external events made a noticeable difference in worksheet operations, with a 22 percent reduction in run times, those optimizations had very little impact when most of the computational work is array-based.
I execute a VBA code that takes a database, treats it and export it into a sheet. This is working fine. However, I have a sheet that produces graphs depending on the data in the particular sheet. The datas does not actualize. I have to enter the cell and click enter to actualize it. I'm pretty sure there is an easier way to do this. Calculation is set to automatic but that doesn't seem to change anything.
In my cell, I have my own vba function that needs to be updated once the report is done. When I click the cell and then enter, the result is updated but I would like this to be done automatically. I hope this is clearer !
Thanks in advance,
Etienne NOEL
HEre is the code of my function
Public Function number_of_appearances(term As String, sheet As String, column As Integer) As Integer
Application.Volatile
Dim number_of_rows As Integer
Dim appearances As Integer
Dim row As Integer
appearances = 0
row = 1
number_of_rows = Worksheets(sheet).UsedRange.Rows.Count
Do While row <= number_of_rows
If Worksheets(sheet).Cells(row, column).Value = term Then
appearances = appearances + 1
End If
row = row + 1
Loop
number_of_appearances = appearances
End Function
A cell example of a user of the function
=number_of_appearances('test';'sheet1'; 3)
Sounds like your UDF might not depend on any cells that change value when your DB is processed.
See This MSDN Link
Post your UDF (or just its header if you prefer) and an example of its use...
EDIT:
Yes, none of the parameters to the UDF are cell references, therefore the UDF is not triggered to recalculate when data on the shet changes.
You have two choices:
1. rewrite your UDF to include parameter(s) that reference cells that change value when the DB is processed
2. make your UDF volitile (include Application.Volatile in the UDF code) WARNING: this can be very inefficient, depending on how many time the UDF is used and how intensive its calculation is
EDIT 2:
Heres a refactor of your udf using the first option mentioned:
Public Function number_of_appearances(term As String, rng As Range) As Integer
Dim v As Variant
Dim i As Long, j As Long
Dim appearances As Long
v = Intersect(rng, rng.Worksheet.UsedRange)
For j = LBound(v, 2) To UBound(v, 2)
For i = LBound(v, 1) To UBound(v, 1)
If v(i, j) = term Then
appearances = appearances + 1
End If
Next i, j
number_of_appearances = appearances
End Function
use like
=number_of_appearances("test";Sheet1!C:C)
EDIT 3:
If all you are doing is counting number of occurances of a string in a range, consider using
=COUNTIF(Sheet1!C:C;"test")
I recently got into Excel macro development after a long time of not having the need to.
I have one column with two-hundred rows where each row has a value. I wrote a loop to iterate to each row value, read the current value and then write the value back minus the last character.
Here is some actual (and pseudo) code of what I wrote.
Dim theRow as Long
Dim totRow as Long
Dim fooStr as String
theRow = 2 'we begin on the second row of the colummn
totRow = 201 'there are 200 values
For theRow = 2 to totRow
fooStr = WorkSheets(DestSheet).Cells(theRow,"A").Formula 'read the cell value
fooStr = Left(fooStr,Len(fooStr)-1 'subtract the last character from the value
Cells(theRow,1).Value = fooStr 'write the value back
Next theRow
After I did some reading I learned that it is best practice to read and write values using a Range. Is it possible to rewrite what I am doing using a Range so it willl go faster.
Here is what I came up with so far.
Range("A2:A201").Value = Len(Range.Left("A2:A201").Value)-1
However, this doesn't work.
Any clues on how to do this if this is indeed possible?
Thanks for any tips.
If you want maximum performance (you don't need it for 200 rows, but...) you have to minimize the number of reads and writes (mostly writes) to ranges. That means reading the whole range into an array, manipulating the array, then writing it back to the range. That's one read and one write compared to 200 in a loop. Here's an example.
Sub RemoveLastChar()
Dim vaValues As Variant
Dim i As Long
vaValues = Sheet1.Range("A2").Resize(200).Value
For i = LBound(vaValues, 1) To UBound(vaValues, 1)
vaValues(i, 1) = Left$(vaValues(i, 1), Len(vaValues(i, 1)) - 1)
Next i
Sheet1.Range("A2").Resize(UBound(vaValues, 1), UBound(vaValues, 2)).Value = vaValues
End Sub
You could do something like
Sub StringTrim()
Dim xCell as Range
Range("A1:A201").Select
For Each xCell in Selection
xCell.Value = Left(xCell.Value, Len(xCell.Value) - 1)
Next
End Sub
I don't know what kind of speed improvements you are seeking, but that would also do the job.
You might know this already but putting Application.ScreenUpdating = False at the top of your code can speed it up significantly (unless you like to watch everything flash by as the script works). You should reset the value to True at the end of your code.