VBA code to use rather than vlookup - vba

I have two very large tables. One of them has 12 columns and 280K rows. The other ones has 12k rowns and 33 columns. I am using vlookup to look for matching values in large table to small one. Vlookups take forever to calculate. Is there an easy way to do this with a VBA code? Can someone share a sample code for me to replicate?
Thanks

You can use Collection object to quickly find matches. This will work very fast (if not faster than VLOOKUP) because when you add key parameter to Collection - it hashes / indexes its value with the specific goal of fast lookup later).
Moreover, for the large number of records you populate Collection once and keep reusing it, while VLOOKUP does search the entire target range repeatedly (which is way less efficient, although built-in formulas run in parallel on multiple cores plus Microsoft definitely built-in some caching for increased efficiency for repeated searches). Even then a single-threaded VBA collections should still be faster.
See example below with more information in the in-line comments.
"Big Table" is on Sheet1:
"Small Table" is on Sheet2:
And the code that matches records in small table to those in the big one:
Option Explicit
Sub matchRows()
' this is where the big table is
Dim w1 As Worksheet
Set w1 = Worksheets("Sheet1")
' this is where the small table is
Dim w2 As Worksheet
Set w2 = Worksheets("Sheet2")
Dim c As New Collection ' list of match keys in big table 1
Dim r As Range
' assume the match key is in col1 in both tables
' enumerate the keys in the big table
For Each r In w1.Range(w1.[a2], w1.[a2].End(xlDown))
c.Add r, r ' this stores the range (first param) and
' its key (second param - taken as string
' (value of the range), must be unique)
Next r
' now lets try to match / vlookup records in small table against
' big table
For Each r In w2.Range(w2.[a2], w2.[a2].End(xlDown))
If contains(c, CStr(r)) Then
' you didn't say what you want to do after a match, so
' I'll just display matched key value and row number in debug console
Debug.Print "Found match """ & r & """ at row number " & r.Row
Else
Debug.Print "No match found for """ & r & """ at row number " & r.Row
End If
Next r
End Sub
Function contains(col As Collection, key As String) As Boolean
On Error Resume Next
col.Item key
contains = (Err.Number = 0)
On Error GoTo 0
End Function
Result in Immediate Window:
Found match "data51" at row number 2
Found match "data61" at row number 3
No match found for "data81" at row number 4
Found match "data91" at row number 5

Related

How to delete unselected columns from range

I am new to VBA and am trying to delete unwanted columns loaded from a .csv file. I am importing a large amount of data but then I ask the user what columns they want to keep going by "ID num.". There are a lot of columns with different ID no. and I want to ask the user what they want to keep and delete the rest.
The problem is I need to delete all the other columns the user didn't want but I still need to keep the first 6 columns and the last two columns as that is different information.
Here is what I have so far:
Sub Select()
'the below will take the users inputs
UserValue = InputBox("Give the ID no. to keep seperating with a comma e.g"12,13,14")
'the below will pass the user inputs to the example to split the values
Call Example(UserValue)
End Sub
Sub Example(UserValue)
TestColArray() = Split(UserValue, ",")
For Each TestCol In TestColArray()
' keep all the columns user wants the delete the rest except the first 6 columns and last 2
Next TestCol
End Sub
That is what I have so far, it is not much but the user could put in a lot of columns with different ID number in the input box the way the Excel sheet is laid out all the ID no.s are in row 2 and the first 6 and last 2 columns are blank of row 2 since the ID no. does not apply. I hope that helps.
try this (commented) code:
Option Explicit '<--| use this statament: at the cost of having to declare all used variable, your code will be much easier to debug and maintainable
Sub MySelect()
Dim UserValue As String
'the below will take the users inputs
UserValue = Application.InputBox("Give the ID no. to keep seperating with a comma e.g: ""12,13,14""", Type:=2) '<--| use Type:=2 to force a string input
'the below will pass the user inputs to the example to split the values
Example UserValue '<--| syntax 'Call Example(UserValue)' is old
End Sub
Sub Example(UserValue As String)
Dim TestCol As Variant
Dim cellsToKeep As String
Dim firstIDRng As Range, lastIDRng As Range, IDRng As Range, f As Range
Set firstIDRng = Range("A2").End(xlToRight) '<-- first ID cell
Set lastIDRng = Cells(2, Columns.Count).End(xlToLeft) '<-- last ID cell
Set IDRng = Range(firstIDRng, lastIDRng) '<--| IDs range
cellsToKeep = firstIDRng.Offset(, -6).Resize(, 6).Address(False, False) & "," '<--| initialize cells-to-keep addresses list with the first six blank cells at the left of first ID
For Each TestCol In Split(Replace(UserValue, " ", ""), ",") '<--| loop through passed ID's
Set f = IDRng.Find(what:=TestCol, LookIn:=xlValues, lookat:=xlWhole, MatchCase:=False) '<--| search for the current passed IDs range
If Not f Is Nothing Then cellsToKeep = cellsToKeep & f.Address(False, False) & "," '<--| if the current ID is found then update cells-to-keep addresses list
Next TestCol
cellsToKeep = cellsToKeep & lastIDRng.Offset(, 1).Resize(, 2).Address(False, False) '<--| finish cells-to-keep addresses list with the firts two blank cells at the right of last ID
Range(cellsToKeep).EntireColumn.Hidden = True '<-- hide columns-to-keep
ActiveSheet.UsedRange.EntireColumn.SpecialCells(xlCellTypeVisible).EntireColumn.Delete '<--| delete only visible rows
ActiveSheet.UsedRange.EntireColumn.Hidden = False '<-- unhide columns
End Sub
it's assumed to be working with currently active worksheet
A simple google search produces this. On the first page of results too. Perhaps this will suit your needs.
If the data set that needs to be deleted is really large (larger than the ranges you want to keep too.) Then perhaps only select the columns you want to have whilst you import the csv? This stackoverflow question shows how to import specific columns.
EDIT:
So from what I believe the OP is stating as the problem, there is a large csv file that is being imported into excel. After importing there is alot of redundant columns that should be deleted. My first thought would be to only import the needed data (columns) in the first place. This is possible via VBA by using the .TextToColumns method with the FieldInfo argument. As stated above, the stackoverflow question linked above provides a means of doing so.
If the selective importing is not an option, and you are still keen on making an inverse of the user selection. One option would be to create 2 ranges (one being the user selected Ranges and the second being the entire sheet), you could perform an intersect check between the two ranges and delete the range if there is no intersection present (ie. delete any cell that is not part of the users selection). This method is provided by the first link I supplied and is quite straight forward.

Manipulating Excel spreadsheet, removing rows based on values in a column and then removing more rows based on values in another column

I have a rather complicated problem.
I have a log file that when put into excel the column "I" contains event IDs, and the column J contains a custom key that keeps a particular even grouped.
All i want to do is remove any rows that do not contain the value of say 102 in the event id column.
And THEN i need to check the custom key (column J) and remove rows that are duplicates since any duplicates will falsely show other statistics i want.
I have gotten as far as being able to retrieve the values from the columns using com objects and .entirecolumn cell value etc, but I am completely stumped as to how i can piece together a solid way to remove rows. I could not figure out how to get the row for each value.
To give a bit more clarity this is my thought process on what i need to do:
If cell value in Column I does not = 102 Then delete the row that cell contains.
Repeat for all rows in spreadsheet.
And THEN-
Read every cell in column J and remove all rows containing duplicates based on the values in column J.
Save spreadsheet.
Can any kind persons help me?
Additional Info:
Column I holds a string that is an event id number e.g = 1029
Column J holds a string that is a mix of numbers and letters = 1ASER0X3NEX0S
Ellz, I do agree with Macro Man in that your tags are misleading and, more importantly, I did indeed need to know the details of Column J.
However, I got so sick of rude posts today and yours was polite and respectful so I've pasted some code below that will do the trick ... provided Column J can be a string (the details of which you haven't given us ... see what Macro Man's getting at?).
There are many ways to test for duplicates. One is to try and add a unique key to a collection and see if it throws an error. Many wouldn't like that philosophy but it seemed to be okay for you because it also gives you a collection of all the unique (ie remaining) keys in Column J.
Sub Delete102sAndDuplicates()
Dim ws As Worksheet
Dim uniques As Collection
Dim rng As Range
Dim rowPair As Range
Dim iCell As Range
Dim jCell As Range
Dim delRows As Range
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set rng = Intersect(ws.UsedRange, ws.Range("I:J"))
Set uniques = New Collection
For Each rowPair In rng.Rows
Set iCell = rowPair.Cells(, 1)
Set jCell = rowPair.Cells(, 2)
On Error Resume Next
uniques.Add jCell.Value2, jCell.Text
If Err = 457 Or iCell.Value2 = 102 Then
On Error GoTo 0
If delRows Is Nothing Then
Set delRows = rowPair.EntireRow
Else
Set delRows = Union(delRows, rowPair.EntireRow)
End If
End If
Next
If Not delRows is Nothing then
MsgBox delRows.Address(False, False) & " deleted."
delRows.Delete
End If
End Sub
There are a number of ways in which this can be done, and which is best will depend on how frequently you perform this task and whether you want to have it fully automated. Since you've tagged your question with VBA I assume you'll be happy with a VBA-based answer:
Sub removeValues()
Range("I1").Select 'Start at the top of the I column
'We are going to go down the column until we hit an empty row
Do Until IsEmpty(ActiveCell.Value) = True
If ActiveCell.Value <> 102 Then
ActiveCell.EntireRow.Delete 'Then delete the row
Else
ActiveCell.Offset(1).Select 'Select the cell below
End If
Loop
'Now we have removed all non-102 values from the column, let`s remove the duplicates from the J column
Range("A:J").RemoveDuplicates Columns:=10, Header:=xlNo
End Sub
The key line there is Range("A:J").RemoveDuplicates. It will remove rows from the range you specify according to duplicates it finds in the column you specify. In that case, it will remove items from the A-J columns based on duplicates in column 10 (which is J). If your data extends beyond the J column, then you'll need to replace "A:J" with the appropriate range. Note that the Columns value is relative to the index of the first column, so while the J column is 10 when that range starts at A (1), it would be 2 for example if the range were only I:J. Does that make sense?
(Note: Using ActiveCell is not really best practice, but it's the method that most obviously translates to what you were trying to do and as it seems you're new to VBA I thought it would be the easiest to understand).

VBA Word table with unknown number of fused rows/columns

I'm currently trying to work with complex tables in Microsoft Word. My problem is, those tables have fused cells and rows, and I'm not sure of how many rows or columns i'll have.
Here is a (stupid) example how the kind of tables i'll have
I get my table thanks to a bookmark, and then proceed to stock the table in a Dim for easier access
Sub SetTable()
Dim tb as Table
Selection.GoTo What:=wdGoToBookmark, Name:="MyTable"
Selection.MoveDown
Set tb = Selection.Tables(1)
End Sub
Now, I'd use that table to write in several tables of a database.
Let's say, I have a table "Destinations", a table "Ways" and a table "Time"
I'm kinda blocked there.
With fused rows and columns, i cannot access a whole column or row. But as i don't know how many rows and columns i have (i could have, for example, 5 different ways for "Destination 1", or several distances in "Way 1")
I am a little lost on how i should try to work.
Cell(x,y).Row doesn't work because several rows are fused, and it is the same with Column, so we get errors extremely easily
I was thinking of putting tables in cells that might get an unknown number of rows/columns, a bit like this
The Problem with this method is that the person that'll write in the document won't be me. Meaning, if he has to create a table each time there is a new line/column that requires it, chance is that it'll become a problem quickly.
(I haven't found yet a method to put something in a given cell of a table at the creation of a new line, I'm also open on that point)
I was wondering if there are best practices to apply in this kind of case, and I am looking for advices too.
If you already had to treat something similar to this, how did you do?
Thanks in advance for your answers
Cordially,
Zawarudio
Note : The example of table here is insanely stupid, and even I don't even know what it's talking about. It was just to put informations in the tables, and have absolutely no link with what I'm trying to do.
If you were lost by the distances/times/whatever, sorry about that
I had some vacations so I didn't work on that question before now.
I just found a way that I felt was relevant, so I come here to share my answer
Note that I only worked on an unknown number of merged rows at the moment, so this answer will only be about that, though I believe it is the same. Also note that I'm on Word 2010. I don't know if rows/column behavior changed in 2013 or will change in the future. (well, obviously)
The big problem was that a merged row cell will only have a value of the first row of the merged row. Let's take a simple example
This table has 2 rows and 2 columns. We fused the rows of the 1st column.
table.Rows.Count will return 2, so will table.Columns.count.
table.cell(1,1).Range.text will return the content of the merged rows.
We would like table.cell(2,1).Range.text to return the value of the merged row, but VBA tells us here that this value doesn't exist.
There is no problem with table.cell(1,2).Range.text and table.cell(2,2).Range.text.
With values, that means that our table with merged rows is pretty equals to that
Where each empty cell would generate an error 5941.
How to resolve the problem?
Sub ReadAllRows()
Dim NbRows As Integer
Dim NbColumns As Integer
Dim i, j As Integer
Dim SplitStr() As String
Dim col1 as String
Dim col2 as String
Dim col3 as String
Dim col4 as String
'note : my table here is a public value that i get thanks to bookmarks
NbRows = table.Rows.count
NbColumns = table.Columns.count
For i = 3 To NbRows
'We put each value of each columns in a dim
'We do that to remember previously entered row value if the application encounters an error
'Because of merged rows, some cells on each row will not exist and return an error
'When the application encounters an error, it just proceeds to next column
'As previous existing value of this column was stocked in a Dim, we can get the full row at the end of the column loop
For j = 1 To NbColumns
On Error GoTo ErrorHandler
SplitStr = Split(table.Cell(i, j).Range.Text, Chr(13))
Select Case j
Case 1:
col1 = SplitStr(0)
Case 2:
col2 = SplitStr(0)
Case 3:
col3 = SplitStr(0)
Case 4:
col4 = SplitStr(0)
'ect...
End Select
NextRow:
Next j
'We have here all the values of the line
MsgBox "col1: " & col1 & Chr(10) & _
"col2: " & col2 & Chr(10) & _
"col3: " & col3 & Chr(10) & _
"col4: " & col4 & Chr(10)
Next i
'This Error handler will skip the whole Select Case and thus will proceed towards next cell
ErrorHandler:
If Err.Number = 5941 Then
Err.Clear
Resume NextRow
End If
End Sub
That way, when a cell doesn't exist, that mean the row if merged. Meaning we want the last known value of the row. Since we skip the whole select when row is unknown, the value of the Dim isn't changed while we do get right the value of not merged rows.
This isn't rocket science, but I first began with a simple On Error Resume Next, and with that, non-existing rows simply had the value of last existing row, so I also had to work on a function that would try to get the good value for each cell of each row...
Note that I did things the ugly way here, but you can use a one dimensionnal arrays to stock an entire row the way Word is supposed to understand it, or you can even get a two dimensionnal array stocking your whole table in it a way Word understands
Well, I hope it helps someone, someday!
Cordially,
Zawarudio
I think there must be an existing Q/A about this but I didn't find it using a quick search, so for now...
One thing you can do is iterate through the cells of the range of the table. Like this:
Sub iterTable()
Dim r As Range
Set r = ActiveDocument.Tables(1).Range
For i = 1 To r.Cells.Count
Debug.Print r.Cells(i).RowIndex, r.Cells(i).ColumnIndex, r.Cells(i).Range.Text
Next
End Sub
As long as you have predefined texts that will allow you to detect your "Destination" groups, that should be enough for you to make progress...

VBA - merge set number of rows in first column

I have seen some VBA examples on here allowing one to merge set numbers of cells, but none exactly as I need it.
What I would like to do is go down the entire column A:A and merge every four rows, starting with cell A4. I know this involves changing the reference cell but I'm not skilled enough with the language to know how to do this without screwing up the cycle.
Here is an example of the data I would like to format. Thanks in advance for any and all help with this.
Simply set Count to the number of merged cells that you want and run the MergeColA.
Sub MergeColA()
Dim Count As Integer
Count = 10
MergeCells (Count)
End Sub
Sub MergeCells(Count As Integer)
For i = 4 To 4 * count Step (4)
Dim r As Range
Set r = Range("A" & i, "A" & i + 3)
r.Merge
Next i
End Sub

compare huge text files using vba

I gotta serious problem here.. any kind of help is much appreciated!!
I have two huge text files (130 MB)each with thousands of records in each. I need to compare the two files using vba or by any means and generate a spreadsheet which includes the header and with two additional columns. The two additional columns will be the file name and in the next column it should display in which particular column is error. Each record will be having multiple discrepancies. One file can have the records which cannot be found in the other file. So this condition should also be recorded in the spreadsheet.
Example:
Media Events: Taking one record from each.
00000018063|112295|000|**0009**|
PROL:
00000018063|112295|000|**0013**|
In the above example, the records are from two files. The highlighted ones are the differences between the records. So the output should be like this..
HH_NUMBER | CLASS_DATE | MV_MIN DURATION File Mismatc Mismatch Reason
00000018063 | 112295 | 000 **0009** Media Events Mismatches in DURATION
00000018063 | 112295 | 000 **0013** PROL Mismatches in DURATION
00000011861 | 112295 | 002 0126 Media Events missing in PROL file
It seems there are three problems here:
1) Find matching records (first column) between two files.
2) Compare records that match on the first column - if there is a difference, record what the difference is
3) If a record exists in one file but not the other, record that.
I am going to assume that the two "huge files" are in fact separate sheets in the same excel workbook, and that the records are sorted on the first key. This will speed up processing significantly. But speed is a secondary concern, I assume. I also assume there is a third sheet where you put the output.
Here is an outline of VBA code - you will have to do a bit of work to get it "just right" for your application, but I hope this gets you going.
Sub compare()
Dim s1 as Worksheet
Dim s2 as Worksheet
Dim col1 as Range
Dim col2 as Range
Dim c as Range
Dim record1 As Range, record2 As Range, output As Range
Dim m
Dim numCols as Integer
numCols = 5 ' however many columns you want to compare over
Set s1 = Sheets("Media")
Set s2 = Sheets("Pro")
Set output = Sheets("output").Range("A2")
Application.ScreenUpdating = False
s1.Select
Set col1 = Range("A2", [A2].End(xlDown));
s2.Select
Set col2 = Range("A2", [A2].End(xlDown));
On Error Resume Next
For Each c in col1.Cells
m = Application.Match(c.Value, col2, 0);
If isError(m) Then
' you found a record in 1 but not 2
' record this in your output sheet
output.Value = "Record " & c.Value & " does not exist in Pro"
Set output = output.Offset(1,0) ' next time you write output it will be in the next line
' you will have to do the same thing in the other direction - test all values
' in 2 against 1 to see if any records exist in 2 that don't exist in 1
Else
' you found matching records
Set record1 = Range(c, c.offset(0, numCols))
Set record2 = Range(col2.Cells(m,1), col2.Cells(m,numCols))
' now you call another function to compare these records and record the result
' using the same trick as above to "go to the next line" - using output.Offset(1,0)
End If
Next c
End Sub
You could do this with formulas:
See
MS KB: Use Excel to compare two lists of data
Me Excel.com - Creating a list of non-matching values
ExcelExperts.com - Extracting non-matching entries from two columns in a third column
To give you an idea, basically, if you have two lists in columns A & B, you could use formulas like below in columns C and D to show the matching or non-matching:
In C1,
=If(isna(match(A1,B:B,0)),A1,"")
and, in D1
=IF(Isna(Match(B1,A:A,0)),B1,"")
both copied down.
FURTHER READING:
Excel Index Function and Match Function - Contextures MVP
Excel VLOOKUP and Index & Match - Excel User MVP
Excel User MVP - Excel’s Best Lookup Method: INDEX-MATCH