compare huge text files using vba - vba

I gotta serious problem here.. any kind of help is much appreciated!!
I have two huge text files (130 MB)each with thousands of records in each. I need to compare the two files using vba or by any means and generate a spreadsheet which includes the header and with two additional columns. The two additional columns will be the file name and in the next column it should display in which particular column is error. Each record will be having multiple discrepancies. One file can have the records which cannot be found in the other file. So this condition should also be recorded in the spreadsheet.
Example:
Media Events: Taking one record from each.
00000018063|112295|000|**0009**|
PROL:
00000018063|112295|000|**0013**|
In the above example, the records are from two files. The highlighted ones are the differences between the records. So the output should be like this..
HH_NUMBER | CLASS_DATE | MV_MIN DURATION File Mismatc Mismatch Reason
00000018063 | 112295 | 000 **0009** Media Events Mismatches in DURATION
00000018063 | 112295 | 000 **0013** PROL Mismatches in DURATION
00000011861 | 112295 | 002 0126 Media Events missing in PROL file

It seems there are three problems here:
1) Find matching records (first column) between two files.
2) Compare records that match on the first column - if there is a difference, record what the difference is
3) If a record exists in one file but not the other, record that.
I am going to assume that the two "huge files" are in fact separate sheets in the same excel workbook, and that the records are sorted on the first key. This will speed up processing significantly. But speed is a secondary concern, I assume. I also assume there is a third sheet where you put the output.
Here is an outline of VBA code - you will have to do a bit of work to get it "just right" for your application, but I hope this gets you going.
Sub compare()
Dim s1 as Worksheet
Dim s2 as Worksheet
Dim col1 as Range
Dim col2 as Range
Dim c as Range
Dim record1 As Range, record2 As Range, output As Range
Dim m
Dim numCols as Integer
numCols = 5 ' however many columns you want to compare over
Set s1 = Sheets("Media")
Set s2 = Sheets("Pro")
Set output = Sheets("output").Range("A2")
Application.ScreenUpdating = False
s1.Select
Set col1 = Range("A2", [A2].End(xlDown));
s2.Select
Set col2 = Range("A2", [A2].End(xlDown));
On Error Resume Next
For Each c in col1.Cells
m = Application.Match(c.Value, col2, 0);
If isError(m) Then
' you found a record in 1 but not 2
' record this in your output sheet
output.Value = "Record " & c.Value & " does not exist in Pro"
Set output = output.Offset(1,0) ' next time you write output it will be in the next line
' you will have to do the same thing in the other direction - test all values
' in 2 against 1 to see if any records exist in 2 that don't exist in 1
Else
' you found matching records
Set record1 = Range(c, c.offset(0, numCols))
Set record2 = Range(col2.Cells(m,1), col2.Cells(m,numCols))
' now you call another function to compare these records and record the result
' using the same trick as above to "go to the next line" - using output.Offset(1,0)
End If
Next c
End Sub

You could do this with formulas:
See
MS KB: Use Excel to compare two lists of data
Me Excel.com - Creating a list of non-matching values
ExcelExperts.com - Extracting non-matching entries from two columns in a third column
To give you an idea, basically, if you have two lists in columns A & B, you could use formulas like below in columns C and D to show the matching or non-matching:
In C1,
=If(isna(match(A1,B:B,0)),A1,"")
and, in D1
=IF(Isna(Match(B1,A:A,0)),B1,"")
both copied down.
FURTHER READING:
Excel Index Function and Match Function - Contextures MVP
Excel VLOOKUP and Index & Match - Excel User MVP
Excel User MVP - Excel’s Best Lookup Method: INDEX-MATCH

Related

VBA code to use rather than vlookup

I have two very large tables. One of them has 12 columns and 280K rows. The other ones has 12k rowns and 33 columns. I am using vlookup to look for matching values in large table to small one. Vlookups take forever to calculate. Is there an easy way to do this with a VBA code? Can someone share a sample code for me to replicate?
Thanks
You can use Collection object to quickly find matches. This will work very fast (if not faster than VLOOKUP) because when you add key parameter to Collection - it hashes / indexes its value with the specific goal of fast lookup later).
Moreover, for the large number of records you populate Collection once and keep reusing it, while VLOOKUP does search the entire target range repeatedly (which is way less efficient, although built-in formulas run in parallel on multiple cores plus Microsoft definitely built-in some caching for increased efficiency for repeated searches). Even then a single-threaded VBA collections should still be faster.
See example below with more information in the in-line comments.
"Big Table" is on Sheet1:
"Small Table" is on Sheet2:
And the code that matches records in small table to those in the big one:
Option Explicit
Sub matchRows()
' this is where the big table is
Dim w1 As Worksheet
Set w1 = Worksheets("Sheet1")
' this is where the small table is
Dim w2 As Worksheet
Set w2 = Worksheets("Sheet2")
Dim c As New Collection ' list of match keys in big table 1
Dim r As Range
' assume the match key is in col1 in both tables
' enumerate the keys in the big table
For Each r In w1.Range(w1.[a2], w1.[a2].End(xlDown))
c.Add r, r ' this stores the range (first param) and
' its key (second param - taken as string
' (value of the range), must be unique)
Next r
' now lets try to match / vlookup records in small table against
' big table
For Each r In w2.Range(w2.[a2], w2.[a2].End(xlDown))
If contains(c, CStr(r)) Then
' you didn't say what you want to do after a match, so
' I'll just display matched key value and row number in debug console
Debug.Print "Found match """ & r & """ at row number " & r.Row
Else
Debug.Print "No match found for """ & r & """ at row number " & r.Row
End If
Next r
End Sub
Function contains(col As Collection, key As String) As Boolean
On Error Resume Next
col.Item key
contains = (Err.Number = 0)
On Error GoTo 0
End Function
Result in Immediate Window:
Found match "data51" at row number 2
Found match "data61" at row number 3
No match found for "data81" at row number 4
Found match "data91" at row number 5

How do I compare two columns in different sheets for non existing values and then copy them to the main sheet?

Firstly, I am no expert with VBA, just searching for similar situation copying them, changing the code a bit and hoping for the best.
So I need to make a macro that compares two sheets. One of the sheets is the one that contains history information and all the specific names in Column A, where in the other sheet I paste daily information, where the specific names is always in column C and starts with row 7. The existing names could disappear or new names could be added and there will be duplicates.
What I need is for the code to first compare these two Columns for new names, if such are found copy them and past them in the history sheets A columns 2nd row, the existing names get moved down, so that they don't get deleted.
In short words saying If duplicate do nothing, else copy to history sheet.
Thank you in advance for all the help
Not sure of what your logic would be, but here are some VBA Pointers:
To compare columns in different sheets:
If Sheets("Sheet1").Range("ColRow").Value <> Sheets("Sheet2").Range("Col2Row2").Value Then...
Or you could replace the sheet names with (1) and (2) [or whatever order they are in the workbook].
For instance:
If Sheets(1).Range("A2").Value <> Sheets(2).Range("C7").Value Then ...
Assignment to a cell works similarly. You can use a variable as an index:
Dim i1 as integer
Dim i2 as integer
i2 = 7
For i1 = 1 to 50
If Sheets(1).Range("A" + CStr(i1)).Value <> Sheets(2).Range("C" + CStr(i2)).Value Then
Sheets(1).Range("B" + Cstr(i1)).Value = Sheets(2).Range("C" + CStr(i2)).Value
End If
i2 = i2 + 1
Next i1

Excel VB - Multiple If Column Contains Strings Then

I have looked at a bunch of questions like this, but I have only found formulas, and VB examples that don't appear to check the values of cells in a column.
I was able to derive the following formula in Excel:
=IF(AND(ISNUMBER(SEARCH("Cat",R2)),OR(ISNUMBER(SEARCH("5E",R2)),ISNUMBER(SEARCH("6",R2))), ISNUMBER(SEARCH("Patch",R2)), ISNUMBER(SEARCH("Cables",R2))), "CAT 5E Ethernet Cables")
The problem is that this formula only checks for 1 out of 500 possible values. This is not productive. Plus, I have to make it one big formula when I check the entire row, because if I don't, the formula overwrites the result of the previous formula...
So, Visual Basic... I think I may have better luck scripting some kind of IF ELSE or CASE statement. I just do not understand how to do this in excel. I need to achieve the same thing as the formula above, but instead of checking for one set of conditions,
I need to check for multiple, and populate the S & T columns based on the result of each set of conditions.
I found this webpage that just mentions Excel and shows a VB IF - ELSE statement. How can I make this statement check Excel columns?
I tried the selected answer in this post with no luck:
Private Sub UpdateCategories()
Dim x As Long
For x = 1 To 5000
If InStr(1, Sheet1.Range("$B$" & x), "cat") > 0 And InStr(1, Sheet1.Range("$B$" & x), "5e") > 0 Then
Sheet1.Range("$T$" & x) = Sheet1.Range("$T$" & x) & "CAT 5E Ethernet Cables (Test)"
End If
Next
End Sub
Any help is appreciated. Thanks in advance!
Assuming you choose the route of using a data table sheet to compare to your string.
You would need to have a sheet looking like this (Maybe this is not what you want because I didn't thoroughly understand how your data looks like but the idea remains). You could have sub-category if you want, as well as category, in a third column.
column A | column B
keyword |category
CAT |ATX Cases
5e |Mini-ITX Cases
important words |MicroATX Cases
...
This would need to be filled manually. I'm not sure about the amount of data you're looking at. It can be pretty rapid if you can copy/paste stuff efficiently, depending on the form of your data.
When you have that, loop using this code. I assume the data table is in Sheet1, columns A and B and the values are in Sheet2, column A.
dim listLength as integer 'number of values to look at
dim i as integer
dim dataLength as integer 'number of keywords
dim j as integer
dim keyword as string
dim value as string
listlength = Sheet2.Cells(Rows.Count, "A").End(xlUp).Row - 1 'assuming you start on row 2
datalength = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row - 1 'assuming you start on row 2
for i = 2 to listLength + 1
value = Sheet2.Range("A")(i)
for j = 2 to dataLength + 1
keyword = Sheet1.Range("A")(j)
if instr(value, keyword) <> 0 then
' what happens when the keyword is present goes here
end if
next j
next i

Copying Row Info from one sheet to another based on match

I have an excel book that has two sheets: 1) Import 2) Pricing Rules.
Pricing Rules Sheet
The A column is what I need to match on. Example values include STA_PNP4, STA_PST.. and others. There are potentially around 50 different rows in the sheet, and it will continue to grow over time. Then for each row, there are pricing values in columns B to CF.
Import Sheet
This sheet has the same number of columns, but only Column A is filled out. Example values include STA_PNP4_001_00, STA_PNP4_007_00, STA_PST_010_00.. and many more.
What I need to do:
If the text in Import Sheet Column A before the second "_" matches the column identifer in Pricing Rules Sheet Column A, copy the rest of B to CF of Pricing Rules sheet for that row into the Import sheet for the row it matched on.
Any idea on where to begin with this one?
Why don't you do it using formulas only?
Assuming :
1.) Data in Import Sheet is
(col A)
STA_PNP4_007_00
STA_PNP4_001_00
STA_PNP4_001_00
.
.
2.) Data in Pricing Rules Sheet
(Col A) (col B) (ColC) (Col D) .......
STA_PNP4 1 2 3 .....
STA_PST 4 5 6 .....
STA_ASA2 7 8 9 .....
Then write this formula in B1 cell of Import Sheet
=IFERROR(VLOOKUP(LEFT(A1,FIND("",A1,FIND("",A1)+1)-1),PricingRules!$A$1:$CF$100,2,0),"")
Drag it down in column B
and For Column C , D just change index num from 2 to (3 for C) , (4 for D) and like that.
Because it will continue to grow over time you may be best using VBA. However, even with code I would start by applying the ‘groups’ via formula, so as not to have a spreadsheet overburdened with formulae and hence potentially slow and easy to corrupt. Something like part of #xtremeExcel’s solution which I repeat because the underscores have been treated as formatting commands in that answer:
=LEFT(A1,FIND("_",A1,1+FIND("_",A1))-1)
I’d envisage this (copied down) as an additional column in your Import Sheet - to serve as a key field to link to your Pricing Rules Sheet. Say on the extreme left so available for use by VLOOKUP across the entire sheet.
With that as a key field then either:
Write the code to populate Pricing Rules Sheet as frequently as run/desired. Either populating ‘from scratch’ each time (perhaps best for low volumes) or incrementally (likely advisable for high volumes).
Use VLOOKUP (as suggested). However with at least 84 columns and, presumably, many more than 50 rows that is a lot of formulae, though may be viable as a temporary ‘once off’ solution (ie after population Copy/Paste Special/Values).
A compromise. As 2. But preserve a row or a cell with the appropriate formulae/a and copy that to populate the other columns for your additions to your ColumnA and/or ColumnA:B.
Thanks for the input guys.
I got it implemented via a method like this:
{=VLOOKUP(LEFT($A4,7),PricingRules!A3:CF112,{2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84},FALSE)}
That is my ugly function, applied across a whole row, to look up and copy from my pricing rules every column when it finds a match.
Below is the function that I have created for above scenario. Its working as per the requirement that you have mentioned.
Sub CopyData()
Dim wb As Workbook
Dim importws As Worksheet
Dim PricingRulesws As Worksheet
Dim Pricingrowcount As Integer
Dim importRowCount As Integer
Dim FindValue As String
Dim textvalue As String
Dim columncount As Integer
Dim stringarray() As String
'Enter full address of your file ex: "C:\newfolder\datafile.xlsx"
Set wb = Workbooks.Open("C:\newfolder\datafile.xlsx")
'Enter the name of your "import" sheet
Set importws = Sheets("Import")
'Enter the name of your "Pricing" sheet
Set PricingRulesws = Sheets("PricingRules")
For Pricingrowcount = 1 To PricingRulesws.UsedRange.Rows.Count
FindValue = PricingRulesws.Cells(Pricingrowcount, 1)
For importRowCount = 1 To importws.UsedRange.Rows.Count
textvalue = importws.Cells(importRowCount, 1)
stringarray = Split(textvalue, "_")
textvalue = stringarray(0) & "_" & stringarray(1)
If FindValue = textvalue Then
For columncount = 2 To PricingRulesws.UsedRange.Columns.Count
importws.Cells(importRowCount, columncount) = PricingRulesws.Cells(Pricingrowcount, columncount)
Next columncount
End If
Next importRowCount
Next Pricingrowcount
End Sub

VBA for checking rows and returning cells that match criteria to another sheet

I’m struggling trying to develop an automated solution for the following challenge (ie apart from a button press, no user intervention):
I have a master ‘sheet that contains Accrual figures in monthly columns and an associated Receipt Number for this in a column immediately to the right. The Receipt columns have the month in the form: ‘Feb’, ‘Jun’, ‘Dec’ at the top.
I need to return a variety of cell data to another summary ‘sheet in the workbook from each row that matches the following criteria:
Identify the Receipt column based on a match of the month with an entered month in cell ‘x’ in the summary ‘sheet, eg if I type ‘Mar’, find the column headed ‘Mar’ in the master ’sheet.
Ignoring blank rows (ie there are data in Column A), if the cell in the first row of the Receipt column is blank, identify the row, select 6 different cells and return the contents to 6 specific cells in the first blank row (from a given row number) in the summary ‘sheet – then move onto the next row in the column and continue this process until the end (or a row limit). NB: the 'blank' cell will have a fill colour (conditionally formatted).
If the cell contains data, move to the next row down and continue the process.
A VBA routine I can attach to an onscreen button to update the results each time would do the job nicely. Although I've picked up bits of really useful code here, eg for checking each row for given conditions, I can’t get my head around a working solution to include returning the relevant cells to the summary 'sheet...
Any and all feedback greatly appreciated.
(Sorry - can't post images / screenshots yet...)
Update
Thanks for the quick response, and apols I couldn't append/paste-in a sample of the spreadsheet - apparently I need 10 points... :)
Slightly Different summary sheet (Sheet1). The columns are: Month (Col A), then 4 cells (TBD) to be returned to cols B-E, then Receipt No (Col F). The data come from each identified row in the master sheet (Sheet4) and are placed in the next available blank row in Sheet1. I can mod any examples given to match the actual positions.
I have the criteria for selecting the row (this is from a loop I've got working to identify the row, but using a fixed column reference [26 - Col Z] for a specific month):
If Not Cells(TheRow, 3).Value = "" And Cells(TheRow, 26).Value = "" And Cells(TheRow, 6).Value < Cells(TheRow, 25).Value Then... (do the other bit I'm stuck on)
Hope this is a bit clearer. Appreciate your help.
Can't add a comment, or chat (!) so further update:
Thanks Tony
The month column in the summary is to confirm the month reported on, as this will change when the user overwrites the source cell for the month they wish to parse the master sheet with. The person this data goes to will get these 7 columns and nothing else.
The mechanism has to find the Receipt col (month), then identify that row entry is blank AND col 3 on that row is not blank AND that the Accrual value [Rng.Column - 1] is greater than the PO value (Col 4) on that row. Once these criteria are established, the Receipt, Accrual, PO and 3 other data are selected, copied and pasted into the summary sheet on the next available blank row.
The master sheet has Accrual and Receipt for each month, so there is only one column to identify.
Hope this helps...
I am struggling to visualise you worksheets. I see the master sheet as something like:
... |Accrual| Jan|Accrual| Feb|Accrual| Mar|Accrual| Apr| ...
... | 1.23|A12 | | | 23.67|A14 | | |
... | | | 56.78|C34 | 178.12|C56 | | |
... | 2.34|B678 | | | | | 123.82|B821 |
... | | | | | 96.52|D56 | | |
Somewhere within the Summary sheet there is a list of months of interest.
I need a variable:
Dim MonOfInt As String
and a loop within which a month of interest will be loaded from the worksheet into this variable:
With Sheets("Summary")
MonOfInt = .Cells(R, C).Value
End With
The following is a possible skeleton for the code to move values for the month of interest:
Dim ColCrnt As Long
Dim Rng As Range
Dim RowCrnt As Long
Dim RowLast As Long
:
With Sheets("Master")
' Look for month of interest in row 1
Set Rng = .Rows(1).Find(MonOfInt)
End With
If Rng Is Nothing Then
' No columns for this month in Master
Else
' Accrual and Receipt columns for this month found
ColCrnt = Rng.Column - 1 ' Accrual column for month
' Find last used row in Accrual column
RowLast = .Cells(Rows.Count, ColCrnt).End(xlUp).Row
For RowCrnt = 2 To RowLast
' Code to extract values from Master and move to Summary here
Next
End If
End With
If you can add information to your question to confirm my visualisation and to give more information about (1) how to detect is a row is blank (other than testing the whole row which is an option) and (2) the source and destination of the six cells then I will try to pad out this answer.