Compare 2 sheets with different headers - vba

I have 2 different files which have different headers, for example:
OldfileHeaders | NewFileheaders
ID | Test ID
Date | New date
and so on. I am trying to compare the data in both sheets and see if they match. The rows of data may be in different order and the headers may also be in different order.
So what I am trying to do is:
1) define which headers match which headers between the 2 files
2) find the ID from the oldfile and see if it is in the new file, if it is then see if the data under each header matches. If it doesn't then export that row of data to a new sheet add a column and label it "Missing".
The Code So far:
Set testIdData = testIdData.Resize(testIdData.CurrentRegion.Rows.Count)
Do Until sourceId.Value = ""
datacopy = False
' Look for ID in test data
Set cellFound = testIdData.Find(What:=sourceId.Value, LookIn:=xlValues, LookAt:=xlWhole, MatchCase:=False)
If cellFound Is Nothing Then
' This entry not found, so copy to output
datacopy = True
outputRange.Resize(ColumnSize:=NUMCOLUMNS).Interior.Color = vbRed
Else
' This assumes that columns are in same order
For columnNum = 2 To NUM_COLUMNS_DATA
' No need to test the ID column
If sourceId.Cells(ColumnIndex:=columnNum).Value <> cellFound.Cells(ColumnIndex:=columnNum).Value Then
outputRange.Cells(ColumnIndex:=columnNum).Interior.Color = vbYellow
datacopy = True
End If
Next columnNum
End If
If datacopy Then
sourceId.Resize(ColumnSize:=NUMCOLUMNS).Copy
outputRange.PasteSpecial xlPasteValuesAndNumberFormats
Application.CutCopyMode = False
Set outputRange = outputRange.Offset(RowOffset:=1)
difference = difference + 1
End If
Set sourceId = sourceId.Offset(RowOffset:=1)
Loop
This code works depending on me formatting the sheets in the correct order and changing the header names.
I need help in defining which field names match which field names within the 2 sheets, and then searching the new sheet for each ID and seeing if the data in the corresponding cells match. If the ID is not in the sheet then output that row too a different sheet. If the id is present and there are differences in the cells then out put these to the shame sheet. I want to produce a tally of differences in each column.

Matching up data between data sets requires that you give the program some help. In this case, the help needed is which columns are related to each other. You have identified a small table of how headers are related. With this, you can do the various translations from data source 1 to data source 2. It requires heavy usage of Application.Match and Application.VLookup.
I will provide a base example which does the core of what you are trying to do. It is much easier to see it all on one sheet which is what I have done.
Picture of data shows three tables: rng_headers, rng_source, and rng_dest. One is the lookup for the headers, the second is the "source" data, and the third is the data source to compare against which I will call destination = "dest".
Code include steps to: iterate through all the IDs in the source data, check if they exist in the dest data, and, if so, check all the individual values for equality. This code checks the headers on every step (which is slow) but allows for the data to be out of order.
Sub ConfirmHeadersAndMatch()
Dim rng_headers As Range
Set rng_headers = Range("B3").CurrentRegion
Dim rng_dest As Range
Set rng_dest = Range("I2").CurrentRegion
Dim rng_source As Range
Set rng_source = Range("E2").CurrentRegion
Dim rng_id As Range 'first column, below header row
For Each rng_id In Intersect(rng_source.Columns(1).Offset(1), rng_source)
Dim str_header As Variant
str_header = Application.VLookup( _
Intersect(rng_id.EntireColumn, rng_source.Rows(1)), _
rng_headers, 2, False)
'get col number
Dim int_col_id As Integer
int_col_id = Application.Match(str_header, rng_dest.Rows(1), 0)
'find ID in the new column
Dim int_row_id As Variant
int_row_id = Application.Match(rng_id, rng_dest.Columns(int_col_id), 0)
If IsError(int_row_id) Then
'ID missing... do something
rng_id.Interior.Color = 255
Else
Dim rng_check As Range 'all values, same row
For Each rng_check In Intersect(rng_source, rng_id.EntireRow)
'get col number
str_header = Application.VLookup( _
Intersect(rng_check.EntireColumn, rng_source.Rows(1)), _
rng_headers, 2, False)
int_col_id = Application.Match(str_header, rng_dest.Rows(1), 0)
'check value
If rng_check.Value <> rng_dest.Cells(int_row_id, int_col_id).Value Then
'values did not match... do something
rng_dest.Cells(int_row_id, int_col_id).Interior.Color = 255
End If
Next rng_check
End If
Next
End Sub
Notes on the code
Ranges are built on CurrentRegion which picks out the blocks of data. You can swap these out for different ranges on different sheets.
Column header translation is done with Application.VLookup to check the source header and return the destination header. This String is then found in the destination header row using Application.Match. You could abstract this code into a Function to avoid repeating it twice.
Once the column is found, the ID is searched for in the destination table using Application.Match. This will return an error if the ID is not found.
If the ID is found, it then checks all of the other values in the same row, comparing them against the correct columns in the destination table. Non-matching results are colored red.
If all of the columns do not have pairs, you can add additional checks on the VLookup or the column Match to check this.
The vast majority of this code just handles getting to the correct spots in the data using Intersect, Rows, and Columns.
Results show some red values for the ID not found and the values that don't match.

Related

How to delete unselected columns from range

I am new to VBA and am trying to delete unwanted columns loaded from a .csv file. I am importing a large amount of data but then I ask the user what columns they want to keep going by "ID num.". There are a lot of columns with different ID no. and I want to ask the user what they want to keep and delete the rest.
The problem is I need to delete all the other columns the user didn't want but I still need to keep the first 6 columns and the last two columns as that is different information.
Here is what I have so far:
Sub Select()
'the below will take the users inputs
UserValue = InputBox("Give the ID no. to keep seperating with a comma e.g"12,13,14")
'the below will pass the user inputs to the example to split the values
Call Example(UserValue)
End Sub
Sub Example(UserValue)
TestColArray() = Split(UserValue, ",")
For Each TestCol In TestColArray()
' keep all the columns user wants the delete the rest except the first 6 columns and last 2
Next TestCol
End Sub
That is what I have so far, it is not much but the user could put in a lot of columns with different ID number in the input box the way the Excel sheet is laid out all the ID no.s are in row 2 and the first 6 and last 2 columns are blank of row 2 since the ID no. does not apply. I hope that helps.
try this (commented) code:
Option Explicit '<--| use this statament: at the cost of having to declare all used variable, your code will be much easier to debug and maintainable
Sub MySelect()
Dim UserValue As String
'the below will take the users inputs
UserValue = Application.InputBox("Give the ID no. to keep seperating with a comma e.g: ""12,13,14""", Type:=2) '<--| use Type:=2 to force a string input
'the below will pass the user inputs to the example to split the values
Example UserValue '<--| syntax 'Call Example(UserValue)' is old
End Sub
Sub Example(UserValue As String)
Dim TestCol As Variant
Dim cellsToKeep As String
Dim firstIDRng As Range, lastIDRng As Range, IDRng As Range, f As Range
Set firstIDRng = Range("A2").End(xlToRight) '<-- first ID cell
Set lastIDRng = Cells(2, Columns.Count).End(xlToLeft) '<-- last ID cell
Set IDRng = Range(firstIDRng, lastIDRng) '<--| IDs range
cellsToKeep = firstIDRng.Offset(, -6).Resize(, 6).Address(False, False) & "," '<--| initialize cells-to-keep addresses list with the first six blank cells at the left of first ID
For Each TestCol In Split(Replace(UserValue, " ", ""), ",") '<--| loop through passed ID's
Set f = IDRng.Find(what:=TestCol, LookIn:=xlValues, lookat:=xlWhole, MatchCase:=False) '<--| search for the current passed IDs range
If Not f Is Nothing Then cellsToKeep = cellsToKeep & f.Address(False, False) & "," '<--| if the current ID is found then update cells-to-keep addresses list
Next TestCol
cellsToKeep = cellsToKeep & lastIDRng.Offset(, 1).Resize(, 2).Address(False, False) '<--| finish cells-to-keep addresses list with the firts two blank cells at the right of last ID
Range(cellsToKeep).EntireColumn.Hidden = True '<-- hide columns-to-keep
ActiveSheet.UsedRange.EntireColumn.SpecialCells(xlCellTypeVisible).EntireColumn.Delete '<--| delete only visible rows
ActiveSheet.UsedRange.EntireColumn.Hidden = False '<-- unhide columns
End Sub
it's assumed to be working with currently active worksheet
A simple google search produces this. On the first page of results too. Perhaps this will suit your needs.
If the data set that needs to be deleted is really large (larger than the ranges you want to keep too.) Then perhaps only select the columns you want to have whilst you import the csv? This stackoverflow question shows how to import specific columns.
EDIT:
So from what I believe the OP is stating as the problem, there is a large csv file that is being imported into excel. After importing there is alot of redundant columns that should be deleted. My first thought would be to only import the needed data (columns) in the first place. This is possible via VBA by using the .TextToColumns method with the FieldInfo argument. As stated above, the stackoverflow question linked above provides a means of doing so.
If the selective importing is not an option, and you are still keen on making an inverse of the user selection. One option would be to create 2 ranges (one being the user selected Ranges and the second being the entire sheet), you could perform an intersect check between the two ranges and delete the range if there is no intersection present (ie. delete any cell that is not part of the users selection). This method is provided by the first link I supplied and is quite straight forward.

Faster Workflow

I have a table (Table 1) with a whole bunch of well data (versions, MD, HD, etc.) and I want to create another table (Table 2) that will only show the data for the well I am interested in.
I have it set up where you select the well using a drop down list. Then I want Table 2 to be populated with four values for each of the iterations that show up in Table 1....
I tried using vlookup but was having issues when a well had multiple versions. And I also tried using an advanced filter.
Screenshot of the spreadsheet
Let's solve this using a helper column. First, assume column A will be used to the left of your table, to show the row number which each one of these is found in.
A5 would have the following formula:
=MATCH($C$1,K:K,0)
This shows us the row number that Well1 is first matched at. Then A6 and copied down would have the formula:
=A5+MATCH(B6,OFFSET(K1,A5,0,COUNT(M:M),1),0)
This uses OFFSET to create a new range, starting at the cell immediately below the previous match for Well1, and then uses MATCH to find what row that occurs.
So now, column A will always show the row number to pull data from. The rest is simply using the INDEX function to pull from your desired columns. For example, the data in column C pulls the iteration from column L, and can be pulled through formula like so, in cell C5 and copied to the right / down:
=INDEX(L:L,$A5)
If your data is appropriately normalized, you might be better off with a Pivot Table. This would give you the option of filtering by Well ID.
To use a Advanced filter you will need to create a worksheet event. Place this in the code for the sheet on which you want the data.
Private Sub Worksheet_Change(ByVal Target As Range)
If Not Intersect(Target, Range("A2")) Is Nothing Then
Dim dataRng As Range
Dim critRng As Range
Dim CpyToRng As Range
Dim cpytoarr() As Variant
With Worksheets("Sheet1")
Set dataRng = .Range(.Cells(1, 1), .Cells(1, 1).End(xlDown).End(xlToRight))
End With
With Me
.Range("CC1") = .Cells(1, 1).Value
.Range("CC2") = "'=" & .Cells(2, 1).Value
Set critRng = .Range("CC1:CC2")
Set CpyToRng = .Range(.Cells(6, 1), .Cells(6, 1).End(xlToRight))
End With
Debug.Print dataRng.Address
Debug.Print critRng.Address
Debug.Print CpyToRng.Address
dataRng.AdvancedFilter Action:=xlFilterCopy, _
CriteriaRange:=critRng, CopyToRange:=CpyToRng, _
Unique:=False
critRng.ClearContents
End If
End Sub
How this works. This assumes the data is on Sheet1 and starts in "A1" with no blanks in column A or the last row:
On Sheet2 set it up like this:
It is important that the header rows on sheet2 are name identical to the headers on sheet1.
Now every time that the value changes in A2 on sheet 2, your drop down, the requisite data will appear below row 6.

Copy values from multiple columns in same row VBA

I'm a VBA noob with a massive table of analysis results where each row is containing results from a different date. I'm looking for values in certain columns, and they are some times empty. When they are, I'm not interested in them. The values I want are supposed to be copied to a new sheet to be collected in a smaller table.
I have written a script where I loop through the rows in the masterTable, and I am able to identify the rows with the values I'm interested in. However, I am not able to copy the value from the different cells in the identified row to a new sheet.
I've tried using Union to make a range inlcuding the columns that are relevant for copying.
Dim searchCells As Range
Dim masterTable As Range
Set searchCells = Union(Columns("R"), Columns("S"), Columns("T"), Columns("X"), Columns("Z"), Columns("AF"), Columns("AQ"), Columns("AT"), Columns("AY"), Columns("AV"), Columns("BB"), Columns("BD"), Columns("BG"))
Set masterTable = Worksheets("Sheet0").Range("A3:BG2022")
a = 1
For i = 1 To masterTable.Rows.Count
If Application.WorksheetFunction.CountA(searchCells(i).Value) <> 0 Then ' look for values among the relevant columns in row(i).
Debug.Print "Found data at "; i
Worksheets("Sheet0").searchCells.Rows(i).Copy ' copy data from searchCells
Worksheets("Results").Range("C1").Offset(a, 0).paste ' paste data to destination
a = a + 1 ' increment destination row offset
End If
Next
My idea of the searchCells are not working, as I "find data" in all rows, and I'm not able to run the .Copy and the .Paste methods. All help is appreaciated!
EDIT: upon compilation VBA throws the following error on the copy-line:
Run-time error '438':
Object doesn't support this property or method
To copy the relevant rows (for those columns only) where there is data in at least one cell, you could use:
Dim searchCells As Range
Dim masterTable As Range
Dim rRow As Range
Set searchCells = Range("R:T,X:X,Z:Z,AF:Af,AQ:Aq,AT:AT,AV:AV,AY:AY,BB:BB,BD:BD,BG:BG")
Set masterTable = Worksheets("Sheet0").Range("A3:BG2022")
a = 1
For i = 1 To masterTable.Rows.Count
Set rRow = Intersect(searchCells, searchCells.Cells(i, 1).EntireRow)
If Application.WorksheetFunction.CountA(rRow) <> 0 Then ' look for values among the relevant columns in row(i).
Debug.Print "Found data at "; i
rRow.Copy Worksheets("Results").Range("C1").offset(a, 0) ' paste data to destination
a = a + 1 ' increment destination row offset
End If
Next

Manipulating Excel spreadsheet, removing rows based on values in a column and then removing more rows based on values in another column

I have a rather complicated problem.
I have a log file that when put into excel the column "I" contains event IDs, and the column J contains a custom key that keeps a particular even grouped.
All i want to do is remove any rows that do not contain the value of say 102 in the event id column.
And THEN i need to check the custom key (column J) and remove rows that are duplicates since any duplicates will falsely show other statistics i want.
I have gotten as far as being able to retrieve the values from the columns using com objects and .entirecolumn cell value etc, but I am completely stumped as to how i can piece together a solid way to remove rows. I could not figure out how to get the row for each value.
To give a bit more clarity this is my thought process on what i need to do:
If cell value in Column I does not = 102 Then delete the row that cell contains.
Repeat for all rows in spreadsheet.
And THEN-
Read every cell in column J and remove all rows containing duplicates based on the values in column J.
Save spreadsheet.
Can any kind persons help me?
Additional Info:
Column I holds a string that is an event id number e.g = 1029
Column J holds a string that is a mix of numbers and letters = 1ASER0X3NEX0S
Ellz, I do agree with Macro Man in that your tags are misleading and, more importantly, I did indeed need to know the details of Column J.
However, I got so sick of rude posts today and yours was polite and respectful so I've pasted some code below that will do the trick ... provided Column J can be a string (the details of which you haven't given us ... see what Macro Man's getting at?).
There are many ways to test for duplicates. One is to try and add a unique key to a collection and see if it throws an error. Many wouldn't like that philosophy but it seemed to be okay for you because it also gives you a collection of all the unique (ie remaining) keys in Column J.
Sub Delete102sAndDuplicates()
Dim ws As Worksheet
Dim uniques As Collection
Dim rng As Range
Dim rowPair As Range
Dim iCell As Range
Dim jCell As Range
Dim delRows As Range
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set rng = Intersect(ws.UsedRange, ws.Range("I:J"))
Set uniques = New Collection
For Each rowPair In rng.Rows
Set iCell = rowPair.Cells(, 1)
Set jCell = rowPair.Cells(, 2)
On Error Resume Next
uniques.Add jCell.Value2, jCell.Text
If Err = 457 Or iCell.Value2 = 102 Then
On Error GoTo 0
If delRows Is Nothing Then
Set delRows = rowPair.EntireRow
Else
Set delRows = Union(delRows, rowPair.EntireRow)
End If
End If
Next
If Not delRows is Nothing then
MsgBox delRows.Address(False, False) & " deleted."
delRows.Delete
End If
End Sub
There are a number of ways in which this can be done, and which is best will depend on how frequently you perform this task and whether you want to have it fully automated. Since you've tagged your question with VBA I assume you'll be happy with a VBA-based answer:
Sub removeValues()
Range("I1").Select 'Start at the top of the I column
'We are going to go down the column until we hit an empty row
Do Until IsEmpty(ActiveCell.Value) = True
If ActiveCell.Value <> 102 Then
ActiveCell.EntireRow.Delete 'Then delete the row
Else
ActiveCell.Offset(1).Select 'Select the cell below
End If
Loop
'Now we have removed all non-102 values from the column, let`s remove the duplicates from the J column
Range("A:J").RemoveDuplicates Columns:=10, Header:=xlNo
End Sub
The key line there is Range("A:J").RemoveDuplicates. It will remove rows from the range you specify according to duplicates it finds in the column you specify. In that case, it will remove items from the A-J columns based on duplicates in column 10 (which is J). If your data extends beyond the J column, then you'll need to replace "A:J" with the appropriate range. Note that the Columns value is relative to the index of the first column, so while the J column is 10 when that range starts at A (1), it would be 2 for example if the range were only I:J. Does that make sense?
(Note: Using ActiveCell is not really best practice, but it's the method that most obviously translates to what you were trying to do and as it seems you're new to VBA I thought it would be the easiest to understand).

VBA. Comparing values of columns names in two separate worksheets, copy nonmatching columns to third worksheet

So, I've explored a few answered VBA Questions, but I'm still stuck. I have three sheets "By_Oppt_ID", "Top_Bottom" and "Non_Top_Bottom". The first two have a large amount of columns each with a unique name. Now there are some columns in By_Oppt_ID that aren't in "Top_Bottom". So I want to compare each column name in By_Oppt_ID to every column name in "Top_Bottom", and if the column name isn't found, copy that column name and all the rows beneath it, to a third worksheet "Non_Top_Bottom".
So Here's what I have:
Sub Copy_Rows_If()
Dim Range_1 As Worksheet, Range_2 As Worksheet
Dim c As Range
Set Range_1 = Workbooks("Complete_Last_Six_Months_Q_Results.xlsx").Sheets("Top_Bottom")
Set Range_2 = Workbooks("Complete_Last_Six_Months_Q_Results.xlsx").Sheets("By_Oppt_ID")
Application.ScreenUpdating = False ' Stays on the same screen even if referencing different worksheets
For Each c In Range_2.Range("A2:LX2")
' Checks for values not in Range_1
If Application.WorksheetFunction.CountIf(Range_1.Range("A1:CR1"), c.Value) = 0 Then
' If not, copies rows to new worksheet
' LR = .Cells(Row.Count, c).End(xUp).Row
c = ActiveCell
Sheets("By_Oppt_ID").Range("Activecell", "ActiveCell.End(xlDown)").Copy Destination:=Workbooks("Complete_Last_Six_Months_Q_Results.xlsx").Sheets("Non_Top_Bottom").Range("A1:A6745")
Set rgPaste = rgPaste.Offset(0, 1) 'Moves to the next col, but starts at the same row position
End If
Next c
End Sub
I've compiled this many ways and keep getting a series of errors: Subscript Out of Range/ Method "Global_Range" Failure. What am I doing wrong?
If you are going to have this code within the same workbook every time, try using
ThisWorkbook.Sheets("Top_Bottom")
instead of
Workbooks("Complete_Last_Six_Months_Q_Results.xlsx").Sheets("Top_Bottom")
replicate that through your code and see if that fixes the problem.
What do you mean by c = Activecell? Do you mean to say c.activate?
You might then also want to change the next line to
Sheets("By_Oppt_ID").Range(Activecell, ActiveCell.End(xlDown)).Copy Workbooks("Complete_Last_Six_Months_Q_Results.xlsx").Sheets("Non_Top_Bottom").Range("A1")