How do I compare two columns in different sheets for non existing values and then copy them to the main sheet? - vba

Firstly, I am no expert with VBA, just searching for similar situation copying them, changing the code a bit and hoping for the best.
So I need to make a macro that compares two sheets. One of the sheets is the one that contains history information and all the specific names in Column A, where in the other sheet I paste daily information, where the specific names is always in column C and starts with row 7. The existing names could disappear or new names could be added and there will be duplicates.
What I need is for the code to first compare these two Columns for new names, if such are found copy them and past them in the history sheets A columns 2nd row, the existing names get moved down, so that they don't get deleted.
In short words saying If duplicate do nothing, else copy to history sheet.
Thank you in advance for all the help

Not sure of what your logic would be, but here are some VBA Pointers:
To compare columns in different sheets:
If Sheets("Sheet1").Range("ColRow").Value <> Sheets("Sheet2").Range("Col2Row2").Value Then...
Or you could replace the sheet names with (1) and (2) [or whatever order they are in the workbook].
For instance:
If Sheets(1).Range("A2").Value <> Sheets(2).Range("C7").Value Then ...
Assignment to a cell works similarly. You can use a variable as an index:
Dim i1 as integer
Dim i2 as integer
i2 = 7
For i1 = 1 to 50
If Sheets(1).Range("A" + CStr(i1)).Value <> Sheets(2).Range("C" + CStr(i2)).Value Then
Sheets(1).Range("B" + Cstr(i1)).Value = Sheets(2).Range("C" + CStr(i2)).Value
End If
i2 = i2 + 1
Next i1

Related

Filter and copy certain rows from multiple excel sheets to another

I am using a workbook that has various sheets. I want to copy all the rows from the last 5 sheets that have the value "Pending" in their column "J". I want to create a new tab named "Pending week" and paste all these rows there. Any help would be really appreciated.
Thanks
You can create this yourself very easily if you just break it down:
Add a new Sheet
Name the sheet to Pending Week
Find the five latest sheets.
Create some kind of loop that copy paste row if cells in column J contains the value "Pending"
You have not provided any code, so I'll give you a base to work from:
You add a new sheet & name it using:
Worksheets.Add
ActiveSheet.Name = "Pending week"
Find the five latest sheets
To my knowledge, you can't find the latest sheets. Sheets doesn't contain the date and time of when they were created. But if we ignore that and expect the five latest sheets to be placed in the workbook to the far most right (Default position for newly created sheets). Then you need to figure out how many sheets you have and count backwards.
You can use: Worksheets.Count to count all the sheets. Use this number and count it backwards. My first thought would be to use a For Loop
Dim X As Integer
For X = (Worksheets.Count - 4) To Worksheets.Count
Next
X would be the identifier to find our latest sheets. So you should incorporate that into our loop below. You want to place the loop within this For Block.
Loop
There are many ways to find a value in a sheet, but you need to figure out what the last row of your sheets are. Without it we don't know when the code should stop.
You can use a Do Until Loop if there is a value in all J cells. Then you can simply insert the entire row into Pending week
It would look something like:
Dim XLrow As Integer
XLrow = 1
Do Until Worksheets(1).Cells(XLrow, "J") = ""
If Worksheets(1).Cells(XLrow, "J") = "Pending" Then
Worksheets(1).Range(XLrow & ":" & XLrow) = Worksheets("Pending week").Cells(XLrow, "J").Value
End If
XLrow = XLrow + 1
Loop
You will need to change the Range to the length of the range you want to copy. Note: the value Pending is case sensitive, so keep that in mind.
Alright, this is what you need to create your code. Of course you need to change values to fit your own workbook, but this is the base.

Copying Row Info from one sheet to another based on match

I have an excel book that has two sheets: 1) Import 2) Pricing Rules.
Pricing Rules Sheet
The A column is what I need to match on. Example values include STA_PNP4, STA_PST.. and others. There are potentially around 50 different rows in the sheet, and it will continue to grow over time. Then for each row, there are pricing values in columns B to CF.
Import Sheet
This sheet has the same number of columns, but only Column A is filled out. Example values include STA_PNP4_001_00, STA_PNP4_007_00, STA_PST_010_00.. and many more.
What I need to do:
If the text in Import Sheet Column A before the second "_" matches the column identifer in Pricing Rules Sheet Column A, copy the rest of B to CF of Pricing Rules sheet for that row into the Import sheet for the row it matched on.
Any idea on where to begin with this one?
Why don't you do it using formulas only?
Assuming :
1.) Data in Import Sheet is
(col A)
STA_PNP4_007_00
STA_PNP4_001_00
STA_PNP4_001_00
.
.
2.) Data in Pricing Rules Sheet
(Col A) (col B) (ColC) (Col D) .......
STA_PNP4 1 2 3 .....
STA_PST 4 5 6 .....
STA_ASA2 7 8 9 .....
Then write this formula in B1 cell of Import Sheet
=IFERROR(VLOOKUP(LEFT(A1,FIND("",A1,FIND("",A1)+1)-1),PricingRules!$A$1:$CF$100,2,0),"")
Drag it down in column B
and For Column C , D just change index num from 2 to (3 for C) , (4 for D) and like that.
Because it will continue to grow over time you may be best using VBA. However, even with code I would start by applying the ‘groups’ via formula, so as not to have a spreadsheet overburdened with formulae and hence potentially slow and easy to corrupt. Something like part of #xtremeExcel’s solution which I repeat because the underscores have been treated as formatting commands in that answer:
=LEFT(A1,FIND("_",A1,1+FIND("_",A1))-1)
I’d envisage this (copied down) as an additional column in your Import Sheet - to serve as a key field to link to your Pricing Rules Sheet. Say on the extreme left so available for use by VLOOKUP across the entire sheet.
With that as a key field then either:
Write the code to populate Pricing Rules Sheet as frequently as run/desired. Either populating ‘from scratch’ each time (perhaps best for low volumes) or incrementally (likely advisable for high volumes).
Use VLOOKUP (as suggested). However with at least 84 columns and, presumably, many more than 50 rows that is a lot of formulae, though may be viable as a temporary ‘once off’ solution (ie after population Copy/Paste Special/Values).
A compromise. As 2. But preserve a row or a cell with the appropriate formulae/a and copy that to populate the other columns for your additions to your ColumnA and/or ColumnA:B.
Thanks for the input guys.
I got it implemented via a method like this:
{=VLOOKUP(LEFT($A4,7),PricingRules!A3:CF112,{2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84},FALSE)}
That is my ugly function, applied across a whole row, to look up and copy from my pricing rules every column when it finds a match.
Below is the function that I have created for above scenario. Its working as per the requirement that you have mentioned.
Sub CopyData()
Dim wb As Workbook
Dim importws As Worksheet
Dim PricingRulesws As Worksheet
Dim Pricingrowcount As Integer
Dim importRowCount As Integer
Dim FindValue As String
Dim textvalue As String
Dim columncount As Integer
Dim stringarray() As String
'Enter full address of your file ex: "C:\newfolder\datafile.xlsx"
Set wb = Workbooks.Open("C:\newfolder\datafile.xlsx")
'Enter the name of your "import" sheet
Set importws = Sheets("Import")
'Enter the name of your "Pricing" sheet
Set PricingRulesws = Sheets("PricingRules")
For Pricingrowcount = 1 To PricingRulesws.UsedRange.Rows.Count
FindValue = PricingRulesws.Cells(Pricingrowcount, 1)
For importRowCount = 1 To importws.UsedRange.Rows.Count
textvalue = importws.Cells(importRowCount, 1)
stringarray = Split(textvalue, "_")
textvalue = stringarray(0) & "_" & stringarray(1)
If FindValue = textvalue Then
For columncount = 2 To PricingRulesws.UsedRange.Columns.Count
importws.Cells(importRowCount, columncount) = PricingRulesws.Cells(Pricingrowcount, columncount)
Next columncount
End If
Next importRowCount
Next Pricingrowcount
End Sub

compare huge text files using vba

I gotta serious problem here.. any kind of help is much appreciated!!
I have two huge text files (130 MB)each with thousands of records in each. I need to compare the two files using vba or by any means and generate a spreadsheet which includes the header and with two additional columns. The two additional columns will be the file name and in the next column it should display in which particular column is error. Each record will be having multiple discrepancies. One file can have the records which cannot be found in the other file. So this condition should also be recorded in the spreadsheet.
Example:
Media Events: Taking one record from each.
00000018063|112295|000|**0009**|
PROL:
00000018063|112295|000|**0013**|
In the above example, the records are from two files. The highlighted ones are the differences between the records. So the output should be like this..
HH_NUMBER | CLASS_DATE | MV_MIN DURATION File Mismatc Mismatch Reason
00000018063 | 112295 | 000 **0009** Media Events Mismatches in DURATION
00000018063 | 112295 | 000 **0013** PROL Mismatches in DURATION
00000011861 | 112295 | 002 0126 Media Events missing in PROL file
It seems there are three problems here:
1) Find matching records (first column) between two files.
2) Compare records that match on the first column - if there is a difference, record what the difference is
3) If a record exists in one file but not the other, record that.
I am going to assume that the two "huge files" are in fact separate sheets in the same excel workbook, and that the records are sorted on the first key. This will speed up processing significantly. But speed is a secondary concern, I assume. I also assume there is a third sheet where you put the output.
Here is an outline of VBA code - you will have to do a bit of work to get it "just right" for your application, but I hope this gets you going.
Sub compare()
Dim s1 as Worksheet
Dim s2 as Worksheet
Dim col1 as Range
Dim col2 as Range
Dim c as Range
Dim record1 As Range, record2 As Range, output As Range
Dim m
Dim numCols as Integer
numCols = 5 ' however many columns you want to compare over
Set s1 = Sheets("Media")
Set s2 = Sheets("Pro")
Set output = Sheets("output").Range("A2")
Application.ScreenUpdating = False
s1.Select
Set col1 = Range("A2", [A2].End(xlDown));
s2.Select
Set col2 = Range("A2", [A2].End(xlDown));
On Error Resume Next
For Each c in col1.Cells
m = Application.Match(c.Value, col2, 0);
If isError(m) Then
' you found a record in 1 but not 2
' record this in your output sheet
output.Value = "Record " & c.Value & " does not exist in Pro"
Set output = output.Offset(1,0) ' next time you write output it will be in the next line
' you will have to do the same thing in the other direction - test all values
' in 2 against 1 to see if any records exist in 2 that don't exist in 1
Else
' you found matching records
Set record1 = Range(c, c.offset(0, numCols))
Set record2 = Range(col2.Cells(m,1), col2.Cells(m,numCols))
' now you call another function to compare these records and record the result
' using the same trick as above to "go to the next line" - using output.Offset(1,0)
End If
Next c
End Sub
You could do this with formulas:
See
MS KB: Use Excel to compare two lists of data
Me Excel.com - Creating a list of non-matching values
ExcelExperts.com - Extracting non-matching entries from two columns in a third column
To give you an idea, basically, if you have two lists in columns A & B, you could use formulas like below in columns C and D to show the matching or non-matching:
In C1,
=If(isna(match(A1,B:B,0)),A1,"")
and, in D1
=IF(Isna(Match(B1,A:A,0)),B1,"")
both copied down.
FURTHER READING:
Excel Index Function and Match Function - Contextures MVP
Excel VLOOKUP and Index & Match - Excel User MVP
Excel User MVP - Excel’s Best Lookup Method: INDEX-MATCH

VBA Macro: Trying to code "if two cells are the same, then nothing, else shift rows down"

My Goal: To get all data about the same subject from multiple reports (already in the same spreadsheet) in the same row.
Rambling Backstory: Every month I get a new datadump Excel spreadsheet with several reports of variable lengths side-by-side (across columns). Most of these reports have overlapping subjects, but not entirely. Fortunately, when they are talking about the same subject, it is noted by a number. This number tag is always the first column at the beginning of each report. However, because of the variable lengths of reports, the same subjects are not in the same rows. The columns with the numbers never shift (report1's numbers are always column A, report2's are always column G, etc) and numbers are always in ascending order.
My Goal Solution: Since the columns with the ascending numbers do not change, I've been trying to write VBA code for a Macro that compares (for example) the number of the active datarow with from column A with Column G. If the number is the same, do nothing, else move all the data in that row (and under it) from columns G:J down a line. Then move on to the next datarow.
I've tried: I've written several "For Each"s and a few loops with DataRow + 1 to and calling what I thought would make the comparisons, but they've all failed miserably. I can't tell if I'm just getting the syntax wrong or its a faulty concept. Also, none of my searches have turned up this problem or even parts of it I can maraud and cobble together. Although that may be more of a reflection of my googling skill :)
Any and all help would be appreciated!
Note: In case it's important, the columns have headers. I've just been using DataRow = Found.Row + 1 to circumvent. Additionally, I'm very new at this and self-taught, so please feel free to explain in great detail
I think I understand your objective and this should work. It doesn't use any of the methodology you were using as reading your explanation I had a good idea how to proceed. If it isn't what you are looking for my apologies.
It starts at a predefined column (see FIRST_ROW constant) and goes row by row comparing the two cells (MAIN_COLUMN & CHILD_COLUMN). If MAIN_COLUMN < CHILD_COLUMN it pushes everything between SHIFT_START & SHIFT_END down one row. It continues until it hits an empty row.
Sub AlignData()
Const FIRST_ROW As Long = 2 ' So you can skip a header row, or multiple rows
Const MAIN_COLUMN As Long = 1 ' this is your primary ID field
Const CHILD_COLUMN As Long = 7 ' this is your alternate ID field (the one we want to push down)
Const SHIFT_START As String = "G" ' the first column to push
Const SHIFT_END As String = "O" ' the last column to push
Dim row As Long
row = FIRST_ROW
Dim xs As Worksheet
Set xs = ActiveSheet
Dim im_done As Boolean
im_done = False
Do Until im_done
If WorksheetFunction.CountA(xs.Rows(row)) = 0 Then
im_done = True
Else
If xs.Cells(row, MAIN_COLUMN).Value < xs.Cells(row, CHILD_COLUMN).Value Then
xs.Range(Cells(row, SHIFT_START), Cells(row, SHIFT_END)).Insert Shift:=xlDown
Debug.Print "Pushed row: " & row & " down!"
End If
row = row + 1
End If
Loop
End Sub
I modified the code to work as a macro. You should be able to create it right from the macro dialog and run it from there also. Just paste the code right in and make sure the Sub and End Sub lines don't get duplicated. It no longer accepts a worksheet name but instead runs against the currently active worksheet.

Is there a way to check for duplicate values in Excel WITHOUT using the CountIf function?

A lot of the solutions here on SO involve using CountIf to find duplicates. When I have a list of 100,000+ values however, it will often take minutes for CountIf to search for duplicates.
Is there a quicker way to search for duplicates within an Excel column WITHOUT using CountIf?
Thanks!
EDIT #1:
After reading the comments and replies I realize I need to go into greater detail. Let's pretend I'm a birdwatcher, and after I return from a birdwatching trip I input anywhere from 1 to 25 or 50 new birds that I saw on my trip into my "Master List of Birds Seen". This is really a dynamically growing list, and with each addition I want to make sure I'm not duplicating something that already exists in my list.
So, in column A of my file are the names of the birds. Column B-M might contain other attributes of the birds. I want to know if a bird that I just added in column A after my latest birdwatching trip ALREADY exists somewhere ELSE in my list. And, if it does, I would manually merge the data of the 2 entries and throw away some and keep some after careful review. I clearly don't want to have duplicate entries of the same bird in my database.
So, ultimately I want some indication that there is or isn't a duplicate somewhere else, and if there is duplicate please tell me what row to look in (or highlight or color both of the duplicates).
The fastest way that I know of (in case you are using Excel 2007/2010/2011) is to use Data (In Ribbon) | Remove Duplicates to find the total number of duplicates OR to remove duplicates. You might want to move data to a temp sheet before you test this.
The 2nd fastest way is to use Countif. Now Countif can be used in many ways to find duplicates. Here are two main ways.
1) Inserting a New Column next to the data and putting the formula and simply copying it down.
2) Using Countif in Conditional formatting to highlight cells which are duplicates. For more details, please see this link.
suggestions for a macro to find duplicates in a SINGLE column
EDIT:
My Apologies :)
Countif is the 3rd fastest way!
The 2nd fastest way is to use Pivot Tables ;)
What exactly is your main purpose of finding duplicates? Do you want to delete them? Or Do you want to highlight them? Or something else?
FOLLOWUP
Seems like I made a typo in the formula. Yes for large number of rows, CountIf does take minutes as you suggested.
Let me see if I can come up with a VBA code to suit your exact needs.
Sid
You can use VBA - the following function returns a list of unique entries within a list of 100,000 in less than a second. Usage: select a range, type the formula (=getUniqueListFromRange(YourRange)) and validate with CTRL+SHIFT+ENTER.
Public Function getUniqueListFromRange(parRange As Range) As Variant
' Returns a (1 to n,1 to 1) array with all the values without duplicates
Dim i As Long
Dim j As Long
Dim locKey As Variant
Dim locData As Variant
Dim locUniqueDict As Variant
Dim locUniqueList As Variant
On Error GoTo error_handler
locData = Intersect(parRange.Parent.UsedRange, parRange)
Set locUniqueDict = CreateObject("Scripting.Dictionary")
On Error Resume Next
For i = 1 To UBound(locData, 1)
For j = 1 To UBound(locData, 2)
locKey = UCase(locData(i, j))
If locKey <> "" Then locUniqueDict.Add locKey, locData(i, j)
Next j
Next i
If locUniqueDict.Count > 0 Then
ReDim locUniqueList(1 To locUniqueDict.Count, 1 To 1) As Variant
i = 1
For Each locKey In locUniqueDict
locUniqueList(i, 1) = locUniqueDict(locKey)
i = i + 1
Next
getUniqueListFromRange = locUniqueList
End If
error_handler: 'Empty range
End Function
If using Excel 2007 or later (which is likely from the 100,000+ values) you can choose:
Home Tab | Conditional Formatting > Highlight Cell Rules > Duplicate Values...
Right-click a highlighted cell and filter by selected cell color to show just the duplicates (be aware however this can be slow with conditional formatting).
Alternatively run this code and filter for colored cells which takes only a second on 100,000 cells:
Sub HighlightDupes()
Dim i As Long, dic As Variant, v As Variant
Application.ScreenUpdating = False
Set dic = CreateObject("Scripting.Dictionary")
i = 1
For Each v In Selection.Value2
If dic.exists(v) Then dic(v) = "" Else dic.Add v, i
i = i + 1
Next v
Selection.Font.Color = 255
For Each v In dic
If dic(v) <> "" Then Selection(dic(v)).Font.Color = 0
Next v
End Sub
Addendum:
To select only duplicate values without code or formulas, i have found this method useful:
Data Tab | Advanced Filter... Filter in Place, Unique Records Only, OK.
Now select the range of unique values and press Alt+; (Goto Special... Visible cells only). With this selection clear the filter and you will see that all unselected cells are duplicates, you can then press Ctrl+9 (Hide Rows) to show just the duplicates. These rows can be copied to another sheet if needed or marked with an "X".
You do not mention what you want to do when you find them. If you merely want to see where they are...
Sub HighLightCells()
ActiveSheet.UsedRange.Cells.FormatConditions.Delete
ActiveSheet.UsedRange.Cells.FormatConditions.Add Type:=xlCellValue, Operator:=xlEqual, Formula1:=ActiveCell
ActiveSheet.UsedRange.Cells.FormatConditions(1).Interior.ColorIndex = 4
End Sub
Preventing Duplicates with Data Validation
You can use Data Validation to prevent you entering duplicate bird names. See Debra Dalgelish's site here
Handling existing duplicates
My free Duplicate Master addin will let you
Select
Colour
List
Delete
duplicates.
But more importantly it will let you run more complex matching than exact strings, ie
Case Insensitive / Case Sensitive searches (sample below)
Trim/Clean data
Remove all blank spaces (including CHAR(160)) see the " mapgie" and "magpie" example below
Run regular expression matches (for example the sample below replaces s$ with "" to remove plurals)
Match on any combination of columns (ie Column A, all columns, Column A&B etc)
I'm surprised that no one has mentioned the RemoveDuplicates method.
ActiveSheet.Range("A:A").RemoveDuplicates Columns:=1
This will simply remove any duplicate entries on the active worksheet in column A. It takes milliseconds to run (tested with 200k rows). Mind you, this will strictly delete all the duplicate entries. Although that isn't how the original question was worded, I do believe that this still serves your purpose.
One simple way of finding unique values is to use the advance filter and filter for unique values only and copy and paste them into other sheet as when the pivot is removed you will get the whole data with the duplicate in them.
Sort the range
and in next column put `=if(a2=a1;1;if(a2=a3;1;0))
"1" will be displayed for duplicates.