Okay - This has been asked multiple times, but asking again for best possible solution :
I have two excel files (not sheets). the first excel sheet is very huge and has close to 200,000 records. One of the column (Gender) is corrupted and i have to fix it.
I have a second excel file and it has only around 200 records - these have the correct value for those ones which are messed up.
for eg:
and this is the file that has correct values with only around 200 records (only the corrupted ones).
Now i need a macro , where i need to find these exact 200 records out of 200,000 records (by employee id) and replace the Gender value with correct one.
i found something similar here. but i dont want to loop 200,000 records 200 times. feels like a performance overhead.
is there a better option?
I am thinking an ideal solution would be
Loop through 200 items and use employee id per loop
Take that employee id and do a "Find" operation in the Employee id column of the master excel
If found, replace the Gender column value
would there be any other better solution? Any inputs is gladly appreciated
One way to do this through VBA is to just loop through the 200 corrections, comparing the ID with the MATCH function to find the row it belongs on, as opposed to a second loop (a second loop through 20000 would take ages like you say).
For the below sub I have copied and pasted the 200 table into columns 5:7 of the 20000 table, you can either automate this part easily enough, or just put in the correct sheet references for each part of the code.
I've also put in a checking line to make sure there IS a match for the current ID from the small table, otherwise it'd throw up an error. You could put an ELSE in front of the END IF in this error catch to highlight any ID's which weren't actually found. Here's the code, hope this method helps!
Sub replace_things()
With ActiveSheet
For x = 2 To 200 'Change this to however many is in the small table
cur = .Cells(x, 5) 'Defined cur as ID from small table
aMatch = Application.WorksheetFunction.CountIf(.Range("A:A"), cur) 'Check to see there's a match in large table
If aMatch > 0 Then ' if there's a match then...
theRow = Application.WorksheetFunction.Match(cur, .Range("A:A"), 0) 'get the row number the match is actually on
.Cells(theRow, 3) = .Cells(x, 7) 'when row is found, replace with the relevant value from col7 (col3 of small table)
End If
Next x
End With
End Sub
A super quick way, copy your CORRECT employee ID list and paste below the CORRUPT employee ID list... highlight duplicates and correct the highlighted.
Otherwise a VLOOKUP could label which ones are corrupt? basically getting a unique field from your correct list and comparing that to your corrupt list then fixing the ~200 errors.
I assume that the employee ID is a unique record so you can paste the correct ones under existing ones, sort by empID and highlight duplicates to find them easily.
Related
I have a worksheet here with multiple columns full of data, ranging from cell AA to CT currently. The first column contains rows with headings and is frozen. Data is added to the end column of this range weekly. & Then all columns between the first and the last four are deleted (to show a month's worth of data but keep the headings intact).
I'm very new to VBA and I've been trying to write some code to automate this, so far I've managed to select the 5th column in from the end. But I can't get it to select the columns in between the 4th column from the end and Cell AA:
Sub DeleteColumns()
Dim i
i = Range("KFIData").End(xlToRight).Offset(0,-4).EntireColumn.Select
Columns("AA:ColumnFive").Delete
End Sub
Am I going about this completely the wrong way?
Many thanks
Well if you managed to catch column 5 from the end, use this statement:
Range(Columns("AA"), Columns(ColumnFiveFromEnd)).Delete
ColumnFiveFromEnd can be a number as well as a text identifier.
I'm really stuck on this one. I have a spreadsheet with thousands of rows. I use this code to filter them based off of product in the E column.
Sub IsolateCCENCE()
Dim Operations As Workbook
Dim Operations_Sheet As Worksheet
Set Operations = Workbooks("Operations for Macros")
Set Operations_Sheet = Operations.Worksheets("Operations")
Operations_Sheet.Range("$A$6:$AH$13108").AutoFilter Field:=5, Criteria1:="=CCE" _
, Operator:=xlOr, Criteria2:="=NCE"
End Sub
Which works and leaves me with just under 1700 rows. Within these rows, in the A column, there are company names. Each company takes up approximately 20 rows. Each row represents a payment and has a corresponding date, in the D column. I need a macro (I'm assuming with a loop) that will then do the following:
Go through the rows, find the last row for each company
In that row, find the corresponding date
If that date is within 30 days from today, generate an email
Part 3 is easy. But Part 1 and 2 I can't seem to get. The data is always going to be changing.
Maybe it would be easier to have all of the data copy and pasted into another spreadsheet and then filter through every single company, find the last row (and thus the corresponding date)? But I don't know I would have a macro defined to filter through each company when the company names will be changing constantly.
I appreciate any help. Thanks in advance!
If a specific company name in say F1 then:
=MIN(IF(A:A=F1,D:D))
entered with Ctrl+Shift+Enter should give you the earliest date for the company named in F1, that if more recent than today()-30 (or less far into the future than today()+30 ?) you might use for your e-mail trigger (subject to other filtering etc).
So I would like to get this table to only display 1 row for each individual with a sum of all of their tickets. The way it is right now, Marci is coming up multiple times because each row represents the amount of tickets for a different department (Ex. # of paramedic tickets, # of FireFighter tickets, etc..) I tried the "Unique Records Only" under the advanced tab, and obviously that didn't work because the rows are not unique.
Can someone please assist me?
You need VBA. Make a loop on these 2 concepts:
1: sort cells alphabetically
1: check neighbour cells for equality:
2: Delete and number occurences if true
3: proceed to read next line
Sub RemoveAndNumberDuplicates
Dim r As range
Set r = Selection
for N=1 to 100
IF r.offset(N, 0).value= R.offset(N - 1 - NumOfDeletedLines, 0).value THEN
SumOfDupl=SumOfDupl+1 'number of current record instances
rang.offset(N, 0).Delete (xlShiftUp)
rang.offset(N, 1).Value=SumOfDupl ' show duplicates in side column
NumOfDeletedLines=NumOfDeletedLines+1 'needed to correct offset for deleted lines
Else
SumOfDupl=1'restart numbering if no duplicate found
End If
Next N
End Sub
This is a general idea for Excel VBA, not a complete code.
I'm looking for an algorithm for which I do not have the VBA knowledge to script myself. So I'm stuck. It isn't through lack of effort trying because I have given it a go (plus, this bit of code is the last remaining piece of my bigger VBA code) I simply lack the knowledge/experience/skill...
Basically, I have an Excel file. In this file is a sheet, "sheet1". Sheet1 contains many rows of data. The number of rows contained in sheet1 can vary from 1 to n. Sometimes, I may have 50 while other times I may have 30, etc. What is consistent is the layout of the book, i.e. I have codes in column A which identify a product in my database.
What I want to do is this:
1. Scan the sheet for empty rows (due to the way the workbook is generated, I sometimes have blank rows) and remove them. These blank rows are sometimes in-between rows with data while at other times may be trailing at the end of the sheet.
2. After removing the blank rows find the last used row. Store that to a variable. I have found this piece of code useful for doing that:
mylastrow = myBook.Sheets("Results").Cells.Find(what:="*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
3. Starting from the row determined in (2), I want to take the product code in A(x where x = mylastrow) and find any other occurrences of it (in column A). If any are found, delete that entire row corresponding to it. Importantly, this loop must go in reverse. For example let's say mylastrow = 40, the loop will need to begin at A40 and on the next iteration do A39 (or 38 if a row has been removed?). This is because with any of the product numbers the corresponding data in the row contains more data further down the column (because of the way the sheet was generated). Essentially the entry closest to the last row is the most recent.
Hopefully I've been able to explain the situ properly. But if not and you're willing to take the challenge (my burden?) off me I would be very grateful.
QF
The only way to develop that knowledge and skill is to get in there and code! I'm sure someone may come in and write you the entire procedure, but in the meantime these resources should give you the tools to do it yourself.
First, check out the method here to delete blank rows. It relies on "Selection" for the range, so you can either manually select all the cells of the sheet, then run the macro, or replace it with the following:
Dim r as range
set r = Sheet1.Cells 'now use r instead of Selection
OR (even better) use your code for finding the last used row and set the range from row 1 to "mylastrow".
Next, beginning from "mylastrow", start adding the values in Column A to a Dictionary object (example here). You can use a row counter to decrement from "mylastrow" to 1. Here's an example of how it would work. The key is assumed to be in the 1st column ("A").
Dim dict As Object
Dim rowCount As Long
Dim strVal As String
Set dict = CreateObject("Scripting.Dictionary")
rowCount = Sheet1.Range("A1").CurrentRegion.Rows.Count
Do While rowCount > 1
strVal = Sheet1.Cells(rowCount, 1).Value2
If dict.exists(strVal) Then
Sheet1.Rows(rowCount).EntireRow.Delete
Else
dict.Add strVal, 0
End If
rowCount = rowCount - 1
Loop
Set dict = Nothing
Before:
After:
Note that the 1st row hasn't been touched since we stopped when rowCount is 1 (assumes there's a header).
A lot of the solutions here on SO involve using CountIf to find duplicates. When I have a list of 100,000+ values however, it will often take minutes for CountIf to search for duplicates.
Is there a quicker way to search for duplicates within an Excel column WITHOUT using CountIf?
Thanks!
EDIT #1:
After reading the comments and replies I realize I need to go into greater detail. Let's pretend I'm a birdwatcher, and after I return from a birdwatching trip I input anywhere from 1 to 25 or 50 new birds that I saw on my trip into my "Master List of Birds Seen". This is really a dynamically growing list, and with each addition I want to make sure I'm not duplicating something that already exists in my list.
So, in column A of my file are the names of the birds. Column B-M might contain other attributes of the birds. I want to know if a bird that I just added in column A after my latest birdwatching trip ALREADY exists somewhere ELSE in my list. And, if it does, I would manually merge the data of the 2 entries and throw away some and keep some after careful review. I clearly don't want to have duplicate entries of the same bird in my database.
So, ultimately I want some indication that there is or isn't a duplicate somewhere else, and if there is duplicate please tell me what row to look in (or highlight or color both of the duplicates).
The fastest way that I know of (in case you are using Excel 2007/2010/2011) is to use Data (In Ribbon) | Remove Duplicates to find the total number of duplicates OR to remove duplicates. You might want to move data to a temp sheet before you test this.
The 2nd fastest way is to use Countif. Now Countif can be used in many ways to find duplicates. Here are two main ways.
1) Inserting a New Column next to the data and putting the formula and simply copying it down.
2) Using Countif in Conditional formatting to highlight cells which are duplicates. For more details, please see this link.
suggestions for a macro to find duplicates in a SINGLE column
EDIT:
My Apologies :)
Countif is the 3rd fastest way!
The 2nd fastest way is to use Pivot Tables ;)
What exactly is your main purpose of finding duplicates? Do you want to delete them? Or Do you want to highlight them? Or something else?
FOLLOWUP
Seems like I made a typo in the formula. Yes for large number of rows, CountIf does take minutes as you suggested.
Let me see if I can come up with a VBA code to suit your exact needs.
Sid
You can use VBA - the following function returns a list of unique entries within a list of 100,000 in less than a second. Usage: select a range, type the formula (=getUniqueListFromRange(YourRange)) and validate with CTRL+SHIFT+ENTER.
Public Function getUniqueListFromRange(parRange As Range) As Variant
' Returns a (1 to n,1 to 1) array with all the values without duplicates
Dim i As Long
Dim j As Long
Dim locKey As Variant
Dim locData As Variant
Dim locUniqueDict As Variant
Dim locUniqueList As Variant
On Error GoTo error_handler
locData = Intersect(parRange.Parent.UsedRange, parRange)
Set locUniqueDict = CreateObject("Scripting.Dictionary")
On Error Resume Next
For i = 1 To UBound(locData, 1)
For j = 1 To UBound(locData, 2)
locKey = UCase(locData(i, j))
If locKey <> "" Then locUniqueDict.Add locKey, locData(i, j)
Next j
Next i
If locUniqueDict.Count > 0 Then
ReDim locUniqueList(1 To locUniqueDict.Count, 1 To 1) As Variant
i = 1
For Each locKey In locUniqueDict
locUniqueList(i, 1) = locUniqueDict(locKey)
i = i + 1
Next
getUniqueListFromRange = locUniqueList
End If
error_handler: 'Empty range
End Function
If using Excel 2007 or later (which is likely from the 100,000+ values) you can choose:
Home Tab | Conditional Formatting > Highlight Cell Rules > Duplicate Values...
Right-click a highlighted cell and filter by selected cell color to show just the duplicates (be aware however this can be slow with conditional formatting).
Alternatively run this code and filter for colored cells which takes only a second on 100,000 cells:
Sub HighlightDupes()
Dim i As Long, dic As Variant, v As Variant
Application.ScreenUpdating = False
Set dic = CreateObject("Scripting.Dictionary")
i = 1
For Each v In Selection.Value2
If dic.exists(v) Then dic(v) = "" Else dic.Add v, i
i = i + 1
Next v
Selection.Font.Color = 255
For Each v In dic
If dic(v) <> "" Then Selection(dic(v)).Font.Color = 0
Next v
End Sub
Addendum:
To select only duplicate values without code or formulas, i have found this method useful:
Data Tab | Advanced Filter... Filter in Place, Unique Records Only, OK.
Now select the range of unique values and press Alt+; (Goto Special... Visible cells only). With this selection clear the filter and you will see that all unselected cells are duplicates, you can then press Ctrl+9 (Hide Rows) to show just the duplicates. These rows can be copied to another sheet if needed or marked with an "X".
You do not mention what you want to do when you find them. If you merely want to see where they are...
Sub HighLightCells()
ActiveSheet.UsedRange.Cells.FormatConditions.Delete
ActiveSheet.UsedRange.Cells.FormatConditions.Add Type:=xlCellValue, Operator:=xlEqual, Formula1:=ActiveCell
ActiveSheet.UsedRange.Cells.FormatConditions(1).Interior.ColorIndex = 4
End Sub
Preventing Duplicates with Data Validation
You can use Data Validation to prevent you entering duplicate bird names. See Debra Dalgelish's site here
Handling existing duplicates
My free Duplicate Master addin will let you
Select
Colour
List
Delete
duplicates.
But more importantly it will let you run more complex matching than exact strings, ie
Case Insensitive / Case Sensitive searches (sample below)
Trim/Clean data
Remove all blank spaces (including CHAR(160)) see the " mapgie" and "magpie" example below
Run regular expression matches (for example the sample below replaces s$ with "" to remove plurals)
Match on any combination of columns (ie Column A, all columns, Column A&B etc)
I'm surprised that no one has mentioned the RemoveDuplicates method.
ActiveSheet.Range("A:A").RemoveDuplicates Columns:=1
This will simply remove any duplicate entries on the active worksheet in column A. It takes milliseconds to run (tested with 200k rows). Mind you, this will strictly delete all the duplicate entries. Although that isn't how the original question was worded, I do believe that this still serves your purpose.
One simple way of finding unique values is to use the advance filter and filter for unique values only and copy and paste them into other sheet as when the pivot is removed you will get the whole data with the duplicate in them.
Sort the range
and in next column put `=if(a2=a1;1;if(a2=a3;1;0))
"1" will be displayed for duplicates.