I have a code that has a variable range with many categories in each column that display data. I need to highlight the least commonly occurring values as a percentage of the total number of cells.
If there are 300 cells in the column, it needs to find the value (out of many possibly repeating values) that occurs least frequently. It is a bonus if the code can anticipate the total number, and give only 5% or 10% of the entire column as a result.
Currently my attempt is to use a function in the top cell that will find the least common occurrence, and the code will simply highlight whatever that value is down the cell as it repeats (and highlight every one of the least common ones.
The difficulty I am having is twofold.
There may be more than one least common value that is still below 10% of the total values
The ability to automate this search so that it may be performed and highlighted for all of more than 100 columns with different categories and different values in each column
If too vague, feel free to ask questions about what I am going for, and I will respond promptly.
This is what the data looks like. As you can see there are merged titles for each column with various blank spaces and sperratically placed data that matches some specific column.
This is the proposed code which is still not highlighting what I would like it to. It has two problems. 1: It will highlight ALL of the data in one range if there is no differing value in the row. 2: It will highlight the titles of the columns.
This is the highlighted data which is still insufficiently complete.
In some cases the column truely do not match the purpose of the code, for example in one column, the number 12 was highlighted down the column (67 occurances) where there are fewer occurances of other numbers. (8 occurs 29 times and is not highlighted)
I just hacked together a seemingly working example. Try this here:
Sub frequenz()
Dim col As Range, cel As Range
Dim letter As String
Dim lookFor As String
Dim frequency As Long, totalRows As Long
Dim relFrequency As Double
Dim ran As Range
ran = ActiveSheet.Range("A1:ZZ65535")
totalRows = 65535
For Each col In ran.Columns
'***get column letter***
letter = Split(ActiveSheet.Cells(1, col.Column).Address, "$")(1)
'*******
For Each cel In col.Cells
lookFor = cel.Text
frequency = Application.WorksheetFunction.CountIf(Range(letter & "2:" & letter & totalRows), lookFor)
relFrequency = frequency / totalRows
If relFrequency <= 0.001 Then
cel.Interior.Color = ColorConstants.vbYellow
End If
Next cel
Next col
End Sub
It seemed to be doing just what you are looking for.
Edit: fixed the address getting.
Related
I am trying to write a formula to take only number formats from a row to in order to use an IF formula that illicit the value of "No" if all cells are equal to each other.
See a snippet of my file here:
The problem I am having is that the formula is taking the blank cells (no formats) and counting them in the equation still, so all of my values are "Yes". How do I let my formula only calculate from cells that have numbers in them for each row? I have tried the IFNUMBER(), IFEMPTY() etc... but I am not sure I am employing these correctly e.g. nesting into my formula. I need to ignore the blanks in each row.
Note: I need to only take the cells with number values. I have cleared the contents of values of the rows that do not have number values.
Here is the equation that I currently have:
=IFERROR(IF(AND(ROUND($E2,3)=ROUND($F2,3),ROUND($F2,3)=ROUND($G2,3),ROUND($G2,3)=ROUND($H2,3),ROUND($H2,3)=ROUND($I2,3),ROUND($I2,3)=ROUND($J2,3),ROUND($J2,3)=ROUND($K2,3),ROUND($K2,3)=ROUND($L2,3),ROUND($L2,3)=ROUND($M2,3),ROUND($M2,3)=ROUND($N2,3)),"No","Yes"),"")
Note: this is taking blanks and counting them (as stated from above). It should produce a "Yes" if there are numbers different in the column and a "No" if there are no numbers differences. Currently, it is always producing a "Yes" because it is counting the blanks in the columns.
I am open to a vba solution as well, I have the following from code, but I do not know how to set the range for each row to only look for number formats:
Here is my vba code:
Dim arng As Range
Dim aworkrng As Range
Dim brng As Range
Dim bworkrng As Range
On Error Resume Next
Set aworkrng = Range("O2:O1550")
Set bworkrng = Range("E2:N1550")
Set brng = Range("E2:N2")
On Error Resume Next
For Each arng In aworkrng
If Not IsEmpty(brng.Value) Then
arng.Formula = _
"=IFERROR(IF(AND(ROUND(RC5,3)=ROUND(RC6,3),ROUND(RC6,3)=ROUND(RC7,3),ROUND(RC7,3)=ROUND(RC8,3),ROUND(RC8,3)=ROUND(RC9,3),ROUND(RC9,3)=ROUND(RC10,3),ROUND(RC10,3)=ROUND(RC11,3),ROUND(RC11,3)=ROUND(RC12,3),ROUND(RC12,3)=ROUND(RC13,3),ROUND(RC13,3)=ROUND(RC14,3)),""No"",""Yes""),"""")"
Range("O3").Select
End If
Next
If anyone can help me on this, I would great appreciate it!
Try:
=IF(MIN(E2:N2)=MAX(E2:N2),"No","Yes")
I am a beginner in VBA.
I have components which always consist from 2 parts. (Rotor and a stator, each has its own number). When work is with them it can be damaging some of these parts, however it is necessary to keep a list of damaged parts, where the result is inventory e.g. 200 rotors, stators 150 with different numbers. Before I could scrap it, I need to complete them as proper sets. I.e. rotor "a" stator "a", "b" with "b", etc. It's crazy to work with many numbers to compare them, copy …to find the result of sets qty.
It is possible to solve it with Macro, what I try to do, but I was stuck.
What is the task: In the column "A" I have a list of all damaged parts (mix of rotors, stators different numbers). In the column "C" an information only with help of VlookUP, what should be a counterpart number.
What do I need to solve: In row 5, column. „A“ I have component number , but I know that in the same column, somewhere from row 6 to xx I have a counterpart. What I need is … according to information from column C, same row(5) where is info about the counterpart num. to find counerpart in column A, when found, took it out and put into cell B5. Thus,I get a complete set. Then the next row (6), same action. Macro reading num. in „C“,searching in „A“, when found, cut, and put to „B“ next row 7,8,9,… The result should be a certain qty of pairs + some single numbers if not second part found.
The problem I have is that cycle is working until always found relared counterpart. If the counterpart in row A is not available (no match betwen C-A), the code will stop on that row.
What I need help with is, that if code did not find the counerpart based to info from C just skip this row, make it red and continue with next row till end, it means stop on first empty cell in C. Thanks a lot to everybody who is helping me.
Dim pn As Range,
Dim a
Dim x
x = 5
Dim i As Long, Dim radek As Long
a = Cells(x, 3)
For i = 1 To 500
Range("A:A").Select
Set pn = Selection.Find(What:=a)
If Not pn Is Nothing Then
pn.Select
End If
Selection.Cut
Cells(x, 2).Select
ActiveSheet.Paste
x = x + 1
Next
End Sub
Hello I have an excel sheet that contains a single massive grid of values, and I want to find the largest gap in the numbers (any of the numbers, not necessarily two adjacent ones). I've seen solutions on here for single columns of data, but I have a massive grid, and I'm almost certain that the number of unique values is greater than the max row count in modern excel, so merging them into a single column won't work. Any ideas on how to make this work even somewhat efficiently?
A few assumptions that can be made in case it's helpful:
Each column contains no blanks or duplicates
Each column is sorted in ascending order
Not all columns are the same length (in fact they may vary wildly)
The largest gap could easily be between two non-adjacent cells
Both formula and VBA methods are appreciated.
I think you should just loop through all of your columns and count the largest gap like this (I hope this works well I don't currently have Excel, so I just wrote this here, if any errors appear comment it and I will correct my code):
Sub LargestGap()
Dim RowCount as Double, ColumnCount as Double, i as Double, j as Double, maxgap as Double
' largest gap
Dim lgc as string
' for performance
Application.Screenupdating=False
' this is just a sample set your columncount to the max columns
ColumnCount = 20
maxgap=0
For i=1 to ColumnCount
j=0
do until Cells(i,j+1)=""
If Cells(i,j)<>Cells(i,j+1) then
If (Cells(i,j+1)-Cells(i,j)) > maxgap then
maxgap=(Cells(i,j+1)-Cells(i,j))
lgc=Cells(i,j+1).address
End if
End if
j=j+1
Loop
Next i
Application.Screenupdating=True
' the messagebox will tell you the adress of the largest gap's cell (the first in the bigger)
msgbox(lgc)
End Sub
I'm looking for an algorithm for which I do not have the VBA knowledge to script myself. So I'm stuck. It isn't through lack of effort trying because I have given it a go (plus, this bit of code is the last remaining piece of my bigger VBA code) I simply lack the knowledge/experience/skill...
Basically, I have an Excel file. In this file is a sheet, "sheet1". Sheet1 contains many rows of data. The number of rows contained in sheet1 can vary from 1 to n. Sometimes, I may have 50 while other times I may have 30, etc. What is consistent is the layout of the book, i.e. I have codes in column A which identify a product in my database.
What I want to do is this:
1. Scan the sheet for empty rows (due to the way the workbook is generated, I sometimes have blank rows) and remove them. These blank rows are sometimes in-between rows with data while at other times may be trailing at the end of the sheet.
2. After removing the blank rows find the last used row. Store that to a variable. I have found this piece of code useful for doing that:
mylastrow = myBook.Sheets("Results").Cells.Find(what:="*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
3. Starting from the row determined in (2), I want to take the product code in A(x where x = mylastrow) and find any other occurrences of it (in column A). If any are found, delete that entire row corresponding to it. Importantly, this loop must go in reverse. For example let's say mylastrow = 40, the loop will need to begin at A40 and on the next iteration do A39 (or 38 if a row has been removed?). This is because with any of the product numbers the corresponding data in the row contains more data further down the column (because of the way the sheet was generated). Essentially the entry closest to the last row is the most recent.
Hopefully I've been able to explain the situ properly. But if not and you're willing to take the challenge (my burden?) off me I would be very grateful.
QF
The only way to develop that knowledge and skill is to get in there and code! I'm sure someone may come in and write you the entire procedure, but in the meantime these resources should give you the tools to do it yourself.
First, check out the method here to delete blank rows. It relies on "Selection" for the range, so you can either manually select all the cells of the sheet, then run the macro, or replace it with the following:
Dim r as range
set r = Sheet1.Cells 'now use r instead of Selection
OR (even better) use your code for finding the last used row and set the range from row 1 to "mylastrow".
Next, beginning from "mylastrow", start adding the values in Column A to a Dictionary object (example here). You can use a row counter to decrement from "mylastrow" to 1. Here's an example of how it would work. The key is assumed to be in the 1st column ("A").
Dim dict As Object
Dim rowCount As Long
Dim strVal As String
Set dict = CreateObject("Scripting.Dictionary")
rowCount = Sheet1.Range("A1").CurrentRegion.Rows.Count
Do While rowCount > 1
strVal = Sheet1.Cells(rowCount, 1).Value2
If dict.exists(strVal) Then
Sheet1.Rows(rowCount).EntireRow.Delete
Else
dict.Add strVal, 0
End If
rowCount = rowCount - 1
Loop
Set dict = Nothing
Before:
After:
Note that the 1st row hasn't been touched since we stopped when rowCount is 1 (assumes there's a header).
A lot of the solutions here on SO involve using CountIf to find duplicates. When I have a list of 100,000+ values however, it will often take minutes for CountIf to search for duplicates.
Is there a quicker way to search for duplicates within an Excel column WITHOUT using CountIf?
Thanks!
EDIT #1:
After reading the comments and replies I realize I need to go into greater detail. Let's pretend I'm a birdwatcher, and after I return from a birdwatching trip I input anywhere from 1 to 25 or 50 new birds that I saw on my trip into my "Master List of Birds Seen". This is really a dynamically growing list, and with each addition I want to make sure I'm not duplicating something that already exists in my list.
So, in column A of my file are the names of the birds. Column B-M might contain other attributes of the birds. I want to know if a bird that I just added in column A after my latest birdwatching trip ALREADY exists somewhere ELSE in my list. And, if it does, I would manually merge the data of the 2 entries and throw away some and keep some after careful review. I clearly don't want to have duplicate entries of the same bird in my database.
So, ultimately I want some indication that there is or isn't a duplicate somewhere else, and if there is duplicate please tell me what row to look in (or highlight or color both of the duplicates).
The fastest way that I know of (in case you are using Excel 2007/2010/2011) is to use Data (In Ribbon) | Remove Duplicates to find the total number of duplicates OR to remove duplicates. You might want to move data to a temp sheet before you test this.
The 2nd fastest way is to use Countif. Now Countif can be used in many ways to find duplicates. Here are two main ways.
1) Inserting a New Column next to the data and putting the formula and simply copying it down.
2) Using Countif in Conditional formatting to highlight cells which are duplicates. For more details, please see this link.
suggestions for a macro to find duplicates in a SINGLE column
EDIT:
My Apologies :)
Countif is the 3rd fastest way!
The 2nd fastest way is to use Pivot Tables ;)
What exactly is your main purpose of finding duplicates? Do you want to delete them? Or Do you want to highlight them? Or something else?
FOLLOWUP
Seems like I made a typo in the formula. Yes for large number of rows, CountIf does take minutes as you suggested.
Let me see if I can come up with a VBA code to suit your exact needs.
Sid
You can use VBA - the following function returns a list of unique entries within a list of 100,000 in less than a second. Usage: select a range, type the formula (=getUniqueListFromRange(YourRange)) and validate with CTRL+SHIFT+ENTER.
Public Function getUniqueListFromRange(parRange As Range) As Variant
' Returns a (1 to n,1 to 1) array with all the values without duplicates
Dim i As Long
Dim j As Long
Dim locKey As Variant
Dim locData As Variant
Dim locUniqueDict As Variant
Dim locUniqueList As Variant
On Error GoTo error_handler
locData = Intersect(parRange.Parent.UsedRange, parRange)
Set locUniqueDict = CreateObject("Scripting.Dictionary")
On Error Resume Next
For i = 1 To UBound(locData, 1)
For j = 1 To UBound(locData, 2)
locKey = UCase(locData(i, j))
If locKey <> "" Then locUniqueDict.Add locKey, locData(i, j)
Next j
Next i
If locUniqueDict.Count > 0 Then
ReDim locUniqueList(1 To locUniqueDict.Count, 1 To 1) As Variant
i = 1
For Each locKey In locUniqueDict
locUniqueList(i, 1) = locUniqueDict(locKey)
i = i + 1
Next
getUniqueListFromRange = locUniqueList
End If
error_handler: 'Empty range
End Function
If using Excel 2007 or later (which is likely from the 100,000+ values) you can choose:
Home Tab | Conditional Formatting > Highlight Cell Rules > Duplicate Values...
Right-click a highlighted cell and filter by selected cell color to show just the duplicates (be aware however this can be slow with conditional formatting).
Alternatively run this code and filter for colored cells which takes only a second on 100,000 cells:
Sub HighlightDupes()
Dim i As Long, dic As Variant, v As Variant
Application.ScreenUpdating = False
Set dic = CreateObject("Scripting.Dictionary")
i = 1
For Each v In Selection.Value2
If dic.exists(v) Then dic(v) = "" Else dic.Add v, i
i = i + 1
Next v
Selection.Font.Color = 255
For Each v In dic
If dic(v) <> "" Then Selection(dic(v)).Font.Color = 0
Next v
End Sub
Addendum:
To select only duplicate values without code or formulas, i have found this method useful:
Data Tab | Advanced Filter... Filter in Place, Unique Records Only, OK.
Now select the range of unique values and press Alt+; (Goto Special... Visible cells only). With this selection clear the filter and you will see that all unselected cells are duplicates, you can then press Ctrl+9 (Hide Rows) to show just the duplicates. These rows can be copied to another sheet if needed or marked with an "X".
You do not mention what you want to do when you find them. If you merely want to see where they are...
Sub HighLightCells()
ActiveSheet.UsedRange.Cells.FormatConditions.Delete
ActiveSheet.UsedRange.Cells.FormatConditions.Add Type:=xlCellValue, Operator:=xlEqual, Formula1:=ActiveCell
ActiveSheet.UsedRange.Cells.FormatConditions(1).Interior.ColorIndex = 4
End Sub
Preventing Duplicates with Data Validation
You can use Data Validation to prevent you entering duplicate bird names. See Debra Dalgelish's site here
Handling existing duplicates
My free Duplicate Master addin will let you
Select
Colour
List
Delete
duplicates.
But more importantly it will let you run more complex matching than exact strings, ie
Case Insensitive / Case Sensitive searches (sample below)
Trim/Clean data
Remove all blank spaces (including CHAR(160)) see the " mapgie" and "magpie" example below
Run regular expression matches (for example the sample below replaces s$ with "" to remove plurals)
Match on any combination of columns (ie Column A, all columns, Column A&B etc)
I'm surprised that no one has mentioned the RemoveDuplicates method.
ActiveSheet.Range("A:A").RemoveDuplicates Columns:=1
This will simply remove any duplicate entries on the active worksheet in column A. It takes milliseconds to run (tested with 200k rows). Mind you, this will strictly delete all the duplicate entries. Although that isn't how the original question was worded, I do believe that this still serves your purpose.
One simple way of finding unique values is to use the advance filter and filter for unique values only and copy and paste them into other sheet as when the pivot is removed you will get the whole data with the duplicate in them.
Sort the range
and in next column put `=if(a2=a1;1;if(a2=a3;1;0))
"1" will be displayed for duplicates.