I have the following data structure:
As you see in column J, I am trying to merge data into one column from columns A & C & E & G.
I am using this formula:
=IF(ROW()<=COUNTA($A:$A);INDEX($A:$C;ROW();COLUMN(A1));INDEX($A:$C;ROW()-COUNTA($A:$A)+1;COLUMN(C1)))
and I get the values in column K as you see. Currently this formula is merging only two columns. How to modify it to merge all four columns?
And how to only get those values starting from row 5?
The column height will vary constantly: sometimes there are 10 values in column A and sometimes there are 2 values.
Either any excel formula or any VBA code will be acceptable.
There is a fairly standard method for retrieving unique values from a column but not multiple columns. To achieve the retrieval from multiple columns you need to stack multiple formulas together with the processing being passed to successive columns one the earlier formula errors out.
The array formula¹ in J5 is,
=IFERROR(INDEX($A$5:$A$99, MATCH(0, IF(LEN($A$5:$A$99), COUNTIF(J$4:J4, $A$5:$A$99), 1), 0)),
IFERROR(INDEX($C$5:$C$99, MATCH(0, IF(LEN($C$5:$C$99), COUNTIF(J$4:J4, $C$5:$C$99), 1), 0)),
IFERROR(INDEX($E$5:$E$99, MATCH(0, IF(LEN($E$5:$E$99), COUNTIF(J$4:J4, $E$5:$E$99), 1), 0)),
IFERROR(INDEX($G$5:$G$99, MATCH(0, IF(LEN($G$5:$G$99), COUNTIF(J$4:J4, $G$5:$G$99), 1), 0)),
""))))
I have only included columns A, C, E and G as your sample data shows only duplicates in columns B, D, F, and H.
¹ Array formulas need to be finalized with Ctrl+Shift+Enter↵. If entered correctly, Excel with wrap the formula in braces (e.g. { and }). You do not type the braces in yourself. Once entered into the first cell correctly, they can be filled or copied down or right just like any other formula. Try and reduce your full-column references to ranges more closely representing the extents of your actual data. Array formulas chew up calculation cycles logarithmically so it is good practise to narrow the referenced ranges to a minimum. See Guidelines and examples of array formulas for more information.
This answer is another way of thinking about the formulas you could use for this sort of task. It gets to the point made by #Jeeped that it is difficult to find unique values in multiple columns. My first step then is to create a single column.
If you can live with a helper column, these formulas might be a tad easier to maintain than the nested IFERROR already proposed. They are equally difficult to understand though at first glance. The other upside is that it scales nicely if the number of columns involved increases.
It is possible using CHOOSE and some INDEX math to build a single column array of a group of separated columns. The trick is that CHOOSE will join discontinuous ranges side-by-side when given an array as the selecting parameter. If this starts with columns of the same size, you can then use division and mod math to turn it into a single column.
Picture of ranges shows the four groups of data with duplicates colored red.
Formula in F2:F31 is an array formula. This is combining all of the columns into an array and then back into a single column. I selected the columns out of order just to emphasize that it is handling a discontinuous range.
=INDEX(CHOOSE({1,2,3,4}, A2:A7,C2:C7,B2:B7,D2:D7), MOD(ROW(1:30)-1, ROWS(A2:A7))+1,INT((ROW(1:30)-1)/ROWS(A2:A7))+1)
The array formula in H2 and copied down is then the standard formula for unique values. The one exception is that instead of avoiding blanks like normal, I am avoiding 0 values.
=IFERROR(INDEX(F2:F31,MATCH(0,IF(F2:F31=0,1,COUNTIF($H$1:H1,F2:F31)),0)),"")
A couple of other comments about this approach:
In the CHOOSE, I am using {1,2,3,4}. This could be replaced with TRANSPOSE(ROWS(1:4)) or whatever number of columns you have.
There is also a ROWS(A2:A7) in 2 places, this could just be 2:7 or 1:6 or whatever size was used for the column size. I used one of the data ranges so that the coloring was simplified and to emphasize it needs to match the size of the block.
And the ROW(1:30) is used for the number of total items to collect. It really only needs to be 1:24 since there are 6*4 items, but I made it big while testing.
There are definitely a couple of downsides to this approach, but it may be a good trick to keep in the toolbox. Never know when you might want to make a column out of discontinuous ranges. The largest downside is that the columns of data all need to be the same size (and of course the helper column).
This code will do what you ask:
Sub MoveData()
START_ROW = 5
START_COL = 1
STEP_COL = 2
OUTPUT_ROW = 5
OUTPUT_COL = 10
Row = START_ROW
Col = START_COL
Out_Row = OUTPUT_ROW
While Col < OUTPUT_COL
While Cells(Row, Col).Value <> ""
Cells(Out_Row, OUTPUT_COL).Value = Cells(Row, Col).Value
Out_Row = Out_Row + 1
Row = Row + 1
Wend
Row = START_ROW
Col = Col + STEP_COL
Wend
End Sub
Think you guys are making this complicated. Just pull the range of data into power query , select all the columns and unpivot them this will bring all the data into a single column
Related
I'm quite new in VBA and was looking for ways on how to apply the same formula to all cells in a column. I have 5000 rows and I would simply like to multiply row values in Column 1 with row values in Column 2 and show the results of the same row in Column 3.
I have tried looping through each of the cells:
For i = 1 to 5000
Next i
but the calculation was slow and was wondering if there is another way to calculate all 5000 rows quicker.
I actually have 60 columns with different formulas and conditional statements which slows down the calculation process. I am actually looking for a simpler way to 'BULK' apply a certain formula to the whole column.
The following code makes Column1*Column2=Column3 using Formula property of Range object without a looping
Sheet1.Range("C1:C5000").Formula = "=RC[-2]*RC[-1]"
I have the following code that:
copies the column headings (in row 1) from column C across to the second last column
pastes these column headings in row 1 in the column 2 across from the last column with data
pastes the column headings alongside each row of data through to the bottom row
Sub GLDR()
'use End(xlUp) to determine Last Row with Data, in column A of the GLDRYYPP tab
Dim lastRowDR As Long
lastRowDR = Sheets("GLDRYYPP").Range("A" & Rows.Count).End(xlUp).Row
'copy the cost type categories and paste alongside the cost centres
CTNameCol = "S2:AF" & lastRowDR
Sheets("GLDRYYPP").Range("C1", Range("C1").End(xlToRight).Offset(0, -1)).Copy
Sheets("GLDRYYPP").Paste Destination:=Sheets("GLDRYYPP").Range("C1").End(xlToRight).Offset(0, 2)
Sheets("GLDRYYPP").Range(Range("C1").End(xlToRight).Offset(0, 2), Range("C1").End(xlToRight).Offset(0, 2).End(xlToRight)).Copy
Sheets("GLDRYYPP").Paste Destination:=Sheets("GLDRYYPP").Range(CTNameCol)
End Sub
The first two steps have been set to be dynamic for any additional columns added but I am having trouble writing some code that will paste the data through to the bottom row. As you can see the range "S2:AF(last row)" has been written to make use of the result from the lastRowDR dimension.
Is there a way to write the code which will make the copy dynamic across the columns and rows?
Yes there is a way to write the code which will make the copy dynamic across the columns and rows. What you need to do is to identify a distinct quality about the cost centres so that you locate it using something like ActiveSheet.Cells.Find(Txt).
Then you'll be able to determine what CTNameCol should be. There is some helpful info here: Range.Find Method.
If you add add a picture of your spreadsheet I will update this answer with more precise information.
(Current Sort Sample:)
2-1203-4
2-1206-3
2CM-
3-1610-1
3-999
…
AR3021-A-7802
AR3021-A-7802-1
B43570-
B43570-3
I am working on an 8000+ record parts list. The challenge I am running into is that different manufactures of the parts are using many varying formats for their part numbers. “Part Number” is the field I wish to sort my entire worksheet on. (There are about 10 columns of data in this worksheet.)
My methodology for attacking this challenge was to count the number of characters to the left of any “-“ and count the total number of numeric characters in the field. (I also set “Part Numbers” that started with a non-numeric character to a count value of 99 for both count calculations so those would sort after the numeric values.) From this, I was able to sort on the values to the left of the “-“ using .the MIN of the two counts. (My “Part Numbers” are in Column B and I have a header row which means that my first “Part Number” is in cell B2.)
This method worked up to a point. My challenge is that I need to subsequently sort values after the “-“ character as is illustrated by the erroneous sort of “3-1610-1” being followed by “3-999”
One of the limitations I see is that sorting with Data Sort only gives three columns to sort on. To sort on just the characters to the left of the “-“ is costing me those three columns. So, I am unable to repeat the whole process of counting values after the “-“ character and subsequently sorting with Data Sort after running the primary sort.
Has the sort of many differing formats of a field such as “Part Number” been solved? Is there a macro that can be applied to this challenge? If so, I would be grateful for your input.
This data is continuously updated with new part numbers so the goal here is to be able to add those additional part numbers to the bottom of the worksheet and use a macro to correctly resort the appended list.
For the record, I am not married to my approach. After all, it didn’t solve my challenge!
Thank you,
Darrell
Place this procedure in a standard code moule:
Public Sub PartNumberSortFormat()
Dim i&, j&, f, vIn, vOut
vIn = [b2:index(b:b,match("*",b:b,-1))]
vOut = vIn
For i = 1 To UBound(vIn)
f = Split(Replace(vIn(i, 1), " ", ""), "-")
For j = 0 To UBound(f)
If IsNumeric(f(j)) Then
f(j) = Format$(f(j), "000000")
Else
f(j) = String$(6 - Len(f(j)), "0") & f(j)
End If
Next
vOut(i, 1) = Join(f, "-")
Next
Columns(1).Insert xlToRight
[a1] = "SORT COLUMN"
[a2].Resize(UBound(vOut)) = vOut
Columns(1).EntireColumn.AutoFit
End Sub
After running the procedure, you will notice that it has inserted a new column A on your worksheet and your data has been scooted over to the right by one column.
This new column A will contain a copy of your part numbers, reformatted in such a fashion to allow normal sorting.
Now select all of the data INCLUDING this new column A and sort A-Z on column A.
After the sort, you may delete the new column A.
This works by padding all characters surrounding dashes to six zeroes.
My Thoughts:
Excel 2010 onwards lets you sort using as many columns as you like. (Not sure about 2007). Don't know which version you have!
You could use the formula SUBSTITUTE to remove all "-" from the part number then sort on the number that remains, which gives you a order more like the one you are wanting.
eg
Value =SUBSTITUTE(B2,"-","")
3-15 315
3-888 3888
3-999 3999
3-1610 31610
3-2610 32610
3-1610-1 316101
3-2610-3 326103
It's not exactly what you need though!
Combine this with other formulas (or a VBA function) to manipulate you part number to be more sortable.
You could use FIND to find the position of the first "-" and extract the numbers before it into one column.
Similarly using FIND, MID and LEN you could extract the numbers between a part number two "-".
I suspect if will be best to write a VBA function to convert a part number into a "sortable value". This might splitting the part number into it's component bits (ie each bit being the text between the "-")
(VBA function split might useful for this. It creates an array.
If you know the formats of ALL the part numbers that can be delivered, you can code accordingly.
I suspect you code will take a numbers like and convert them as shown
AB123-456-78 AB12300456007800
AB12-45-7 AB12000450007000
AB12-45 AB12000450000000
ie padding with zeros each component of the part number
The key to sorting the TEXTUAL values into the order you want is understanding how textuals values get sorted! Do some experiments. Then create zero (or "9") padded numbers that sort the numbers as you required.
I hope this helps.
While not a technical answer to the Excel question, I am a logistician working with extremely large data sets of part numbers - always varying in format. The standard approach used in my field is to "ignore" (remove) special characters from the P/N and append the (clean) P/N to the 5-digit CAGE (manufacturer) code to create a "unique" CAGE + (clean) P/N code for sorting, lookup, etc. Create a column for that construct.
I am trying to write a VBA to draw data from one sheet to another, but am stuck on something.
I only need some of the data in the original sheet (let's call it s1), in particular, I need data between two rows.
I have these rows written down in another sheet (s2), so I know exactly between which rows I need the data from. As you may expect there are multiple rows between which I need the data.
The problem is now that I am trying to write a VBA that is able to look up these rows in my row sheet (s2), and then goes to the sheet in which all my original data is contained (s1), and then draws out the data between the two rows into a third sheet (s3).
I have not been able to make it draw in the numbers from s2 (can't seem to work out how to tell it that it is these two rows between which I need the data, but from another sheet), and currently have to input the row numbers myself, which is really tedious, since the dataset is large!
Any help would be much appreciated!
Thank you!
Have you tried the .find option.
You could use something like this:
findrow = w1.Cells.Find(what:="Content", MatchCase:=False).Row
An more extensive example would help
Just break down your problem into smaller steps.
You can assign a value on a sheet using cell references:
Sheets("S1").Cells(iRow, iCol) = Value
'or
Sheets("S3").Cells(2, 10) = Sheets("S2").Cells(2, 10)
You can use Sheets("S1").Range("J" & iRow) or Range("J2").
If there are conditions that need to be met for something to happen, use If statements.
If (Value 1 < Value 2) Then
Sheets("S3").Cells(2, 10) = Sheets("S2").Cells(2, 10)
Else
Sheets("S3").Cells(2, 10) = Sheets("S1").Cells(2, 10)
End If
Without more information about the specific choices you are using to determine which rows to copy and from which sheet, it's hard to say. But you can do things in smaller batches or copy whole ranges, by looping.
Dim iRow As Integer
'This would copy all the data from Sheet("S1").Range("J1:J20) to Sheet "S3".
For iRow = 1 to 20
Sheets("S3").Cells(iRow, 10) = Sheets("S1").Cells(iRow, 10)
Next iRow
You could insert an If statement into each row, inside the loop to see if a certain criteria is met, and then decide which sheet in which to copy the data.
A lot of the solutions here on SO involve using CountIf to find duplicates. When I have a list of 100,000+ values however, it will often take minutes for CountIf to search for duplicates.
Is there a quicker way to search for duplicates within an Excel column WITHOUT using CountIf?
Thanks!
EDIT #1:
After reading the comments and replies I realize I need to go into greater detail. Let's pretend I'm a birdwatcher, and after I return from a birdwatching trip I input anywhere from 1 to 25 or 50 new birds that I saw on my trip into my "Master List of Birds Seen". This is really a dynamically growing list, and with each addition I want to make sure I'm not duplicating something that already exists in my list.
So, in column A of my file are the names of the birds. Column B-M might contain other attributes of the birds. I want to know if a bird that I just added in column A after my latest birdwatching trip ALREADY exists somewhere ELSE in my list. And, if it does, I would manually merge the data of the 2 entries and throw away some and keep some after careful review. I clearly don't want to have duplicate entries of the same bird in my database.
So, ultimately I want some indication that there is or isn't a duplicate somewhere else, and if there is duplicate please tell me what row to look in (or highlight or color both of the duplicates).
The fastest way that I know of (in case you are using Excel 2007/2010/2011) is to use Data (In Ribbon) | Remove Duplicates to find the total number of duplicates OR to remove duplicates. You might want to move data to a temp sheet before you test this.
The 2nd fastest way is to use Countif. Now Countif can be used in many ways to find duplicates. Here are two main ways.
1) Inserting a New Column next to the data and putting the formula and simply copying it down.
2) Using Countif in Conditional formatting to highlight cells which are duplicates. For more details, please see this link.
suggestions for a macro to find duplicates in a SINGLE column
EDIT:
My Apologies :)
Countif is the 3rd fastest way!
The 2nd fastest way is to use Pivot Tables ;)
What exactly is your main purpose of finding duplicates? Do you want to delete them? Or Do you want to highlight them? Or something else?
FOLLOWUP
Seems like I made a typo in the formula. Yes for large number of rows, CountIf does take minutes as you suggested.
Let me see if I can come up with a VBA code to suit your exact needs.
Sid
You can use VBA - the following function returns a list of unique entries within a list of 100,000 in less than a second. Usage: select a range, type the formula (=getUniqueListFromRange(YourRange)) and validate with CTRL+SHIFT+ENTER.
Public Function getUniqueListFromRange(parRange As Range) As Variant
' Returns a (1 to n,1 to 1) array with all the values without duplicates
Dim i As Long
Dim j As Long
Dim locKey As Variant
Dim locData As Variant
Dim locUniqueDict As Variant
Dim locUniqueList As Variant
On Error GoTo error_handler
locData = Intersect(parRange.Parent.UsedRange, parRange)
Set locUniqueDict = CreateObject("Scripting.Dictionary")
On Error Resume Next
For i = 1 To UBound(locData, 1)
For j = 1 To UBound(locData, 2)
locKey = UCase(locData(i, j))
If locKey <> "" Then locUniqueDict.Add locKey, locData(i, j)
Next j
Next i
If locUniqueDict.Count > 0 Then
ReDim locUniqueList(1 To locUniqueDict.Count, 1 To 1) As Variant
i = 1
For Each locKey In locUniqueDict
locUniqueList(i, 1) = locUniqueDict(locKey)
i = i + 1
Next
getUniqueListFromRange = locUniqueList
End If
error_handler: 'Empty range
End Function
If using Excel 2007 or later (which is likely from the 100,000+ values) you can choose:
Home Tab | Conditional Formatting > Highlight Cell Rules > Duplicate Values...
Right-click a highlighted cell and filter by selected cell color to show just the duplicates (be aware however this can be slow with conditional formatting).
Alternatively run this code and filter for colored cells which takes only a second on 100,000 cells:
Sub HighlightDupes()
Dim i As Long, dic As Variant, v As Variant
Application.ScreenUpdating = False
Set dic = CreateObject("Scripting.Dictionary")
i = 1
For Each v In Selection.Value2
If dic.exists(v) Then dic(v) = "" Else dic.Add v, i
i = i + 1
Next v
Selection.Font.Color = 255
For Each v In dic
If dic(v) <> "" Then Selection(dic(v)).Font.Color = 0
Next v
End Sub
Addendum:
To select only duplicate values without code or formulas, i have found this method useful:
Data Tab | Advanced Filter... Filter in Place, Unique Records Only, OK.
Now select the range of unique values and press Alt+; (Goto Special... Visible cells only). With this selection clear the filter and you will see that all unselected cells are duplicates, you can then press Ctrl+9 (Hide Rows) to show just the duplicates. These rows can be copied to another sheet if needed or marked with an "X".
You do not mention what you want to do when you find them. If you merely want to see where they are...
Sub HighLightCells()
ActiveSheet.UsedRange.Cells.FormatConditions.Delete
ActiveSheet.UsedRange.Cells.FormatConditions.Add Type:=xlCellValue, Operator:=xlEqual, Formula1:=ActiveCell
ActiveSheet.UsedRange.Cells.FormatConditions(1).Interior.ColorIndex = 4
End Sub
Preventing Duplicates with Data Validation
You can use Data Validation to prevent you entering duplicate bird names. See Debra Dalgelish's site here
Handling existing duplicates
My free Duplicate Master addin will let you
Select
Colour
List
Delete
duplicates.
But more importantly it will let you run more complex matching than exact strings, ie
Case Insensitive / Case Sensitive searches (sample below)
Trim/Clean data
Remove all blank spaces (including CHAR(160)) see the " mapgie" and "magpie" example below
Run regular expression matches (for example the sample below replaces s$ with "" to remove plurals)
Match on any combination of columns (ie Column A, all columns, Column A&B etc)
I'm surprised that no one has mentioned the RemoveDuplicates method.
ActiveSheet.Range("A:A").RemoveDuplicates Columns:=1
This will simply remove any duplicate entries on the active worksheet in column A. It takes milliseconds to run (tested with 200k rows). Mind you, this will strictly delete all the duplicate entries. Although that isn't how the original question was worded, I do believe that this still serves your purpose.
One simple way of finding unique values is to use the advance filter and filter for unique values only and copy and paste them into other sheet as when the pivot is removed you will get the whole data with the duplicate in them.
Sort the range
and in next column put `=if(a2=a1;1;if(a2=a3;1;0))
"1" will be displayed for duplicates.