Varying Format "Part Number" sort issue - vba

(Current Sort Sample:)
2-1203-4
2-1206-3
2CM-
3-1610-1
3-999
…
AR3021-A-7802
AR3021-A-7802-1
B43570-
B43570-3
I am working on an 8000+ record parts list. The challenge I am running into is that different manufactures of the parts are using many varying formats for their part numbers. “Part Number” is the field I wish to sort my entire worksheet on. (There are about 10 columns of data in this worksheet.)
My methodology for attacking this challenge was to count the number of characters to the left of any “-“ and count the total number of numeric characters in the field. (I also set “Part Numbers” that started with a non-numeric character to a count value of 99 for both count calculations so those would sort after the numeric values.) From this, I was able to sort on the values to the left of the “-“ using .the MIN of the two counts. (My “Part Numbers” are in Column B and I have a header row which means that my first “Part Number” is in cell B2.)
This method worked up to a point. My challenge is that I need to subsequently sort values after the “-“ character as is illustrated by the erroneous sort of “3-1610-1” being followed by “3-999”
One of the limitations I see is that sorting with  Data  Sort only gives three columns to sort on. To sort on just the characters to the left of the “-“ is costing me those three columns. So, I am unable to repeat the whole process of counting values after the “-“ character and subsequently sorting with  Data  Sort after running the primary sort.
Has the sort of many differing formats of a field such as “Part Number” been solved? Is there a macro that can be applied to this challenge? If so, I would be grateful for your input.
This data is continuously updated with new part numbers so the goal here is to be able to add those additional part numbers to the bottom of the worksheet and use a macro to correctly resort the appended list.
For the record, I am not married to my approach. After all, it didn’t solve my challenge!
Thank you,
Darrell

Place this procedure in a standard code moule:
Public Sub PartNumberSortFormat()
Dim i&, j&, f, vIn, vOut
vIn = [b2:index(b:b,match("*",b:b,-1))]
vOut = vIn
For i = 1 To UBound(vIn)
f = Split(Replace(vIn(i, 1), " ", ""), "-")
For j = 0 To UBound(f)
If IsNumeric(f(j)) Then
f(j) = Format$(f(j), "000000")
Else
f(j) = String$(6 - Len(f(j)), "0") & f(j)
End If
Next
vOut(i, 1) = Join(f, "-")
Next
Columns(1).Insert xlToRight
[a1] = "SORT COLUMN"
[a2].Resize(UBound(vOut)) = vOut
Columns(1).EntireColumn.AutoFit
End Sub
After running the procedure, you will notice that it has inserted a new column A on your worksheet and your data has been scooted over to the right by one column.
This new column A will contain a copy of your part numbers, reformatted in such a fashion to allow normal sorting.
Now select all of the data INCLUDING this new column A and sort A-Z on column A.
After the sort, you may delete the new column A.
This works by padding all characters surrounding dashes to six zeroes.

My Thoughts:
Excel 2010 onwards lets you sort using as many columns as you like. (Not sure about 2007). Don't know which version you have!
You could use the formula SUBSTITUTE to remove all "-" from the part number then sort on the number that remains, which gives you a order more like the one you are wanting.
eg
Value =SUBSTITUTE(B2,"-","")
3-15 315
3-888 3888
3-999 3999
3-1610 31610
3-2610 32610
3-1610-1 316101
3-2610-3 326103
It's not exactly what you need though!
Combine this with other formulas (or a VBA function) to manipulate you part number to be more sortable.
You could use FIND to find the position of the first "-" and extract the numbers before it into one column.
Similarly using FIND, MID and LEN you could extract the numbers between a part number two "-".
I suspect if will be best to write a VBA function to convert a part number into a "sortable value". This might splitting the part number into it's component bits (ie each bit being the text between the "-")
(VBA function split might useful for this. It creates an array.
If you know the formats of ALL the part numbers that can be delivered, you can code accordingly.
I suspect you code will take a numbers like and convert them as shown
AB123-456-78 AB12300456007800
AB12-45-7 AB12000450007000
AB12-45 AB12000450000000
ie padding with zeros each component of the part number
The key to sorting the TEXTUAL values into the order you want is understanding how textuals values get sorted! Do some experiments. Then create zero (or "9") padded numbers that sort the numbers as you required.
I hope this helps.

While not a technical answer to the Excel question, I am a logistician working with extremely large data sets of part numbers - always varying in format. The standard approach used in my field is to "ignore" (remove) special characters from the P/N and append the (clean) P/N to the 5-digit CAGE (manufacturer) code to create a "unique" CAGE + (clean) P/N code for sorting, lookup, etc. Create a column for that construct.

Related

Extracting "hidden" data from expanding/collapsing pivot table - Excel

I'm not sure if this is possible but as you can see I have a pivot table with multiple dependent and expandable fields. I am trying to concatenate the data from columns A:D into one cell which works fine in row 2 but doesn't work with blank parent cells, as you can see in column F.
Any ideas for how to achieve this?
Pivot table
This answer assumes that you don't want to just Repeat All Item Labels in the PivotTable from the "Report Layout" drop-down on the Pivt Table Tools "Design" tab.
A formula to get the first non-blank value on or above the same row as the current cell from Column B can be constructed with a combination of AGGREGATE, SUMPRODUCT and OFFSET, like so:
=OFFSET($B2,SUMPRODUCT(AGGREGATE(14,6,ROW($B$1:$B$100)*--(ROW($B$1:$B$100)<=ROW())*--(LEN($B$1:$B$100)>0),1))-ROW(),0)
How does it work?
Starting with the outermost part, OFFSET($B2, VALUE, 0) - this will start in cell B2, then look up or down by VALUE rows to get the value.
Next we need to know how many rows we will need to look up-or-down. Now, if we can work out the bottom-most row with data, we can subtract the current ROW() from that, giving us OFFSET($B2, NON_BLANK-ROW(),0)
So, to finish up we need to work out which rows are not blank, AND which rows are on-or-above our current row, then take the largest of those. This is going to take an ArrayFormula, but we can use SUMPRODUCT to make that calculate properly. To find the largest number we could use MAX or LARGE - but we get less errors if we pick AGGREGATE(14,6,..,1). (The 14 means "we want the kth largest number", the 6 means "ignore error values", and the 1 is k - so "we want the largest number, ignoring errors")
But, what list of numbers are we going to look at, I don't hear you ask. Well, we want the ROW for output from our range (I'm using $B$1:$B$100, because using the whole column B would take far to long to calculate repeatedly), a comparison against the current ROW(), and check that the LENgth is > 0. Those last two are comparisons, so let's write them out first:
ROW($B$1:$B100)<=ROW()
and
LEN($B$1:$B$100)>0
We want to use -- to convert TRUE and FALSE to 1 and 0 - this means that any "bad" values become 0, and any "good" values are larger than 0:
ROW($B$1:$B$100)*--(ROW($B$1:$B$100)<=ROW())*--(LEN($B$1:$B$100)>0)
This gives us the Row number when the Row is on-or-before the current row AND Column B is not blank - if either of those are False, then we get 0 instead. Stick that in the AGGREGATE to find the largest number:
AGGREGATE(14, 6, ROW($B$1:$B$100)*--(ROW($B$1:$B$100)<=ROW())*--(LEN($B$1:$B$100)>0), 1)
Then put it in a SUMPRODUCT to force Excel to treat it as an ArrayFormula, and that's your NON_BLANK. This then gives you that first formula right at the top of the post

Merge values in multiple columns into one

I have the following data structure:
As you see in column J, I am trying to merge data into one column from columns A & C & E & G.
I am using this formula:
=IF(ROW()<=COUNTA($A:$A);INDEX($A:$C;ROW();COLUMN(A1));INDEX($A:$C;ROW()-COUNTA($A:$A)+1;COLUMN(C1)))
and I get the values in column K as you see. Currently this formula is merging only two columns. How to modify it to merge all four columns?
And how to only get those values starting from row 5?
The column height will vary constantly: sometimes there are 10 values in column A and sometimes there are 2 values.
Either any excel formula or any VBA code will be acceptable.
There is a fairly standard method for retrieving unique values from a column but not multiple columns. To achieve the retrieval from multiple columns you need to stack multiple formulas together with the processing being passed to successive columns one the earlier formula errors out.
      
The array formula¹ in J5 is,
=IFERROR(INDEX($A$5:$A$99, MATCH(0, IF(LEN($A$5:$A$99), COUNTIF(J$4:J4, $A$5:$A$99), 1), 0)),
IFERROR(INDEX($C$5:$C$99, MATCH(0, IF(LEN($C$5:$C$99), COUNTIF(J$4:J4, $C$5:$C$99), 1), 0)),
IFERROR(INDEX($E$5:$E$99, MATCH(0, IF(LEN($E$5:$E$99), COUNTIF(J$4:J4, $E$5:$E$99), 1), 0)),
IFERROR(INDEX($G$5:$G$99, MATCH(0, IF(LEN($G$5:$G$99), COUNTIF(J$4:J4, $G$5:$G$99), 1), 0)),
""))))
I have only included columns A, C, E and G as your sample data shows only duplicates in columns B, D, F, and H.
¹ Array formulas need to be finalized with Ctrl+Shift+Enter↵. If entered correctly, Excel with wrap the formula in braces (e.g. { and }). You do not type the braces in yourself. Once entered into the first cell correctly, they can be filled or copied down or right just like any other formula. Try and reduce your full-column references to ranges more closely representing the extents of your actual data. Array formulas chew up calculation cycles logarithmically so it is good practise to narrow the referenced ranges to a minimum. See Guidelines and examples of array formulas for more information.
This answer is another way of thinking about the formulas you could use for this sort of task. It gets to the point made by #Jeeped that it is difficult to find unique values in multiple columns. My first step then is to create a single column.
If you can live with a helper column, these formulas might be a tad easier to maintain than the nested IFERROR already proposed. They are equally difficult to understand though at first glance. The other upside is that it scales nicely if the number of columns involved increases.
It is possible using CHOOSE and some INDEX math to build a single column array of a group of separated columns. The trick is that CHOOSE will join discontinuous ranges side-by-side when given an array as the selecting parameter. If this starts with columns of the same size, you can then use division and mod math to turn it into a single column.
Picture of ranges shows the four groups of data with duplicates colored red.
Formula in F2:F31 is an array formula. This is combining all of the columns into an array and then back into a single column. I selected the columns out of order just to emphasize that it is handling a discontinuous range.
=INDEX(CHOOSE({1,2,3,4}, A2:A7,C2:C7,B2:B7,D2:D7), MOD(ROW(1:30)-1, ROWS(A2:A7))+1,INT((ROW(1:30)-1)/ROWS(A2:A7))+1)
The array formula in H2 and copied down is then the standard formula for unique values. The one exception is that instead of avoiding blanks like normal, I am avoiding 0 values.
=IFERROR(INDEX(F2:F31,MATCH(0,IF(F2:F31=0,1,COUNTIF($H$1:H1,F2:F31)),0)),"")
A couple of other comments about this approach:
In the CHOOSE, I am using {1,2,3,4}. This could be replaced with TRANSPOSE(ROWS(1:4)) or whatever number of columns you have.
There is also a ROWS(A2:A7) in 2 places, this could just be 2:7 or 1:6 or whatever size was used for the column size. I used one of the data ranges so that the coloring was simplified and to emphasize it needs to match the size of the block.
And the ROW(1:30) is used for the number of total items to collect. It really only needs to be 1:24 since there are 6*4 items, but I made it big while testing.
There are definitely a couple of downsides to this approach, but it may be a good trick to keep in the toolbox. Never know when you might want to make a column out of discontinuous ranges. The largest downside is that the columns of data all need to be the same size (and of course the helper column).
This code will do what you ask:
Sub MoveData()
START_ROW = 5
START_COL = 1
STEP_COL = 2
OUTPUT_ROW = 5
OUTPUT_COL = 10
Row = START_ROW
Col = START_COL
Out_Row = OUTPUT_ROW
While Col < OUTPUT_COL
While Cells(Row, Col).Value <> ""
Cells(Out_Row, OUTPUT_COL).Value = Cells(Row, Col).Value
Out_Row = Out_Row + 1
Row = Row + 1
Wend
Row = START_ROW
Col = Col + STEP_COL
Wend
End Sub
Think you guys are making this complicated. Just pull the range of data into power query , select all the columns and unpivot them this will bring all the data into a single column

Check the length of data in a cell and add a zero

I have a spreadsheet containing data in the following format:
Col1 Col2
ROW1: 21211 Customer 3873721
ROW2: 101111 Customer 2321422
ROW3: 91214 Customer 2834712
ROW4: 231014 Customer 3729123
I need to be able to create a macro that goes through each row and determines the number of characters that make up the row1 data.
For example:
If data contained in the first cell or ROW1 consisted of a total of 6
characters then this will remain the same. If it consisted of 5
characters then a zero needs to be added to the front of it.
I'm using Excel 2003.
VBA is not needed in this case. A simple IF Statement should do the trick. Assuming the column you want to evalutate is column A place this in column B and duplicate down the column:
=+IF(LEN(A1)=5,"0" & A1,A1)
This will work provided all values are 5 or 6 characters long as it appears in your sample data.
It seems OP may be happy with a Custom Format of:
#000000
This is easy to apply but mainly may be a good idea because most other ways of prepending a 0 are only possible by conversion of numeric values to strings (as otherwise Excel will automatically strip leading zeros). If that is done selectively (for length 5 but not length 6) a column of what appears to be numbers might end up with mixed types (of different behaviour) and so confuse or inconvenience.

Excel formula/macro for finding data with letters and numbers

I have a column of data in Excel that contains different values. I am looking for a formula or macro to distinguish the different types of data. For instance, I have a VLOOKUP for numerical values =VLOOKUP(E2,TECH!B:F,4,FALSE) but this only works for certain values.
For instance, this returns the value of E2 when it's listed as a 4 digit extension in the column. Some data points are listed as "i78990" or "n65778", etc. I want to return a value of "Chicago" when an "i" is before the number and an "Atlanta" if the "n" is before the number, etc.
Use "LEFT" function in order to get the as follows: LEFT(E2,1) to get the first letter (you can apply that to all of your cells), store the result on a different column and preform the Vlookup from there.
For a more general case of separating numbers from text you can use the following algorithm:
-Break the alphanumeric string into separate characters.
use: MID(A1,ROW($1:$9),1)
-Determine whether there is a number in the decomposed string.
use: ISNUMBER(1*MID(A1,ROW($1:$9),1))
-Determine the position of the number in the alphanumeric string.
use: MATCH(TRUE,ISNUMBER(1*MID(A1,ROW($1:$9),1)),0)
-Count the numbers in the alphanumeric string.
use =OUNT(1*MID(A1,ROW($1:$9),1))
or as a whole:
MID(A1,MATCH(TRUE,ISNUMBER(1*MID(A1,ROW($1:$9),1)),0),COUNT(1*MID(A1,ROW($1:$9),1)))
you can see an example in the following link.

Excel countif Pulling apart a cell to do different things

Excel 2007
I have a row of cells with variation of numbers and letters (which all mean something.. not random.)
It's basically a timesheet. If they take a sick day they put in S, if they take a partial sick day they put in PS. The problem is they also put in the hours they did work too. They put it in this format: (number)/PS.
Now if it were just letters I could just do =countif(range,"S") to keep track of how many s / ps cells there are. How would I keep track if they are PS where it also has a number separated by a slash then PS.... I also still need to be able to use that number to add to a total. Is it even possible or will I have to format things different to be able to keep track of all this stuff.
Assuming this is something like what your data looks like:
A B C D E
1 1 2 S 4/PS 8
...then you could do this:
1- add a column that just totals the "S" entries with a COUNTIF function.
2- add a hidden row beneath each real data row that will copy the numerical part of the PS entries only with this function in each column:
=IF(RIGHT(B1,2)="PS",IF(ISERROR(LEFT(B1,LEN(B1)-SEARCH("/",B1)-1)),"",INT(LEFT(B1,LEN(B1)-SEARCH("/",B1)-1))),"")
3- add another column to the right that just totals the "PS" entries by summing the hidden row from step 2.
3- add another column that totals everything by just summing the data row. that will ignore the text entries automagically.
4- have a grand total column that adds those three columns up
If you don't want to see the "S" and "PS" total columns, you can of course just hide them.
So in the end, the sheet would look like this:
A B C D E F G H I J
1 1 2 S 4/PS 8 1 4 11 16
2 4 <--- hidden row
HTH...
My quick take on this is:
pass the cell value into a CSTR function, so no matter what is entered you will be working with a string.
parse the information. Look for S, PS, or any other code you deem to be valid. Use Left or Right functions if you need to look at partial string.
check for number by testing the ascii value, or trying a CINT function, which will only work if the string can be converted to integer.
If you can show a sample of your cells with variation of numbers and letters I can give you more help. Hope this works out.
-- Mike