I would need to split only specific strings to numbers and then maximize these numbers from the specific strings in Google sheets? - sql

Example of my google sheet spreadsheet cells below
1
AAA C1
BBB C4
AAA C7
A1: "AAA C1"
B1: "BBB C4"
C1: "AAA C7"
I would need to maximize the highest numbers after the letter "C" from all the strings that start with "AAA". I have only reached to maximize all numbers after "C" by using the formula:
=Max(ARRAYFORMULA(VALUE(REGEXREPLACE(a1:c1,"[^[:digit:]]", ""))))
Nevertheless, with the above formula I have not been able to select only the strings that start with "AAA". I have tried the function =maxifs, but it does not allow string functions such as MID being applied to the range of cells.
This is my first question here, I hope it is all clear and someone can help me with this problem.
Thanks!

Suggested Solution #1:
=Max(ARRAYFORMULA(VALUE(REGEXREPLACE(FILTER(A1:C1,ISNUMBER(SEARCH("AAA",A1:C1))),"[^[:digit:]]", ""))))
I left your original formula essentially as it was, since you already understand it as it was. All I did was replace your range with a FILTER of that range.
The FILTER alone — FILTER(A1:C1,ISNUMBER(SEARCH("AAA",A1:C1))) — keeps only those entries in the range where searching for "AAA" results in a position number (i.e., all matches for "AAA" as anything else would result in an error).
If the "AAA" must appear first in the string, you can use this version:
=Max(ARRAYFORMULA(VALUE(REGEXREPLACE(FILTER(A1:C1,SEARCH("AAA",A1:C1)=1),"[^[:digit:]]", ""))))
Suggested Solution #2:
=Max(ARRAYFORMULA(IFERROR(VALUE(REGEXEXTRACT(A1:C1,"^AAA.+(\d+)$")),-9^9)))
This will find the max from REGEXEXTRACTs matching only the "AAA" cells. If something doesn't match the "AAA" pattern, the IFERROR will return an incredibly high negative number (which will rule those out from ever being the MAX).

I think you just need to alter your REGEX expression.
^A{3}[^A].*?C(?=\d+)
^A{3}[^A] : start with exactly three uppercase A
.*?C : find the First uppercase C
(?=\d+) : the C follow by at least one number.
which will extract the part you want to replace with empty string.
e.g. AAA C1 will extract AAA C which after replace will leave you only 1.

try:
=MAX(INDEX(1*IFNA(REGEXEXTRACT(A1:C1; "^AAA.+(\d+)"))))

You mention:
...maximize the highest numbers after the letter "C" from all the strings that start with "AAA".
The mentioned answers work for a single row.
If you want to apply it to any range of rows in more complex situations you can use
=MAX(INDEX(IFERROR(
REGEXEXTRACT(FLATTEN(A1:C),"^AAA .*C(\d+)")+0)))

Related

vba - Get row number of a cell with certain value in a named range

Is there a way to find the row of a cell containing certain value in a named range (Table)?
I have a table named "Table1" from C4 to D10.. I want to return the row number of a cell in column c that contains the value "CCC".. This value is found at the third row (C6).. So I want the code to return the number "3" which means the third row in the table "Table1" and not the number "6" that means it's found in cell "C6".
Thank you in advance.
Edit:
I missed the part about "relative to the table" but that's even easier...
Here's one that will work:
=MATCH("CCC",Table1,0)
or, referring to the cell range instead of the table directly:
=MATCH("CCC",C4:C10,0)
Original Answer:
If you want to know the row number of the first cell in Column C that contains CCC, you could use the MATCH function, either in a worksheet formula, or in VBA.
On the worksheet:
=MATCH("CCC",C:C,0)
or, in VBA:
WorksheetFunction.Match("CCC",Range("C:C"),0)
The fact that it's in a table is irrelevant since you've identified the column.
Incidentally, I can think of at least half a dozen other ways to get the same data just as easily.
Here's one that refers to the table directly:
=ROW(INDEX(Table1,MATCH("CCC",Table1,0)))
...and more variations:
=MATCH("CCC",C4:C10,0)-1+ROW(C4)
or
=MATCH("CCC",Table1,0)-1+ROW(Table1)
Note that a big difference between MATCH and VLOOKUP is that MATCH returns a range object, whereas VLOOKUP only returns the value of the matched cell, and therefore isn't suitable for a task like this.
More Information:
Office Support : MATCH Function
Microsoft Docs : WorksheetFunction.Match Method (Excel)
Office Support : INDEX Function

Find Text Duplicates in Excel

I'm an Excel/VBA newbie and I have a question.
Is it possible to tag partial string matches between two columns in Excel?
Let's say I have two columns, A and B, that have text values in them. I want to identify rows where the A cell and B cell has a partial match.
Here are some hypothetical cases of the 'partial matches' that I'm looking for.
Case 1: exact phrase match (Fictional Company Ltd) but one column has extra text
Cell A2: 123456789 Fictional Company Ltd
Cell B2: Fictional Company Ltd
Case 2: exact phrase match (Fictional Company Ltd) but both columns have extra text
Cell A3: 123456789 Fictional Company Ltd
Cell B3: Fictional Company Ltd, 1 Main Street, City, State 12345
Case 3: partial match
Cell A4: Fictional Ltd
Cell B4: Fictional Company Ltd
Case 4: word match
Cell A5: Fictional Company Ltd
Cell B5: Fictional
I would like to identify all of those cases above. However, I don't mind running >1 set of codes to cover them all.
Thanks a lot in advance for your help!
Update: when I first created the cases, I didn't realize that I put the first word in column B as the matching word with column A. It is not the case - sometimes it is the 3rd word in column B and the 5th word in column A that matches.. the data is all over the place!
*Update 2:** also want to clarify that the cases are reversible - for example, there are some rows where it's Case 1 but cell B has more info instead of cell A.
This function returns the number of times a word in Txt1 is contained anywhere (not just as a word) in Txt2:
Function CountMatches(text1 As String, text2 As String) As Long
Dim arr, x As Long
arr = Split(text2)
For x = 0 To UBound(arr)
If text1 Like "*" & arr(x) & "*" Then CountMatches = CountMatches + 1
Next x
End Function
...and this one does the same, but also counts each occurence of Txt2 anywhere within Txt1:
Function CountMatches2(text1 As String, text2 As String) As Long
Dim arr, x As Long
arr = Split(text1)
For x = 0 To UBound(arr)
If text2 Like "*" & arr(x) & "*" Then CountMatches2 = CountMatches2 + 1
Next x
arr = Split(text2)
For x = 0 To UBound(arr)
If text1 Like "*" & arr(x) & "*" Then CountMatches2 = CountMatches2 + 1
Next x
End Function
Both are susceptible to counting the same match twice, especially (obviously) the CountMatches2.
Sample Output:
I'm curious if this suits your needs (as it's obviously not a true "fuzzy match")...
It can be easily modified to return a TRUE/FALSE (ie., TRUE = One or more matches) or to look only for entire word matches as opposed to "anywhere".
Let me know if you have any questions!
Case 1 is possible, simply by truncating the length of the longer so that it matches the length of the shorter, and then seeing if they are the same. Use the LEFT function to trim the longer word to the length of the shorter one. (Use the LEN function on the shorter word to work out how long it is).
Case 2 is tricky but possible, because you effectively need to search the longer string for every possible combination of ordered words from the shorter. It's kind of a 'slightly simpler' version of Case 3.
Case 3 is damn tricky: it's pretty much a Fuzzy Match which is computationally expensive, and requires something called tokenisation to do efficiently. Microsoft has a free Fuzzy Match addin but it's kinda sucky...it returns many false positives to the point that you need to eyeball each and every result to make sure it is a valid one. Which completely defeats the purpose. I'm working on putting together a commercial offering in that space myself that returns far fewer false positives, but can't share code. Suffice to say that this is a very difficult thing to do efficiently.
Case 4 is trivial: you just use the SEARCH formula.
Add a whole 'nother layer of trickyness if you have multiple words in each list.
The above answer is enough to point you in the right direction for a Google search. Note that you can simplify things by substituting out things like "Ltd" and "Limited" and other sundry terms using the SUBSTITUTE formula, but you've still got a heck of a challenge on your hands.

Sum numbers that contain a letter in the cell

I'm trying to sum a3:b21 but in these cells they will either have a c or s as the first character in the cell, Some cells might just be the letter itself. the values can be in the range of $1 all the up to $99999.
I have tried a few different formulas but none are adding the cells. Some of the formulas I have tried are
=SUM(IF((LEFT(A3:B20,1))=C3,(--RIGHT((IF(A3:B20="",0,A3:B20)),7)),0))
=IF(ISNUMBER(A3:B20),M3,VALUE(RIGHT(M3,SEARCH(" ",A3:B20)-1)))
=SUMIF(A3:B20,M2,A3:B20)
=SUMIF(A3:B19, "s*")
For the time being while I try and figure out a formula that works I have m2 for the letter s and m3 for the letter c, for testing different formulas.
Sample cell:
If you can add a column and sum from that try this, it acts like a replace() function. Will remove c and s. You can nest other substitutes() to remove other characters as necessary
=substitute(substitute("a3","c",""),"s","")+0
Then copy for other cells
This worked for me.
=SUMPRODUCT(--(LEFT(A1:A3,1)="c"),IFERROR(--RIGHT(A1:A3,LEN(A1:A3)-1),0))
CSE-entered.

Varying Format "Part Number" sort issue

(Current Sort Sample:)
2-1203-4
2-1206-3
2CM-
3-1610-1
3-999
…
AR3021-A-7802
AR3021-A-7802-1
B43570-
B43570-3
I am working on an 8000+ record parts list. The challenge I am running into is that different manufactures of the parts are using many varying formats for their part numbers. “Part Number” is the field I wish to sort my entire worksheet on. (There are about 10 columns of data in this worksheet.)
My methodology for attacking this challenge was to count the number of characters to the left of any “-“ and count the total number of numeric characters in the field. (I also set “Part Numbers” that started with a non-numeric character to a count value of 99 for both count calculations so those would sort after the numeric values.) From this, I was able to sort on the values to the left of the “-“ using .the MIN of the two counts. (My “Part Numbers” are in Column B and I have a header row which means that my first “Part Number” is in cell B2.)
This method worked up to a point. My challenge is that I need to subsequently sort values after the “-“ character as is illustrated by the erroneous sort of “3-1610-1” being followed by “3-999”
One of the limitations I see is that sorting with  Data  Sort only gives three columns to sort on. To sort on just the characters to the left of the “-“ is costing me those three columns. So, I am unable to repeat the whole process of counting values after the “-“ character and subsequently sorting with  Data  Sort after running the primary sort.
Has the sort of many differing formats of a field such as “Part Number” been solved? Is there a macro that can be applied to this challenge? If so, I would be grateful for your input.
This data is continuously updated with new part numbers so the goal here is to be able to add those additional part numbers to the bottom of the worksheet and use a macro to correctly resort the appended list.
For the record, I am not married to my approach. After all, it didn’t solve my challenge!
Thank you,
Darrell
Place this procedure in a standard code moule:
Public Sub PartNumberSortFormat()
Dim i&, j&, f, vIn, vOut
vIn = [b2:index(b:b,match("*",b:b,-1))]
vOut = vIn
For i = 1 To UBound(vIn)
f = Split(Replace(vIn(i, 1), " ", ""), "-")
For j = 0 To UBound(f)
If IsNumeric(f(j)) Then
f(j) = Format$(f(j), "000000")
Else
f(j) = String$(6 - Len(f(j)), "0") & f(j)
End If
Next
vOut(i, 1) = Join(f, "-")
Next
Columns(1).Insert xlToRight
[a1] = "SORT COLUMN"
[a2].Resize(UBound(vOut)) = vOut
Columns(1).EntireColumn.AutoFit
End Sub
After running the procedure, you will notice that it has inserted a new column A on your worksheet and your data has been scooted over to the right by one column.
This new column A will contain a copy of your part numbers, reformatted in such a fashion to allow normal sorting.
Now select all of the data INCLUDING this new column A and sort A-Z on column A.
After the sort, you may delete the new column A.
This works by padding all characters surrounding dashes to six zeroes.
My Thoughts:
Excel 2010 onwards lets you sort using as many columns as you like. (Not sure about 2007). Don't know which version you have!
You could use the formula SUBSTITUTE to remove all "-" from the part number then sort on the number that remains, which gives you a order more like the one you are wanting.
eg
Value =SUBSTITUTE(B2,"-","")
3-15 315
3-888 3888
3-999 3999
3-1610 31610
3-2610 32610
3-1610-1 316101
3-2610-3 326103
It's not exactly what you need though!
Combine this with other formulas (or a VBA function) to manipulate you part number to be more sortable.
You could use FIND to find the position of the first "-" and extract the numbers before it into one column.
Similarly using FIND, MID and LEN you could extract the numbers between a part number two "-".
I suspect if will be best to write a VBA function to convert a part number into a "sortable value". This might splitting the part number into it's component bits (ie each bit being the text between the "-")
(VBA function split might useful for this. It creates an array.
If you know the formats of ALL the part numbers that can be delivered, you can code accordingly.
I suspect you code will take a numbers like and convert them as shown
AB123-456-78 AB12300456007800
AB12-45-7 AB12000450007000
AB12-45 AB12000450000000
ie padding with zeros each component of the part number
The key to sorting the TEXTUAL values into the order you want is understanding how textuals values get sorted! Do some experiments. Then create zero (or "9") padded numbers that sort the numbers as you required.
I hope this helps.
While not a technical answer to the Excel question, I am a logistician working with extremely large data sets of part numbers - always varying in format. The standard approach used in my field is to "ignore" (remove) special characters from the P/N and append the (clean) P/N to the 5-digit CAGE (manufacturer) code to create a "unique" CAGE + (clean) P/N code for sorting, lookup, etc. Create a column for that construct.

Finding a partial match between cells in different sheets

So in sheet "account", I have a column B, and from B2 to B990, I have identification numbers like 00AMSDF4 Corp, and in sheet "positions", from A2:A330 I have just the id, so "00AMSDF4."
I want to check if the ID appears in the column. I've tried Vlookup, which is difficult to do for partial matches (or I'm just bad), and I've tried several matches which show the value N/A.
Does anyone have any advice? Thank you so much
Use the MATCH function with the wildcard. For example, in cell B2 on sheet positions:
=MATCH(A2 & "*", account!B$2:B$990,0)