Extract substring with criteria - vba

I have several rows of information pulled from a report in Column C and D, its basically a description someone wants to do with an account they also of course use give you the account number what I want to do is extract that substring the criteria I'm using is that it must start with the letter A and should be as a minimum 17 characters long, Account numbers have a combination of letter and numbers but they all start with letter A i.e A8H66P66FHDSJ2YNTP some of this account numbers have up to 25 characters some have 19 some 17 so again I'm looking to extract a substring from a string that starts with letter A and its atleast 17 characters long

Try to use RegEx as shown in the below example:
Sub Test()
Dim oCell, oMatch
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = "\bA[A-Z0-9]{16,24}\b"
For Each oCell In ThisWorkbook.Sheets("Sheet1").Range("C1:D1000")
For Each oMatch In .Execute(oCell.Value)
Debug.Print oMatch.Value
Next
Next
End With
End Sub

Formula solution:
=IFERROR(TRIM(MID(SUBSTITUTE(C1," ",REPT(" ",LEN(C1))),LEN(C1)*(MATCH(TRUE,INDEX(ISNUMBER(SEARCH("A"&REPT("?",17),TRIM(MID(SUBSTITUTE(C1," ",REPT(" ",LEN(C1))),LEN(C1)*(ROW($1:$100)-1)+1,LEN(C1))))),),0)-1)+1,LEN(C1))),"No Account Number")

Related

Extract first two digits that comes after some string in Excel

I have a row with values something like this, How to extract first two digits that come after the text 'ABCD' to another cell, any formula or vba? There may be a few chars in between or sometimes none.
ABCD 10 sadkf sdfas
ABCD-20sdf asdf
ABCD 40
ABCD50 asdf
You can do this with a worksheet formula. No need for VBA.
Assuming you do not need to test for the presence of two digits:
=MID(A1,MIN(FIND({1,2,3,4,5,6,7,8,9,0},A1&"1234567890")),2)
If you need to test for the presence of two digits, you can try:
=IF(ISNUMBER(-RIGHT(MID(A1,MIN(FIND({1,2,3,4,5,6,7,8,9,0},A1&"1234567890")),2),1)),MID(A1,MIN(FIND({1,2,3,4,5,6,7,8,9,0},A1&"1234567890")),2),"Invalid")
In general, it is always a good idea to show some code in StackOverflow. Thus, you show that you have tried something and you give some directions for the answer.
Concerning the first two digits extract, there are many ways to do this. Starting from RegEx and finishing with a simple looping of the chars and checking each one of them.
This is the loop option:
Public Function ExtractTwoDigits(inputString As String) As Long
Application.Volatile
Dim cnt As Long
Dim curChar As String
For cnt = 1 To Len(inputString)
curChar = Mid(inputString, cnt, 1)
If IsNumeric(curChar) Then
If Len(ExtractTwoDigits) Then
ExtractTwoDigits = ExtractTwoDigits & curChar
Exit Function
Else
ExtractTwoDigits = curChar
End If
End If
Next cnt
ExtractTwoDigits = -1
End Function
Application.Volatile makes sure that the formula recalculates every time;
-1 is the answer if no two digits exist in the inputString;
IsNumeric checks whether the string inside is numeric;
As a further step, you may try to make the function a bit robust, extracting the first 1, 3, 4 or 5 digits, depending on a parameter that you put. Something like this =ExtractTwoDigits("tarato123ra2",4), returning 1232.
RegEx Version:
Public Function GetFirstTwoNumbers(ByVal strInput As String) As Integer
Dim reg As New RegExp, matches As MatchCollection
With reg
.Global = True
.Pattern = "(\d{2})"
End With
Set matches = reg.Execute(strInput)
If matches.Count > 0 Then
GetFirstTwoNumbers = matches(0)
Else
GetFirstTwoNumbers = -1
End If
End Function
You have to enable Microsoft Regular Expressions 5.5 under extras->references. The pattern (\d{2}) matches 2 digits, return value is the number, if not existing -1.
Note: it only extracts 2 successive numbers.
If you place this function into a module, you can use it like normal formula.
Here a great site to to get into regEx.

How to sort excel values with numbers in end

I have a macro which reads file names from a folder. The problem is that when file names are in series like A1,A2.....A200.pdf, as in this image:
then it reads in Excel as A1,A10,A100,A101.....A109,A11,A110.....A119,A20, as in this image:
How can I sort this so that the value in Excel comes as same as folder file names, or is there a way I can sort in Excel itself?
You can sort this in Excel with a helper column. Create a new column and calculate the length of your filenames in that "=LEN(A1)". Then use two-level sort to sort your filenames. Data -> Sort: Use length in the first level and the filenames in the second level.
Another option, you can use the RegEx object to extract the Numeric digits "captured" inside the file name.
Option Explicit
Sub SortFileNames()
Dim i As Long
With Sheets("Sheet1") ' replaces "Sheet1| with your sheet's name
For i = 1 To .Cells(.Rows.Count, "A").End(xlUp).Row
.Range("B" & i).Value = RetractNumberwithRegex(.Range("A" & i)) ' call the Regex function
Next i
End With
End Sub
'========================================================================
Function RetractNumberwithRegex(Rng As Range) As String
' function uses the Regex object to substract the Numeric values inside
Dim Reg1 As Object
Dim Matches As Object
Set Reg1 = CreateObject("vbscript.regexp")
With Reg1
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "[0-9]{1,20}" ' any size numeric string (up to 20 digits length)
End With
Set Matches = Reg1.Execute(Rng.Value2)
If Matches.Count <> 0 Then
RetractNumberwithRegex = Matches.Item(0)
End If
End Function
This is happening ofcourse of because different sorting algorithm in both these cases (Windows Explorer and Excel) Refer to this article if you want to understand.
To solve your problem, one of the ways is to pull out only the numeric part of file names in a different cell (say column B) and then sort based on those numbers.
If I can assume that the pattern of the files names is AXXX.pdf i.e. one letter A, then number, and 4 characters for file extension. You can use this function
=VALUE(MID(A1,2,LEN(A1)-5))
This works by pulling out some number of characters from in between the string. As per assumption, the number starts from 2nd place that's why the second parameter is 2. Then to decide, how many characters you pull, you know that all the characters except 'A' (1 char) and '.pdf' (4 chars) make the number. So, take the lenght of the whole name and reduce 5 characters. You get your number part which you can sort.
This will be your result:
The best way is to change the file names in your Excel list to have leading zeroes. Instead of A19, refer to the file as A019 and it will sort correctly. Convert the file names using this formula in a helper column.
=Left($A2, 1) & Right("000" & Mid($A2, 2, 3), 3)
Note that the 3 zeroes and string lengths of 3 are all related to each other. To create fixed length numbers of 4 digits, just use 4 zeroes and increase both string lengths to 4.
Copy the formula down from row 2 to the end. Copy the helper column, paste Values in place and, when everything is perfect, replace the original column with the helper.
In order to accommodate a fixed number of digits following the number the above formula may be tweaked. The formula below will accommodate 4 extra characters which follow the number, for example ".pdf" (including the period).
=Left($A2, 1) & Right("000" & Mid($A2, 2, 7), 7)

Split Text and number IN VBA

I have column header which I want to split
Heading
XA 2009
WW YY 2010
XXA 2011
I Want output like
XA,
WW YY,
XXA
Earlier I was using find function in excel which was working fine
=MID("XA 2009",1,FIND(" ","XA 2009",FIND(" ","XA 2009")+1)-1)
OUTPUT AS XA,
WW YY
Now requirement has change to code in vba
I was trying to use Instr() instead of find as it is not working in VBA
Mid("XA 2009", 1, InStr(1, "XA 2009", " ", InStr(1, "XA 2009", "2")) - 1)
Now the output is XA,
WW instead of WW YY.
Can anyone suggest what I am doing wrong. I am pretty new to vba.
I Want output like
XA,
WW YY,
XXA
I am using excel 2013
First, see in the answer for the following SO question, the general approach and prerequisites for using Regex search in VBA:
How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
Now, as for your specific requirement, try the following pattern:
(\D*)\s+\d*\s+(\D*)\s+\d*\s+(\D*)\s+\d*
It will work for your precise example, but if you need the input string to be a bit more general you might need to modify the pattern.
Some explanations:
\D* will match one or more non numerical text characters ("alpha character")
\s+ will match at least one space character
\d* will match one or more numerical digits
the (parenthesis) are for grouping sets of results, so I used them to surround what you wish to extract from the input string.
If for example you know for sure that there's only one white-space character you can use:
[\s]
So the pattern might look like:
(\D*)[\s]\d*[\s](\D*)[\s]\d*[\s](\D*)[\s]\d*
Also, this is a great tool for online pattern testing:
https://regex101.com/
This is the solution for your edited requirement:
In the VBA editor, go to tools=>references, find and select the checkbox next to "Microsoft VBScript Regular Expressions 5.5", press ok
add this code to "ThisWorkbook" module:
Private Sub solution()
Dim regEx As New RegExp
Dim strPattern As String
Dim myInput As Range
Dim myOutput As Range
Set myInput = ActiveSheet.Range("A1")
Set myOutput = ActiveSheet.Range("A2")
strPattern = "(\D*)[\s]\d*[\s](\D*)[\s]\d*[\s](\D*)[\s]\d*"
strInput = myInput.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
ActiveSheet.Range("A2") = regEx.Replace(strInput, "$1, $2, $3")
End If
End Sub
You probably had that formula in a specific cell, right? I think the following should do:
Range("yourcell").FormulaR1C1 = "=MID("XA 2009",1,FIND(" ","XA 2009",FIND(" ","XA 2009")+1)-1)"
Just replace "yourcell" with the cell number you had the formula in. So if you had it in cell A1, for example, it should be Range("A1")

Pull Site Code from Location Name (VBA)

So I have a customer that need a specific code isolated from the name of each location. I have the following formula that I have been manually editing, but was wondering if there is a way to have it possibly count the characters in a cell and pull the codes to a new cell.
Example Location Name: MRI-LENOX HILL RADIOLOGY 150/14101
=RIGHT(A1,FIND("/",A1)-19)
The code format is 0123/01234 (3 to 4 characters in front of the slash and 5 after)
Any help in this regard would be much appreciated.
Thanks,
Justin Hames
You can use a regex to find and extract the code from the cell value. For example:
With CreateObject("VBScript.RegExp")
.Pattern = "\d{3,4}/\d{5}"
If .Test(Range("A1")) Then
Range("B1") = .Execute(Range("A1"))(0)
End If
End With
This will extract the code from A1 and place it into B1.
Edit, with respect to comments:
To run on a range of cells:
Dim re
Set re = CreateObject("VBScript.RegExp")
re.Pattern = "\d{3,4}/\d{5}"
Dim r As Range
For Each r In Range("A1:A100")
If re.Test(r) Then r.Offset(0, 1) = re.Execute(r)(0)
Next

Extract 5-digit number from one column to another

I need help with extracting 5-digit numbers only from one column to another in Excel 2010. These numbers can be in any position of the string (beginning of the string, anywhere in the middle, or at the end). They can be within brackets or quotes like:
(15478) or "15478" or '15478' or [15478]
I need to ignore any numbers that are less than 5 digits and include numbers that start with 1 or more leading zeros (like 00052, 00278, etc.) and ensure that leading zeros are copied over to the next column. Could someone help me with either creating a formula or UDF?
Here is a formula-based alternative that will extract the first 5 digit number found in cell A1. I tend to prefer reasonably simple formula solutions over VBA in most situations as formulas are more portable. This formula is an array formula and thus must be entered with Ctrl+Shift+Enter. The idea is to split the string up into every possible 5 character chunk and test each one and return the first match.
=MID(A1,MIN(IF(NOT(ISERROR(("1"&MID(A1,ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE)),5)&".1")*1))*ISERROR(MID(A1,ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE))+5,1)*1)*ISERROR(MID(A1,ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE))-1,1)*1),ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE)),9999999999)),5)
Let's break this down. First we have an expression I used twice to return an array of numbers from 1 up to 4 less than the length of your initial text. So if you have a string of length 10 the following will return {1,2,3,4,5,6}. Hereafter the below formula will be referred to as rowlist. I used R1C1 notation to avoid potential circular references.
ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE))
Next we will use that array to split the text into an array of 5 letter chunks and test each chunk. The test being performed is to prepend a "1" and append ".1" then verify the chunk is numeric. The prepend and append eliminate the possibility of white space or decimals. We can then check the character before and the character after to make sure they are not numbers. Hereafter the below formula will be referred to as isnumarray.
NOT(ISERROR(("1"&MID(A1,rowlist,5)&".1")*1))
*ISERROR(MID(A1,rowlist+5,1)*1)
*ISERROR(MID(A1,rowlist-1,1)*1)
Next we need to find the first valid 5 digit number in the string by returning the current index from a duplicate of the rowlist formula and returning a large number for non-matches. Then we can use the MIN function to grab that first match. Hereafter the below will be referred to as minindex.
MIN(IF(isnumarray,rowlist,9999999999))
Finally we need to grab the numeric string that started at the index returned by the MIN function.
MID(A1,minindex,5)
The following UDF will return the first five digit number in the string, including any leading zero's. If you need to detect if there is more than one five digit number, the modifications are trivial. It will return a #VALUE! error if there are no five-digit numbers.
Option Explicit
Function FiveDigit(S As String, Optional index As Long = 0) As String
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Pattern = "(?:\b|\D)(\d{5})(?:\b|\D)"
.Global = True
FiveDigit = .Execute(S)(index).submatches(0)
End With
End Function
As you may see from the discussion between Mark and myself, some of your specifications are unclear. But if you would want to exclude decimal numbers, when the decimal portion has five digits, then the regex pattern in my code above should be changed:
.Pattern = "(?:\d+\.\d+)|(?:\b|\D)(\d{5})(?:\b|\D)"
I just wrote this UDF for you , basic but will do it...
It will find the first 5 consecutive numbers in a string, very crude error checking so it just says Error if anything isn't right
Public Function GET5DIGITS(value As String) As String
Dim sResult As String
Dim iLen As Integer
sResult = ""
iLen = 0
For i = 1 To Len(value)
If IsNumeric(Mid(value, i, 1)) Then
sResult = sResult & Mid(value, i, 1)
iLen = iLen + 1
Else
sResult = ""
iLen = 0
End If
If iLen = 5 Then Exit For
Next
If iLen = 5 Then
GET5DIGITS = Format(sResult, "00000")
Else
GET5DIGITS = "Error"
End If
End Function