How to find and copy specific text inside a string VBA? - vba

I have a bunch of strings that i need to extract the phone numbers from, how do I manage to get them from this string and paste in a worksheet knowing that they all have the formatting
(??) ????-???? where ? is a random number from 0 to 9 and knowing that there could be multiple phone numbers inside the same string?
Example:
"Acreaves Alimentos. Rodovia Do Pacifico, (68) 3546-4754 Br 317, Km 8, S/N - Zona Rura... Brasileia - AC | CEP: 69932-000. (68) 3546-5544. Enviar "
would return (68) 3546-4754 and (68) 3546-5544

I have a snippet of code here which sets up a regular expression for the format you have specified and searches the string, then providing a msgbox for each instance it finds.
You need to ensure that you have added (using Tools->References) the Microsoft VBScript Regular Expressions 5.5 reference, or you will fail to create the RegExp object initially.
The regex pattern in this case is specified to allow a bracket (escaped with a \ since otherwise it has special meaning in a regular expression), then two digits, each of which can be 0-9, a close bracket (escaped again), \s to indicate a space, followed by 4 digits in the character set 0-9, a dash (escaped again) and the final four digits in the 0-9 set.
Don't forget to set the regex Global attribute to True so that it returns all matches.
sString = "Acreaves Alimentos. Rodovia Do Pacifico, (68) 3546-4754 Br 317, Km 8, S/N - Zona Rura... Brasileia - AC | CEP: 69932-000. (68) 3546-5544 . Enviar"
Dim oReg : Set oReg = New RegExp
oReg.Global = True
oReg.Pattern = "\([0-9]{2}\)\s[0-9]{4}\-[0-9]{4}"
Set Matches = oReg.Execute(sString)
For Each oMatch In Matches
MsgBox oMatch.Value
Next
Should do what you require, based on your details and the string you provided.

If the format actually stays the same throughout you can try something like this:
a = "Acreaves Alimentos. Rodovia Do Pacifico, (68) 3546-4754 Br 317, Km 8, S/N - Zona Rura... Brasileia - AC | CEP: 69932-000. (68) 3546-5544. Enviar "
arrNums = Split(a, "(")
For i = 1 To UBound(arrNums)
num = "(" & Left(arrNums(i), 13)
Next

This function will return an array containing the numbers:
Function ReturnNumbers(s As String) As variant
Dim s As String, a As Variant, r As Variant, i As Integer
a = Split(s, "(")
ReDim r(1 To UBound(a, 1))
For i = 1 To UBound(a, 1)
r(i) = "(" & Left(a(i), 13)
Next
ReturnNumbers = r
End Function

Related

How to split an unicode-string to readable characters?

I have a VBA formula-function to split a string and add space between each character. It works fines only for an Ascii string. But I want to do the same for the Tamil Language. Since it is Unicode, the result is not readable. It splits even the auxiliary characters, Upper dots, Prefix, Suffix auxilary characters which should not be separated in Tamil/Hindi/Kanada/Malayalam/All India Languages. So, how to write a function to split a Tamil Word into readable characters.
Function AddSpace(Str As String) As String
Dim i As Long
For i = 1 To Len(Str)
AddSpace = AddSpace & Mid(Str, i, 1) & " "
Next i
AddSpace = Trim(AddSpace)
End Function
Adding Space is not the important point of this question. Splitting the Unicode string into an array from any of those languages is the requirement.
For example, the word, "பார்த்து" should be separated as "பா ர் த் து", not as "ப ா ர ் த ் த ு". As you can see, the first two letters "பா" (ப + ா) are combined. If I try to manually put a space in between them, I can't do it in any word processor. If you want to test, please put it in Notepad and add space between each character. It won't allow you to separate as ("ப ா"). So "பார்த்து" should be separated as "பா ர் த் து". It is the correct separation in Tamil like languages. This is the one that I am struggling to achieve in VBA.
The Character Code table for Tamil is here.
Tamil/Hindi/many Indian languages have (1)Consonants, (2)Independent vowels, (3)Dependent vowel signs, (4)Two-part dependent vowel signs. Among these 4 types, the first two are each one separate lettter, no issues with them. but the last 2 are dependent, they should not be separated from its joint character. For example, the letter, பா (ப + ் ), it contains one independent (ப) and one dependent (ா) letter.
If this info is not enough, please comment what should I post more.
(Note: It is possible in C#.Net using the code from the MS link by #Codo)
You can assign a string to a Byte array so the following might work
Dim myBytes as Byte
myBytes = "Tamilstring"
which generates two bytes for each character. You could then create a second byte array twice the size of the first by using space$ to crate a suitable string and then use a for loop (step 4) to copy two bytes at a time from the first to the second array. Finally, assign the byte array back to a string.
The problem you have is you are looking for what Unicode calls an extended grapheme cluster.
For a Unicode compatible regex engine that is simply /\X/
Not sure how you do that in VBA.
Referring the link mentioned by #ScottCraner in comments on the question and Character code for Tamil.
Check the result in cell A2 and highlighted in yellow are Dependent vowel signs which are used in DepVow string
Sub Split_Unicode_String()
'https://stackoverflow.com/questions/68774781/how-to-split-an-unicode-string-to-readable-characters
Dim my_string As String
'input string
Dim buff() As String
'array of input string characters
Dim DepVow As String
'Create string of Dependent vowel signs
Dim newStr As String
'result string with spaces as desired
Dim i As Long
my_string = Range("A1").Value
ReDim buff(Len(my_string) - 1) 'array of my_string characters
For i = 1 To Len(my_string)
buff(i - 1) = Mid$(my_string, i, 1)
Cells(1, i + 2) = buff(i - 1)
Cells(2, i + 2) = AscW(buff(i - 1)) 'used this for creating DepVow below
Next i
'Create string of Dependent vowel signs preceded and succeeded by comma
DepVow = "," & Join(Array(ChrW$(3006), ChrW$(3021), ChrW$(3009)), ",")
newStr = ""
For i = LBound(buff) To UBound(buff)
If InStr(1, DepVow, ChrW$(AscW(buff(i + 1))), vbTextCompare) > 0 Then
newStr = newStr & ChrW$(AscW(buff(i))) & ChrW$(AscW(buff(i + 1))) & " "
i = i + 1
Else
newStr = newStr & ChrW$(AscW(buff(i))) & " "
End If
Next i
'result string in range A2
Cells(2, 1) = Left(newStr, Len(newStr) - 1)
End Sub
Try below algorithm. which will concat all the mark characters with letter characters.
redim letters(0)
For i=1 To Len(Str)
If ascW(Mid(Str,i,1)) >3005 And ascW(Mid(Str,i,1)) <3022 Then
letters(UBound(letters)-1) = letters(UBound(letters)-1)+Mid(Str,i,1)
Else REDIM PRESERVE
letters(UBound(letters) + 1)
letters(UBound(letters)-1) = Mid(Str,i,1)
End If
Next
MsgBox(join(letters, ", "))'return பா, ர், த், து,

Split cells into 2 columns when any number

I can see numerous posts around this topic but none that specifically solves the problem I have.
I have a string that has text and numbers. I need to split the string into 2 columns when it first sees a number.
Example:
Ballyvic Boru5/6
First Drift2/1
Sizing Cusimanoin15/2
Becomes:
You can use a simple formula to find the first number, along with LEFT and MID to split the string.
Part 1:
=LEFT(A1,MIN(FIND({1,2,3,4,5,6,7,8,9,0},A1&"1234567890"))-1)
Part 2:
=MID(A1,MIN(FIND({1,2,3,4,5,6,7,8,9,0},A1&"1234567890")),99)
Here's a regex method:
You must set a reference to Microsoft VBScript Regular Expressions x.x, where x.x is the highest version you have (mine is 5.5)
Option Explicit
Sub splitCells()
Dim RegEx As New RegExp, i As Long, tempStr As String
With RegEx
.Global = True
.IgnoreCase = True
.Pattern = "(([a-z]*\s?)*\D)(\d.*)"
End With
With ThisWorkbook.Worksheets(1)
For i = 1 To .Cells(.Rows.Count, 1).End(xlUp).Row
If RegEx.Test(.Cells(i, 1)) Then
tempStr = .Cells(i, 1)
.Cells(i, 1) = RegEx.Replace(tempStr, "$1")
.Cells(i, 2) = RegEx.Replace(tempStr, "$3")
End If
Next i
End With
End Sub
Breaking down the Regular Expression:
(([a-z]*\s?)*\D)(\d.*)
[a-z]* matches any character in the alphabet, with the * multiplier for unlimited occurances
\s? Matches any whitespace character, with the ? multiplier to match 0-1 occurances (meaning there may or may not be a white space
Both of the above is enclosed in a grouping (), followed by another * to match 0-unlimited occurances
\D This excludes all digits
The above is enclosed in a group with the first (([..])*\D)
We have our final group: (\d.*), which matches the first digit and everything else afterwards.
Here's a pair functions you can use on the worksheet (as opposed to having to run a VBA procedure to 'fix' the cells one time):
Public Function splitNum1(str As String) As String
Dim p
For p = 1 To Len(str)
If Mid(str, p, 1) Like "#" Then Exit For
Next
splitNum1 = Left(str, p - 1)
End Function
Public Function splitNum2(str As String) As String
splitNum2 = Right(str, Len(str) - Len(splitNum1(str)))
End Function
splitNum1 returns the string on the "left" side of the number.
splitNum2 returns the string beginning with the first nummber.

vba excel - Find and replace on condition + within same cell multiple occurance

I am trying to write a VBA code ; I have 3-days experience as vba programmer. So trying my best based on my pascal programming experience.
find number in hexadecimal string from excel, check the position of number if its odd then replace the number with new number. If its not odd then continue searching for other occurrence within the same string.
I have 15,000 hexa strings where I need to recursively search. range(B1:B15000)
Example:
Hexa string - Cell B1 - 53706167686574746920616c6c9261676c696f2c206f6c696f20652070657065726f63696e692537
translates to text - Spaghetti all�aglio, olio e peperocini
i want to replace 92(�) with 65(e) but in hexa string you notice there are multiple occurrences of 92 number but only one 92 falls at odd position to be replaced.
In excel I tried following:
=IF(ISODD(IF(ISERROR(SEARCH(92,B5)),0,SEARCH(92,B5)))=TRUE,SUBSTITUTE(B5,92,"27"),"no 92")
This works only for first occurrence in cell,
tried modifying it to search further but no luck:
=IF(ISODD(IF(ISERROR(SEARCH(92,B6)),0,SEARCH(92,B6)))=TRUE,SUBSTITUTE(B6,92,"27"),IF(ISODD(IF(ISERROR(SEARCH(92,B6,SEARCH(92,B6)+1)),0,SEARCH(92,B6,SEARCH(92,B6)+1)))=TRUE,SUBSTITUTE(B6,92,"27"),"no 92"))
Any suggestions are welcome.
How about a small UDF, looking only at every second position?
Function replaceWhenAtOddPos(ByVal s As String, findStr As String, replaceStr As String)
replaceWhenAtOddPos = s
If Len(findStr) <> 2 Or Len(replaceStr) <> 2 Then Exit Function
Dim i As Long
For i = 1 To Len(s) Step 2
If Mid(s, i, 2) = findStr Then s = Left(s, i - 1) & replaceStr & Mid(s, i + 2)
Next i
replaceWhenAtOddPos = s
End function
call:
replaceWhenAtOddPos("53706167686574746920616c6c9261676c696f2c206f6c696f20652070657065726f63696e692537", "92", "65")
Please put the following UDF in a standard module:
Public Function replace_hex(str As String, srch As Integer, repl As Integer) As String
Dim pos As Integer
pos = InStr(pos + 1, str, CStr(srch))
Do Until pos = 0
If pos Mod 2 = 0 Then str = Left(str, pos - 1) & CStr(repl) & Mid(str, pos + 2)
pos = InStr(pos + 1, str, CStr(srch))
Loop
replace_hex = str
End Function
and call it in your worksheet like that:
=replace_hex(A1,92,65)

Substitute wildcard characters (? and *) in Excel 2010 vba macro

Using a macro in Excel 2010, I am trying to replace all "invalid" characters (as defined by a named range) with spaces.
Dim sanitisedString As String
sanitisedString = Application.WorksheetFunction.Clean(uncleanString)
Dim validCharacters As Range
Set validCharacters = ActiveWorkbook.Names("ValidCharacters").RefersToRange
Dim pos As Integer
For pos = 1 To Len(sanitisedString)
If WorksheetFunction.CountIf(validCharacters, Mid(sanitisedString, pos, 1)) = 0 Then
sanitisedString = WorksheetFunction.Replace(sanitisedString, pos, 1, " ")
End If
Next
It works for all characters except * and ?, because CountIf is interpreting those as wildcard characters.
I have tried escaping all characters in the CountIf, using:
If WorksheetFunction.CountIf(validCharacters, "~" & Mid(sanitisedString, pos, 1)) = 0
but this led to all characters being replaced, regardless of whether they are in the list or not.
I then tried doing two separate Substitute commands, placed after the for loop using "~*" and "~?":
sanitisedString = WorksheetFunction.Substitute(sanitisedString, "~*", " ")
sanitisedString = WorksheetFunction.Substitute(sanitisedString, "~?", " ")
but the * and ? still make it through.
What am I doing wrong?
Since there are onlyl two wildcards to worry about, you can test for those explicitly:
Dim character As String
For pos = 1 To Len(sanitisedString)
character = Mid(sanitisedString, pos, 1)
If character = "*" Or character = "?" Then character = "~" & character
If WorksheetFunction.CountIf(validCharacters, character) = 0 Then
Mid$(sanitisedString, pos, 1) = " "
End If
Next

VB.net Question with array search

I have 10 lines of array that are first name space last name space zip code. All the zip codes start with different numbers. Is there a way to replace the #1 in the indexof below so that it searches for any number character instead?
'open file
inFile = IO.File.OpenText("Names.txt")
'process the loop instruct until end of file
intSubscript = 0
Do Until inFile.Peek = -1 OrElse intSubscript = strLine.Length
strLine(intSubscript) = inFile.ReadLine
intSubscript = intSubscript + 1
Loop
inFile.Close()
intSubscript = 0
strFound = "N"
Do Until strFound = "Y" OrElse intSubscript = strLine.Length
intIndex = strLine(intSubscript).IndexOf("1")
strName = strLine(intSubscript).Substring(0, intIndex - 1)
If strName = strFullname Then
strFound = "Y"
strZip = strLine(intSubscript).Substring(strLine(intSubscript).Length - 5, 5)
txtZip.Text = strZip
End If
Loop
End Sub
use a regular expression.
Regular expressions allow you to do pattern matching on text. It's like String.IndexOf() with wildcard support.
For example, suppose your source data looks like this:
James Harvey 10939
Madison Whittaker 33893
George Keitel 22982
...and so on.
Expressed in English, the pattern each line follows is this:
the beginning of the string, followed by
a sequence of 1 or more alphabetic characters, followed by
a sequence of one or more spaces, followed by
a sequence of 1 or more alphabetic characters, followed by
a sequence of one or more spaces, followed by
a sequence of 5 numeric digits, followed by
the end of the string
You can express that very precisely and succintly in regex this way:
^([A-Za-z]+) +([A-Za-z]+) +([0-9]{5})$
Apply it in VB this way:
Dim sourcedata As String = _
"James Harvey 10939" & _
vbcrlf & _
"Madison Whittaker 33893" & _
vbcrlf & _
"George Keitel 22982"
Dim regex = "^([A-Za-z]+) +([A-Za-z]+) +([0-9]{5})$"
Dim re = New Regex(regex)
Dim lineData As String() = sourceData.Split(vbcrlf.ToCharArray(), _
StringSplitOptions.RemoveEmptyEntries )
For i As Integer = 0 To lineData.Length -1
System.Console.WriteLine("'{0}'", lineData(i))
Dim matchResult As Match = re.Match(lineData(i))
System.Console.WriteLine(" zip: {0}", matchResult.Groups(3).ToString())
Next i
To get that code to compile, you must import the System.Text.RegularExpressions namespace at the top of your VB module, to get the Regex and Match types.
If your input data follows a different pattern, then you will need to adjust your regex.
For example if it could be "Chris McElvoy III 29828", then you need to adjust the regex accordingly, to handle the name suffix.