How do I remove the characters? - vba

How do I remove special characters and alphabets in a string ?
qwert1234*90)! ' this might be my cell value
I have to convert it to
123490 ' I mean I have to remove everything but keep only the numbers in string
but it should allow spaces !
qwe123 4567*. 90 ' String with spaces
123 4567 90 ' output should be
I found the vba Replace - but writing a replace for each character makes my code big. Well let me tell you clearly without hiding anything from you:
input: qwe123 4567*. 90 ' String with spaces cells(1,"A").value
My idea to do these next: 123 4567 90 ' remove characters first keeping white spaces
final output in A1:A3
123
4567
90
(for every space it should insert row and fill that)
Could you tell me how do remove all characters except numbers and spaces in string?
Thanks In advance

You need to use a regular expression.
See this example:
Option Explicit
Sub Test()
Const strTest As String = "qwerty123 456 uiops"
MsgBox RE6(strTest)
End Sub
Function RE6(strData As String) As String
Dim RE As Object, REMatches As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.MultiLine = False
.Global = True
.IgnoreCase = True
.Pattern = "([0-9]| )+"
End With
Set REMatches = RE.Execute(strData)
RE6 = REMatches(0)
End Function
Explanation:
Pattern = "([0-9]| )+" will match any 0 or more group (+) containing a number ([0-9]) or (|) a space ().
Some more info on the regexp:
a thread on ozgrid
a very good reference about regexp

Non-re alternative;
Public Function fmt(sValue As String) As String
Dim i As Long
For i = 1 To Len(sValue) '//loop each char
Select Case Mid$(sValue, i, 1) '//examine current char
Case "0" To "9", " " '//permitted chars
'//ok
Case Else
Mid$(sValue, i, 1) = "!" '//overwrite char in-place with "!"
End Select
Next
fmt = Replace$(sValue, "!", "") '//strip invalids & return
End Function
For:
?fmt("qwe123 4567*. 90")
123 4567 90

Those two funny codes will do both of your whishes..
Sub MySplitter(strInput As String)
Row = 10 ' Start row
Col = "A" ' Column Letter
Range(Col & Row) = "" ' Clean the start cell
For i = 1 To Len(strInput) ' Do with each Character in input string...
c = Mid(strInput, i, 1) ' Get actual char
If IsNumeric(c) Then Range(Col & Row) = Range(Col & Row) & c ' If numeric then append to actual cell
If (c = " ") And (Range(Col & Row) <> "") Then 'If space and actual row is not empty then...
Row = Row + 1 ' Jump to next row
Range(Col & Row) = "" ' Clean the new cell
End If
Next
End Sub
Function KeepNumbersAndSpaces(ByVal strInput As String)
For i = 1 To Len(strInput) ' Do with each Character in input string...
c = Mid(strInput, i, 1) ' Get actual char
If IsNumeric(c) Or c = " " Then ' If numeric or a space then append to output
KeepNumbersAndSpaces = KeepNumbersAndSpaces & c
End If
Next
End Function
Sub Test()
strInput = "qwert1234*90)! qwe123 4567*. 90"
MySplitter (strInput)
Range("A5") = KeepNumbersAndSpaces(strInput)
End Sub

Something like this to
split the string using a regexp
place the matches into an array
dump the array to an automatically sized spreadsheet range
main sub
Sub CleanStr()
Dim strOut As String
Dim Arr
strOut = Trim(KillChar("qwe123 4567*. 90 "))
Arr = Split(strOut, Chr(32))
[a1].Resize(UBound(Arr) + 1, 1) = Application.Transpose(Arr)
End Sub
function
Function KillChar(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "[^\d\s]+"
KillChar = .Replace(strIn, vbNullString)
End With
End Function

Related

Excel VBA - Split string with variable delimiter count

How I can split the string with variable delimiter count:
s = "a1 b2 c d e"
into array:
arr(1) = "a1"
arr(2) = "b2"
arr(4) = "c"
arr(5) = "d"
arr(6) = "e"
The split-function does not give a desired result:
arr = Split(s, " ")
Thanks!
Use WorksheetFunction.Trim to remove leading and trailing spaces, as well as extra inner spaces.
Dim s As String '<~ don't use Str
s = "a b c d e"
s = WorksheetFunction.Trim(s)
The pure VBA approach is to use a loop and the replace function
Public Function Dedup(ByVal ipSource As String, ByVal ipDedup As String) As String
Dim mySource As String
mySource = ipSource
Dim MyDedupDedup As String
MyDedupDedup = ipDedup & ipDedup
Do
DoEvents ' Always put a doevents in a Do loop
Dim myLen As Long
myLen = Len(mySource)
mySource = Replace(mySource, MyDedupDedup, ipDedup)
Loop Until myLen = Len(mySource)
Dedup = mySource
End Function
If you have leading or trailing characters you can use a more flexible trim function
Public Function Trimmer(ByVal ipString As String, Optional ByVal ipTrimChars As String = " ,;" & vbCrLf & vbTab) As String
Dim myString As String
myString = ipString
Dim myIndex As Long
For myIndex = 1 To 2
If VBA.Len(myString) = 0 Then Exit For
Do While VBA.InStr(ipTrimChars, VBA.Left$(myString, 1)) > 0
DoEvents ' Always put a do event statement in a do loop
myString = VBA.Mid$(myString, 2)
Loop
myString = VBA.StrReverse(myString)
Next
Trimmer = myString
End Function

Insert space between text and number for cells within column

I've already written a code that inserts a space between text and numbers, separating 'unspaced' days and months from dates, and it works as it's supposed to.
The only problem is that I'm using an If then structure to determine which Regular Expressions pattern I should use.
If the first character of the date is a number, then knowing that it is in the 'DayMonth' sequence, I use this pattern: "(.*\d)(?! )(\D.*)". Otherwise, assuming that it isn't in the 'DayMonth' sequence but rather in the 'MonthDay' sequence, I use the other pattern: "(.*\D)(?! )(\d.*)".
Is there any way to use two patterns at once for the Regular Expressions object to scan through so that I can get rid of the If Then structure?
My code below:
Sub SpaceMonthDayIf()
Dim col As Range
Dim i As Long
Set col = Application.InputBox("Select Date Column", "Obtain Object Range", Type:=8)
With CreateObject("VBScript.RegExp")
For i = 1 To Cells(Rows.Count, col.Column).End(xlUp).Row
If IsNumeric(Left(Cells(i, col.Column).Value, 1)) Then
.Pattern = "(.*\d)(?! )(\D.*)"
Cells(i, col.Column) = .Replace(Cells(i, col.Column), "$1 $2")
Else
.Pattern = "(.*\D)(?! )(\d.*)"
Cells(i, col.Column) = .Replace(Cells(i, col.Column), "$1 $2")
End If
Next
End With
End Sub
For clarity, here's what happens when I run my code:
Try this code
Sub Test()
Dim a, i As Long
With Range("A2", Range("A" & Rows.Count).End(xlUp))
a = .Value
With CreateObject("VBScript.RegExp")
.Global = True
.Pattern = "(\d+)"
For i = 1 To UBound(a, 1)
a(i, 1) = Application.Trim(.Replace(a(i, 1), " $1 "))
Next i
End With
.Columns(2).Value = a
End With
End Sub
You can avoid that by inserting your space differently. Here is a Function written with early-binding, but you can change that to late-binding.
Match the junction between a letter and a number, then construct a string, inserting a space appropriately.
Option Explicit
Function InsertSpace(S As String) As String
Const sPat As String = "[a-z]\d|\d[a-z]"
Dim RE As RegExp, MC As MatchCollection
Set RE = New RegExp
With RE
.Global = False
.Pattern = sPat
.IgnoreCase = True
If .Test(S) = True Then
Set MC = .Execute(S)
With MC(0)
InsertSpace = Left(S, .FirstIndex + 1) & " " & Mid(S, .FirstIndex + 2)
End With
End If
End With
End Function
You can also accomplish this without using Regular Expressions:
EDIT Pattern change for Like operator
Option Explicit
Option Compare Text
Function InsertSpace2(S As String) As String
Dim I As Long
For I = 1 To Len(S)
If Mid(S, I, 2) Like "#[a-z]" Or Mid(S, I, 2) Like "[a-z]#" Then
InsertSpace2 = Left(S, I) & " " & Mid(S, I + 1)
Exit Function
End If
Next I
End Function

using VBA, how to remove the addidtional repeated characters at last

I have multiple rows, I need to join with "##" characters at end of every cell value,
I am able add this characters but at the end it is printing extra characters(##)
My excel file: From this excel file, I need to join the values by ## for each cell and feed into notepad
Excel File for Input
My output should be:(Actual and Expected)
Actual and Expected Output
Here Is my code:
sub join()
dim LRow as long
dim LCol as long
Dim str1 as string
Dim str2 as string
Dim ws1 As Worksheet
Set ws1 = ActiveWorkbook.Worksheets(1)
plik = ThisWorkbook.Path & "\" & "BL2ASIS" & ws1.Name & ".txt"
Open plik For Output As 2
With ws1
LRow = .Cells(.Rows.Count, 1).End(xlUp).Row
LCol = .Cells(1, .Columns.Count).End(xlToLeft).Column
LCol = LCol - 2
slast = vbNullString
str2 = Join(Application.Transpose(Application.Transpose(.Cells(n, "A").Resize(1, 2).Value)), "")
str1 = str2 & Join(Application.Transpose(Application.Transpose(.Cells(n, "C").Resize(1, LCol).Value)), "##") & "##"
str1 = Replace(str1, "=", vbNullString)
str1 = Replace(str1, "####", "##")
Print #2, str1
End with
end sub
You could replace your line:
str1 = Replace(str1, "####", "##")
with:
Do Until Len(str1) = Len(Replace(str1, "####", "##"))
str1 = Replace(str1, "####", "##")
Loop
which will keep applying the replace until there is no point in doing so (i.e. the length doesn't change)
EDIT
Sorry to alter an accepted answer, but I've noticed that you might want to keep instances of #### if they occur somewhere other than at the end of the row. If you do then the following would be better, as it only trims the right-most characters:
Do Until Right(str1, 4) <> "####"
str1 = Left(str1, Len(str1) - 2)
Loop
The reason you are getting repeated characters is because you are joining empty array elements. An alternative to removing repeated delimiters is to use a UDF to only join non null values. Please see below for such a function.
Sub TestJoin()
Dim r As Range: Set r = Worksheets("Sheet1").Range("B1:B12")
Dim arr() As Variant
arr = Application.Transpose(r)
Debug.Print NonNullJoin(arr, "#") & "#"
End Sub
Function NonNullJoin(SourceArray() As Variant, Optional Delimiter As String = " ") As String
On Error Resume Next
Dim i As Long: For i = 0 To UBound(SourceArray)
If CStr(SourceArray(i)) <> "" Then NonNullJoin = _
IIf(NonNullJoin <> "", NonNullJoin & Delimiter & CStr(SourceArray(i)), CStr(SourceArray(i)))
Next i
End Function
Use regex to replace more than one instance.
Note:
If you want to replace repeats only at the end of the string then change regex pattern to (##){2,}$ . This will deal with 2 or more occurrences.
If only worried about two occurrences at end, use (##)\1$
Code:
Option Explicit
Sub TEST()
Dim testString As String, pattern As String
testString = "xxxxx####"
testString = RemoveChars(testString)
Debug.Print testString
End Sub
Public Function RemoveChars(ByVal inputString As String) As String
Dim regex As Object, tempString As String
Set regex = CreateObject("VBScript.RegExp")
With regex
.Global = True
.MultiLine = True
.IgnoreCase = False
.pattern = "(##){2,}"
End With
If regex.TEST(inputString) Then
RemoveChars = regex.Replace(inputString, "##")
Else
RemoveChars = inputString
End If
End Function

Deleting duplicate text in a cell in excel

I was wondering how to remove duplicate names/text's in a cell. For example
Jean Donea Jean Doneasee
R.L. Foye R.L. Foyesee
J.E. Zimmer J.E. Zimmersee
R.P. Reed R.P. Reedsee D.E. Munson D.E. Munsonsee
While googling, I stumbled upon a macro/code, it's like:
Function RemoveDupes1(pWorkRng As Range) As String
'Updateby20140924
Dim xValue As String
Dim xChar As String
Dim xOutValue As String
Set xDic = CreateObject("Scripting.Dictionary")
xValue = pWorkRng.Value
For i = 1 To VBA.Len(xValue)
xChar = VBA.Mid(xValue, i, 1)
If xDic.exists(xChar) Then
Else
xDic(xChar) = ""
xOutValue = xOutValue & xChar
End If
Next
RemoveDupes1 = xOutValue
End Function
The macro is working, but it is comparing every letter, and if it finds any repeated letters, it's removing that.
When I use the code over those names, the result is somewhat like this:
Jean Dos
R.L Foyes
J.E Zimers
R.P edsDEMuno
By looking at the result I can make out it is not what I want, yet I got no clue how to correct the code.
The desired output should look like:
Jean Donea
R.L. Foye
J.E. Zimmer
R.P. Reed
Any suggestions?
Thanks in Advance.
Input
With the input on the image:
Result
The Debug.Print output
Regex
A regex can be used dynamically iterating on the cell, to work as a Find tool. So it will extract only the shortest match. \w*( OUTPUT_OF_EXTRACTELEMENT )\w*, e.g.: \w*(Jean)\w*
The Regex's reference must be enabled.
Code
Function EXTRACTELEMENT(Txt As String, n, Separator As String) As String
On Error GoTo ErrHandler:
EXTRACTELEMENT = Split(Application.Trim(Mid(Txt, 1)), Separator)(n - 1)
Exit Function
ErrHandler:
' error handling code
EXTRACTELEMENT = 0
On Error GoTo 0
End Function
Sub test()
Dim str As String
Dim objMatches As Object
Set objRegExp = CreateObject("VBScript.RegExp") 'New regexp
lastrow = ActiveSheet.Cells(ActiveSheet.Rows.Count, "A").End(xlUp).Row
For Row = 1 To lastrow
str = Range("A" & Row)
F_str = ""
N_Elements = UBound(Split(str, " "))
If N_Elements > 0 Then
For k = 1 To N_Elements + 1
strPattern = "\w*(" & EXTRACTELEMENT(CStr(str), k, " ") & ")\w*"
With objRegExp
.Pattern = strPattern
.Global = True
End With
If objRegExp.test(strPattern) Then
Set objMatches = objRegExp.Execute(str)
If objMatches.Count > 1 Then
If objRegExp.test(F_str) = False Then
F_str = F_str & " " & objMatches(0).Submatches(0)
End If
ElseIf k <= 2 And objMatches.Count = 1 Then
F_str = F_str & " " & objMatches(0).Submatches(0)
End If
End If
Next k
Else
F_str = str
End If
Debug.Print Trim(F_str)
Next Row
End Sub
Note that you can Replace the Debug.Print to write on the target
cell, if it is column B to Cells(Row,2)=Trim(F_str)
Explanation
Function
You can use this UDF, that uses the Split Function to obtain the element separated by spaces (" "). So it can get every element to compare on the cell.
Loops
It will loop from 1 to the number of elements k in each cell and from row 1 to lastrow.
Regex
The Regex is used to find the matches on the cell and Join a new string with the shortest element of each match.
This solution operates on the assumption that 'see' (or some other three-letter string) will always be on the end of the cell value. If that isn't the case then this won't work.
Function RemoveDupeInCell(dString As String) As String
Dim x As Long, ct As Long
Dim str As String
'define str as half the length of the cell, minus the right three characters
str = Trim(Left(dString, WorksheetFunction.RoundUp((Len(dString) - 3) / 2, 0)))
'loop through the entire cell and count the number of instances of str
For x = 1 To Len(dString)
If Mid(dString, x, Len(str)) = str Then ct = ct + 1
Next x
'if it's more than one, set to str, otherwise error
If ct > 1 Then
RemoveDupeInCell = str
Else
RemoveDupeInCell = "#N/A"
End If
End Function

RegExp - Replace everything in string that does NOT match pattern with nothing

I am looking for a way to negate a previously set matching pattern in order to pull out everything that is in between two characters.
I have the following code matching comments in SQL code in the "/* comment */" format. It will pick up the original code in column A and then strip the comments, placing the trimmed string in column B:
Sub FindComments()
Dim xOutArr As Variant
Dim RegEx As Object
Dim xOutRg As Range
Dim SQLString As Variant
Dim i As Integer
Dim lr As Long
lr = Worksheets("Sheet1").Cells(Rows.count, "A").End(xlUp).Row
For i = 2 To lr
SQLString = Worksheets("Sheet1").Cells(i, "A").Value
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "(/\*(.*?)\*/)"
End With
If RegEx.test(SQLString) Then
SQLString = RegEx.replace(SQLString, "")
End If
Set RegEx = Nothing
xOutArr = VBA.Split(SQLString, ";")
Set xOutRg = Worksheets("Sheet1").Range("B" & (Worksheets("Sheet1").Cells(Rows.count, "B").End(xlUp).Row + 1))
xOutRg.Range("A1").Resize(UBound(xOutArr) + 1, 1) = Application.WorksheetFunction.Transpose(xOutArr)
Next i
End Sub
The code above will find anything written in between "/* " and " */" and then remove it, but I want to be able to also pull out anything that is in between two characters. I need to be able to match everything that does not satisfy that pattern (or some other pattern like "< comment >"). This includes line breaks, etc etc. This is specifically for VBA, and it needs to be able to search the entire string for any and all instances that that pattern appears. My goal is to put the contents in between those characters (in the pattern) into column C.
What would be the RegExp pattern for this?
Examples of SQLString would be:
1) /* Step 1 */ Select * from dual ;
2) /* Step 2 */ Select * from dual ; /* Step 3 */ Select * from Table
I am capturing the SQL code by removing the "/* Step # */" but I want to capture what is in those comments as well (in Column C). 1) and 2) are single rows. 2) has multiple queries. Each row is getting split by ";" in order to run queries one by one.
Instead of using Test you can use Match to get all matching strings from the SQL: loop over the match collection, storing each one in Col C and use Replace() to remove it from the original SQL:
Sub Tester()
ExtractComments Range("A1")
End Sub
Sub ExtractComments(c As Range)
Dim re As Object
Dim allMatches, m, txt, comm
Set re = CreateObject("VBScript.RegExp")
re.Pattern = "(/\*(.*?)\*/)"
re.ignorecase = True
re.MultiLine = True
re.Global = True
txt = c.Value
Set allMatches = re.Execute(txt)
For Each m In allMatches
comm = comm & IIf(Len(comm) > 0, vbLf, "") & m
txt = Replace(txt, m, "")
Debug.Print Trim(m)
Next m
c.Offset(0, 1).Value = txt
c.Offset(0, 2).Value = comm
End Sub