RegExp - Replace everything in string that does NOT match pattern with nothing - vba

I am looking for a way to negate a previously set matching pattern in order to pull out everything that is in between two characters.
I have the following code matching comments in SQL code in the "/* comment */" format. It will pick up the original code in column A and then strip the comments, placing the trimmed string in column B:
Sub FindComments()
Dim xOutArr As Variant
Dim RegEx As Object
Dim xOutRg As Range
Dim SQLString As Variant
Dim i As Integer
Dim lr As Long
lr = Worksheets("Sheet1").Cells(Rows.count, "A").End(xlUp).Row
For i = 2 To lr
SQLString = Worksheets("Sheet1").Cells(i, "A").Value
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "(/\*(.*?)\*/)"
End With
If RegEx.test(SQLString) Then
SQLString = RegEx.replace(SQLString, "")
End If
Set RegEx = Nothing
xOutArr = VBA.Split(SQLString, ";")
Set xOutRg = Worksheets("Sheet1").Range("B" & (Worksheets("Sheet1").Cells(Rows.count, "B").End(xlUp).Row + 1))
xOutRg.Range("A1").Resize(UBound(xOutArr) + 1, 1) = Application.WorksheetFunction.Transpose(xOutArr)
Next i
End Sub
The code above will find anything written in between "/* " and " */" and then remove it, but I want to be able to also pull out anything that is in between two characters. I need to be able to match everything that does not satisfy that pattern (or some other pattern like "< comment >"). This includes line breaks, etc etc. This is specifically for VBA, and it needs to be able to search the entire string for any and all instances that that pattern appears. My goal is to put the contents in between those characters (in the pattern) into column C.
What would be the RegExp pattern for this?
Examples of SQLString would be:
1) /* Step 1 */ Select * from dual ;
2) /* Step 2 */ Select * from dual ; /* Step 3 */ Select * from Table
I am capturing the SQL code by removing the "/* Step # */" but I want to capture what is in those comments as well (in Column C). 1) and 2) are single rows. 2) has multiple queries. Each row is getting split by ";" in order to run queries one by one.

Instead of using Test you can use Match to get all matching strings from the SQL: loop over the match collection, storing each one in Col C and use Replace() to remove it from the original SQL:
Sub Tester()
ExtractComments Range("A1")
End Sub
Sub ExtractComments(c As Range)
Dim re As Object
Dim allMatches, m, txt, comm
Set re = CreateObject("VBScript.RegExp")
re.Pattern = "(/\*(.*?)\*/)"
re.ignorecase = True
re.MultiLine = True
re.Global = True
txt = c.Value
Set allMatches = re.Execute(txt)
For Each m In allMatches
comm = comm & IIf(Len(comm) > 0, vbLf, "") & m
txt = Replace(txt, m, "")
Debug.Print Trim(m)
Next m
c.Offset(0, 1).Value = txt
c.Offset(0, 2).Value = comm
End Sub

Related

Insert space between text and number for cells within column

I've already written a code that inserts a space between text and numbers, separating 'unspaced' days and months from dates, and it works as it's supposed to.
The only problem is that I'm using an If then structure to determine which Regular Expressions pattern I should use.
If the first character of the date is a number, then knowing that it is in the 'DayMonth' sequence, I use this pattern: "(.*\d)(?! )(\D.*)". Otherwise, assuming that it isn't in the 'DayMonth' sequence but rather in the 'MonthDay' sequence, I use the other pattern: "(.*\D)(?! )(\d.*)".
Is there any way to use two patterns at once for the Regular Expressions object to scan through so that I can get rid of the If Then structure?
My code below:
Sub SpaceMonthDayIf()
Dim col As Range
Dim i As Long
Set col = Application.InputBox("Select Date Column", "Obtain Object Range", Type:=8)
With CreateObject("VBScript.RegExp")
For i = 1 To Cells(Rows.Count, col.Column).End(xlUp).Row
If IsNumeric(Left(Cells(i, col.Column).Value, 1)) Then
.Pattern = "(.*\d)(?! )(\D.*)"
Cells(i, col.Column) = .Replace(Cells(i, col.Column), "$1 $2")
Else
.Pattern = "(.*\D)(?! )(\d.*)"
Cells(i, col.Column) = .Replace(Cells(i, col.Column), "$1 $2")
End If
Next
End With
End Sub
For clarity, here's what happens when I run my code:
Try this code
Sub Test()
Dim a, i As Long
With Range("A2", Range("A" & Rows.Count).End(xlUp))
a = .Value
With CreateObject("VBScript.RegExp")
.Global = True
.Pattern = "(\d+)"
For i = 1 To UBound(a, 1)
a(i, 1) = Application.Trim(.Replace(a(i, 1), " $1 "))
Next i
End With
.Columns(2).Value = a
End With
End Sub
You can avoid that by inserting your space differently. Here is a Function written with early-binding, but you can change that to late-binding.
Match the junction between a letter and a number, then construct a string, inserting a space appropriately.
Option Explicit
Function InsertSpace(S As String) As String
Const sPat As String = "[a-z]\d|\d[a-z]"
Dim RE As RegExp, MC As MatchCollection
Set RE = New RegExp
With RE
.Global = False
.Pattern = sPat
.IgnoreCase = True
If .Test(S) = True Then
Set MC = .Execute(S)
With MC(0)
InsertSpace = Left(S, .FirstIndex + 1) & " " & Mid(S, .FirstIndex + 2)
End With
End If
End With
End Function
You can also accomplish this without using Regular Expressions:
EDIT Pattern change for Like operator
Option Explicit
Option Compare Text
Function InsertSpace2(S As String) As String
Dim I As Long
For I = 1 To Len(S)
If Mid(S, I, 2) Like "#[a-z]" Or Mid(S, I, 2) Like "[a-z]#" Then
InsertSpace2 = Left(S, I) & " " & Mid(S, I + 1)
Exit Function
End If
Next I
End Function

Deleting duplicate text in a cell in excel

I was wondering how to remove duplicate names/text's in a cell. For example
Jean Donea Jean Doneasee
R.L. Foye R.L. Foyesee
J.E. Zimmer J.E. Zimmersee
R.P. Reed R.P. Reedsee D.E. Munson D.E. Munsonsee
While googling, I stumbled upon a macro/code, it's like:
Function RemoveDupes1(pWorkRng As Range) As String
'Updateby20140924
Dim xValue As String
Dim xChar As String
Dim xOutValue As String
Set xDic = CreateObject("Scripting.Dictionary")
xValue = pWorkRng.Value
For i = 1 To VBA.Len(xValue)
xChar = VBA.Mid(xValue, i, 1)
If xDic.exists(xChar) Then
Else
xDic(xChar) = ""
xOutValue = xOutValue & xChar
End If
Next
RemoveDupes1 = xOutValue
End Function
The macro is working, but it is comparing every letter, and if it finds any repeated letters, it's removing that.
When I use the code over those names, the result is somewhat like this:
Jean Dos
R.L Foyes
J.E Zimers
R.P edsDEMuno
By looking at the result I can make out it is not what I want, yet I got no clue how to correct the code.
The desired output should look like:
Jean Donea
R.L. Foye
J.E. Zimmer
R.P. Reed
Any suggestions?
Thanks in Advance.
Input
With the input on the image:
Result
The Debug.Print output
Regex
A regex can be used dynamically iterating on the cell, to work as a Find tool. So it will extract only the shortest match. \w*( OUTPUT_OF_EXTRACTELEMENT )\w*, e.g.: \w*(Jean)\w*
The Regex's reference must be enabled.
Code
Function EXTRACTELEMENT(Txt As String, n, Separator As String) As String
On Error GoTo ErrHandler:
EXTRACTELEMENT = Split(Application.Trim(Mid(Txt, 1)), Separator)(n - 1)
Exit Function
ErrHandler:
' error handling code
EXTRACTELEMENT = 0
On Error GoTo 0
End Function
Sub test()
Dim str As String
Dim objMatches As Object
Set objRegExp = CreateObject("VBScript.RegExp") 'New regexp
lastrow = ActiveSheet.Cells(ActiveSheet.Rows.Count, "A").End(xlUp).Row
For Row = 1 To lastrow
str = Range("A" & Row)
F_str = ""
N_Elements = UBound(Split(str, " "))
If N_Elements > 0 Then
For k = 1 To N_Elements + 1
strPattern = "\w*(" & EXTRACTELEMENT(CStr(str), k, " ") & ")\w*"
With objRegExp
.Pattern = strPattern
.Global = True
End With
If objRegExp.test(strPattern) Then
Set objMatches = objRegExp.Execute(str)
If objMatches.Count > 1 Then
If objRegExp.test(F_str) = False Then
F_str = F_str & " " & objMatches(0).Submatches(0)
End If
ElseIf k <= 2 And objMatches.Count = 1 Then
F_str = F_str & " " & objMatches(0).Submatches(0)
End If
End If
Next k
Else
F_str = str
End If
Debug.Print Trim(F_str)
Next Row
End Sub
Note that you can Replace the Debug.Print to write on the target
cell, if it is column B to Cells(Row,2)=Trim(F_str)
Explanation
Function
You can use this UDF, that uses the Split Function to obtain the element separated by spaces (" "). So it can get every element to compare on the cell.
Loops
It will loop from 1 to the number of elements k in each cell and from row 1 to lastrow.
Regex
The Regex is used to find the matches on the cell and Join a new string with the shortest element of each match.
This solution operates on the assumption that 'see' (or some other three-letter string) will always be on the end of the cell value. If that isn't the case then this won't work.
Function RemoveDupeInCell(dString As String) As String
Dim x As Long, ct As Long
Dim str As String
'define str as half the length of the cell, minus the right three characters
str = Trim(Left(dString, WorksheetFunction.RoundUp((Len(dString) - 3) / 2, 0)))
'loop through the entire cell and count the number of instances of str
For x = 1 To Len(dString)
If Mid(dString, x, Len(str)) = str Then ct = ct + 1
Next x
'if it's more than one, set to str, otherwise error
If ct > 1 Then
RemoveDupeInCell = str
Else
RemoveDupeInCell = "#N/A"
End If
End Function

Match Individual Words In String In VBA

Excel 2013 here - and am attempting to match the value in cell D to the value in cell C. The part that is leaving me pulling my hair out, is the fact that if a single word exists in column C it should be removed from column D.
For example
Column C Column D
Red Hairy Hats Hairy Cowpies
Since both fields contain the word Hairy it should be updated to read like so
Column C Column D
Red Hairy Hats Cowpies
I can not uncover how to do a wildcard match on string comparison in Excel VBA. I have this syntax which does an Exact match, but how could I do single words from the string like in my example above?
Dim i As Long
Dim resArry
dataArry = Cells(1).CurrentRegion
ReDim resArry(UBound(dataArry, 1) - 1, 1)
For i = 2 To UBound(dataArry, 1)
If InStr(1, dataArry(i, 3), dataArry(i, 4), vbBinaryCompare) Then
resArry(i - 2, 0) = ""
Else
resArry(i - 2, 0) = dataArry(i, 4)
End If
Next
Range("D2").Resize(UBound(resArry, 1)) = resArry
A RegExp option with variant arrays.
Create a pattern for each C string against each D string for a whole word only replacement
\b(Red|Hairy|Hats)\b
etc
Sub Interesting()
Dim rng1 As Range
Dim X, Y
Dim lngCnt As Long
Dim ObjRegex As Object
Set rng1 = Range([c1], Cells(Rows.Count, "c").End(xlUp))
X = rng1.Value2
Y = rng1.Offset(0, 1).Value2
Set ObjRegex = CreateObject("vbscript.regexp")
With ObjRegex
.Global = True
For lngCnt = 1 To UBound(X, 1)
.Pattern = "\b(" & Join(Split(X(lngCnt, 1), Chr(32)), "|") & ")\b"
Y(lngCnt, 1) = .Replace(Y(lngCnt, 1), vbNullString)
Next
End With
rng1.Offset(0, 1).Value2 = Y
End Sub
This is not a complete answer, since I’m a bit rusty with VBA, but rather than use instr to look for matches, you might have more success splitting both strings into arrays.
The process would be something like this:
split both strings using space
for each element in the second array
test whether it’s in the first array
if it is, remove the element
Join the second array back into a string using spaces
Repeat and rinse
Private Sub Test()
Dim C As String, D As String
C = "Red Hairy Hats"
D = "hairy cowpies"
Debug.Print RemoveMatches(C, D)
End Sub
Private Function RemoveMatches(C As String, D As String) As String
Dim Sp() As String
Dim i As Integer
Sp = Split(C)
For i = 0 To UBound(Sp)
If InStr(1, D, Sp(i), vbTextCompare) Then
D = Trim(Replace(D, Sp(i), "", Compare:=vbTextCompare))
End If
Next i
RemoveMatches = D
End Function

Getting individual numbers from a string

I want to extract individual numbers from a string. So for:
x = " 99 1.2 99.25 "
I want to get three individual numbers: 99, 1.2, and 99.25.
Here is my current code. It extracts the first occurring number, but I do not know how to use loops to get the three individual numbers.
Sub ExtractNumber()
Dim rng As Range
Dim TestChar As String
Dim IsNumber As Boolean
Dim i, StartChar, LastChar, NumChars As Integer
For Each rng In Selection
IsNumber = False
i = 1
Do While IsNumber = False And i <= Len(rng)
TestChar = Mid(rng, i, 1)
If IsNumeric(TestChar) = True Then
StartChar = i
IsNumber = True
End If
i = i + 1
Loop
IsNumber = False
Do While IsNumber = False And i <= Len(rng)
TestChar = Mid(rng, i, 1)
If IsNumeric(TestChar) = False Or i = Len(rng) Then
If i = Len(rng) Then
LastChar = i
Else
LastChar = i - 1
End If
IsNumber = True
End If
i = i + 1
Loop
NumChars = LastChar - StartChar + 1
rng.Offset(0, 1).Value = Mid(rng, StartChar, NumChars)
Next rng
End Sub
My previous attempt (input is stored in cell A6):
Dim x, y, z As String
x = Range("A6")
y = Len(x)
For i = 1 To Len(x)
If IsNumeric(Mid(x, i, 1)) Then
z = z & Mid(x, i, 1)
End If
Next i
MsgBox z
If speed is not an issue (if the task is not intensive, etc) then you can use this
Public Sub splitme()
Dim a As Variant
Dim x As String
Dim i, j As Integer
Dim b() As Double
x = "1.2 9.0 0.8"
a = Split(x, " ")
j = 0
ReDim b(100)
For i = 0 To UBound(a)
If (a(i) <> "") Then
b(j) = CDbl(a(i))
j = j + 1
End If
Next i
ReDim Preserve b(j - 1)
End Sub
Error checking needs to be included for b(100), to suit your particular needs - and with CDbl.
If this is to be used as part of a loop, or for large x - or both, consider other options like RegEx (previous answer) - as repeated calls to ReDim Preserve are generally best avoided.
Rather than writing your own code to extract the numbers, why not try using Regular Expressions? This website has a lot of great info and tutorials on regular expressions. It can be a bit baffling at first but once you get the hang of it it's a very powerful tool for solving problems of this type.
Below is an example of extracting the information you're after using a regular expression object.
Public Sub ExtractNumbers()
'Regular Expression Objects
Dim objRegEx As Object
Dim objMatches As Object
Dim Match As Object
'String variable for source string
Dim strSource As String
'Iteration variable
Dim i As Integer
'Create Regular Expression Object
Set objRegEx = CreateObject("VBScript.RegExp")
'Set objRegEx properties
objRegEx.Global = True '<~~ We want to find all matches
objRegEx.MultiLine = True '<~~ Allow line breaks in source string
objRegEx.IgnoreCase = False '<~~ Not strictly necessary for this example
'Below pattern matches an integer or decimal number 'word' within a string
' \b matches the start of the word
' [+-]? optionally matches a + or - symbol
' [0-9]+ matches one or more digits in sequence
' (\.[0-9]+)? optionally matches a period/decimal point followed by one or more digits
' \b matches the end of the word
objRegEx.Pattern = "\b[+-]?[0-9]+(\.[0-9]+)?\b"
'Example String
strSource = "x= 99 10.1 20.6 Aardvark"
'Ensure that at least one match exists
If objRegEx.Test(strSource) Then
'Capture all matches in objMatches
Set objMatches = objRegEx.Execute(strSource)
'TODO: Do what you want to do with them
'In this example I'm just printing them to the Immediate Window
'Print using Match object and For..Each
For Each Match In objMatches
Debug.Print Match.Value
Next Match
'Print using numeric iteration (objMatches.Items is a 0-based collection)
For i = 0 To (objMatches.Count - 1)
Debug.Print objMatches.Item(i)
Next i
End If
End Sub
Both of the print variations shown in this example would print the following output to the Immediate window
99
10.1
20.6

How do I remove the characters?

How do I remove special characters and alphabets in a string ?
qwert1234*90)! ' this might be my cell value
I have to convert it to
123490 ' I mean I have to remove everything but keep only the numbers in string
but it should allow spaces !
qwe123 4567*. 90 ' String with spaces
123 4567 90 ' output should be
I found the vba Replace - but writing a replace for each character makes my code big. Well let me tell you clearly without hiding anything from you:
input: qwe123 4567*. 90 ' String with spaces cells(1,"A").value
My idea to do these next: 123 4567 90 ' remove characters first keeping white spaces
final output in A1:A3
123
4567
90
(for every space it should insert row and fill that)
Could you tell me how do remove all characters except numbers and spaces in string?
Thanks In advance
You need to use a regular expression.
See this example:
Option Explicit
Sub Test()
Const strTest As String = "qwerty123 456 uiops"
MsgBox RE6(strTest)
End Sub
Function RE6(strData As String) As String
Dim RE As Object, REMatches As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.MultiLine = False
.Global = True
.IgnoreCase = True
.Pattern = "([0-9]| )+"
End With
Set REMatches = RE.Execute(strData)
RE6 = REMatches(0)
End Function
Explanation:
Pattern = "([0-9]| )+" will match any 0 or more group (+) containing a number ([0-9]) or (|) a space ().
Some more info on the regexp:
a thread on ozgrid
a very good reference about regexp
Non-re alternative;
Public Function fmt(sValue As String) As String
Dim i As Long
For i = 1 To Len(sValue) '//loop each char
Select Case Mid$(sValue, i, 1) '//examine current char
Case "0" To "9", " " '//permitted chars
'//ok
Case Else
Mid$(sValue, i, 1) = "!" '//overwrite char in-place with "!"
End Select
Next
fmt = Replace$(sValue, "!", "") '//strip invalids & return
End Function
For:
?fmt("qwe123 4567*. 90")
123 4567 90
Those two funny codes will do both of your whishes..
Sub MySplitter(strInput As String)
Row = 10 ' Start row
Col = "A" ' Column Letter
Range(Col & Row) = "" ' Clean the start cell
For i = 1 To Len(strInput) ' Do with each Character in input string...
c = Mid(strInput, i, 1) ' Get actual char
If IsNumeric(c) Then Range(Col & Row) = Range(Col & Row) & c ' If numeric then append to actual cell
If (c = " ") And (Range(Col & Row) <> "") Then 'If space and actual row is not empty then...
Row = Row + 1 ' Jump to next row
Range(Col & Row) = "" ' Clean the new cell
End If
Next
End Sub
Function KeepNumbersAndSpaces(ByVal strInput As String)
For i = 1 To Len(strInput) ' Do with each Character in input string...
c = Mid(strInput, i, 1) ' Get actual char
If IsNumeric(c) Or c = " " Then ' If numeric or a space then append to output
KeepNumbersAndSpaces = KeepNumbersAndSpaces & c
End If
Next
End Function
Sub Test()
strInput = "qwert1234*90)! qwe123 4567*. 90"
MySplitter (strInput)
Range("A5") = KeepNumbersAndSpaces(strInput)
End Sub
Something like this to
split the string using a regexp
place the matches into an array
dump the array to an automatically sized spreadsheet range
main sub
Sub CleanStr()
Dim strOut As String
Dim Arr
strOut = Trim(KillChar("qwe123 4567*. 90 "))
Arr = Split(strOut, Chr(32))
[a1].Resize(UBound(Arr) + 1, 1) = Application.Transpose(Arr)
End Sub
function
Function KillChar(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "[^\d\s]+"
KillChar = .Replace(strIn, vbNullString)
End With
End Function