Replace all text in text file using regular expression - vba

I have a text file with following text
161624.406 : Send:[sometext1]
161624.437 : Send:[sometext2]
161624.468 : Send:[sometext3]
161624.499 : Send:[sometext4]
161624.531 : Send:[sometext5]
I want to keep only the sometext part in that file. Desired output is
sometext1
sometext2
sometext3
sometext4
sometext5
I am using the following code in Excel-VBA
Public Sub testa()
a = "C:\Users\pankaj.jaju\Desktop\test.log"
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTxtFile = objFSO.OpenTextFile(a, 1)
strText = objTxtFile.ReadAll
objTxtFile.Close
Set objTxtFile = Nothing
Set objRegEx = CreateObject("VBScript.RegExp")
With objRegEx
.Global = True
.MultiLine = True
.Pattern = "\[([^]]+)\]"
Set objRegMC = .Execute(strText)
b = objRegMC(0).SubMatches(0)
End With
Set objRegEx = Nothing
Debug.Print b
End Sub
The problem is the output is displayed as sometext1 only. How do I ReplaceAll in the text file and save the file with the desired text only.

The regex.Replace method should do the trick.
Separate your pattern into groups like this: "(.*?)(\[)([^]]+)(\])(.*?)"
And now you can replace your input string with the matching group which is group three in this case: objRegEx.Replace(strText, "$3")
Here is a helpful link to different examples of Regex within Excel.

I know you've already got your answer. But for a simple program like this, why even utilize regular expressions? Here's a 4 line script that can do the same thing - yes it has been tested.
Dim a, text: a = "C:\testenv\test.log"
text = split(CreateObject("Scripting.FileSystemObject").OpenTextFile(a, 1).ReadAll, vbcrlf)
for i=0 to ubound(text) : text(i) = replace(split(text(i), "[")(1), "]", "") : next
CreateObject("Scripting.FileSystemObject").OpenTextFile(a, 2).Write(Join(text, vbcrlf))

The regex obviously only grabbing the first match of the string. I'm not proficient enough with regex to come up with a regex only solution that takes into account \n
A simple workaround would be to use objTxtFile.ReadLine instead of ReadAll
Read each string in one by one, apply regex and get output.
Public Sub testa()
a = "C:\Users\pankaj.jaju\Desktop\test.log"
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTxtFile = objFSO.OpenTextFile(a, 1)
Do Until (objTxtFile.AtEndOfStream) 'Loop till EOF
strText = objTxtFile.ReadLine 'Single line read instead of ReadAll
Set objRegex = CreateObject("VBScript.RegExp")
With objRegex
.Global = True
.MultiLine = True
.Pattern = "\[([^]]+)\]"
Set objRegMC = .Execute(strText)
b = objRegMC(0).SubMatches(0)
End With
Set objRegex = Nothing
Debug.Print b 'replace this with the code to output the string where you want etc
Loop
objTxtFile.Close
End Sub

OK ... found a way to do it (Thanks MSDN ... sometimes the docs are useful :-)). Sharing the answer
Public Sub testa()
a = "C:\Users\pankaj.jaju\Desktop\test.log"
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTxtFile = objFSO.OpenTextFile(a, 1)
strText = objTxtFile.ReadAll
objTxtFile.Close
Set objTxtFile = Nothing
b = ""
Set objRegEx = CreateObject("VBScript.RegExp")
With objRegEx
.Global = True
.MultiLine = True
.Pattern = "\[([^]]+)\]"
Set objMatches = .Execute(strText)
For Each Match In objMatches
For Each SubMatch In Match.SubMatches
b = b & SubMatch & vbCrLf
Next SubMatch
Next Match
End With
Set objRegEx = Nothing
Set objTxtFile = objFSO.OpenTextFile(a, 2)
objTxtFile.Write b
objTxtFile.Close
Set objTxtFile = Nothing
End Sub
EDIT - Based on #PortlandRunner's suggestion
Public Sub testa()
a = "C:\Users\pankaj.jaju\Desktop\test.log"
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTxtFile = objFSO.OpenTextFile(a, 1)
strText = objTxtFile.ReadAll
objTxtFile.Close
Set objTxtFile = Nothing
b = ""
Set objRegEx = CreateObject("VBScript.RegExp")
With objRegEx
.Global = True
.MultiLine = True
.Pattern = "(.*?)(\[)([^]]+)(\])(.*?)"
b = objRegEx.Replace(strText, "$3")
End With
Set objRegEx = Nothing
Set objTxtFile = objFSO.OpenTextFile(a, 2)
objTxtFile.Write b
objTxtFile.Close
Set objTxtFile = Nothing
End Sub

Related

split pdf based on the text using vba acrobat api

I am trying to split pdf, based on the pages where it finds ".pdf" however when I try to save the pdf with a dynamic string variable, it do not save the pdf but when I write hardcode file path it output the pdf. do not know what is going on here.
the following code is not finished yet I am stuck in creating new pdf with deleted pages:
Function Extract_PDF()
Dim aApp As Acrobat.CAcroApp
Dim av_Doc As Acrobat.CAcroAVDoc
Dim pdf_Doc As Acrobat.CAcroPDDoc '
Dim newPDFdoc As Acrobat.CAcroPDDoc
Dim Sel_Text As Acrobat.CAcroPDTextSelect
Dim i As Long, j As Long
Dim pageNum, Content
Dim pageContent As Acrobat.CAcroHiliteList
Dim found As Boolean
Dim foundPage As Integer
Dim PDF_Path As String
Dim pdfName As String
Dim folerPath As String
Dim FileExplorer As FileDialog
Set FileExplorer = Application.FileDialog(msoFileDialogFilePicker)
With FileExplorer
.AllowMultiSelect = False
.InitialFileName = ActiveDocument.Path
.Filters.Clear
.Filters.Add "PDF File", "*.pdf"
If .Show = -1 Then
PDF_Path = .SelectedItems.Item(1)
Else
PagesLB = "Catch me Next Time ;)"
PDF_Path = ""
Exit Function
End If
End With
Set aApp = CreateObject("AcroExch.App")
Set av_Doc = CreateObject("AcroExch.AVDoc")
If av_Doc.Open(PDF_Path, vbNull) <> True Then Exit Function
While av_Doc Is Nothing
Set av_Doc = aApp.GetActiveDoc
Wend
av_Doc.BringToFront
aApp.Show
Set pdf_Doc = av_Doc.GetPDDoc
For i = pdf_Doc.GetNumPages - 1 To 0 Step -1
Set pageNum = pdf_Doc.AcquirePage(i)
Set pageContent = CreateObject("AcroExch.HiliteList")
If pageContent.Add(0, 9000) <> True Then Exit Function
Set Sel_Text = pageNum.CreatePageHilite(pageContent)
Content = ""
found = False
For j = 0 To Sel_Text.GetNumText - 1
Content = Content & Sel_Text.GetText(j)
If InStr(1, Content, ".pdf") > 0 Then
found = True
foundPage = i
pdfName = Content
Exit For
End If
Next j
If found Then
PDF_Path = Left(PDF_Path, InStrRev(PDF_Path, "\")) & ValidWBName(pdfName)
Set newPDFdoc = CreateObject("AcroExch.PDDoc")
Set newPDFdoc = av_Doc.GetPDDoc
If newPDFdoc.DeletePages(0, i - 1) = False Then
Debug.Print "Failed"
Else
Debug.Print "done"
End If
If newPDFdoc.Save(PDSaveFull, PDF_Path) = False Then
Debug.Print "Failed to save pdf "
Else
Debug.Print "Saved"
End If
newPDFdoc.Close
End If
Next i
av_Doc.Close False
aApp.Exit
Set av_Doc = Nothing
Set pdf_Doc = Nothing
Set aApp = Nothing
End Function
ValidWBName:
Function ValidWBName(agr As String) As String
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Pattern = "[\\/:\*\?""<>\|]"
.Global = True
ValidWBName = .Replace(agr, "")
End With
End Function
in above function when it finds the word pdf it try to create a new instance of pdf and remove previous pages.
If found Then
PDF_Path = Left(PDF_Path, InStrRev(PDF_Path, "\")) & ValidWBName(pdfName)
Set newPDFdoc = CreateObject("AcroExch.PDDoc")
Set newPDFdoc = av_Doc.GetPDDoc
If newPDFdoc.DeletePages(0, i - 1) = False Then
Debug.Print "Failed"
Else
Debug.Print "done"
End If
If newPDFdoc.Save(PDSaveFull, PDF_Path) = False Then
Debug.Print "Failed to save pdf "
Else
Debug.Print "Saved"
End If
newPDFdoc.Close
End If
this line "Failed to save pdf"
If newPDFdoc.Save(PDSaveFull, PDF_Path) = False Then
but when I write hardcode path it create the pdf
If newPDFdoc.Save(PDSaveFull, "C:\Users\MBA\Desktop\PDF Project 2\Murdoch_Michael__Hilary_PIA_19.pdf") = False Then
the culprit HAD to be in ValidWBName() function, which didn't handle all possible not allowed chars for a valid file name
since it came out vbCr char was one of them, you could change it as follows:
Function ValidWBName(agr As String) As String
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Pattern = "[\\/:\*\?""<>\|" & Chr(13) & "]" ' <-- added vbCr
.Global = True
ValidWBName = .Replace(agr, "")
End With
End Function

find a complex string in a substring in VBA

how can I get in VBA a complex string in a substring, e.g. if i = InStr("test-VBA", " this is a test") then msgbox "a part of searching Item exist"
with function "Instr" didn't work because the seraching word is "test-VBA" and ofcourse dosen't
exist as a one word but what I search for if a complete part of the searching item ("test" in the
example as part of "test-VBA") exists should I get a msgbox like described above
Thanks a lot.
Function IsInStr_IgnoreCase(ByVal Str As String, ByVal Value As String) As Boolean
Dim objRegEx as Object
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.IgnoreCase = True
objRegEx.pattern = Value
IsInStr_IgnoreCase = objRegEx.test(Str) ' objRegEx.test(Str) returns True if Sustr exists.
End Function
Sub test()
' this returns TRUE if you get a match
MsgBox IsInStr_IgnoreCase_AsBoolean("CHINA-Country", "Move the dev from China to Newseeland")
' this returns all matches of 'Value' in 'Str'
MsgBox IsInStr_IgnoreCase_AsString("CHINA-Country", "Move the dev from China to Newseeland")
' you need to adjust objRegEx.Pattern if you need to get a specific match. See RegEx.
End Sub
Function IsInStr_IgnoreCase_AsBoolean(ByVal Value As String, ByVal Value As String) As Boolean
Dim objRegEx As Object
Dim tStr, tVal, iStr, iVal
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.Pattern = "(\b\w+)|(\b\d+)"
Set tStr = objRegEx.Execute(Str)
Set tVal = objRegEx.Execute(Value)
objRegEx.IgnoreCase = True
For Each iStr In tStr
For Each iVal In tVal
objRegEx.Pattern = iVal
If objRegEx.test(iStr) Then
IsInStr_IgnoreCase_AsBoolean = True
Exit Function
End If
Next
Next
End Function
Function IsInStr_IgnoreCase_AsString(ByVal Str As String, ByVal Value As String) As String
Dim objRegEx As Object
Dim tStr, tVal, iStr, iVal
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.Pattern = "(\b\w+)|(\b\d+)"
Set tStr = objRegEx.Execute(Str)
Set tVal = objRegEx.Execute(Value)
objRegEx.IgnoreCase = True
For Each iStr In tStr
For Each iVal In tVal
objRegEx.Pattern = iVal
If objRegEx.test(iStr) Then
IsInStr_IgnoreCase_AsString = IsInStr_IgnoreCase_AsString & iStr & "; "
End If
Next
Next
End Function

VBA - Find string in email body or subject

I am trying to create a simple macro, which reads the active email and checks whether or not a certain string is present. Now, the string can have two possible formats, and will only contains digits.
The two formats:
xxx-xxxxxxxx or xxxxxxxxxxx
(x will always be a digit)
I am unsure on how to do this. Below I have a macro, which reads the mail - but it can only find a specific string:
Sub AutomateReplyWithSearchString()
Dim myInspector As Outlook.Inspector
Dim myObject As Object
Dim myItem As Outlook.MailItem
Dim myDoc As Word.Document
Dim mySelection As Word.Selection
Dim strItem As String
Dim strGreeting As String
Set myInspector = Application.ActiveInspector
Set myObject = myInspector.CurrentItem
'The active inspector is displaying a mail item.
If myObject.MessageClass = "IPM.Note" And myInspector.IsWordMail = True Then
Set myItem = myInspector.CurrentItem
'Grab the body of the message using a Word Document object.
Set myDoc = myInspector.WordEditor
myDoc.Range.Find.ClearFormatting
Set mySelection = myDoc.Application.Selection
With mySelection.Find
.Text = "xxx-xxxxxxxx"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
If mySelection.Find.Execute = True Then
strItem = mySelection.Text
'Mail item is in compose mode in the inspector
If myItem.Sent = False Then
strGreeting = "With reference to " + strItem
myDoc.Range.InsertBefore (strGreeting)
End If
Else
MsgBox "There is no item number in this message."
End If
End If
End Sub
You can use regex pattern:
(\d{11}|\d{3}-\d{8})
Try it.
This example is copied from here. I have not tested it.
Option Explicit
Sub GetValueUsingRegEx()
' Set reference to VB Script library
' Microsoft VBScript Regular Expressions 5.5
Dim olMail As Outlook.MailItem
Dim Reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Set olMail = Application.ActiveExplorer().Selection(1)
' Debug.Print olMail.Body
Set Reg1 = New RegExp
With Reg1
.Pattern = "(\d{11}|\d{3}-\d{8})"
.Global = True
End With
If Reg1.test(olMail.body) Then
Set M1 = Reg1.Execute(olMail.body)
For Each M In M1
Debug.Print M.SubMatches(1)
Next
End If
End Sub

oMath from excel to word

I am creating a word report through excel VBA. I need math equation to be written but unfortunately, the word document do not autocorrect \pi and \times. Otherwise the equation is printed. Can someone suggest me what is the way forward. Below is the code
Sub AreaSolidBolt(wrdApp As Object, wrdDoc As Object, d As Variant)
Dim objRange As Object
Dim objEq As OMath
Dim aCorrect As OMathAutoCorrectEntry
wrdApp.OMathAutoCorrect.UseOutsideOMath = True
Set objRange = wrdDoc.Range
objRange.Text = "A = \pi/4 \times d^2"
Set objRange = wrdApp.Selection.OMaths.Add(objRange)
For Each aCorrect In wrdApp.OMathAutoCorrect.Entries
With objRange
If InStr(.Text, aCorrect.Name) > 0 Then
.Text = Replace(.Text, aCorrect.Name, aCorrect.Value)
End If
End With
Next aCorrect
Set objEq = objRange.OMaths(1)
objEq.BuildUp
Set objRange = Nothing
End Sub
I have defined the objects as below in the calling function. Can you please suggest me the way forward.
Set fso = CreateObject("Scripting.FileSystemObject")
Set wrdApp = CreateObject("Word.Application")
If Not fso.FileExists(wrdFileName) Then
Set wrdDoc = wrdApp.Documents.Add
wrdApp.Visible = False
With wrdDoc
.SaveAs FileName:=wrdFileName
End With
Else
Set wrdDoc = wrdApp.Documents.Open(wrdFileName)
wrdApp.Visible = False
wrdDoc.Content.InsertAfter vbLf
End If
If the AutoCorrect isn't recognizing something you need to use, then you can write in the corresponding (unicode) character code as part of the equation, using ChrW(). PI is 960, for example.
It's not clear what you consider a "times" character, whether an "x" an "*" or something else. I note that most Math equations don't actually use a character for multiplication, which may be why AutoCorrect isn't picking that up. But you can certainly just type those in?
For example:
objRange.Text = "A = " & ChrW(960) & "/4 * d^2"
I have found the answer myself. The was slightly modified. The error was with the placement of the code line
Set objRange = wrdApp.Selection.OMaths.Add(objRange)
Below is the modified code.
Sub AreaSolidBolt(wrdApp As Object, wrdDoc As Object, d As Variant)
Dim objRange As Object
Dim objEq As OMath
Dim aCorrect As OMathAutoCorrectEntry
wrdApp.OMathAutoCorrect.UseOutsideOMath = True
Set objRange = wrdDoc.Range
objRange.Text = "A = \pi/4 \times d^2"
For Each aCorrect In wrdApp.OMathAutoCorrect.Entries
With objRange
If InStr(.Text, aCorrect.Name) > 0 Then
.Text = Replace(.Text, aCorrect.Name, aCorrect.Value)
End If
End With
Next aCorrect
Set objRange = wrdApp.Selection.OMaths.Add(objRange)
Set objEq = objRange.OMaths(1)
objEq.BuildUp
Set objRange = Nothing
End Sub

Loop Through Cells & Return Regex Search in Message Box

Question with a possibly very simple answer, but I'm still potty training with VBA. I'm looping through an excel array and searching a very long string for the regex pattern "\.\w*?_\w*?_Tag_\w*?". There is a similar example of this in this post under the accepted answer "Example 3: Loop Through Range."
Instead of finding and replacing text though, I want to display the matched search pattern from each cell of the array in a single message box. I've been searching for several hours but most of the VBA Regex examples I've found use built in functions (which don't loop through an array), but since this will be used by other folks I need to use a macro.
Here's what I have so far. I'm thinking I need another loop outside of my For Each loop but not sure how to start executing on this.
Sub TagNameList()
Dim strPattern As String: strPattern = "\.\w*?_\w*?_Tag_\w*?"
Dim Regx As New RegExp
Dim StrInput As String
Dim Rng As range
Dim LastRow As Long: LastRow = ActiveSheet.UsedRange.Rows.Count
' Set Rng = ActiveSheet.range(Cells(2, 16), Cells(LastRow, 16))
' Set RegxMatch = Regx.Execute(StrInput)
For Each cell In Rng
StrInput = cell.Value
With Regx
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = strPattern
End With
' If Regx.Test(StrInput) Then
' MsgBox (Regx.Replace(StrInput, strReplace))
' Else
' MsgBox ("Not matched")
' End If
Next
End Sub
I'm not clear if you mean a message box for the whole range or one for each cell, but you need to use the matches collection, e.g.
Sub TagNameList()
Dim strPattern As String: strPattern = "\.\w*?_\w*?_Tag_\w*?"
Dim Regx As New RegExp
Dim StrInput As String
Dim Rng As Range
Dim LastRow As Long: LastRow = ActiveSheet.UsedRange.Rows.Count
Dim oMatches As Object, s As String
Set Rng = ActiveSheet.Range(Cells(2, 16), Cells(LastRow, 16))
Set RegxMatch = Regx.Execute(StrInput)
For Each cell In Rng
StrInput = cell.Value
With Regx
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = strPattern
If .Test(cell) Then
Set oMatches = .Execute(cell)
s = s & "," & oMatches(0).Value
End If
End With
Next
MsgBox Mid(s, 2)
End Sub