I need to clean up text in a MS Word file.
I receive text from a web form like this and then get it as a Word file.
Confirmed: Something
Confirmed: Else
Confirmed: every
Confirmed: time
I would like to get rid of all the "Confirmed" but for the first one, to get something like the following.
Confirmed:
Something
Else
every
time
I count all the words with
Function CountOccurrences(ByVal strToCount As String) As Integer
Dim iCount As Integer
iCount = 0
With ActiveDocument.Content.Find
.Text = strToCount
.Format = False
.Wrap = wdFindStop
Do While .Execute
iCount = iCount + 1
Loop
End With
CountOccurrences = iCount
End Function
I found articles on how to delete just the first one, or the last one, but can't figure out, how to delete all but the first one.
Try this:
Sub tester()
Dim col As Collection, i As Long
Set col = AllOccurrences("Confirmed:")
For i = 2 To col.Count
col(i).Text = vbTab 'replace the text with a tab
Next i
End Sub
'Return a collection with all instances of strToMatch in the activedocument
Function AllOccurrences(ByVal strToMatch As String) As Collection
Dim rv As New Collection, rng As Range
Set rng = ActiveDocument.Range
With rng.Find
.Text = strToMatch
.Format = False
.Wrap = wdFindStop
Do While .Execute
rv.Add ActiveDocument.Range(rng.Start, rng.End)
Loop
End With
Set AllOccurrences = rv
End Function
Related
I have a hundreds of word documents that have multiple tables. Each table row has a specific custom style that was applied that identifies the data that goes in the cell. The need is to iterate through the word document, find the style, and add a ContentControl on that item. The issue that I have is the Selection.Find command restarts at the beginning of the document, so it ends up nesting ContentControls. I have tried adding in some counting mechanism, but while it fixes most of the issues, it leaves off at least some of the ContentControls and does have a few nests. I have tried only searching on a specific table but the Selection.Find overrides the selected table. Is there a way to iterate from the beginning of the document to the end so that I can dynamically add the content controls? Each document has 2 different types of tables. There will be only 1 of the following tables:
There can be 1 to 100 of this table:
The contentControl is supposed to encapsulate the data in the Document Level Metadata column. Here is the code I have up to this point
Option Explicit
Sub FindStyleReplaceWithCC()
Dim CCtrl As ContentControl
Do While ActiveDocument.ContentControls.Count > 0
For Each CCtrl In ActiveDocument.ContentControls
If CCtrl.LockContentControl = True Then CCtrl.LockContentControl = False
CCtrl.Delete False
Next
Loop
'For Each CCtrl In ActiveDocument.ContentControls
'For Each CCtrl In ActiveDocument.ContentControls
' MsgBox (CCtrl.Range)
'Next
'Dim CCtrl As ContentControl
Dim sty As Style
Dim oTbl As Table
''''''''''''''''''''''''''''''''''''''''
'Table 1
Dim thearray(1 To 13, 1 To 2)
Dim element As Variant
Dim arrWsNames() As Variant
Dim I As Integer
arrWsNames = Array("Sensitive Information Protection", "Applies To", "Functional Org", "Functional Process Owner", _
"Topic Owner", "Subject Matter Experts", "Author", "Corporate Source ID", "Superior Source", "CIPS Legacy Document", _
"Meta-Roles(DocLvl)", "SME Reviewer", "SourceDocs")
For I = 1 To 13
thearray(I, 1) = arrWsNames(I - 1)
thearray(I, 2) = 0
Next
Dim howmany As Integer
howmany = 0
For Each element In arrWsNames
Dim iterations As Integer
With Selection.Find
.ClearFormatting
.Style = ActiveDocument.Styles(element)
.Replacement.ClearFormatting
.Text = ""
.Replacement.Text = ""
.Forward = False
.Wrap = wdFindContinue
End With
Selection.Find.Execute
Selection.Range.ContentControls.Add (wdContentControlRichText)
Selection.ParentContentControl.Title = element
Next
'''''''''''''''''''''''''''''''''''''
'Table 2
Dim thearray2(1 To 8, 1 To 2)
Dim arrWsNames2() As Variant
arrWsNames2 = Array("Meta-ReqType", "Meta-Roles", "Meta-Input", "Meta-Output", "Meta-Toolset", _
"Meta-Sources", "Meta-Traced", "Meta-Objective_Evidence")
For I = 1 To 8
thearray2(I, 1) = arrWsNames2(I - 1)
thearray2(I, 2) = 0
Next
howmany = 0
For Each element In arrWsNames2
iterations = 1
For Each oTbl In ActiveDocument.Tables
oTbl.Select
With Selection.Find
.ClearFormatting
.Style = ActiveDocument.Styles(element)
.Replacement.ClearFormatting
.Text = ""
.Replacement.Text = ""
.Forward = False
.Wrap = wdFindContinue
End With
Selection.Find.Execute
If howmany + 1 = iterations Then
Selection.Range.ContentControls.Add (wdContentControlRichText)
Selection.ParentContentControl.Title = element
howmany = howmany + 1
iterations = iterations - 1
Else
iterations = iterations + 1
End If
Next
Next
MsgBox ("Done")
End Sub
If this can't be done in VBA, can it be done in .net?
This can definitely be done in VBA.
The first thing you need to do is to stop using the Selection object. Although there are occasions when Selection has to be used most things can be accomplished by using Range instead.
The next thing I recommend is breaking your code down into separate routines that only perform one element of the solution. This will not only enable you to simplify your code it will result in reusable routines.
I have edited your code as below and tested it in O365 on a document with a subset or your styles.
Sub AddContentControlsForMetadata()
RemoveContentControls ActiveDocument
Dim element As Variant
Dim arrWsNames() As Variant
arrWsNames = Array("Sensitive Information Protection", "Applies To", "Functional Org", "Functional Process Owner", _
"Topic Owner", "Subject Matter Experts", "Author", "Corporate Source ID", "Superior Source", "CIPS Legacy Document", _
"Meta-Roles(DocLvl)", "SME Reviewer", "SourceDocs", "Meta-ReqType", "Meta-Roles", "Meta-Input", "Meta-Output", "Meta-Toolset", _
"Meta-Sources", "Meta-Traced", "Meta-Objective_Evidence")
For Each element In arrWsNames
FindStyleReplaceWithCC ActiveDocument, CStr(element)
Next element
End Sub
Sub RemoveContentControls(docTarget As Document)
Dim ccIndex As Long
For ccIndex = docTarget.ContentControls.Count To 1 Step -1
With docTarget.ContentControls(ccIndex)
If .LockContentControl = True Then .LockContentControl = False
.Delete False
End With
Next ccIndex
End Sub
Sub FindStyleReplaceWithCC(searchDoc As Document, styleName As String)
Dim findRange As Range
Dim ccRange As Range
Set findRange = searchDoc.Range
With findRange.Find
.ClearFormatting
.Style = ActiveDocument.Styles(styleName)
.Replacement.ClearFormatting
.Text = ""
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindStop
Do While .Execute = True
If findRange.Information(wdWithInTable) Then
findRange.Expand wdCell
End If
Set ccRange = findRange.Duplicate
AddContentControlToRange ccRange, styleName
'need to collapse the findRange so that Find can continue without finding the same location again
findRange.Collapse wdCollapseEnd
Loop
End With
End Sub
Sub AddContentControlToRange(ByVal ccLocation As Range, ByVal ccTitle As String)
ccLocation.ContentControls.Add(wdContentControlRichText).Title = ccTitle
End Sub
EDIT:
To add both a tag and a title to the content control:
Sub AddContentControlToRange(ByVal ccLocation As Range, ByVal ccTitle As String, ByVal ccTag as String)
With ccLocation.ContentControls.Add(wdContentControlRichText)
.Title = ccTitle
.Tag = ccTag
End With
End Sub
I'm using a VBA script to try to find the starting number of a paragraph (they are list items not formatted as such - not trying to format, just find the numbers).
1. First Item
2. Second Item
No number - don't include despite 61.5 in paragraph.
25 elephants should not be included
12. Item Twelve, but don't duplicate because of Susie's 35 items
Is there any way to say in VBA "If start of paragraph has 1-2 numbers, return those numbers". In regex, what I'm looking for is ^(\d\+)\.
Here is a working bit of VBA code - haven't figured out how to CREATE the excel file yet, so if you go to test create a blank test.xslx in your temp folder. Of course this may be simple enough that testing isn't necessary.
Sub FindWordCopySentence()
On Error Resume Next
Dim appExcel As Object
Dim objSheet As Object
Dim aRange As Range
Dim intRowCount As Integer
intRowCount = 1
' Open Excel File
If objSheet Is Nothing Then
Set appExcel = CreateObject("Excel.Application")
'Change the file path to match the location of your test.xls
Set objSheet = appExcel.workbooks.Open("C:\temp\test.xlsx").Sheets("Sheet1")
intRowCount = 1
End If
' Word Document Find
Set aRange = ActiveDocument.Range
With aRange.Find
Do
.ClearFormatting
' Find 1-2 digit number
.Text = "[0-9]{1,2}"
.MatchWildcards = True
.Execute
If .Found Then
' Copy to Excel file
aRange.Expand Unit:=wdSentence
aRange.Copy
aRange.Collapse wdCollapseEnd
objSheet.Cells(intRowCount, 1).Select
objSheet.Paste
intRowCount = intRowCount + 1
End If
Loop While .Found
End With
Set aRange = Nothing
If Not objSheet Is Nothing Then
appExcel.workbooks(1).Close True
appExcel.Quit
Set objSheet = Nothing
Set appExcel = Nothing
End If
End Sub
Thanks!
I would go quite a bit simpler and just check the first few characters of the paragraph:
Option Explicit
Sub test()
Dim para As Paragraph
For Each para In ThisDocument.Paragraphs
With para.Range
If (.Characters(2) = ".") Or (.Characters(3) = ".") Then
If IsNumeric(para.Range.Words(1)) Then
Debug.Print "Do something with paragraph number " & _
para.Range.Words(1) & "."
End If
End If
End With
Next para
End Sub
A more efficient approach, which obviates the need to test every paragraph:
Sub Demo()
Application.ScreenUpdating = False
Dim StrOut As String
With ActiveDocument.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "^13[0-9.]{1,}" ' or: .Text = "^13[0-9]{1,}
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = True
.Execute
End With
Do While .Find.Found
StrOut = StrOut & .Text
' or: MsgBox Split(.Text, vbCr)(1)
.Collapse wdCollapseEnd
.Find.Execute
Loop
End With
Application.ScreenUpdating = True
MsgBox StrOut
End Sub
As coded, the macro returns the entire list strings where there may be multiple levels (e.g. 1.2). Comments show how to find just the first number where there may be multiple levels and how to extract that number for testing (the Find expression includes the preceding paragraph break).
I have a word document that is stuffed with codes and URLs that are lying all over the document.
I've been trying to find out how I can extract all the URLs in this word document and have them pasted on another document?
the URLs all have the same website which starts with https://subdomain.domain.com ..
the problem is .. i will need the full URL link which usually ends with a .jpg
i have tried googling but all i find are solutions on how URLs of hyperlinks can be extracted. Couldnt find a solution on my situation so i hope you guys can help!
I edited the code to send the results to C:\temp\my_links.txt. You can edit the code to change the destination.
Public Sub GetUrls()
Dim r As Range
Dim outfile As String
outfile = "C:\temp\my_links.txt"
Open outfile For Output As #1
Set r = ActiveDocument.Range
r.Select
With Selection.Find
.ClearFormatting
.Text = "https://subdomain.domain.com/*.jpg"
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = True
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
End With
Do While Selection.Find.Execute
Write #1, Selection.Text
Loop
Close #1
End Sub
When I run this on a test file, I get this in the the output file:
"https://subdomain.domain.com/res1/joe.jpg"
"https://subdomain.domain.com/res2/cat.jpg"
Hope that helps.
This will parse the document and find all the URL's for you and output the array to a new Document:
Option Explicit
Sub FindLinks()
Dim p As Paragraph
Dim vSplit As Variant
Dim nIndex As Integer
Dim sURLs() As String
ReDim sURLs(0)
' find each URL and add it to an array
For Each p In ActiveDocument.Paragraphs
vSplit = Split(p.Range, " ")
For nIndex = 0 To UBound(vSplit)
If InStr(vSplit(nIndex), "https://stackoverflow.com") > 0 Then
ReDim Preserve sURLs(UBound(sURLs) + 1)
sURLs(UBound(sURLs)) = Replace(vSplit(nIndex), "src=", "")
End If
Next
Next
' create a new document and output the array
Dim sURL As Variant
Documents.Add
For Each sURL In sURLs
Selection.TypeText sURL
Selection.TypeParagraph
Next
End Sub
I'm entirely new to VBA. I need to write a macro to do as the following pseudo code describes. Any references to VBA code are from looking at examples I've found so far from googling. Many thanks for any guidance you can offer.
Dim myText as string;
Dim myAutoTextFieldValue as string;
Set myText='Figure';
Set myAutoTextFieldValue = 'fignum';
// fignum is a autotext value that will insert a sequence type field
.Find text which matches this Word expression \[[0-9]*[0-9]*[0-9]\]
// this expression works in the Find what function in Word, not strictly regex
For each
.InsertBefore (myText + myTextAutoFieldValue);
// I'm guessing I'll need a With expression and a Do While.
EDIT:
I now have the following but I get "Method or Data Member not found" when I try to run it.
Sub EditFindLoop()
'find text where the string equals [00:00:00] or numeric sequence as per input mask
'then insert myText and myAutoTextFieldValue before it
Dim myText As String
Dim myAutoTextFieldValue As String
Dim myFind As String
myFind = "\[[0-9]*[0-9]*[0-9]\]"
myAutoTextFieldValue = "fignum"
myText = "Figure"
With ActiveDocument.Content.Find
'.Text = myFind
'.ClearFormatting
.MatchWildcards = True
Do While .Execute(findText:=myFind, Forward:=True) = True
.InsertBefore myText & myAutoTextFieldValue
Loop
End With
End Sub
And here's the answer to my own question, should anyone else require a similar piece of code.
Sub EditFindLoop()
Dim myText As String
Dim myFind As String
Dim x As Integer
myFind = "\[[0-9]*[0-9]*[0-9]\]"
myText = "Figure "
mySpace = ". "
x = 1
Dim oRange As Word.Range
Set oRange = ActiveDocument.Range
With oRange.Find
.Text = myFind
.ClearFormatting
.MatchWildcards = True
.MatchCase = False
.MatchWholeWord = False
Do While .Execute = True
If .Found Then
oRange.InsertBefore (myText & x & mySpace)
End If
oRange.Start = oRange.End
oRange.End = ActiveDocument.Range.End
x = x + 1
Loop
End With
End Sub
I am writing a vba macro to search a word document line by line and trying to find certain names in the document. The looping works fine except for when it gets to the end of the document, it just continues from the top and starts over. Here is the code:
Application.ScreenUpdating = False
Dim i As Integer, Rng As Range
With ActiveDocument.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "?"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchWildcards = True
.Execute
End With
Do While .Find.found
i = i + 1
Set Rng = .Duplicate
Set Rng = Rng.GoTo(What:=wdGoToBookmark, Name:="\line")
MsgBox "Line " & i & vbTab & Rng.Text
If Rng.Bookmarks.Exists("\EndOfDoc") Then Exit Do
.start = Rng.End
.Collapse wdCollapseEnd
.Find.Execute
Loop
End With
Set Rng = Nothing
Application.ScreenUpdating = True
I have also tried this piece of code:
Dim appWD As Word.Application
Dim docWD As Word.Document
Dim rngWD As Word.Range
Dim strDoc As String
Dim intVal As Integer
Dim strLine As String
Dim bolEOF As Boolean
bolEOF = False
' Set strDoc here to include the full
' file path and file name
On Error Resume Next
Set appWD = GetObject(, "Word.Application")
If Err.Number <> 0 Then
Set appWD = CreateObject("Word.Application")
End If
Err.Clear
On Error GoTo 0
strDoc = "c:\KenGraves\Project2\output\master.doc"
Set docWD = appWD.Documents.Open(strDoc)
appWD.Visible = True
docWD.Characters(1).Select
Do
appWD.Selection.MoveEnd Unit:=wdLine, Count:=1
strLine = appWD.Selection.Text
Debug.Print strLine
intVal = LineContainsDescendant(strLine)
If intVal = 1 Then
MsgBox strLine
End If
appWD.Selection.Collapse wdCollapseEnd
If appWD.Selection.Bookmarks.Exists("\EndOfDoc") Then bolEOF = True
Loop Until bolEOF = True
Neither seem to recognize the bookmark ("\EndOfDoc"). It doesn't matter which one gets working. Is it possible that my document does not contain this bookmark?
Not terribly elegant, but this change to one line of your first procedure seems to stop it at the appropriate time. I believe you actually have to insert bookmarks into your document if you want to reference them. They aren't automatically generated.
If i >= ActiveDocument.BuiltInProperties("NUMBER OF LINES") Then Exit Do
Cheers, LC
Unless you have a corrupted document, all Word documents should have the \EndOfDoc bookmark. You can check using simply ActiveDocument.Range.Bookmarks("\EndOfDoc").Exists. If it doesn't then you'll need to supply more details on the version of Word and if possible supply a sample document via Dropbox or the like.
I'm not sure why you're looping to the start of the Word document, when I run the code it works fine. However, if I put a footnote at the end of the document it runs into an endless loop, depending on your documents you may run into additional situations like this where your code fails to handle the document setup.
I would suggest modifying slightly how you check for the end of the document to make your code a bit more robust. I'd still use the bookmark "\EndOfDoc", however I'd check the limits of the range against your current search range.
So at the top of your code declare a range variable and set it to range of the end of the document eg:
Dim rEnd As Range
Set rEnd = ActiveDocument.Bookmarks("\EndOfDoc").Range
and then in your loop, instead of this line:
If Rng.Bookmarks.Exists("\EndOfDoc") Then Exit Do
use this line:
If Rng.End >= rEnd.End Then Exit Do