Count words in a Microsoft Word document by document style? - vba

In analogy to this question, I would like to script a VBA script that counts words formatted with a certain document style, more precisely: a certain paragraph style.
Sub CountTypeface()
Dim lngWord As Long
Dim lngCountIt As Long
Const Typeface As String = "Calibri"
Const MyFormat As String = "My Paragraph Style"
'Ignore any document "Words" that aren't real words (CR, LF etc)
For lngWord = 1 To ActiveDocument.Words.Count
If Len(Trim(ActiveDocument.Words(lngWord))) > 1 Then
If ActiveDocument.Styles(lngWord) = MyFormat Then
lngCountIt = lngCountIt + 1
End If
End If
Next lngWord
MsgBox "Number of " & Typeface & " words: " & lngCountIt
End Sub
But running this code results in the runtime error:
runtime error "5941".: the requested member of the collection does not exist
Why does this happen and how to fix it?

You're using your word count iterator as the index for the style collection. You have more words than Styles has indices, and the If would only be true one time, since you aren't checking the word's style.
Replace:
If ActiveDocument.Styles(lngWord) = MyFormat Then
With:
If ActiveDocument.Words(lngWord).Style = MyFormat Then

Related

Finding and Replacing with VBA for Word overwrites previous style

I'm writing a VBA script to generate word documents from an already defined template. In it, I need to be able to write headings along with a body for each heading. As a small example, I have a word document that contains only <PLACEHOLDER>. For each heading and body I need to write, I use the find-and-replace feature in VBA to find <PLACEHOLDER> and replace it with the heading name, a newline, and then <PLACEHOLDER> again. This is repeated until each heading name and body is written and then the final <PLACEHOLDER> is replaced with a newline.
The text replacing works fine, but the style I specify gets overwritten by the next call to the replacement. This results in everything I just replaced having the style of whatever my last call to my replacement function is.
VBA code (run main)
Option Explicit
Sub replace_stuff(search_string As String, replace_string As String, style As Integer)
With ActiveDocument.Range.Find
.Text = search_string
.Replacement.Text = replace_string
.Replacement.style = style
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchWholeWord = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Execute Replace:=wdReplaceAll
End With
End Sub
Sub main()
Dim section_names(2) As String
section_names(0) = "Introduction"
section_names(1) = "Background"
section_names(2) = "Conclusion"
Dim section_bodies(2) As String
section_bodies(0) = "This is the body text for the introduction! Fetched from some file."
section_bodies(1) = "And Background... I have no issue fetching data from the files."
section_bodies(2) = "And for the conclusion... But I want the styles to 'stick'!"
Dim i As Integer
For i = 0 To 2
' Writes each section name as wsStyleHeading2, and then the section body as wdStyleNormal
Call replace_stuff("<PLACEHOLDER>", section_names(i) & Chr(11) & "<PLACEHOLDER>", wdStyleHeading2)
Call replace_stuff("<PLACEHOLDER>", section_bodies(i) & Chr(11) & "<PLACEHOLDER>", wdStyleNormal)
Next i
Call replace_stuff("<PLACEHOLDER>", Chr(11), wdStyleNormal)
End Sub
Input document: A word document with only <PLACEHOLDER> in it.
<PLACEHOLDER>
Expected Output:
I expect that each heading will be displayed in the style I specified and can be viewed from the navigation pane like this:
Actual Output: However what I actually get is everything as wdStyleNormal style like this:
I think the problem can be solved by inserting a paragraph break between every style transition, but when I try using vbCrLF or Chr(10) & Chr(13) or vbNewLine instead of the chr(11) I am using now, Each line begins with a boxed question mark like this:
Update from discussion in comments on another answer. The problem described below applies to Word 2016 and earlier. Starting in Office 365 (and probably Word 2019, but that's not been confirmed) the Replace behavior has been changed to "convert" ANSI 13 to a "real" paragraph mark, so the problem in the question would not occur.
Answer
The reason for the odd formatting behavior is the use of Chr(11), which inserts a new line (Shift + Enter) instead of a new paragraph. So a paragraph style applied to any part of this text formats the entire text with the same style.
In this particular case (working with Replace), vbCr or the equivalent Chr(13) also don't work because these are not really Word's native paragraph. A paragraph is much more than just ANSI code 13 - it contains paragraph formatting information. So, while the code is running, Word is not really recognizing these as true paragraph marks and the paragraph style assignment is being applied to "everything".
What does work is to use the string ^p, which in Word's Find/Replace is the "alias" for a complete paragraph mark. So, for example:
replace_stuff "<PLACEHOLDER>", section_names(i) & "^p" & "<PLACEHOLDER>", wdStyleHeading2
replace_stuff "<PLACEHOLDER>", section_bodies(i) & "^p" & "<PLACEHOLDER>", wdStyleNormal
There is, however, a more efficient way to build a document than inserting a placeholder for each new item and using Find/Replace to replace the placeholder with the document content. The more conventional approach is to work with a Range object (think of it like an invisible selection)...
Assign content to the Range, format it, collapse (like pressing right-arrow for a selection) and repeat. Here's an example that returns the same result as the (corrected) code in the question:
Sub main()
Dim rng As Range
Set rng = ActiveDocument.content
Dim section_names(2) As String
section_names(0) = "Introduction"
section_names(1) = "Background"
section_names(2) = "Conclusion"
Dim section_bodies(2) As String
section_bodies(0) = "This is the body text for the introduction! Fetched from some file."
section_bodies(1) = "And Background... I have no issue fetching data from the files."
section_bodies(2) = "And for the conclusion... But I want the styles to 'stick'!"
Dim i As Integer
For i = 0 To 2
BuildParagraph section_names(i), wdStyleHeading2, rng
BuildParagraph section_bodies(i), wdStyleNormal, rng
Next i
End Sub
Sub BuildParagraph(para_text As String, para_style As Long, rng As Range)
rng.Text = para_text
rng.style = para_style
rng.InsertParagraphAfter
rng.Collapse wdCollapseEnd
End Sub
The problem is caused by your use of Chr(11) which is a manual line break. This results in all of the text being in a single paragraph. When the paragraph style is applied it applies to the entire paragraph.
Replace Chr(11) with vbCr to ensure that each piece of text is in a separate paragraph.

What does a hyperlink range.start and range.end refer to?

I'm trying to manipulate some text from a MS Word document that includes hyperlinks. However, I'm tripping up at understanding exactly what Range.Start and Range.End are returning.
I banged a few random words into an empty document, and added some hyperlinks. Then wrote the following macro...
Sub ExtractHyperlinks()
Dim rHyperlink As Range
Dim rEverything As Range
Dim wdHyperlink As Hyperlink
For Each wdHyperlink In ActiveDocument.Hyperlinks
Set rHyperlink = wdHyperlink.Range
Set rEverything = ActiveDocument.Range
rEverything.TextRetrievalMode.IncludeFieldCodes = True
Debug.Print "#" & Mid(rEverything.Text, rHyperlink.Start, rHyperlink.End - rHyperlink.Start) & "#" & vbCrLf
Next
End Sub
However, the output between the #s does not quite match up with the hyperlinks, and is more than a character or two out. So if the .Start and .End do not return char positions, what do they return?
This is a bit of a simplification but it's because rEverything counts everything before the hyperlink, then all the characters in the hyperlink field code (including 1 character for each of the opening and closing field code braces), then all the characters in the hyperlink field result, then all the characters after the field.
However, the character count in the range (e.g. rEverything.Characters.Count or len(rEverything)) only includes the field result if TextRetrievalMode.IncludeFieldCodes is set to False and only includes the field code if TextRetrievalMode.IncludeFieldCodes is set to True.
So the character count is always smaller than the range.End-range.Start.
In this case if you change your Debug expression to something like
Debug.Print "#" & Mid(rEverything.Text, rHyperlink.Start, rHyperlink.End - rHyperlink.Start - (rEverything.End - rEverything.Start - 1 - Len(rEverything))) & "#" & vbCrLf
you may see results more along the lines you expect.
Another way to visualise what is going on is as follows:
Create a very short document with a piece of text followed by a short hyperlink field with short result, followed by a piece of text. Put the following code in a module:
Sub Select1()
Dim i as long
With ActiveDocument
For i = .Range.Start to .Range.End
.Range(i,i).Select
Next
End With
End Sub
Insert a breakpoint on the "Next" line.
Then run the code once with the field codes displayed and once with the field results displayed. You should see the progress of the selection "pause" either at the beginning or the end of the field, as the Select keeps "selecting" something that you cannot actually see.
Range.Start returns the character position from the beginning of the document to the start of the range; Range.End to the end of the range.
BUT everything visible as characters are not the only things that get counted, and therein lies the problem.
Examples of "hidden" things that are counted, but not visible:
"control characters" associated with content controls
"control characters" associated with fields (which also means hyperlinks), which can be seen if field result is toggled to field code display using Alt+F9
table structures (ANSI 07 and ANSI 13)
text with the font formatting "hidden"
For this reason, using Range.Start and Range.End to get a "real" position in the document is neither reliable nor recommended. The properties are useful, for example, to set the position of one range relative to the position of another.
You can get a somewhat more accurate result using the Range.TextRetrievalMode boolean properties IncludeHiddenText and IncludeFieldCodes. But these don't affect the structural elements involved with content controls and tables.
Thank you both so much for pointing out this approach was doomed but that I could still use .Start/.End for relative positions. What I was ultimately trying to do was turn a passed paragraph into HTML, with the hyperlinks.
I'll post what worked here in case anyone else has a use for it.
Function ExtractHyperlinks(rParagraph As Range) As String
Dim rHyperlink As Range
Dim wdHyperlink As Hyperlink
Dim iCaretHold As Integer, iCaretMove As Integer, rCaret As Range
Dim s As String
iCaretHold = 1
iCaretMove = 1
For Each wdHyperlink In rParagraph.Hyperlinks
Set rHyperlink = wdHyperlink.Range
Do
Set rCaret = ActiveDocument.Range(rParagraph.Characters(iCaretMove).Start, rParagraph.Characters(iCaretMove).End)
If RangeContains(rHyperlink, rCaret) Then
s = s & Mid(rParagraph.Text, iCaretHold, iCaretMove - iCaretHold) & "" & IIf(wdHyperlink.TextToDisplay <> "", wdHyperlink.TextToDisplay, wdHyperlink.Address) & ""
iCaretHold = iCaretMove + Len(wdHyperlink.TextToDisplay)
iCaretMove = iCaretHold
Exit Do
Else
iCaretMove = iCaretMove + 1
End If
Loop Until iCaretMove > Len(rParagraph.Text)
Next
If iCaretMove < Len(rParagraph.Text) Then
s = s & Mid(rParagraph.Text, iCaretMove)
End If
ExtractHyperlinks = "<p>" & s & "</p>"
End Function
Function RangeContains(rParent As Range, rChild As Range) As Boolean
If rChild.Start >= rParent.Start And rChild.End <= rParent.End Then
RangeContains = True
Else
RangeContains = False
End If
End Function

How to select part of a text field using ms word macros

I am trying to build template invoices for Xero. Xero looks for specific fields in your MS Word template and inputs the variable assigned to that text field name in your given format. In word you can toggle the field code to view as just the field name:
«InvoiceNumber»
or the name with format:
{ MERGEFIELD InvoiceNumber \* MERGEFORMAT}
This outputs: INV1234 successfully into the template. Now what I need to do is output just the last 4 characters.
This post seems to imply it must be done with a VBA. I put together a macro with Visual Basic in word and this is where I have hit trouble:
Sub InvoiceNumber()
Dim MyInv As FormFields
Set MyInv = ActiveDocument.FormFields
If MyInv("Text1").Result = "InvoiceNumber" Then
MyInv("Text1").CheckBox.Value = Right(MyInv("Text1"), 4)
End If
End Sub
This returns with
error 5941: The requested member of the selection does not exist
I am quite a beginner with VB macros in word, what am I doing wrong and how should I instead be trying to call the InvoiceNumber Field?
Please try with the following solution:
Sub InvoiceNumber()
Dim MyInv As Field
Set MyInv = GetFieldByName("InvoiceNumber")
If Not MyInv Is Nothing Then
'do something with field result...
'here... debug to Immediate window
Debug.Print Right(MyInv.Result, 4)
End If
End Sub
Function GetFieldByName(fName As String) As Field
Dim F As Field
For Each F In ActiveDocument.Fields
'if not working try with (1) istead of (2) in line below
If Split(Replace(F.Code, " ", " "), " ")(2) = fName Then
Set GetFieldByName = F
Exit Function
End If
Next F
Set GetFieldByName = Nothing
End Function

Word Macro to Add Comments to a Document Failing at Tables

I'm writing a Microsoft Word VBA macro that runs through every paragraph of a word document and adds a comment to every paragraph. That comment contains the style for that paragraph. This way a coworker can print out the document with comments and know how to style similar documents in the future.
I'm almost there, the code adds the comments to every paragraph, but dies at the first row of a table:
"This method or property is not available because the object refers to the end of a table row."
Here is the code:
Sub aa_AddStylesComment()
'
' aa_AddStylesComment Macro
' Author: Me!
'
Dim strParaStyle As String
Dim cmtNewComment As Comment
'Run through word file and delete any comments with author set to a space character (that is the author of the comments added by the script)
For J = ActiveDocument.Comments.Count To 1 Step -1
With ActiveDocument
If .Comments(J).Author = " " Then
.Comments(J).Delete
End If
End With
Next J
'Running through every paragraph
For i = 1 To ActiveDocument.Paragraphs.Count
With ActiveDocument
'Get paragraph style
strParaStyle = .Paragraphs(i).Style
'Create a new comment and collect it - then change the author to space character
Set cmtNewComment = Selection.Comments.Add(.Range(.Paragraphs(i).Range.Words(1).Start, (.Paragraphs(i).Range.Words(1).End - 1)), strParaStyle)
cmtNewComment.Author = " "
End With
Next
End Sub
You can add a check if it is a table, and then if the paragraph has cells, as follows:
If .Paragraphs(i).Range.Tables.Count = 0 Then
Set cmtNewComment = .Paragraphs(i).Range.Comments.Add(.Range(.Paragraphs(i).Range.Words(1).Start, (.Paragraphs(i).Range.Words(1).End - 1)), strParaStyle)
cmtNewComment.Author = " "
ElseIf .Paragraphs(i).Range.Cells.Count > 0 Then
Set cmtNewComment = .Paragraphs(i).Range.Comments.Add(.Range(.Paragraphs(i).Range.Words(1).Start, (.Paragraphs(i).Range.Words(1).End - 1)), strParaStyle)
cmtNewComment.Author = " "
End If
Note that you don't need to use the Selection as you never change it.

better method for accenting every word in Word document?

I am new to programming, but I am trying to adapt an existing script as a MS Word 2010/2013 addin to add correct stress accentuation to every Latin word in an open document.
The script "DoAccentuate" returns an accented word for any unaccented Latin word I send it as a string. I just need help doing a better job of looping through all the words, and then stopping the loop when the last word is reached. My current method is a bit goofy...I insert a nonesense word at the end of the document and then loop until it gets selected and accented.
Perhaps there's a better or more efficient way to go about the whole thing.
Public Sub Button5_Click(sender As Object, e As EventArgs) Handles Button5.Click
Dim document As Word.Document
document = Globals.ThisAddIn.Application.ActiveDocument
Dim mySelection = document.Application.Selection
'make sure cursor is at start of document
document.Application.Selection.HomeKey(Unit:=Microsoft.Office.Interop.Word.WdUnits.wdStory)
'insert fake word at end to stop the loop
Dim range As Word.Range
range = document.Range()
range.InsertAfter(" documentendatoris")
Do
'use MS Word's wildcard to select the first individual word as trimmed string
mySelection.Find.Text = "<*>"
mySelection.Find.MatchWildcards = True
mySelection.Find.Execute()
'replace the selected word that has been found with its accented counterpart
mySelection.Text = Accentuate.Accentuate.DoAccentuate(mySelection.Text)
Loop Until mySelection.Text = "documentendatóris"
End Sub
Well, I don't realy know if its more efficient way but you could use document.Content and range.Words collection to check all words in main story range
document = Globals.ThisAddIn.Application.ActiveDocument
Dim range As Word.Range
range = document.Content
Dim current As Integer
current = 0
Dim words As Word.Words
words = range.Words
Dim word As Word.Range
Do
current = current + 1
If current < words.Count Then
word = words(current)
If word.Text.EndsWith(" ") Then
word.Text = word.Text.Trim() + "'s "
'replace the selected word that has been found with its accented counterpart
'mySelection.Text = Accentuate.Accentuate.DoAccentuate(mySelection.Text)
Else
word.Text = word.Text.Trim() + "'s"
End If
End If
Loop Until current = words.Count