Counting words in Word document, including footnores - vba

I periodically receive long documents that include footnotes and am trying to find a way using VBA to count the number of words on each page, including footnotes. It doesn't matter if a footnote spills over onto the next page, I just the word count including footnotes that are anchored on the page.
I have a macro that correctly counts the number of words in the body of the text, using the command:
WordCount = ActiveDocument.Range(Start:=pos1, End:=pos2).ComputeStatistics(wdStatisticWords)
The variables pos1 and pos2 have been set to the first and last characters of the page being counted.
However, when I add the True parameter to ComputeStatistics(wdStatisticWords, True), to IncludeFootnotesAndEndnotes, as in:
WordCount = ActiveDocument.Range(Start:=pos1, End:=pos2).ComputeStatistics(wdStatisticWords, True)
it doesn't work, giving an error that there are too many parameters. It appears that when using a Range, the IncludeFootnotesAndEndnotes parameter is not available.
How do you count the words within footnotes contained in a range?

I think what you will need to do is iterate into each of the StoryRanges and update a counter. Here is a small example that should serve as an example, however, you will likely need to tweak it for your specific case (review my note about the enum for StoryRanges)
Here's the code:
Public Sub Count_All_Words()
Dim Story_Ranges As Variant: Set Story_Ranges = ActiveDocument.StoryRanges
Dim Story_Range As Object
Dim WordCount As Long
'Loop through each story range and only include the footer and Main story to the word count
For Each Story_Range In Story_Ranges
'You may need to check additional types, lookup the enumerations for StoryType here:
'https://msdn.microsoft.com/en-us/library/bb238219(v=office.12).aspx
If Story_Range.StoryType = wdMainTextStory Or Story_Range.StoryType = wdFootnoteSeparatorStory Then
'Add to the word count
WordCount = WordCount + Story_Range.ComputeStatistics(wdStatisticWords)
End If
Next
Debug.Print "The word count is: " & WordCount
End Sub

Related

Finding Endnote number with wdRestartSection NumberingRule

I am writing a VBA script to convert endnotes to plain text. This is fairly straightforward when the endnotes have continuous numbers (copy all the end notes to the end of the text, number them using the index, and replace all the references with the indexes).
In this case, however, the endnote numbers are configured to reset every section (NumberingRule=wdRestartSection). This means the index is not the number. I've tried to get the number using endnote.Reference.Text, but this is empty. I haven't found anywhere in the object model that has the actual number for each Endnote.
Is this information available?
Is there a way to walk Endnotes per-section rather than for the entire document so that I could track the index myself?
I'm currently trying to fetch it this way:
For Each objEndnote In ActiveDocument.Endnotes
print(objEndnote.Reference.Text)
Next
This just prints empty strings.
Looks like there is no number per section - weird. So you have to count it yourself per section:
Option Explicit
Sub getAllEndnotesWithNumbers()
Dim e As Endnote, section As Long, eCounter As Long
For Each e In ThisDocument.Endnotes
If section <> endnoteSection(e) Then
section = endnoteSection(e)
eCounter = 1
Debug.Print "--- Section " & section & " ----------"
End If
Debug.Print eCounter, e.Range.Text
eCounter = eCounter + 1
Next
End Sub
Private Function endnoteSection(e As Endnote) As Long
endnoteSection = e.Range.Sections(1).Index
End Function

How do i split a massive string into smaller parts of itself?

Hey so i have a school project in which i need to split a massive word into smaller words.
This is the massive sequence of letters :
'GLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTGHPETLEKFDKFKHLKTEAEMKASEDLKKHGTVVLTALGGILKKKEGH
HEAELKPLAQSHATKHKIPIKYLEFISDAIIHVLHSKHRPGDFGADAQGAMTKALELFRNDIAAKYKELGFQG'
and then i need to split it into other smaller separate parts of itself which would look like this :
'GLSDGEWQQVLNVWGK'
'VEADIAGHGQEVLIR'
'LFTGHPETLEK'
'FDK'
'FK'
'HLK'
'TEAEMK'
'ASEDLK'
'K'
'HGTVVLTALGGILK'
'K'
'K'
'EGHHEAELKPLAQSHATK'
'HK'
'IPIK'
'YLEFISDAIIHVLHSK'
'HRPGDFGADAQGAMTK'
'ALELFR'
'NDIAAK'
'YK'
'ELGFQG'
i have no idea how to start on this if you could help pls and thanks
Different digestion enzymes cut at different positions within a protein sequence; the most commonly used one is trypsin. It follows the following rules: 1) Cuts the sequence after an arginine (R) 2) Cuts the sequence after a lysine (K) 3) Does not cut if lysine (K) or arginine (R) is followed by proline (P).
Okay, hooray, rules! Let's turn this into pseudo-code to describe the same algorithm in a half-way state between the original prose and code. (While a Regex.Split approach would still work, this might be a good time to explore some more fundamentals.)
let the list of words be an empty array
let current word be the empty string
for each letter in the input:
if the letter is R or K and the next letter is NOT P then:
add the letter to the current word
save the current word to the list of words
reset the current word to the empty string
otherwise:
add the letter to the current word
if after the loop the current word is not the empty string then:
add the current word to the list of words
Then let's see how some of these translate. This is incomplete and quite likely contains minor errors1 beyond that which has been called out in comments.
Dim words As New List(Of String)
Dim word = ""
' A basic loop works suitably for string input and it can also be
' modified for a Stream that supports a Peek of the next character.
For i As Integer = 0 To input.Length - 1
Dim letter = input(i)
' TODO/FIX: input(i+1) can access an element off the string. Reader exercise.
If (letter = "R"C OrElse letter = "K"C) AndAlso Not input(i+1) = "P"C
' StringBuilder is more efficient for larger strings
Set word = word & letter
words.Add(word) ' or perhaps just WriteLine the word somewhere?
Set word = ""
Else
Set word = word & letter
End If
Next
' TODO/FIX: consider when last word is not added to words list yet
1As I use C# (and not VB.NET) the above code comes warranty Free. Here are some quick reference links I used to 'stitch' this together:
https://www.dotnetperls.com/loop-string-vbnet
https://learn.microsoft.com/en-us/dotnet/visual-basic/programming-guide/language-features/operators-and-expressions/concatenation-operators
https://www.dotnetperls.com/list-vbnet
https://learn.microsoft.com/en-us/dotnet/visual-basic/language-reference/statements/dim-statement
How do you declare a Char literal in Visual Basic .NET?

Get Index of Specific Content Control based on ID or Title

I would like to retrieve the word content control's index via VBA given a specific title without having to loop through all content controls looking for the title.
I know the title of the content control that I would like to select therefore I can set the ID of that specific content control to a variable for instance
a = ActiveDocument.SelectContentControlsByTitle("123").Item(1).ID
Now, I would like to know what the index of this item is among the other content controls in the document (over 450 content controls in the file template) such that I can refer to the content control index as a variable.
In lay terms I would like something along the lines of "b=getIndexOfA(a)" such that I can perform later process such as:
for i=b to ActiveDocument.ContentControls.Count
.....
next i
I am running Word 2016 on Windows 10.
The following approach works for just about object in the document body that's part of the text (as opposed to a floating image). Get a Range for the object, then set its starting point to the beginning of the document. Count all the objects of that type within the range:
Dim lIndexCC as Long
Dim cc as Word.ContentControl
Dim rng as Word.Range
Set cc = ActiveDocument.SelectContentControlsByTitle("123").Item(1)
Set rng = cc.Range
rng.Start = ActiveDocument.Content.Start
lIndexCC = rng.ContentControls.Count
Debug.Print lIndexCC

vb.net list confusion

I am looping through a list for a spellchecker in vb.net (using vs 2010). I want to go through a wrongly spelled word list. Each time the code picks the index that's one higher than the index of the last checked word.
In my version of notquiteVB/Pythonese I think it would translate something like:
(start loop)
dim i as Integer = 0
dim word as String
word = words_to_check_at_spellcheck.Item(0 + i)
i = i+1
(end loop)
But this doesn't work at all...when it gets to the last item in the list and reaches 'word = ' it throws the error of 'out of range -- must be less than the size of the collection'.
How do you get the last item in a list? Maybe lists aren't what VB uses for this kind of thing?
If you're collection of misspelled words is named mispelled:
For Each word As String In mispelled
'Do something
Next

GetCrossReferenceItems in msword and VBA showing only limited content

I want to make a special list of figures with use of VBA and here I am using the function
myFigures = ActiveDocument.GetCrossReferenceItems(Referencetype:="Figure")
In my word document there are 20 figures, but myFigures only contains the first 10 figures (see my code below.).
I search the internet and found that others had the same problem, but I have not found any solutions.
My word is 2003 version
Please help me ....
Sub List()
Dim i As Long
Dim LowerValFig, UpperValFig As Integer
Dim myTables, myFigures as Variant
If ActiveDocument.Bookmarks.Count >= 1 Then
myFigures = ActiveDocument.GetCrossReferenceItems(Referencetype:="Figure")
' Test size...
LowerValFig = LBound(myFigures) 'Get the lower boundry number.
UpperValFig = UBound(myFigures) 'Get the upper boundry number
' Do something ....
For i = LBound(myFigures) To UBound(myFigures) ‘ should be 1…20, but is onlu 1…10
'Do something ....
Next i
End If
MsgBox ("Done ....")
End Sub*
Definitely something flaky with that. If I run the following code on a document that contains 32 Figure captions, the message boxes both display 32. However, if I uncomment the For Next loop, they only display 12 and the iteration ceases after the 12th item.
Dim i As Long
Dim myFigures As Variant
myFigures = ActiveDocument.GetCrossReferenceItems("Figure")
MsgBox myFigures(UBound(myFigures))
MsgBox UBound(myFigures)
'For i = 1 To UBound(myFigures)
' MsgBox myFigures(i)
'Next i
I had the same problem with my custom cross-refference dialog and solved it by invoking the dialog after each command ActiveDocument.GetCrossReferenceItems(YourCaptionName).
So you type:
varRefItemsFigure1 = ActiveDocument.GetCrossReferenceItems(g_strCaptionLabelFigure1)
For k = 1 To UBound(varRefItemsFigure1)
frmBwtRefDialog.ListBoxFigures.AddItem varRefItemsFigure1(k)
Next
and then:
frmBwtRefDialog.Show vbModeless
Thus the dialog invoked several times instead of one, but it works fast and don't do any trouble. I used this for one year and didn't see any errors.
Enjoy!
Frankly I feel bad about calling this an "answer", but here's what I did in the same situation. It would appear that entering the debugger and stepping through the GetCrossReferenceItems always returns the correct value. Inspired by this I tried various ways of giving control back to Word (DoEvents; running next segment using Application.OnTime) but to no avail. Eventually the only thing I found that worked was to invoke the debugger between assignments, so I have:
availRefs =
ActiveDocument.GetCrossReferenceItems(wdRefTypeNumberedItem):Stop
availTables =
ActiveDocument.GetCrossReferenceItems(wdCaptionTable):Stop
availFigures = ActiveDocument.GetCrossReferenceItems(wdCaptionFigure)
It's not pretty but, as I'm the only person who'll be running this, it kind of works for my purposes.