How can I prevent VBA code, used to draw up a list of unique words from a word document, from slowing down as the document get's longer - vba

I have used some code from the internet, modified slightly for my specific use case, to draw up a list of unique words from a word document, the code works without a problem, but the time to execute the code seems to grow exponentially as the document length increases. Can anyone give me any suggestions to speed up the code when working with very long documents?
Sub UniqueWordList()
Dim wList As New Collection
Dim wrd
Dim chkwrd
Dim sTemp As String
Dim k As Long
Dim cWrd As Long
Dim tWrd As Long
Dim nWrd As String
Dim Flag As Boolean
Flag = False
tWrd = ActiveDocument.Range.Words.Count
cWrd = 0
For Each wrd In ActiveDocument.Range.Words
cWrd = cWrd + 1
If cWrd Mod 100 = 0 Then
Application.StatusBar = "Updating: " & (cWrd)
End If
If Flag Then
Flag = False
GoTo nw
End If
If cWrd < tWrd Then
nWrd = ActiveDocument.Words(cWrd + 1)
nWrd = Trim(LCase(nWrd))
End If
sTemp = Trim(LCase(wrd))
If sTemp = "‘" Then
sTemp = sTemp & nWrd
Flag = True
End If
If sTemp Like "*[a-zA-Z]*" Then
k = 0
For Each chkwrd In wList
k = k + 1
If chkwrd = sTemp Then GoTo nw
If chkwrd > sTemp Then
wList.Add Item:=sTemp, Before:=k
GoTo nw
End If
Next chkwrd
wList.Add Item:=sTemp
End If
nw:
Next wrd
sTemp = "There are " & ActiveDocument.Range.Words.Count & " words "
sTemp = sTemp & "in the document, before this summary, but there "
sTemp = sTemp & "are only " & wList.Count & " unique words."
ActiveDocument.Range.Select
Selection.Collapse Direction:=wdCollapseEnd
Selection.TypeText vbCrLf & sTemp & vbCrLf
For Each chkwrd In wList
Selection.TypeText chkwrd & vbCrLf
Next chkwrd
End Sub
After some suggestions I modified my code to use a scripting dictionary, this however does not seem to have solved the problem. Also to answer the concern regarding my message at the end, I understand that the wording is off, what I want is a list of words from the document but each word only once.
Sub UniqueWordListMi()
Dim wList() As String
Dim sTemp As String
Dim cWrd As Long
Dim tWrd As Long
Dim nWrd As String
Dim Flag As Boolean
Dim IsInArray As Boolean
Dim arrsize As Long
Dim rra2 As Variant
arrsize = 0
Flag = False
tWrd = ActiveDocument.Range.Words.Count
cWrd = 1
ReDim Preserve wList(0 To arrsize)
wList(arrsize) = "UNQ"
For Each wrd In ActiveDocument.Range.Words
If cWrd Mod 100 = 0 Then
Application.StatusBar = "Updating" & (cWrd)
End If
If Flag Then
Flag = False
GoTo nw
End If
If cWrd < tWrd Then
nWrd = ActiveDocument.Words(cWrd + 1)
nWrd = Trim(LCase(nWrd))
End If
sTemp = Trim(LCase(wrd))
If sTemp = "‘" Then
sTemp = sTemp & nWrd
Flag = True
End If
If sTemp Like "*[a-zA-Z]*" Then
ReDim Preserve wList(0 To arrsize)
wList(arrsize) = sTemp
arrsize = arrsize + 1
End If
nw:
cWrd = cWrd + 1
Next wrd
Set Dict = CreateObject("scripting.dictionary")
For i = 0 To UBound(wList)
If (Not Dict.Exists(CStr(wList(i)))) Then Dict.Add CStr(wList(i)), wList(i) 'Next i
Next i
rra2 = Dict.Items
sTemp = "There are " & ActiveDocument.Range.Words.Count & " words "
sTemp = sTemp & "in the document, before this summary, but there "
sTemp = sTemp & "are only " & UBound(wList) & " unique words."
ActiveDocument.Range.Select
Selection.Collapse Direction:=wdCollapseEnd
Selection.TypeText vbCrLf & sTemp & vbCrLf
For u = 0 To UBound(rra2)
Selection.TypeText vbCrLf & rra2(u) & vbCrLf
Next u
End Sub

#AlexK beat me to it with a comment on using a Scripting.Dictionary.
Something like this might help
Option Explicit
Public Function CountUniqueWords(ByRef ipRange As Word.Range) As Scripting.Dictionary
Dim myUniqueWords As Scripting.Dictionary
Set myUniqueWords = New Scripting.Dictionary
Dim myPara As Variant
For Each myPara In ipRange.Paragraphs
Dim myWord As Variant
For Each myWord In Split(myPara.Range.Text)
If myUniqueWords.Exists(myWord) Then
myUniqueWords.Item(myWord) = myUniqueWords.Item(myWord) + 1
Else
myUniqueWords.Add myWord, 1
End If
Next
Next
Set CountUniqueWords = myUniqueWords
End Function
Some polishing might be required to meet your specific requirements.
You can't help some increase in processing time as the document gets longer but as the access to the document is limited to paragraphs rather than words is should proceed somewhat faster.

Try the following code. It uses the dictionary directly with the rules of your code.
Note that this will only improve your code. But still, the longer the document will get, the more words need to be checked and the more time it will need. That fact will not change, you can just optimize it by using the dictionary directly but more words need more time to check.
Option Explicit
Sub UniqueWordListMi()
Dim wList As Object
Set wList = CreateObject("scripting.dictionary")
Dim sTemp As String
Dim cWrd As Long
Dim tWrd As Long
Dim nWrd As String
Dim Flag As Boolean
Dim IsInArray As Boolean
Dim arrsize As Long
Dim rra2 As Variant
arrsize = 0
Flag = False
tWrd = ActiveDocument.Range.Words.Count
cWrd = 1
Dim wrd As Variant
For Each wrd In ActiveDocument.Range.Words
If cWrd Mod 100 = 0 Then
Application.StatusBar = "Updating" & (cWrd)
End If
If Flag Then
Flag = False
GoTo nw
End If
If cWrd < tWrd Then
nWrd = ActiveDocument.Words(cWrd + 1)
nWrd = Trim(LCase(nWrd))
End If
sTemp = Trim(LCase(wrd))
If sTemp = "‘" Then
sTemp = sTemp & nWrd
Flag = True
End If
If sTemp Like "*[a-zA-Z]*" Then
If Not wList.Exists(sTemp) Then
wList.Add sTemp, 1
Else
wList.Item(sTemp) = wList.Item(sTemp) + 1
End If
cWrd = cWrd + 1
End If
nw:
Next wrd
sTemp = "There are " & (cWrd - 1) & " words "
sTemp = sTemp & "in the document, before this summary, but there "
sTemp = sTemp & "are only " & wList.Count & " distinct words."
ActiveDocument.Range.Select
Selection.Collapse Direction:=wdCollapseEnd
Selection.TypeText vbCrLf & sTemp & vbCrLf
Dim chkwrd As Variant
For Each chkwrd In wList
Selection.TypeText chkwrd & vbTab & wList.Item(chkwrd) & " times" & vbCrLf
Next chkwrd
End Sub
The following example:
This is an example test where every word is unique except one.
There are 12 words in the document, before this summary, but there are only 11 distinct words.
this 1 times
is 2 times
an 1 times
example 1 times
test 1 times
where 1 times
every 1 times
word 1 times
unique 1 times
except 1 times
one 1 times

With everyone's help and some additional reading, as well as some help from a reddit user this code work's perfectly:
Sub UniqueWordListFast()
Dim WordDictionary As Object
Dim SourceText As Document
Dim objWord As Object
Dim sTemp As String, strWord As String, nxtWord As String
Dim count As Long
count = 0
Set WordDictionary = CreateObject("Scripting.Dictionary")
Set SourceText = Application.ActiveDocument
For Each objWord In SourceText.Range.Words
count = count + 1
strWord = Trim(objWord.Text)
If strWord = nxtWord Then GoTo nw
If strWord Like "*[a-z]*" Then WordDictionary(strWord) = strWord
If strWord Like "‘" Then
nxtWord = Trim(SourceText.Words(count + 1))
strWord = strWord & nxtWord
WordDictionary(strWord) = strWord
End If
nw:
Next
sTemp = "[DOCUMENT] " & vbTab & SourceText.Name & vbCrLf & vbCrLf & _
"There are " & SourceText.Range.Words.count & " words in the document, " & _
"before this summary, but there are only " & WordDictionary.count & " unique words."
Dim NewDocument As Document
Set NewDocument = Documents.Add
NewDocument.Range.Text = sTemp & vbCrLf & Join(WordDictionary.Keys, vbCrLf)
End Sub
Extremely fast and efficient. Thank you everyone!

Related

copy paste not working in windows 10 office 365 while splitting word document using VBA

The following code split the document by section breaks. however it is working correctly in windows 7 but not in windows 10 office 365, having "run time error 4605 : the command is not available." on windows 10? while I try to paste the copied content using oNewDoc.Range.Paste. I came to know it was due to oNewDoc windows not activate or pasting take place without waiting to oNewDoc to be created. because when I press debug and wait for 1 second then run again it executes correctly.
Private Sub GenerateFiles_Click()
'Pages Update 1.0 By M.B.A.
Dim oNewDoc As Document
Dim oDoc As Document
Dim CR As Range
Dim firstLine As String
Dim strLine As String
Dim DocName As String
Dim pdfName As String
Dim arrSplit As Variant
Dim Counter As Integer
Dim i As Integer
Dim PS As String
PS = Application.PathSeparator
'Progress
pBarCurrent 0
If pdfCheck.Value = False And docCheck.Value = False Then
PagesLB = "**Please Select at least one check boxes!"
Beep
Exit Sub
End If
Set oDoc = ActiveDocument
Set CR = oDoc.Range
Letters = oDoc.Range.Information(wdActiveEndSectionNumber)
Counter = 1
While Counter < Letters + 1
With oDoc.Sections.First.Range
.MoveEnd wdSection, 0
.MoveEnd wdCharacter, -1
.Copy
'.Select
Set oNewDoc = Documents.Add(Visible:=True)
oNewDoc.Range.Paste 'Run-time error '4605': This command is not available
End With
firstLine = oNewDoc.Paragraphs(1).Range.Text
For i = 1 To 2
strLine = oNewDoc.Paragraphs(i).Range.Text
If InStr(strLine, ".pdf") > 0 Then
arrSplit = Split(strLine, ".pdf")
DocName = arrSplit(0) & ".pdf"
Exit For
End If
Next i
If i = 3 Then
DocName = Left(firstLine, 45)
DocName = Replace(DocName, vbCr, "")
End If
DocName = Replace(DocName, Chr(11), "")
pdfName = Counter & " - " & DocName & IIf(i = 3, ".pdf", "")
DocName = Counter & " - " & IIf(i < 2, Replace(DocName, ".pdf", ""), DocName) & ".docx"
'Debug.Print pdfName; vbNewLine; DocName
If docCheck Then
oNewDoc.SaveAs FileName:=oDoc.Path & PS & ValidWBName(DocName), AddToRecentFiles:=False
End If
If pdfCheck Then
oNewDoc.SaveAs FileName:=oDoc.Path & PS & ValidWBName(pdfName), FileFormat:=wdFormatPDF
End If
oDoc.Sections.First.Range.Cut
'== Progress Bar =='
DoEvents
PagesLB = " Letter " & Counter & " of " & Letters & vbCr & " " & Int((Counter / (Letters)) * 100) & "% Completed..."
pBarCurrent Int((Counter / (Letters)) * 100)
oNewDoc.Close False
Counter = Counter + 1
Wend
PagesLB = Letters & " Letters has been Created..."
oDoc.Close wdDoNotSaveChanges
Beep
End Sub
You can avoid using the clipboard by using the FormattedText property
With oDoc.Sections.First.Range
.MoveEnd wdSection, 0
.MoveEnd wdCharacter, -1
Set oNewDoc = Documents.Add(Visible:=True)
oNewDoc.Range.FormattedText = .FormattedText
End With

Debugging MS Word Macro for importing JPGs that is returning duplicate images

I'm looking through the following macro I inherited and trying to figure out why it's importing duplicate images when it pulls unique photos from the same folder. Any help would be much appreciated, I don't have a lot of experience with VBA.
The purpose of the macro is to pull all image files in the same folder as the word document and embed them in the word document itself. Right now it's taking the first image in the folder and embedding it multiple times. I think it's an issue with the loop logic but I'm pretty new to VBA and having trouble fixing it.
Option Explicit
Dim msPath As String
Dim msPictures() As String
Dim mlPicturesCnt As Long
Public Sub ImportJPGFiles()
On Error GoTo Err_ImportJPGFiles
Dim lngCount As Long
Dim lngPicture As Long
Dim strMsg As String
Dim sngBEGTime As Single
Dim sngENDTime As Single
'Assume JPG files are in same directory as
'as the Word document containing this macro.
msPath = Application.ActiveDocument.Path & "\"
lngCount = LoadPicturesArray
'Let user browse to correct folder if pictures aren't in the same
'folder as Word document
While lngCount < 0
strMsg = "Unable to find any JPG files in the following" & vbCrLf & _
"directory:" & vbCrLf & vbCrLf & _
msPath & vbCrLf & vbCrLf & _
"Press the 'OK' button if you want to browse to" & vbCrLf & _
"the directory containing your JPG files. Press" & vbCrLf & _
"the 'Cancel' button to end this macro."
If (MsgBox(strMsg, vbOKCancel + vbInformation, "Technical Difficulties")) = vbOK Then
With Application
.WindowState = wdWindowStateMinimize
msPath = BrowseForDirectory
.WindowState = wdWindowStateMaximize
End With
If LenB(msPath) <> 0 Then
If Right$(msPath, 1) <> "\" Then
msPath = msPath & "\"
End If
lngCount = LoadPicturesArray
Else
Exit Sub
End If
Else
Exit Sub
End If
Wend
Application.ScreenUpdating = False
sngBEGTime = Timer
For lngPicture = 0 To lngCount
Application.StatusBar = "Importing picture " & _
CStr(lngPicture + 1) & " of " & _
CStr(lngCount + 1) & " pictures..."
With Selection
.EndKey Unit:=wdStory
.MoveUp Unit:=wdLine, Count:=21, Extend:=wdExtend
.Copy
.EndKey Unit:=wdStory
.InsertBreak Type:=wdPageBreak
.Paste
.MoveUp Unit:=wdLine, Count:=24
.InlineShapes.AddPicture FileName:=msPath & msPictures(lngPicture), _
LinkToFile:=False, _
SaveWithDocument:=True
End With
Next lngPicture
sngENDTime = Timer
strMsg = "Import Statistics: " & vbCrLf & vbCrLf & _
"Pictures Imported: " & CStr(lngCount + 1) & vbCrLf & _
"Total Seconds: " & Format((sngENDTime - sngBEGTime), "###0.0") & vbCrLf & _
"Seconds/Picture: " & Format((sngENDTime - sngBEGTime) / (lngCount + 1), "###0.00")
MsgBox strMsg, , "Finished"
Exit_ImportJPGFiles:
With Application
.StatusBar = "Ready"
.ScreenUpdating = True
End With
Exit Sub
Err_ImportJPGFiles:
MsgBox Err.Number & " - " & Err.Description, , "ImportJPGFiles"
Resume Exit_ImportJPGFiles
End Sub
Public Function LoadPicturesArray() As Long
On Error GoTo Err_LoadPicturesArray
Dim strName As String
strName = Dir(msPath)
mlPicturesCnt = 0
ReDim msPictures(0)
Do While strName <> ""
If strName <> "." And strName <> ".." _
And strName <> "pagefile.sys" Then
If UCase(Right$(strName, 3)) = "JPG" Then
msPictures(mlPicturesCnt) = strName
mlPicturesCnt = mlPicturesCnt + 1
ReDim Preserve msPictures(mlPicturesCnt)
'Debug.Print strName
End If
End If
strName = Dir
Loop
Call QSort(msPictures, 0, mlPicturesCnt - 1)
' Dim i As Integer
' Debug.Print "----AFTER SORT----"
' For i = 0 To mlPicturesCnt - 1
' Debug.Print msPictures(i)
' Next i
LoadPicturesArray = mlPicturesCnt - 1
Exit_LoadPicturesArray:
Exit Function
Err_LoadPicturesArray:
MsgBox Err.Number & " - " & Err.Description, , "LoadPicturesArray"
Resume Exit_LoadPicturesArray
End Function
Public Sub QSort(ListArray() As String, lngBEGOfArray As Long, lngENDOfArray As Long)
Dim i As Long
Dim j As Long
Dim strPivot As String
Dim strTEMP As String
i = lngBEGOfArray
j = lngENDOfArray
strPivot = ListArray((lngBEGOfArray + lngENDOfArray) / 2)
While (i <= j)
While (ListArray(i) < strPivot And i < lngENDOfArray)
i = i + 1
Wend
While (strPivot < ListArray(j) And j > lngBEGOfArray)
j = j - 1
Wend
If (i <= j) Then
strTEMP = ListArray(i)
ListArray(i) = ListArray(j)
ListArray(j) = strTEMP
i = i + 1
j = j - 1
End If
Wend
If (lngBEGOfArray < j) Then QSort ListArray(), lngBEGOfArray, j
If (i < lngENDOfArray) Then QSort ListArray(), i, lngENDOfArray
End Sub

Splitting Word document into multiple .txt files using a macro

I am splitting a single MS Word document into multiple using a custom delimiter. I am able to create multiple files in MS Word format, but I want to create multiple .txt files instead.
The code that I am using now is:
Sub SplitNotes(delim As String, strFilename As String)
Dim doc As Document
Dim arrNotes
Dim I As Long
Dim X As Long
Dim Response As Integer
arrNotes = Split(ActiveDocument.Range, delim)
Response = MsgBox("This will split the document into " &
UBound(arrNotes) + 1 & " sections. Do you wish to proceed?", 4)
If Response = 7 Then Exit Sub
For I = LBound(arrNotes) To UBound(arrNotes)
If Trim(arrNotes(I)) <> "" Then
X = X + 1
Set doc = Documents.Add
doc.Range = arrNotes(I)
doc.SaveAs ThisDocument.Path & "\" & strFilename & Format(X, "000")
doc.Close True
End If
Next I
End Sub
Sub test()
' delimiter & filename
SplitNotes "%%%%%%%%%%%%%%", "Notes "
End Sub
Can anyone help me with this please?
Try this and see if it does what you want.
Sub SplitNotes(delim As String, strFilename As String)
Dim doc As Document
Dim arrNotes
Dim I As Long
Dim X As Long
Dim Response As Integer
arrNotes = Split(ActiveDocument.Range, delim)
Response = MsgBox("This will split the document into " & UBound(arrNotes) + 1 & " sections. Do you wish to proceed?", 4)
If Response = 7 Then Exit Sub
For I = LBound(arrNotes) To UBound(arrNotes)
If Trim(arrNotes(I)) <> "" Then
X = X + 1
Set doc = Documents.Add
doc.Range = arrNotes(I)
doc.SaveAs ThisDocument.Path & "\" & strFilename & ".txt"
doc.Close True
End If
Next I
End Sub
Sub test()
' delimiter & filename
SplitNotes "///", "Notes "
End Sub

Word VBA discrepancy between Instr() actual starting position of a string

I am using regex to find all pattern matches in a Word doc, which I will then manipulate.
The file I'm searching is ~330 pages long and includes copy/pasted emails. My problem is that when I use InStr(startPos, objRange.Text, match.submatches(0)) to find the starting position of each match, the result is actually offset by some amount. For the document in its original state, that offset happened to be 324 characters.
On a hunch, I decided to remove all the hyperlinks in the document to see what that would do. The RemoveHyperlinks sub found and removed 24 hyperlinks, after which the Instr() return value was off by only 20 characters (so that subtracting the magic number matchStart = matchStart - 1 - 20 gives the correct starting position). Obviously I want to avoid all magic numbers, but I cannot figure out where the last 20 characters are coming from.
I tried unlinking all fields, but there weren't any to unlink after the hyperlinks were removed.
Any thoughts on why
matchStart = InStr(startPos, objRange.Text, match.submatches(0))
matchEnd = matchStart + Len(match.submatches(0))
Set subRange0 = objDoc.Range(matchStart, matchEnd)
give me subRange0.Text different from match.submatches(0)? Or where the other hidden characters may be found (to be removed)?
Sub FixHighlightedText()
Dim objDoc As Document
Dim objRange As Range, subRange0 As Range
Dim matchStart As Long, matchEnd As Long, startPos As Long
Dim regex As Object
Dim matches
Set objDoc = ActiveDocument
Set objRange = objDoc.Range(0, objDoc.Content.End)
startPos = 1
Set regex = CreateObject("VBScript.RegExp")
Call RemoveHyperlinks
With regex
.Pattern = "((\([a-zA-Z]*?[-]?Time:.*?\})[a-zA-Z0-9]{0,3})"
.Global = True
End With
If regex.test(objRange.Text) Then
Set matches = regex.Execute(objRange.Text)
Debug.Print "Document has " & matches.Count & " matches"
Debug.Print "Document range is " & objRange.Start & " to " & objRange.End
Debug.Print "FirstIndex = " & matches(0).FirstIndex
For Each match In matches
matchStart = InStr(startPos, objRange.Text, match.submatches(0))
startPos = matchStart + Len(match.submatches(0))
If matchStart > 0 Then
matchStart = matchStart - 1
matchEnd = matchStart + Len(match.submatches(0))
Set subRange0 = objDoc.Range(matchStart, matchEnd)
Debug.Print "Match starts at " & matchStart & " and ends at " & (matchStart + Len(match.submatches(1)))
Debug.Print " match0 text = " & match.submatches(0)
Debug.Print " subrange0 text = " & subRange0.Text
Else
Debug.Print "Match mysteriously not found in text"
End If
Next match
Else
Debug.Print "No regex matches"
End If
End Sub
Sub RemoveHyperlinks()
Dim link, cnt As Long, linkRange As Range, i As Long
cnt = 0
For i = ActiveDocument.Hyperlinks.Count To 1 Step -1
With ActiveDocument.Hyperlinks(i)
.TextToDisplay = .TextToDisplay & " (" & .Address & ")"
Set linkRange = .Range
End With
ActiveDocument.Hyperlinks(i).Delete
With linkRange.Font
.Underline = wdUnderlineNone
.ColorIndex = wdAuto
End With
cnt = cnt + 1
Next i
Debug.Print "Removed " & cnt & " link(s)"
End Sub
Sub RemoveFields()
Dim cnt As Long, i As Long
cnt = 0
For i = ActiveDocument.Fields.Count To 1 Step -1
ActiveDocument.Fields(i).Unlink
cnt = cnt + 1
Next i
Debug.Print "Removed " & cnt & " field(s)"
End Sub
I ended up finding the hint to my answer in the selected answer to this question: vbscript: replace text in activedocument with hyperlink.
Essentially, Instr() does not play well with the WYSIWYG feature of Word, but the Find method will give selections with the proper ranges. No need to remove hyperlinks nor worry about other mysterious, hidden text.
The code would look like:
Sub FixHighlightedText()
Dim objDoc As Document
Dim objRange As Range
Dim startPos As Long
Dim regex As Object
Dim matches
Set objDoc = ActiveDocument
Set objRange = objDoc.Range
startPos = 1
Set regex = CreateObject("VBScript.RegExp")
With regex
.Pattern = "((\([a-zA-Z]*?[-]?Time:.*?\})[a-zA-Z0-9]{0,3})"
.Global = True
End With
If regex.test(objRange.Text) Then
Set matches = regex.Execute(objRange.Text)
Debug.Print "Document has " & matches.Count & " matches"
Debug.Print "Document range is " & objRange.Start & " to " & objRange.End
Debug.Print "FirstIndex = " & matches(0).FirstIndex
For Each match In matches
Set objRange = objDoc.Range(startPos, objDoc.Content.End)
With objRange.Find
.Text = match.submatches(0)
.MatchWholeWord = True
.MatchCase = True
.Wrap = wdFindStop
.Execute
End With
startPos = objRange.End
Debug.Print "Match starts at " & objRange.Start & " and ends at " & objRange.End
Debug.Print " match0 text = " & match.submatches(0)
Debug.Print " subrange text = " & objRange.Text
Next match
Else
Debug.Print "No regex matches"
End If
End Sub

Word VBA heading text cut short and function produces reversed results when called from Sub

Sorry for the two fold question in one post.
This indirectly relates to a question I posted recently here: vba: return page number from selection.find using text from array which was solved
Program purpose:
Firstly: add a footer with custom page numbers to documents (i.e. 0.0.0, Chapter.Section,Page representative) in a selected folder and sub folders.
Secondly: create a TOC with the custom page numbers saved as roottoc.docx in the root folder selected.
I now have two new problems before I can fully clean and finally put this to bed, I will post the full code at the end of this post.
Solved First of all, from what I have discovered and just read elsewhere too the getCrossReferenceItems(refTypeHeading) method will only return the text upto a certain length from what of finds. I have some pretty long headings which means this is quite an annoyance for the purpose of my code. So the first question I have is is there something I can do with the getCrossReferenceItems(refTypeHeading) method to force it to collect the full text from any referenced headings or is there an alternative way round this problem.
Solved Secondly the createOutline() function when called in ChooseFolder() produces the correct results but in reverse order, could someone point the way on this one too please.
Unfortunately the actual results I am recieving will be difficulty to exactly replicate but if a folder is made containing a couple of documents with various headings. The directory name should be the the same as what is in the Unit Array i.e. Unit(1) "Unit 1", the file names are made up of two parts i.e. Unit(1) & " " & Criteria(1) & ext becoming "Unit 1 p1.docx" etc, the arrays Unit and Criteria are in the ChooseFolder Sub. chapArr is a numerical representative of the Unit array contents soley for my page numbering system, I used another array because of laziness at this point in time. I could have used some other method on the Unit array to achieve the same result which I might look at when cleaning up.
When running the ChooseFolder Sub if the new folder with documents in is located in My Document then My Documents will be the folder to locate and select in the file dialogue window. This should produce results that are similar and will give an example of what I am talking about.
Complete code:
Public Sub ChooseFolder()
'Declare Variables
'|Applications|
Dim doc As Word.Document
'|Strings|
Dim chapNum As String
Dim sResult As String
Dim Filepath As String
Dim strText As String
Dim StrChapSec As String
'|Integers|
Dim secNum As Integer
Dim AckTime As Integer
Dim FolderChosen As Integer
'|Arrays|
Dim Unit() As Variant
Dim ChapArray() As Variant
Dim Criteria() As Variant
'|Ranges|
Dim rng As Range
'|Objects|
Dim InfoBox As Object
'|Dialogs|
Dim fd As FileDialog
'Constants
Const ext = ".docx"
'Set Variable Values
secNum = 0 'Set Section number start value
AckTime = 1 'Set the message box to close after 1 seconds
Set InfoBox = CreateObject("WScript.Shell") 'Set shell object
Set fd = Application.FileDialog(msoFileDialogFolderPicker) 'Set file dialog object
FolderChosen = fd.Show 'Display file dialogue
'Set Array Values
'ToDo: create form to set values for Arrays
'Folder names
Unit = Array("Unit 1", "Unit 2")
'Chapter Numbers
chapArr = Array("1", "2")
'Document names
Criteria = Array("P1", "P2", "P3", "P4", "P5", "P6", "P7", "P8", "P9", "M1", "M2", "M3", "M4", "D1", "D2", "D3")
If FolderChosen <> -1 Then
'didn't choose anything (clicked on CANCEL)
MsgBox "You chose cancel"
Else
'Set sResult equal to selected file/folder in file dialogue
sResult = fd.SelectedItems(1)
End If
' Loop through unit array items
For i = LBound(Unit) To UBound(Unit)
unitName = Unit(i)
' Test unit folder being looked at and concatenate sResult with
' unitName delimited with "\"
If unitName = "Unit 105" Then
Filepath = sResult & "\unit 9"
Else
Filepath = sResult & "\" & unitName
End If
' Loop through criteria array items
For j = LBound(Criteria) To UBound(Criteria)
criteriaName = Criteria(j)
' Set thisFile equal to full file path
thisfile = Filepath & "\" & unitName & " " & criteriaName & ext 'Create file name by concatenating filePath with "space" criteriaName and ext
' Test if file exists
If File_Exists(thisfile) = True Then
' If file exists do something (i.e. process number of pages/modify document start page number)
' Inform user of file being processed and close popup after 3 seconds
Select Case InfoBox.Popup("Processing file - " & thisfile, AckTime, "This is your Message Box", 0)
Case 1, -1
End Select
' Open document in word using generated filePath in read/write mode
' Process first section footer page number and amend to start as intPages (total pages) + 1
Set doc = Documents.Open(thisfile)
With doc
With ActiveDocument.Sections(1)
chapNum = chapArr(i)
secNum = secNum + 1
' Retrieve current footer text
strText = .Footers(wdHeaderFooterPrimary).Range.Text
.PageSetup.DifferentFirstPageHeaderFooter = False
' Set first page footer text to original text
.Footers(wdHeaderFooterFirstPage).Range.Text = strText
' Set other pages footer text
.Footers(wdHeaderFooterPrimary).Range.Text = Date & vbTab & "Author: Robert Ells" & vbTab & chapNum & "." & secNum & "."
Set rng = .Footers(wdHeaderFooterPrimary).Range.Duplicate
rng.Collapse wdCollapseEnd
rng.InsertBefore "{PAGE}"
TextToFields rng
End With
ActiveDocument.Sections(1).Footers(1).PageNumbers.StartingNumber = 1
Selection.Fields.Update
Hide_Field_Codes
ActiveDocument.Save
CreateOutline sResult, chapNum & "." & secNum & "."
End With
Else
'If file doesn't exist do something else (inform of non existant document and close popup after 3 seconds
Select Case InfoBox.Popup("File: " & thisfile & " - Does not exist", AckTime, "This is your Message Box", 0)
Case 1, -1
End Select
End If
Next
Filepath = ""
secNum = 0
Next
End Sub
Private Function TextToFields(rng1 As Range)
Dim c As Range
Dim fld As Field
Dim f As Integer
Dim rng2 As Range
Dim lFldStarts() As Long
Set rng2 = rng1.Duplicate
rng1.Document.ActiveWindow.View.ShowFieldCodes = True
For Each c In rng1.Characters
DoEvents
Select Case c.Text
Case "{"
ReDim Preserve lFldStarts(f)
lFldStarts(f) = c.Start
f = f + 1
Case "}"
f = f - 1
If f = 0 Then
rng2.Start = lFldStarts(f)
rng2.End = c.End
rng2.Characters.Last.Delete '{
rng2.Characters.First.Delete '}
Set fld = rng2.Fields.Add(rng2, , , False)
Set rng2 = fld.Code
TextToFields fld.Code
End If
Case Else
End Select
Next c
rng2.Expand wdStory
rng2.Fields.Update
rng1.Document.ActiveWindow.View.ShowFieldCodes = True
End Function
Private Function CreateOutline(Filepath, pgNum)
' from https://stackoverflow.com/questions/274814/getting-the-headings-from-a-word-document
'Declare Variables
'|Applications|
Dim App As Word.Application
Dim docSource As Word.Document
Dim docOutLine As Word.Document
'|Strings|
Dim strText As String
Dim strFileName As String
'|Integers|
Dim intLevel As Integer
Dim intItem As Integer
Dim minLevel As Integer
'|Arrays|
Dim strFootNum() As Integer
'|Ranges|
Dim rng As Word.Range
'|Variants|
Dim astrHeadings As Variant
Dim tabStops As Variant
'Set Variable values
Set docSource = ActiveDocument
If Not FileLocked(Filepath & "\" & "roottoc.docx") Then
If File_Exists(Filepath & "\" & "roottoc.docx") Then
Set docOutLine = Documents.Open(Filepath & "\" & "roottoc.docx", ReadOnly:=False)
Else
Set docOutLine = Document.Add
End If
End If
' Content returns only the
' main body of the document, not
' the headers and footer.
Set rng = docOutLine.Content
minLevel = 5 'levels above this value won't be copied.
astrHeadings = returnHeaderText(docSource) 'docSource.GetCrossReferenceItems(wdRefTypeHeading)
docSource.Select
ReDim strFootNum(0 To UBound(astrHeadings))
For i = 1 To UBound(astrHeadings)
With Selection.Find
.Text = Trim(astrHeadings(i))
.Wrap = wdFindContinue
End With
If Selection.Find.Execute = True Then
strFootNum(i) = Selection.Information(wdActiveEndPageNumber)
Else
MsgBox "No selection found", vbOKOnly 'Or whatever you want to do if it's not found'
End If
Selection.Move
Next
docOutLine.Select
With Selection.Paragraphs.tabStops
'.Add Position:=InchesToPoints(2), Alignment:=wdAlignTabLeft
.Add Position:=InchesToPoints(6), Alignment:=wdAlignTabRight, Leader:=wdTabLeaderDots
End With
For intItem = LBound(astrHeadings) To UBound(astrHeadings)
' Get the text and the level.
' strText = Trim$(astrHeadings(intItem))
intLevel = GetLevel(CStr(astrHeadings(intItem)))
' Test which heading is selected and indent accordingly
If intLevel <= minLevel Then
If intLevel = "1" Then
strText = " " & Trim$(astrHeadings(intItem)) & vbTab & pgNum & strFootNum(intItem) & vbCr
End If
If intLevel = "2" Then
strText = " " & Trim$(astrHeadings(intItem)) & vbTab & pgNum & strFootNum(intItem) & vbCr
End If
If intLevel = "3" Then
strText = " " & Trim$(astrHeadings(intItem)) & vbTab & pgNum & strFootNum(intItem) & vbCr
End If
If intLevel = "4" Then
strText = " " & Trim$(astrHeadings(intItem)) & vbTab & pgNum & strFootNum(intItem) & vbCr
End If
If intLevel = "5" Then
strText = " " & Trim$(astrHeadings(intItem)) & vbTab & pgNum & strFootNum(intItem) & vbCr
End If
' Add the text to the document.
rng.Collapse (False)
rng.InsertAfter strText & vbLf
docOutLine.SelectAllEditableRanges
' tab stop to set at 15.24 cm
'With Selection.Paragraphs.tabStops
' .Add Position:=InchesToPoints(6), _
' Leader:=wdTabLeaderDots, Alignment:=wdAlignTabRight
' .Add Position:=InchesToPoints(2), Alignment:=wdAlignTabCenter
'End With
rng.Collapse (False)
End If
Next intItem
docSource.Close
docOutLine.Save
docOutLine.Close
End Function
Function returnHeaderText(doc As Word.Document) As Variant
Dim returnArray() As Variant
Dim para As Word.Paragraph
Dim i As Integer
i = 0
For Each para In doc.Paragraphs
If Left(para.Style, 7) = "Heading" Then
ReDim Preserve returnArray(i)
returnArray(i) = para.Range.Text
i = i + 1
End If
Next
returnHeaderText = returnArray
End Function
Function FileLocked(strFileName As String) As Boolean
On Error Resume Next
' If the file is already opened by another process,
' and the specified type of access is not allowed,
' the Open operation fails and an error occurs.
Open strFileName For Binary Access Read Write Lock Read Write As #1
Close #1
' If an error occurs, the document is currently open.
If Err.Number <> 0 Then
' Display the error number and description.
MsgBox "Error #" & Str(Err.Number) & " - " & Err.Description
FileLocked = True
Err.Clear
End If
End Function
Private Function GetLevel(strItem As String) As Integer
' from https://stackoverflow.com/questions/274814/getting-the-headings-from-a-word-document
' Return the heading level of a header from the
' array returned by Word.
' The number of leading spaces indicates the
' outline level (2 spaces per level: H1 has
' 0 spaces, H2 has 2 spaces, H3 has 4 spaces.
Dim strTemp As String
Dim strOriginal As String
Dim intDiff As Integer
' Get rid of all trailing spaces.
strOriginal = RTrim$(strItem)
' Trim leading spaces, and then compare with
' the original.
strTemp = LTrim$(strOriginal)
' Subtract to find the number of
' leading spaces in the original string.
intDiff = Len(strOriginal) - Len(strTemp)
GetLevel = (intDiff / 2) + 1
End Function
Private Function File_Exists(ByVal sPathName As String, Optional Directory As Boolean) As Boolean
'Returns True if the passed sPathName exist
'Otherwise returns False
On Error Resume Next
If sPathName <> "" Then
If IsMissing(Directory) Or Directory = False Then
File_Exists = (Dir$(sPathName) <> "")
Else
File_Exists = (Dir$(sPathName, vbDirectory) <> "")
End If
End If
End Function
Sub Hide_Field_Codes()
Application.ActiveWindow.View.ShowFieldCodes = False
End Sub
Kevin's Solutions:
Question part 1, Answer
I thought initially that something went wrong when I added your function, but it was due to a blank heading on the following line after the actual heading in the documents. I suppose an If statement to test if there is text present could solve this. :-)
I haven't tested this bit yet (due to being tired), but if the heading is inline with normal text, would this function pick up only the heading or both heading and normal text?
Question part 2, Answer
Just worked, although with one niggle (the list produced is no longer indented as desired in the main CreateOutline function). Time is getting on now so will have to pick this up again tomorrow :-)
Thanks yet again kevin, this is where I should have concentrated more during programming at uni instead of thinking about the pub.
Phil :-)
welcome back! :-)
For the reversed data from the CreateOutline function - change your Collapse function to have a false parameter. Collapse defaults to putting the cursor at the beginning of the selection, but this will put it at the end so you're adding to the end of the doc instead of the beginning:
' Add the text to the document.
rng.Collapse(False) 'HERE'
rng.InsertAfter strText & vbLf
docOutLine.SelectAllEditableRanges
rng.Collapse(False) 'AND HERE'
For the CrossReferenceItems issue, try this and let me know if there's any data missing from what it returns. Call this instead of the CrossReferenceItems method:
Function returnHeaderText(doc As Word.Document) As Variant
Dim returnArray() As Variant
Dim para As Word.Paragraph
Dim i As Integer
i = 0
For Each para In doc.Paragraphs
If Left(para.Style, 7) = "Heading" Then
ReDim Preserve returnArray(i)
returnArray(i) = para.Range.Text
i = i + 1
End If
Next
returnHeaderText = returnArray
End Function