Iterate through paragraphs and trim spaces in MS Word - vba

I need to create a macros which removes whitespaces and indent before all paragraphs in the active MS Word document. I've tried following:
For Each p In ActiveDocument.Paragraphs
p.Range.Text = Trim(p.range.Text)
Next p
which sets macros into eternal loop. If I try to assign string literal to the paragraphs, vba always creates only 1 paragraph:
For Each p In ActiveDocument.Paragraphs
p.Range.Text = "test"
Next p
I think I have a general misconception about paragraph object. I would appreciate any enlightment on the subject.

The reason the code in the question is looping is because replacing one paragraph with the processed (trimmed) text is changing the paragraphs collection. So the code will continually process the same paragraph at some point.
This is normal behavior with objects that are getting deleted and recreated "behind the scenes". The way to work around it is to loop the collection from the end to the front:
For i = ActiveDocument.Paragraphs.Count To 1 Step -1
Set p = ActiveDocument.Paragraphs(i)
p.Range.Text = Trim(p.Range.Text)
Next
That said, if the paragraphs in the document contain any formatting this will be lost. String processing does not retain formatting.
An alternative would be to check the first character of each paragraph for the kinds of characters you consider to be "white space". If present, extend the range until no more of these characters are detected, and delete. That will leave the formatting intact. (Since this does not change the entire paragraph a "normal" loop works.)
Sub TestTrimParas()
Dim p As Word.Paragraph
Dim i As Long
Dim rng As Word.Range
For Each p In ActiveDocument.Paragraphs
Set rng = p.Range.Characters.First
'Test for a space or TAB character
If rng.Text = " " Or rng.Text = Chr(9) Then
i = rng.MoveEndWhile(" " + Chr(9))
Debug.Print i
rng.Delete
End If
Next p
End Sub

You could, of course, do this in a fraction of the time without a loop, using nothing fancier than Find/Replace. For example:
Find = ^p^w
Replace = ^p
and
Find = ^w^p
Replace = ^p
As a macro this becomes:
Sub Demo()
Application.ScreenUpdating = False
With ActiveDocument.Range
.InsertBefore vbCr
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchWildcards = False
.Text = "^p^w"
.Replacement.Text = "^p"
.Execute Replace:=wdReplaceAll
.Text = "^w^p"
.Execute Replace:=wdReplaceAll
End With
.Characters.First.Text = vbNullString
End With
Application.ScreenUpdating = True
End Sub
Note also that trimming text the way you're doing is liable to destroy all intra-paragraph formatting, cross-reference fields, and the like; it also won't change indents. Indents can be removed by selecting the entire document and changing the paragraph format; better still, modify the underlying Styles (assuming they've been used correctly).

Entering "eternal" loop is a bit unpleasant. Only Chuck Norris can exit one. Anyway, try to make a check before trimming and it will not enter:
Sub TestMe()
Dim p As Paragraph
For Each p In ThisDocument.Paragraphs
If p.Range <> Trim(p.Range) Then p.Range = Trim(p.Range)
Next p
End Sub

As has been said by #Cindy Meister, I need to prevent endless creation of another paragraphs by trimming them. I bear in mind that paragraph range contains at least 1 character, so processing range - 1 character would be safe. Following has worked for me
Sub ProcessParagraphs()
Set docContent = ActiveDocument.Content
' replace TAB symbols throughout the document to single space (trim does not remove TAB)
docContent.Find.Execute FindText:=vbTab, ReplaceWith:=" ", Replace:=wdReplaceAll
For Each p In ActiveDocument.Paragraphs
' delete empty paragraph (delete operation is safe, we cannot enter enternal loop here)
If Len(p.range.Text) = 1 Then
p.range.Delete
' remove whitespaces
Else
Set thisRg = p.range
' shrink range by 1 character
thisRg.MoveEnd wdCharacter, -1
thisRg.Text = Trim(thisRg.Text)
End If
p.LeftIndent = 0
p.FirstLineIndent = 0
p.Reset
p.range.Font.Reset
Next
With Selection
.ClearFormatting
End With
End Sub

I saw a number of solutions here are what worked for me. Note I turn off track changes and then revert back to original document tracking status.
I hope this helps some.
Option Explicit
Public Function TrimParagraphSpaces()
Dim TrackChangeStatus: TrackChangeStatus = ActiveDocument.TrackRevisions
ActiveDocument.TrackRevisions = False
Dim oPara As Paragraph
For Each oPara In ActiveDocument.StoryRanges(wdMainTextStory).Paragraphs
Dim oRange As Range: Set oRange = oPara.Range
Dim endRange, startRange As Range
Set startRange = oRange.Characters.First
Do While (startRange = Space(1))
startRange.Delete 'Remove last space in each paragraphs
Set startRange = oRange.Characters.First
Loop
Set endRange = oRange
' NOTE: for end range must select the before last characted. endRange.characters.Last returns the chr(13) return
endRange.SetRange Start:=oRange.End - 2, End:=oRange.End - 1
Do While (endRange = Space(1))
'endRange.Delete 'NOTE delete somehow does not work for the last paragraph
endRange.Text = "" 'Remove last space in each paragraphs
Set endRange = oPara.Range
endRange.SetRange Start:=oRange.End - 1, End:=oRange.End
Loop
Next
ActiveDocument.TrackRevisions = TrackChangeStatus
End Function

Related

How to print row of found string?

I'd like to find several strings within Word document and for each string found, I like to print (debug.print for example) the whole row content where the string is found, not the paragraph.
How can I do this? Thanks
Sub FindStrings
Dim StringsArr (1 to 3)
StringsArr = Array("string1","string2","string3")
For i=1 to 3
With
Selection.Find
.ClearFormatting
.Text = Strings(i)
Debug.Print CurrentRow 'here I need help
End With
Next
End Sub
The term Row in Word is used only in the context of a table. I assume the term you mean is Line, as in a line of text.
The Word object model has no concept of "line" (or "page") due to the dynamic layout algorithm: anything the user does, even changing the printer, could change where a line or a page breaks over. Since these things are dynamic, there's no object.
The only context where "line" can be used is in connection with a Selection. For example, it's possible to extend a Selection to the start and/or end of a line. Incorporating this into the code in the question it would look something like:
Sub FindStrings()
Dim StringsArr As Variant
Dim bFound As Boolean
Dim rng As Word.Range
Set rng = ActiveDocument.content
StringsArr = Array("string1", "string2", "string3")
For i = LBound(StringsArr) To UBound(StringsArr)
With rng.Find
.ClearFormatting
.Text = StringsArr(i)
.Wrap = wdFindStop
bFound = .Execute
'extend the selection to the start and end of the current line
Do While bFound
rng.Select
Selection.MoveStart wdLine, -1
Selection.MoveEnd wdLine, 1
Debug.Print Selection.Text
rng.Collapse wdCollapseEnd
bFound = .Execute
Loop
End With
Set rng = ActiveDocument.content
Next
End Sub
Notes
Since it's easier to control when having to loop numerous times, a Range object is used as the basic search object, rather than Selection. The found Range is only selected for the purpose of getting the entire line as these "Move" methods for lines only work on a Selection.
Before the loop can continue, the Range (or, if we were working with a selection, the selection) needs to be "collapsed" so that the code does not search and find the same instance of the search term, again. (This is also the reason for Wrap = wdFindStop).

Identify Invalid Characters

I need to loop through each letter of a word document and compare each against a list of valid characters. If the current character is not in the list, its font color should be changed.
I am a newbie to VBA and have written a small looping code but it takes very long to go through even a small word file. Code is below -
Sub LoopThruFile()
Dim doc As Document
Dim CurrChar As String
Application.ScreenUpdating = False
Set doc = ActiveDocument
For i = 1 To doc.Range.Characters.Count
CurrChar = doc.Range.Characters(i)
If InStr("01234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ,.-_()/#:&\%", CurrChar) = 0 Then
doc.Range.Characters(i).Font.ColorIndex = wdRed
End If
Next
Application.ScreenUpdating = True
End Sub
Is there a better and faster code to do this?
Looping individual characters is slow. One thing that could speed performance in your example is to reduce the number of hierarchy levels working against the Range directly:
Dim doc as Document
Dim docRange as Range
Dim CurrChar as String
Set doc = ActiveDocument
Set docRange = doc.Content
Note that Document.Range is actually a method that expects two parameters; Document.Content automatically returns the entire Range as a property, so would be more correct.
Then, the Character object is actually a Range object. The VBA has to convert your line of code to include the Text property in order to assign CurrChar to a String. Probably doesn't make a lot of difference, but more correct and probably a little faster:
CurrChar = docRange.Characters(i).Text
Sometimes a loop can be faster if you run from the end of the document to the beginning:
For i = docRange.Characters.Count to 1 Step -1
CurrChar = docRange.Characters(i).Text
If InStr("01234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ,.-_()/#:&\%", CurrChar) = 0 Then
docRange.Characters(i).Font.ColorIndex = wdRed
End If
Next
You could also try using For Each, which might be the fastest in this scenario since you save multiple calls of doc.Range.Characters(i) each of which consumes resources.
Dim CurrChar as Range
For Each CurrChar in docRange.Characters
If InStr("01234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ,.-_()/#:&\%", CurrChar.Text) = 0 Then
CurrChar.Font.ColorIndex = wdRed
End If
Next
The following worked for me. Try this:
With ActiveDocument.Content.Find
.ClearFormatting
.Replacement.ClearFormatting
.text = "[!abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0-9\,\.\-_\(\)\/\\\#\:\&\%]"
.Replacement.text = ""
.Replacement.Font.Color = wdColorRed
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With

Smart quotations aren't recognized by InStr()

I have a code like so:
Sub MoveToBeginningSentence()
Application.ScreenUpdating = False
Dim selectedWords As Range
Dim selectedText As String
Const punctuation As String = " & Chr(145) & "
On Error GoTo ErrorReport
' Cancel macro when there's no text selected
Selection.Cut
Selection.MoveLeft Unit:=wdSentence, Count:=1, Extend:=wdMove
Selection.MoveRight Unit:=wdCharacter, Count:=1, Extend:=wdExtend
Set selectedWords = Selection.Range
selectedText = selectedWords
If InStr(selectedText, punctuation) = 0 Then
Selection.MoveLeft Unit:=wdSentence, Count:=1, Extend:=wdMove
Selection.Paste
Else
Selection.MoveLeft Unit:=wdSentence, Count:=1, Extend:=wdMove
Selection.Paste
Selection.Paste
Selection.Paste
Selection.Paste
End If
ErrorReport:
End Sub
Basically, it help me move whatever text I have selected to the beginning of the sentence in Word. If there's no quotation mark, then paste once. If there is a quote mark, paste 4 times.
The problem is regardless of whether there's any quotation there or not, it will only paste once. If I set the macro to detect any other character, it will work fine. But every single time I try to force it to detect smart quotations, it will fail.
Is there any way to fix it?
Working with the Selection object is always a bit chancy; on the whole, it's better to work with a Range object. You can have only one Selection; you can have as many Ranges as you need.
Because your code uses the Selection object it's not 100% clear what the code does. Based on my best guess, I put together the following example which you can tweak if it's not exactly right.
At the beginning, I check whether there's something in the selection, or it's a blinking insertion point. If no text is selected, the macro ends. This is better than invoking Error handling, then not handling anything: If other problems crop up in your code, you wouldn't know about them.
A Range object is instantiated for the selection - there's no need to "cut" it, as you'll see further along. Based on this, the entire sentence is also assigned to a Range object. The text of the sentence is picked up, then the sentence's Range is "collapsed" to its starting point. (Think of this like pressing the left arrow on the keyboard.)
Now the sentence's text is checked for the character Chr(145). If it's not there, the original selection's text (including formatting) is added at the beginning of the sentence. If it's there, then it's added four times.
Finally, the original selection is deleted.
Sub MoveToBeginningSentence()
Application.ScreenUpdating = False
Dim selectedText As String
Dim punctuation As String
punctuation = Chr(145) ' ‘ "smart" apostrophe
Dim selRange As word.Range
Dim curSentence As word.Range
Dim i As Long
' Cancel macro when there's no text selected
If Selection.Type = wdSelectionIP Then Exit Sub
Set selRange = Selection.Range
Set curSentence = selRange.Sentences(1)
selectedText = curSentence.Text
curSentence.Collapse wdCollapseStart
If InStr(selectedText, punctuation) = 0 Then
curSentence.FormattedText = selRange.FormattedText
Else
For i = 1 To 4
curSentence.FormattedText = selRange.FormattedText
curSentence.Collapse wdCollapseEnd
Next
End If
selRange.Delete
End Sub
Please check out this code.
Sub MoveToBeginningSentence()
' 19 Jan 2018
Dim Rng As Range
Dim SelText As String
Dim Repeats As Integer
Dim i As Integer
With Selection.Range
SelText = .Text ' copy the selected text
Set Rng = .Sentences(1) ' identify the current sentence
End With
If Len(SelText) Then ' Skip when no text is selected
With Rng
Application.ScreenUpdating = False
Selection.Range.Text = "" ' delete the selected text
Repeats = IIf(IsQuote(.Text), 4, 1)
If Repeats = 4 Then .MoveStart wdCharacter, 1
For i = 1 To Repeats
.Text = SelText & .Text
Next i
Application.ScreenUpdating = True
End With
Else
MsgBox "Please select some text.", _
vbExclamation, "Selection is empty"
End If
End Sub
Private Function IsQuote(Txt As String) As Boolean
' 19 Jan 2018
Dim Quotes
Dim Ch As Long
Dim i As Long
Quotes = Array(34, 147, 148, -24143, -24144)
Ch = Asc(Txt)
' Debug.Print Ch ' read ASCII code of first character
For i = 0 To UBound(Quotes)
If Ch = Quotes(i) Then Exit For
Next i
IsQuote = (i <= UBound(Quotes))
End Function
The approach taken is to identify the first character of the selected sentence using the ASC() function. For a normal quotation mark that would be 34. In my test I came up with -24143 and -24144 (opening and closing). I couldn't identify Chr(145) but found MS stating that curly quotation marks are Chr(147) and Chr(148) respectively. Therefore I added a function that checks all of them. If you enable the line Debug.Print Ch in the function the character code actually found will be printed to the immediate window. You might add more character codes to the array Quotes.
The code itself doesn't consider spaces between words. Perhaps Word will take care of that, and perhaps you don't need it.
You need to supply InStr with the starting position as a first parameter:
If InStr(1, selectedText, punctuation) = 0 Then
Also
Const punctuation As String = " & Chr(145) & "
is going to search for space-ampersand-space-Chr(145)-space-ampersand-space. If you want to search for the smart quote character then use
Const punctuation As String = Chr(145)
Hope that helps.

Deleting Empty Paragraphs in Word Using VBA: Not All Empty Paragraphs Deleted

I wrote a macro to delete all the empty paragraphs in my document, but it exhibits weird behavior: If there are a number of empty paragraphs at the very end of the document, about half of them are deleted. Repeatedly running the macro gradually eliminates the empty paragraphs until only one empty paragraph remains. Even if there is a boundary condition so that I need a line of code to delete the last paragraph, but I still don't understand why only half of the empty paragraphs at the end are deleted. Can anyone explain why this is happening and how to correct this behavior? As an aside, I searched online and saw numerous posts about detecting paragraph markers (^p, ^13, and others, but only searching vbCr worked, which is another minor puzzle.)
Sub Delete_Empty__Paras_2() 'This macro looks for empty paragraphs and deletes them.
Dim original_num_of_paras_in_doc As Integer
Dim num_of_deleted_paras As Integer
original_num_of_paras_in_doc = ActiveDocument.Paragraphs.Count 'Count the number of paragraphs in the document to start
num_of_deleted_paras = 0 'In the beginning, no paragraphs have been deleted
Selection.HomeKey Unit:=wdStory 'Go to the beginning of the document.
For current_para_number = 1 To original_num_of_paras_in_doc 'Process each paragraph in the document, one by one.
If current_para_number + num_of_deleted_paras > original_num_of_paras_in_doc Then 'Stop processing paragraphs when the loop has processed every paragraph.
Exit For
Else 'If the system just deleted the 3rd paragraph of the document because
' it's empty, the next paragraph processed is the 3rd one again,
'so when we iterate the counter, we have to subtract the number of deleted paragraphs to account for this.
Set paraRange = ActiveDocument.Paragraphs(current_para_number - num_of_deleted_paras).Range
paratext = paraRange.Text
If paratext = vbCr Then 'Is the paragraph empty? (By the way, checking for vbCr is the only method that worked for checking for empty paras.)
paratext = "" 'Delete the paragraph.
ActiveDocument.Paragraphs(current_para_number - num_of_deleted_paras).Range.Text = paratext
num_of_deleted_paras = num_of_deleted_paras + 1 'Iterate the count of deleted paras.
End If
End If
Next current_para_number
End Sub
This code will delete all blank paragraphs...
Sub RemoveBlankParas()
Dim oDoc As Word.Document
Dim i As Long
Dim oRng As Range
Dim lParas As Long
Set oDoc = ActiveDocument
lParas = oDoc.Paragraphs.Count ' Total paragraph count
Set oRng = ActiveDocument.Range
For i = lParas To 1 Step -1
oRng.Select
lEnd = lEnd + oRng.Paragraphs.Count ' Keep track of how many processed
If Len(ActiveDocument.Paragraphs(i).Range.Text) = 1 Then
ActiveDocument.Paragraphs(i).Range.Delete
End If
Next i
Set para = Nothing
Set oDoc = Nothing
Exit Sub
End Sub
You can replace the paragraph marks:
ActiveDocument.Range.Find.Execute FindText:="^p^p", ReplaceWith:="^p", Replace:=wdReplaceAll
ActiveDocument.Range.Find.Execute "^p^p", , , , , , , , , "^p", wdReplaceAll ' might be needed more than once

Word VBA: Get Range between Consecutive Headings

I looked up some examples, but I cannot quite understand how the Range object works. I am trying to loop through each of my headings (of level 4) and have a nested loop that looks through all the tables in between the headings. I cannot figure out how to set that specific range, so any help will be greatly appreciated.
Dim myHeadings As Variant
myHeadings = ActiveDocument.GetCrossReferenceItems(wdRefTypeHeading)
For iCount = LBound(myHeadings) To UBound(myHeadings)
level = getLevel(CStr(myHeadings(iCount)))
If level = 4 Then
'This is where I want to set a range between myHeadings(iCount) to myHeadings(iCount+1)
set aRange = ??
End If
Next iCount
You are on the right track here. The myHeadings variable you have simply gives a list of the strings of the Level 4 Headings in the document. What you need to do is then search the document for those strings to get the range of the Level 4 Headings.
Once you have the range of each of the headings you can check for the tables in the range between these headings. I've modified your code slightly to do this. Also note its good practice to put Option Explicit at the top of your module to ensure all variables are declared.
My code will tell you how many tables are between each of the Level 4 headings. NOTE: It does not check between the last heading and the end of the document, I'll leave that up to you ;)
Sub DoMyHeadings()
Dim iCount As Integer, iL4Count As Integer, Level As Integer, itabCount As Integer
Dim myHeadings As Variant, tbl As Table
Dim Level4Heading() As Range, rTableRange As Range
myHeadings = ActiveDocument.GetCrossReferenceItems(wdRefTypeHeading)
'We want to move to the start of the document so we can loop through the headings
Selection.HomeKey Unit:=wdStory
For iCount = LBound(myHeadings) To UBound(myHeadings)
Level = getLevel(CStr(myHeadings(iCount)))
If Level = 4 Then
'We can now search the document to find the ranges of the level 4 headings
With Selection.Find
.ClearFormatting 'Always clear find formatting
.Style = ActiveDocument.Styles("Heading 4") 'Set the heading style
.Text = VBA.Trim$(myHeadings(iCount)) 'This is the heading text (trim to remove spaces)
.Replacement.Text = "" 'We are not replacing the text
.Forward = True 'Move forward so we can each consecutive heading
.Wrap = wdFindContinue 'Continue to the next find
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Execute
End With
'Just make sure the text matches (it should be I have a habit of double checking
If Selection.Text = VBA.Trim$(myHeadings(iCount)) Then
iL4Count = iL4Count + 1 'Keep a counter for the L4 headings for redim
ReDim Preserve Level4Heading(1 To iL4Count) 'Redim the array keeping existing values
Set Level4Heading(iL4Count) = Selection.Range 'Set the range you've just picked up to the array
End If
End If
Next iCount
'Now we want to loop through all the Level4 Heading Ranges
For iCount = LBound(Level4Heading) To UBound(Level4Heading) - 1
'Reset the table counter
itabCount = 0
'Use the start of the current heading and next heading to get the range in between which will contain the tables
Set rTableRange = ActiveDocument.Range(Level4Heading(iCount).Start, Level4Heading(iCount + 1).Start)
'Now you have set the range in the document between the headings you can loop through
For Each tbl In rTableRange.Tables
'This is where you can work your table magic
itabCount = itabCount + 1
Next tbl
'Display the number of tables
MsgBox "You have " & itabCount & " table(s) between heading " & Level4Heading(iCount).Text & " And " & Level4Heading(iCount + 1).Text
Next iCount
End Sub
You could jump from one heading to the next using Goto. See below how to loop through level 4 headings.
Dim heading As Range
Set heading = ActiveDocument.Range(start:=0, End:=0)
Do ' Loop through headings
Dim current As Long
current = heading.start
Set heading = heading.GoTo(What:=wdGoToHeading, Which:=wdGoToNext)
If heading.start = current Then
' We haven't moved because there are no more headings
Exit Do
End If
If heading.Paragraphs(1).OutlineLevel = wdOutlineLevel4 Then
' Now this is a level 4 heading. Let's do something with it.
' heading.Expand Unit:=wdParagraph
' Debug.Print heading.Text
End If
Loop
Don't look specifically for "Heading 4" because,
one may use non built-in styles,
it would not work with international versions of Word.
Check the wdOutlineLevel4 instead.
Now, to get the range for the whole level 4, here is a little known trick:
Dim rTableRange as Range
' rTableRange will encompass the region under the current/preceding heading
Set rTableRange = heading.GoTo(What:=wdGoToBookmark, Name:="\HeadingLevel")
This will work better for the last heading 4 in the document or the last one below a heading 3.