VBA emulation of WinWord's file compare for strings - vba

I'm replacing certain strings throughout a WinWord document with strings of a slightly different spelling with revision tracking being enabled.
Revision tracking will mark the whole original string as being deleted and the whole replacement string as being inserted. Anybody who is reviewing the text and wanting to know why a certain string has been replaced will have to visually compare both strings, even if they differ only in one or two characters.
I would much prefer if only the differing characters are being marked as revisions. That would probably mean that I have to emulate WinWords file compare function, albeit not for whole files but for strings within a given file. Has anybody already tried anything like that or a good idea of how to approach this task? (I know, it is possible to copy original and replacement string into 2 empty files, compare the files and use the result, but with hundreds of strings within a single file this is no practical solution.)

One way to do this is to be more selective in how you're actually replacing the words. The built in Find/Replace works by replacing the entire word, but by utilising VBA you can be more specific about what is being replaced. So we still use the built in Find function to identify the words to be replaced, but then you could iterate through each character in the word and compare against your replacement text therefore only replacing what is necessary. I've commented to below code to explain how an example of this would work.
This would produce output as in the below examples, only highlighting the actual characters that have been changed.
Sub replaceDifferencesOnly()
Dim findText As String: findText = "Analyze"
Dim replaceText As String: replaceText = "Analyse"
Dim found As Boolean: found = True
' Using the built-in 'Find' Function, loop through each instance of
' findText within the Document.
While found
With Selection.Find
.Text = findText
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = True
.MatchWholeWord = True
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
' If findText is found within the Document...
If Selection.Find.Execute = True Then
Dim match As Range
Dim char As Range
Dim position As Integer: position = 1
' Transfer the Selections Range into a seperate Range object.
Set match = Selection.Range
For Each char In match.Characters
Select Case Len(replaceText)
Case Is >= position
' If findText and replaceText are currently of an equal length...
' Simply compare and replace differing character.
If Not char.Text = Mid(replaceText, position, 1) Then
char.InsertBefore Mid(replaceText, position, 1)
' Inserting a Character will extend the size of the current 'character' or Range
' Move the start position of the range on so as to only delete the original unwanted character.
char.MoveStart
char.Delete wdCharacter
End If
Case Is < position
' If replaceText is shorter than findText...
' Simply delete the remaining unwanted characters in findText.
char.Delete
End Select
position = position + 1
Next char
If position <= Len(replaceText) Then
' Finally if replaceText is longer than findText ie. we've processed each original character in
' findText but there are still more characters in replaceText...
' Simply append the remaining characetrs within replaceText to the end of the Range
match.InsertAfter Mid(replaceText, position, (Len(replaceText) - position) + 1)
End If
Else
' No match from Find so exit the routine as there is nothing more to replace.
found = False
End If
Wend
' Clear the current Selected text.
Selection.Collapse
End Sub

Related

Find/Replace an Inserted Check Box Symbol with a Check Box Content Control

I would like to find/replace all inserted check box symbols with checkbox content controls. The symbol's font is Wingdings (either 111 or 168). Below is the code I started with, but I hit a wall when I realized that Word find doesn't recognize the symbol. I appreciate any help or guidance. Thank you.
Sub ReplaceUnicode168()
Dim objContentControl As ContentControl
With ActiveDocument
Set objContentControl = ActiveDocument.ContentControls.Add(wdContentControlCheckBox)
objContentControl.Cut
With Selection.Find
.ClearFormatting
.Replacement.ClearFormatting
.Forward = True
.Wrap = wdFindContinue
.MatchCase = False
.MatchWholeWord = True
.MatchWildcards = False
.Text = Chr(168)
.Replacement.Text = "^c"
.Execute Replace:=wdReplaceAll
End With
End With
End Sub
I suggest that you try to find/replace these two particular characters using
.Text = ChrW(61551)
for the "111" WingDings Character and
.Text = ChrW(61608)
for the "168" WingDings character.
Be aware that the way Word encodes these characters is not very helpful. As far as Find/Replace is concerned, you have to use these Unicode Private Use Area encodings.
If you actually select the character and use VBA to discover its code using e.g.
Debug.Print AscW(Selection)
the answer is always 40 (and the Font of the character will probably be the same as the Surrounding font) Pretty useless. In older versions of Word you used to be able to look for the 40 character and find these characters, but I don't think that's possible now. But if you select the character and use
Sub SymInfo()
With Dialogs(wdDialogInsertSymbol)
' You won't see .Font and .CharNum listed under the
' properties of a Word.Dialog - some older Dialogs add
' per-Dialog properties at runtime.
Debug.Print .Font
Debug.Print .CharNum
End With
End Sub
Then you get the font name (Wingdings in this case) and the private use area character number, except it's expressed as a negative number (-3928 for Wingdings 168). The character to use in the Find/Replace is 65536-3928 = 61608.
Alternatively, you can find the private use area code by selecting the character, getting its WordOpenXML code, then finding the XML element that gives the code (and the font). Ideally use MSXML to look for the element but the following gives the general idea.
Sub getSymElement
Dim finish As Long
Dim start As Long
Dim x As String
x = Selection.WordOpenXML
start = Instr(1,x,"<w:sym")
' Should check for start = 0 (not found) here.
finish = Instr(start,x,">")
Debug.Print Mid(x,start, finish + 1 - start)
and for the 168 character you should see something like
<w:sym w:font="Wingdings" w:char="F0A8"/>
(Hex F0A8 is 61608)
There may be a problem where Word could potentially map more than one font/code to the same unicode private use area codepoint. There is some further code by Dave Rado here but I do not think you will need it for this particular problem.
After some follow-up, the following seems to work reasonably well here:
Sub replaceWingdingsWithCCs()
Dim cc As Word.ContentControl
Dim charcode As Variant
Dim ccchecked As Variant
Dim i As Integer
Dim r As Word.Range
' Make sure the selection point is not in the way
' (If the selection contains one of the characters you are trying to
' replace, Word will raise an error about the selection being in a
' plain text content control.
' If the first item in the document is not a CC,
' it's enouugh to do this:
ActiveDocument.Select
Selection.Collapse WdCollapseDirection.wdCollapseStart
' Put the character codes you need to look for here. Maybe you have some checked boxes too?
charcode = Array(61551, 61608)
' FOr each code, say whether you want a checked box (True) or an unchecked one.
ccchecked = Array(False, False)
For i = LBound(charcode) To UBound(charcode)
Set r = ActiveDocument.Range
With r.Find
.ClearFormatting
With .Replacement
.ClearFormatting
.Text = ""
End With
.Forward = True
.Wrap = wdFindStop
.MatchCase = False
.MatchWholeWord = True
.MatchWildcards = False
.Text = ChrW(charcode(i))
Do While .Execute(Replace:=True)
Set cc = r.ContentControls.Add(WdContentControlType.wdContentControlCheckBox)
cc.Checked = ccchecked(i)
r.End = r.Document.Range.End
r.Start = cc.Range.End + 1
Set cc = Nothing
Loop
End With
Next
Set r = Nothing
End Sub

VBA Word - .Find "[space]" always find matches outside the selection range thus loops undefinitely

When converting a table from PDF to word, I ended up with a format similar to the following:
([space] is a space character)
Text [space.spacing 10pts] Text [space.spacing 30pts] Text
Text [space.spacing 14pts] Text [space.spacing 31pts] Text
Text [space.spacing 12pts] Text [space.spacing 33pts] Text
Instead of a regular table with 3 columns and 3 rows containing each « Text » such as below
Text
Text
Text
Text
Text
Text
Text
Text
Text
In other words, instead of creating a column, the PDF conversion has created a regular paragraph, mimicking columns by adjusting [spaces].spacing according to the length of the text within the column.
So my inital thought was that it should be possible to recreate a table by identifing the spacing of each space for each paragraph of the converted table, eventually replacing them with identifiable symbols so I can convert the text into a table later on.
My idea was somewhat the following :
' For each paragraph of the selected text (which is the converted table)
' Find all [space] within the paragraph range
' If a [space] is found, check its spacing
' 1st case : [space].spacing is <= 1 pts (so a normal space)
' Do nothing
' 2nd case : [space].spacing is >= 10 pts (so previous Text is supposed to be within a small column)
' insert ££ (symbol for small column)
' 3rd case [space].spacing is >= 30 pts (so previous Text is supposed to be within a small column)
' insert §§ (symbol for large column)
' Once all [space] are found within the current paragraph, do the same with the next paragraph, until the last paragraph of the selected text
My current code is the following :
Private Sub Test()
Dim RngSearch As Range
Dim RngCurrent As Range
Dim Paragraph As Paragraph
For Each Paragraph In ActiveDocument.Paragraphs
Set RngCurrent = Paragraph.Range
RngCurrent.Select 'For testing purposes
With RngCurrent.Find
.Text = " "
Do While RngCurrent.Find.Execute
RngCurrent.Select 'For testing purposes
Select Case RngCurrent.Font.Spacing
Case Is >= 30
RngCurrent.Font.Spacing = 1
RngCurrent.InsertAfter ("§§")
Case Is >= 10
RngCurrent.Font.Spacing = 1
RngCurrent.InsertAfter ("¤")
Case Else
' Do Nothing
End Select
Loop
End With
Next Paragraph
End Sub
So it kinda word with one issue : it loops infinitely. Each time the text is finished, it goes back again indefinitely.
I managed to track the issue to the following code :
With RngCurrent.Find
.Text = " "
Do While RngCurrent.Find.Execute
RngCurrent.Select
' Use Case function
Loop
End With
Without it, the looping through paragraphs works normally (it ends at the last paragraph)
For Each Paragraph In ActiveDocument.Paragraphs
Set RngCurrent = Paragraph.Range
RngCurrent.Select
' Code here
Next Paragraph
But once .find.text (" ") is injected, it actually doesn't look within each Paragraphs.Range anymore as I supposed Do While RngCurrent.Find.Execute should have established.
I feel like the solution is something very stupid, but I've been searching for the reason why or alternatives for 2 days now. Everytime, it stops acting as per my understading when I'm using .find(" ").
I already tried using .wrap = wdFindStop, but it stops at the first match within the paragraph, and goes to the next paragraph prematurely.
With RngCurrent.Find
.Text = " "
.wrap = wdFindStop
Do While RngCurrent.Find.Execute
RngCurrent.Select
' Use Case function
Loop
End With
Strangely .wrap = wdFindAsk doesn't ask me anything... maybe that means something.
I believe it's because there are always spaces within each paragraph ? So it can loops indefinitely?
You're way over-complicating things:
Sub MakeTable()
Application.ScreenUpdating = False
Dim i As Single
With Selection
i = .Characters.First.Font.Size
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Wrap = wdFindStop
.Text = " "
.Replacement.Text = "^t"
.Replacement.Font.Size = i
.Font.Size = 10
.Execute Replace:=wdReplaceAll
.Font.Size = 30
.Execute Replace:=wdReplaceAll
End With
.ConvertToTable Separator:=vbTab
End With
Application.ScreenUpdating = True
End Sub
So I finally found not exactly a solution but a workaround for anyone who may need a similar solution. Instead of using a .find =" ", I decided to go the "hard" path and check for every word in a paragraph (which in MS Word, seems to end with a [space] character). Then, I check for the last character of a word (which is often a space) if its spacing is superior to a value. It the case, do something.
For Each RngWord In Paragraph.Range.Words
Set RngChar = RngWord.Characters.Last
Select Case RngChar.Font.Spacing
Case Is > 300
RngChar.Font.Spacing = 1
RngChar.InsertAfter ("£")
Case Is > 100
RngChar.Font.Spacing = 1
RngChar.InsertAfter ("#")
Case Is > 15
RngChar.Font.Spacing = 1
RngChar.InsertAfter ("¤")
Case Else
' Do Nothing
End Select
Next RngWord
It does the job, and isn't that slow, but I guess there are better solution :)

MS Word VBA Find Variable-Length Pattern String

Question: Is there a way to specify a repeating pattern of variable but bounded length in the Find.Text argument?
Background:
I have a collection of Word documents, each containing several hundred pages of numbered text blocks. I want to copy each block of text into its own cell in a spreadsheet, but the text blocks aren't in Ordered or Multi-Level Lists and each block of text may contain multiple paragraphs, so I can't simply select and copy each paragraph in the document. To work around this, I've tried to use the Range.Find method to locate two adjacent number headings and copy all the characters between them. For testing purposes, I'm using the following sample document:
The paragraph header numbers can be 2-5 levels deep, with 1-2 digits in each level (i.e. "x.x." through "xx.xx.xx.xx.xx."). I'm using a wildcard search of the form "xx.xx.", relying on the placement of the decimal points to identify the headers. Here's my code:
'Open the Word document
Doc = CStr(folderPath & objFile.Name)
Set wDoc = wApp.Documents.Open(Doc)
Set wRange = wDoc.Range
RngEnd = wRange.End
'Search for text block
With wRange
Do While i < 7 And subRngStart2 < RngEnd
With .Find 'Search for starting keyword
.ClearFormatting
.Text = "[0-9]{1,2}.[0-9]{1,2}."
.Forward = True
'.Format = True
.MatchWildcards = True
.MatchCase = False
.Execute
End With
If .Find.Found = True Then
subRngStart1 = wRange.Start 'Mark starting position
wRange.SetRange Start:=subRngStart1 + 6, End:=RngEnd 'Reset range starting at end of keyword
contentFlag = True
Else
contentFlag = False
End If
With .Find 'Search for ending keyword
.ClearFormatting
.Text = "[0-9]{1,2}.[0-9]{1,2}."
.Forward = True
.MatchWildcards = True
.MatchCase = False
.Execute
End With
If .Find.Found = True Then
subRngStart2 = wRange.Start 'Mark ending position
Else
subRngStart2 = RngEnd
End If
wRange.SetRange Start:=subRngStart1, End:=subRngStart2 'Set range between first and second keywords
'Copy text in range to Excel
If contentFlag = True Then
Cells(i + 1, 1) = wRange.Text
End If
wRange.SetRange Start:=subRngStart2 - 3, End:=RngEnd 'Reset range starting at last keyword
i = i + 1
Loop
End With
This works fine for headers up to 3 levels but breaks down beyond that: the "Long Headers" example gets split in half because the search thinks the first two levels in the string form a complete text block (Row 7 in the output sample below).
I could just increase the starting offset (first IF statement, second line) from 6 to 10 to "skip over" long number strings, but this can cause problems with very short headers. I think the proper way to fix this is to search for a pattern of the form "xx.xx." which may repeat up to 4 consecutive times. I've tried a couple of variations on the wildcard string to achieve this, including:
.Text = "[0-9]{1,2}.*[0-9]{1,2}."
.Text = "[0-9]{1,2}.[0-9]{0,2}[.]{0,1}[0-9]{1,2}."
But these either don't do what I want or fail to compile (I'm guessing a min length of zero isn't allowed in wildcard charlists). Is it possible to specify variable-length patterns in Find.Text, or do I need to take a completely different approach?

How can I replace multiple tables and text style within a range/selection in Word-VBA?

So, I am working with VBA on a word template which for every item (requirements in this case) contains a table with different specifications (all the tables are in the same format) and some other information. Below each table I have a text which shows the status of each item like: status: Approved or Work, or Rejected etc. I am asked to delete all the other statuses in the template and keep only the "Rejected" status and the whole information and table with that has this status to format in a light grey. Does anybody has any idea how to navigate to all tables, information, and specify the section I need to Format? I am very new to this and I am completely stucked! Here's some code I wrote:
Sub DeleteWorkflow()
Selection.Find.ClearFormatting
Selection.Find.Style = ActiveDocument.Styles("Normal")
Selection.Find.Replacement.ClearFormatting
Selection.Find.Replacement.Font.Italic = False
With Selection.Find.Replacement.ParagraphFormat
.SpaceBefore = 0
.SpaceBeforeAuto = False
.SpaceAfter = 0
.SpaceAfterAuto = False
End With
With Selection.Find
.Text = "Status: Approved"
.Text = "Status: Work"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.Execute
'Finds status "Rejected" and changes the font color
Selection.Find.ClearFormatting
With Selection.Find
.Text = "Status: Rejected"
.Forward = True
.Wrap = Word.WdFindWrap.wdFindContinue
.Font.ColorIndex = wdGray50
Selection.Find.Execute
End With
The code to find the rejected status and to change its color is not working and I am not getting it why. Any idea?
Basis of the idea
The idea is to look through the sentences of the word document. Sentences comprise regular text and also text contained within tables.
As you load all the sentences in a single object in VBA, you can look through the content of the document sentences by sentences and perform an action on it.
We can also apply that type of search to tables within the document, if the text they contain match the characters you want.
The code
For sentences
Sub SENTENCE_CHANGE_COLOR()
Dim i As Long
Dim oSentences As Sentences
'Here we instantiate the variable oSentences to store all the values of the current opened document
Set oSentences = ThisDocument.Sentences
' We loop through every fields of the document
For i = 1 To oSentences.Count
' The property .Text contains the text of the item in it
' Then we just have to look for the text within the string of characters
If InStr(oSentences.Item(i).Text, "Status: Rejected") Then
'Do some stuff, like changing the color
oSentences.Item(i).Font.ColorIndex = wdGray50
else
' Do some other things like changing the color to a different color
oSentences.Item(i).Font.ColorIndex = wdGray25
End If
Next i
End Sub
For tables
Sub TABLE_CHANGE_COLOR()
Dim i As Long
Dim oTables As Tables
'Here we instantiate the variable oTables to store all the tables of the current opened document
Set oTables = ThisDocument.Tables
' We loop through every fields of the document
For i = 1 To oTables.Count
' Finding the occurence of the text in the table
If Not InStr(oTables.Item(i).Range.Text, "Status: Rejected") = 0 Then
'Do some stuff, like changing the color
oTables.Item(i).Range.Font.ColorIndex = wdGray50
End If
Next i
End Sub
Combination of the above methods
After we found the occurrence of a "Status: Rejected" document we can select the table right before it by comparing the table's end to the start of the occurrence.
Beware since the following code would modify any table before "Status: rejected". So if "Status: rejected" is input in an incorrect location, it will modify the previous table wherever this table will be in the document.
Sub REJECTED_TABLE_CHANGE_COLOR()
Dim i As Long, j As Long
Dim oSentences As Sentences
Dim oTables As Tables
'Here we instantiate the variable oSentences to store all the values of the current opened document
Set oSentences = ThisDocument.Sentences
'Here we instantiate the variable oTables to store all the tables of the current opened document
Set oTables = ThisDocument.Tables
' We loop through every fields of the document
For i = 1 To oSentences.Count
' The property .Text contains the text of the item in it
' Then we just have to look for the text within the string of characters
If InStr(oSentences.Item(i).Text, "Status: Rejected") Then
' When we have found the correct text, we try to find the table just above it
' We start from the last table
' This condition ensures we do not start looking for before the first table
If oTables.Item(1).Range.End < oSentences.Item(i).Start Then
j = oTables.Count
While oTables.Item(j).Range.End > oSentences.Item(i).Start
j = j - 1
Wend
oTables.Item(j).Range.Font.ColorIndex = wdGray50
End If
End If
Next i
End Sub
This solution would provide you the basis to edit the document when the matching criteria is found within an item.

How Do I Find All Acronyms in an MS Word Document Using a Macro?

I have a document with many acronyms that need to be captured and put into an acronyms table at the end of the document.
The term acronym has various meanings. I'd like to create a table that has all of the words that are initialized; two or more capitalized letters that are short for a longer meaning. I.e., CD-ROM, USB, SYNC, MMR, ASCAP, etc.
How do I create a macro to do this?
Something like this might get you started. Add a reference to "Microsoft VBScript Regular Expressions" (Edit Macro: Tools > References). This library is the file, "vbscript.dll".
You may need to adjust the regexp if all your acronyms aren't only upper-case letters (eg some may contain numbers).
Sub Acronyms()
Dim dict, k, tmp
Dim regEx, Match, Matches
Dim rngRange As Range
Set regEx = New RegExp
Set dict = CreateObject("scripting.dictionary")
regEx.Pattern = "[A-Z]{2,}" '2 or more upper-case letters
regEx.IgnoreCase = False
regEx.Global = True
Set Matches = regEx.Execute(ActiveDocument.Range.Text)
For Each Match In Matches
tmp = Match.Value
If Not dict.Exists(tmp) Then dict.Add tmp, 0
dict(tmp) = dict(tmp) + 1
Next
For Each k In dict.Keys
Debug.Print k, dict(k)
Next k
End Sub
Thanks Tim, your code works great!
If it will be of any use to others, the pattern [A-Z]{1,}([a-z]*|\&|\.*)[A-Z]{1,} will find more acronyms...
(I do not have permission to post comments, hence adding this as answer)
Edit (still no way to add comments): \b[A-Z]{1,}([a-z*]|\&|\.|\-)[A-Z]{1,}\b is more robust, but will fail if the last character of the acronym is not capitalized.
I have found the following works well (where some business name acronyms are tolerable). I use this to test data entries in Access, it should also work for a Word document range.
objRegExp.Pattern = "([A-Z]{1,}((\&(?![A-Z]\s[\w]{3})\w*)+|\.\w*)+)|[A-Z]{2,}(?![A-Z]*\s[A-Z]{1}[a-z])"
J&K =Match
JK&S =Match
J.S.S =Match
JK&S.K =Match
JSK =Match
JK =Match
DKD And Sons =No Match
J&K Engineering =No Match
PKF Rogers and Associates =No Match
I use RegExHero to test my expressions
I used the following to find abbreviations in my PhD thesis. They were all in "()".
regEx.Pattern = "\([A-Z]{1,}([a-z]*|\&|\.|\-*)[A-Z]{1,}\)"
You will be running a macro on the main Word document. Open a separate Word document that is blank. This will be used to store discovered the acronyms.
Press "Record Macro". Choose a unique name, and assign a shortcut key such
as CTRL + ALT + A.
Open the Find dialogue (CTRL + F). Paste the following search text:
<[A-Z]{2,}>. In the Find dialogue, choose "More" > check the box for "Use Wildcards". Click the Find Next button.
Right-click on the selected text, being careful not to change the
highlight. Select Copy from the context menu.
Navigate to the separate Word document (ALT + TAB, select the Word
document). Paste the copied text, and hit Enter. ALT + TAB back to
the original Word document.
Close the find dialogue and click the right arrow once. This moves
the cursor off the highlighted text, and readies it for the next
search.
Stop the macro recording.
You now have a macro that finds a word containing two or more capitalized letters, and saves the text to a separate document. In order to search for the remaining acronyms, press CTRL + ALT + A continuously until the end of the document has been reached. Or, edit the macro, and add while a loop.
Here is what the macro looks like (without the loop):
Sub GetAcronyms()
Selection.Find.ClearFormatting
With Selection.Find
.Text = "<[A-Z]{2,}>"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute
Selection.Copy
Windows("Document1.docx").Activate
Selection.PasteAndFormat (wdPasteDefault)
Selection.TypeParagraph
Windows("TheOriginalDocument.docx").Activate
Selection.MoveRight Unit:=wdCharacter, Count:=1
End Sub