How Do I Find All Acronyms in an MS Word Document Using a Macro? - vba

I have a document with many acronyms that need to be captured and put into an acronyms table at the end of the document.
The term acronym has various meanings. I'd like to create a table that has all of the words that are initialized; two or more capitalized letters that are short for a longer meaning. I.e., CD-ROM, USB, SYNC, MMR, ASCAP, etc.
How do I create a macro to do this?

Something like this might get you started. Add a reference to "Microsoft VBScript Regular Expressions" (Edit Macro: Tools > References). This library is the file, "vbscript.dll".
You may need to adjust the regexp if all your acronyms aren't only upper-case letters (eg some may contain numbers).
Sub Acronyms()
Dim dict, k, tmp
Dim regEx, Match, Matches
Dim rngRange As Range
Set regEx = New RegExp
Set dict = CreateObject("scripting.dictionary")
regEx.Pattern = "[A-Z]{2,}" '2 or more upper-case letters
regEx.IgnoreCase = False
regEx.Global = True
Set Matches = regEx.Execute(ActiveDocument.Range.Text)
For Each Match In Matches
tmp = Match.Value
If Not dict.Exists(tmp) Then dict.Add tmp, 0
dict(tmp) = dict(tmp) + 1
Next
For Each k In dict.Keys
Debug.Print k, dict(k)
Next k
End Sub

Thanks Tim, your code works great!
If it will be of any use to others, the pattern [A-Z]{1,}([a-z]*|\&|\.*)[A-Z]{1,} will find more acronyms...
(I do not have permission to post comments, hence adding this as answer)
Edit (still no way to add comments): \b[A-Z]{1,}([a-z*]|\&|\.|\-)[A-Z]{1,}\b is more robust, but will fail if the last character of the acronym is not capitalized.

I have found the following works well (where some business name acronyms are tolerable). I use this to test data entries in Access, it should also work for a Word document range.
objRegExp.Pattern = "([A-Z]{1,}((\&(?![A-Z]\s[\w]{3})\w*)+|\.\w*)+)|[A-Z]{2,}(?![A-Z]*\s[A-Z]{1}[a-z])"
J&K =Match
JK&S =Match
J.S.S =Match
JK&S.K =Match
JSK =Match
JK =Match
DKD And Sons =No Match
J&K Engineering =No Match
PKF Rogers and Associates =No Match
I use RegExHero to test my expressions

I used the following to find abbreviations in my PhD thesis. They were all in "()".
regEx.Pattern = "\([A-Z]{1,}([a-z]*|\&|\.|\-*)[A-Z]{1,}\)"

You will be running a macro on the main Word document. Open a separate Word document that is blank. This will be used to store discovered the acronyms.
Press "Record Macro". Choose a unique name, and assign a shortcut key such
as CTRL + ALT + A.
Open the Find dialogue (CTRL + F). Paste the following search text:
<[A-Z]{2,}>. In the Find dialogue, choose "More" > check the box for "Use Wildcards". Click the Find Next button.
Right-click on the selected text, being careful not to change the
highlight. Select Copy from the context menu.
Navigate to the separate Word document (ALT + TAB, select the Word
document). Paste the copied text, and hit Enter. ALT + TAB back to
the original Word document.
Close the find dialogue and click the right arrow once. This moves
the cursor off the highlighted text, and readies it for the next
search.
Stop the macro recording.
You now have a macro that finds a word containing two or more capitalized letters, and saves the text to a separate document. In order to search for the remaining acronyms, press CTRL + ALT + A continuously until the end of the document has been reached. Or, edit the macro, and add while a loop.
Here is what the macro looks like (without the loop):
Sub GetAcronyms()
Selection.Find.ClearFormatting
With Selection.Find
.Text = "<[A-Z]{2,}>"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute
Selection.Copy
Windows("Document1.docx").Activate
Selection.PasteAndFormat (wdPasteDefault)
Selection.TypeParagraph
Windows("TheOriginalDocument.docx").Activate
Selection.MoveRight Unit:=wdCharacter, Count:=1
End Sub

Related

VBA Word - .Find "[space]" always find matches outside the selection range thus loops undefinitely

When converting a table from PDF to word, I ended up with a format similar to the following:
([space] is a space character)
Text [space.spacing 10pts] Text [space.spacing 30pts] Text
Text [space.spacing 14pts] Text [space.spacing 31pts] Text
Text [space.spacing 12pts] Text [space.spacing 33pts] Text
Instead of a regular table with 3 columns and 3 rows containing each « Text » such as below
Text
Text
Text
Text
Text
Text
Text
Text
Text
In other words, instead of creating a column, the PDF conversion has created a regular paragraph, mimicking columns by adjusting [spaces].spacing according to the length of the text within the column.
So my inital thought was that it should be possible to recreate a table by identifing the spacing of each space for each paragraph of the converted table, eventually replacing them with identifiable symbols so I can convert the text into a table later on.
My idea was somewhat the following :
' For each paragraph of the selected text (which is the converted table)
' Find all [space] within the paragraph range
' If a [space] is found, check its spacing
' 1st case : [space].spacing is <= 1 pts (so a normal space)
' Do nothing
' 2nd case : [space].spacing is >= 10 pts (so previous Text is supposed to be within a small column)
' insert ££ (symbol for small column)
' 3rd case [space].spacing is >= 30 pts (so previous Text is supposed to be within a small column)
' insert §§ (symbol for large column)
' Once all [space] are found within the current paragraph, do the same with the next paragraph, until the last paragraph of the selected text
My current code is the following :
Private Sub Test()
Dim RngSearch As Range
Dim RngCurrent As Range
Dim Paragraph As Paragraph
For Each Paragraph In ActiveDocument.Paragraphs
Set RngCurrent = Paragraph.Range
RngCurrent.Select 'For testing purposes
With RngCurrent.Find
.Text = " "
Do While RngCurrent.Find.Execute
RngCurrent.Select 'For testing purposes
Select Case RngCurrent.Font.Spacing
Case Is >= 30
RngCurrent.Font.Spacing = 1
RngCurrent.InsertAfter ("§§")
Case Is >= 10
RngCurrent.Font.Spacing = 1
RngCurrent.InsertAfter ("¤")
Case Else
' Do Nothing
End Select
Loop
End With
Next Paragraph
End Sub
So it kinda word with one issue : it loops infinitely. Each time the text is finished, it goes back again indefinitely.
I managed to track the issue to the following code :
With RngCurrent.Find
.Text = " "
Do While RngCurrent.Find.Execute
RngCurrent.Select
' Use Case function
Loop
End With
Without it, the looping through paragraphs works normally (it ends at the last paragraph)
For Each Paragraph In ActiveDocument.Paragraphs
Set RngCurrent = Paragraph.Range
RngCurrent.Select
' Code here
Next Paragraph
But once .find.text (" ") is injected, it actually doesn't look within each Paragraphs.Range anymore as I supposed Do While RngCurrent.Find.Execute should have established.
I feel like the solution is something very stupid, but I've been searching for the reason why or alternatives for 2 days now. Everytime, it stops acting as per my understading when I'm using .find(" ").
I already tried using .wrap = wdFindStop, but it stops at the first match within the paragraph, and goes to the next paragraph prematurely.
With RngCurrent.Find
.Text = " "
.wrap = wdFindStop
Do While RngCurrent.Find.Execute
RngCurrent.Select
' Use Case function
Loop
End With
Strangely .wrap = wdFindAsk doesn't ask me anything... maybe that means something.
I believe it's because there are always spaces within each paragraph ? So it can loops indefinitely?
You're way over-complicating things:
Sub MakeTable()
Application.ScreenUpdating = False
Dim i As Single
With Selection
i = .Characters.First.Font.Size
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Wrap = wdFindStop
.Text = " "
.Replacement.Text = "^t"
.Replacement.Font.Size = i
.Font.Size = 10
.Execute Replace:=wdReplaceAll
.Font.Size = 30
.Execute Replace:=wdReplaceAll
End With
.ConvertToTable Separator:=vbTab
End With
Application.ScreenUpdating = True
End Sub
So I finally found not exactly a solution but a workaround for anyone who may need a similar solution. Instead of using a .find =" ", I decided to go the "hard" path and check for every word in a paragraph (which in MS Word, seems to end with a [space] character). Then, I check for the last character of a word (which is often a space) if its spacing is superior to a value. It the case, do something.
For Each RngWord In Paragraph.Range.Words
Set RngChar = RngWord.Characters.Last
Select Case RngChar.Font.Spacing
Case Is > 300
RngChar.Font.Spacing = 1
RngChar.InsertAfter ("£")
Case Is > 100
RngChar.Font.Spacing = 1
RngChar.InsertAfter ("#")
Case Is > 15
RngChar.Font.Spacing = 1
RngChar.InsertAfter ("¤")
Case Else
' Do Nothing
End Select
Next RngWord
It does the job, and isn't that slow, but I guess there are better solution :)

Find String and then apply formatting to preceding number of characters

I have a large document in word, around 800 pages, which I need to apply some formating after finding a person's name within the text.
The text will look like the below.
01/01/1999, 09:00 - Joe Bloggs:
I want to find "Joe Bloggs" in the entire document and then apply formatting to make it bold and Tahoma font to not only the name but the date and time before the name which I believe is 20 characters.
Can anybody help with what may be a simple problem?
Just using Find/Replace in Word (no VBA) you can
Open Find/Replace
Click the More button to expose the Use Wildcards option
Select the Use Wildcards option
For Find What enter ([!¤]{20,20}Joe Bloggs)
For Replace with enter \1 then click the Format button to change to Bold Tahoma. You should see Tahoma, bold below the replace box.
Now do normal Find Next, Replace, Replace All as you see fit.
Note: the ¤ character in the Find What search can be any character you're sure is NOT in your target text.
This will work for what you described (Joe Bloggs: and move 20 characters to the left).
Public Sub Main()
Dim oRng As Range
With Selection
'''move selectiont to start of document
.HomeKey Unit:=wdStory
With .Find
.Text = "Joe Bloggs:"
.ClearFormatting
.MatchWildcards = False
.Wrap = wdFindStop
Do While .Execute
Set oRng = Selection.Range
With oRng
.MoveStart Unit:=wdCharacter, Count:=-20
.Font.Name = "Tahoma"
End With
Selection.MoveRight
Loop
End With
End With
End Sub

VBA - WORD Deleting rows after and before specific word

I'm trying to clean up my Word document using VBA.
What i need to do is to find a specific word (usually a website) then select the line it is in and then select and then remove text line above(only 1 line), the lines under that website line as well (sometimes more than 2 - if the text is longer). I'll try to show you how the line looks now.
Something happend at someplace!
website.com 08.01.2019
Something happend at someplace and it was a bad person doing it.
He used spaces instead of tabs in his code.
TAG-important stuff
The website 99% of times doesn't show in the 1st line, so im trying to find the 2nd line.
There are other websites and texts i would like to keep (so it would skip newsbetter.com)
In every document there are about 30-100 pharagraphs like the one I've typed earlier (the ones do delete)
I've been searching on the internet for a possible solution but they usually are for Excel. I think that strings are not working for me here.
Sub ScratchMacroII()
Dim oRng As Word.Range
Set oRng = ActiveDocument.Range
With oRng.Find
.Text = "news.pl"
While .Execute
While oRng.Find.Found
oRng.Select
Selection.Expand Unit:=wdParagraph
Selection.Delete
Wend
End With
End Sub
I expected the result to delete the whole pharagraph, but it justs deletes one line and leaves the other ones. I need some pointers since I'm new at VBA.
The following code, based on the sample in the question, searches the term from the beginning to the end of the document. When found, the paragraphs following and preceding the term are deleted. The search Range is then set to the document content following the found instance so that the same instance is not picked up repeatedly.
Note that I included Find.Wrap = wdFindStop to prevent the code from cycling through the document again. It's also necessary to repeat the Execute method within the loop, rather than trying to loop on it. While...Wend is an old type of loop; preferred is Do While...Loop.
Sub ScratchMacroII()
Dim oRng As Word.Range
Dim para As Word.Paragraph
Dim found As Boolean
Set oRng = ActiveDocument.Range
With oRng.Find
.Text = "news.pl"
.wrap = wdFindStop
found = .Execute
Do While found
Set para = oRng.Next(wdParagraph, 1).Paragraphs(1)
para.Range.Delete
Set para = oRng.Next(wdParagraph, -1).Paragraphs(1)
para.Range.Delete
oRng.Collapse wdCollapseEnd
oRng.End = ActiveDocument.content.End
found = oRng.Find.Execute
Loop
End With
End Sub

VBA Compare positions of two words in a Word document

I'm writing a module for defining acronyms in Word documents. The script gets the acronym and definition from an Excel document. What I'm having trouble with is comparing the location of the first instance of the acronym with the location of the first instance of the full definition.
Ultimately, I need to make sure the first instance of the acronym occurs immediately after the first instance of the definition, enclosed in parentheses. After this is done, the script will need to remove subsequent instances of the definition, so I also need to figure out how to remove all but the first instance of a definition.
The end result should look something like this:
....This document is about software as a service (SaaS). SaaS is software that is hosted by someone else. Rather than installing it on your own computer, you access it through a Web browser. There are many types of SaaS.
....
How can I get the positions of these two elements and or compare their positions?
In the example above, how would I find the first instance of "SaaS" and make sure it occurred exactly two positions after (space, open parentheses) the definition (assuming the definition actually appears in the document)?
'Selects first instance of acronym. Get start and end positions of first instance of acronym.
Selection.HomeKey Unit:=wdStory
Selection.Find.Execute Acronym 'Acronym is a variable. Now that it's selected, I need to get it's start position (or the position of the cursor if the cursor is at the start of the acronym) or find a way to compare it's position to the UserSelection variable.
'Is the definition in the document?
'If no, add definition before first instance of acronym.
'If yes, get start and end positions of first instance of definition.
'Is end position of first instance of definition adjacent to start position of first instance of acronym? If not, which is first?
'If definition is first, add acronym behind definition.
'If acronym is first, add definition in front of acronym and delete remaining instances of definition.
'Highlights all instances of the acronym in green
With ActiveDocument.Content.Find
.MatchCase = True
.MatchWholeWord = False
With .Replacement
.Highlight = True
End With
.Execute Replace:=wdReplaceAll, Wrap:=wdFindContinue, FindText:=Acronym, ReplaceWith:=Acronym
End With
Any help or insight would be appreciated, as I'm completely at a loss and having no luck with Google.
-Vince
I think the following code snippet can help you:
Sub example(acronym, definition)
Selection.Find.ClearFormatting
With Selection.Find
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchAllWordForms = False
End With
ActiveDocument.Range(0, 0).Select ' go to beginning of document
Selection.Find.Text = acronym
Selection.MatchSoundsLike = False
If (Not Selection.Find.Execute) Then
' acronym not found in this document
Exit Sub
End If
ActiveDocument.Range(0, 0).Select ' go to beginning of document
Selection.Find.Text = definition
Selection.MatchCase = False
Selection.MatchSoundsLike = True
While (Selection.Find.Execute)
'
Selection.Delete ' delete all definitions
'
Wend
ActiveDocument.Range(0, 0).Select ' go to beginning of document
Selection.Find.Text = acronym
Selection.MatchSoundsLike = False
If (Selection.Find.Execute) Then
Selection.InsertBefore "(" & definition & ")"
End If
End Sub
Note: I also found that authors make mistakes in the definitions (minor variations) and even an extra unintended space messes up the find.

deleting certain lines in ms word 2007

I would like to delete certain lines from my word document using a VBA macro. Basically the (block of) text to be deleted (and replaced by "***") follows a certain pattern (below).
Bottom of Form
perma-link
Top of Form
save
Bottom of Form
[+] ....
[–] ....
Top of Form
"...." represents text that changes every block, but for sure the line starts with "[+]" or "[-]".
Please suggest a suitable macro
EDIT: In the screenshot, I would like to keep the text in yellow and delete the rest. (in the actual file, the text isn't in yellow)
PS-FYI, I tried using the example looping a find and delete row macro (for line by line deletion) but i get a runtime error 5941 with debugging option highlighting the line "selection.row.delete" in the macro.
What does this mean?
Assuming that the example list is a list of paragraphs beginnings the following code should do the trick. What you have to do is to place all 'paragraphs starting' into array arrRemove as I did for the test. If any of the mark is a special marks (see this link for additional information) you need to add \ in front of it as I did for [+] and [-]. Hope this is what you are looking for.
Sub Macro2()
Dim arrRemove As Variant
arrRemove = Array("Bottom of Form", "perma -link", "Top of Form", _
"\[+\]", "\[\-\]", "Donec", "In")
Dim i!
For i = 0 To UBound(arrRemove)
Activedocument.Range(0,0).select
Selection.Find.ClearFormatting
With Selection.Find
.Text = arrRemove(i) & "*^13"
.Replacement.Text = "" 'replace with nothing
.Forward = True
.Wrap = wdFindContinue
.MatchCase = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
Next i
End Sub
The above macro will remove all yellow paragraph in the following document.