Scraping Word with Excel VBA - vba

I have a Word document full of newspaper articles. Each newspaper article is preceded by the article title and the string "Length:", which is followed by the number of words in the article (i.e. "Length: 1500 words"). I simply need an Excel Macro that will comb the Word document and extract the length value for each article - placing these values in an Excel column.
Through my Googling, I found this: Extract Data from Word Document to an Excel SpreadSheet
This is almost what I need, but it only returns the first article length value found by the search. How do I modify the code to find every article length value, return these values to an Excel column and then terminate?

The code to which you link is not particularly robust. I've extracted the assignment to the cell in Excel (ExR(1, 1) = WDR ' place at Excel cursor) and built more robust Word code around it.
The code uses the Word Range object instead of Selection. This is more efficient, more predictable and the screen won't jump around. The Find uses a wildcard search for the specific text, plus the digits between "Length " and " words". Since a successful Find includes the found Range, all that's necessary is to assign the Range's Text to the cell in Excel.
The Find plus assignment is built into a LOOP, which runs as long as Find.Execute is successful. For the cell assignment in Excel a COUNTER is incremented in each loop so you don't need to hard-code the target cell indices.
Dim strFind As String
Dim rngFind As word.Range 'or As Object if you don't set a Reference to the Word object library
Dim bFound As Boolean
Dim iCellCounter As Long
strFind = "Length: [0-9]{1;} words"
bFound = False
iCellCounter = 1
Set rngFind = WApp.ActiveDocument.Content
With rngFind.Find
.ClearAllFuzzyOptions
.ClearFormatting
.ClearHitHighlight
.Format = False
.MatchWildcards = True
.Text = strFind
.wrap = wdFindStop '0 if you don't use a Reference to the Word object library
Do
bFound = .Execute
If bFound Then
ExR(1, iCellCounter) = rngFind.Text
iCellCounter = iCellCounter + 1
End If
Loop While bFound
End With

Related

How to print row of found string?

I'd like to find several strings within Word document and for each string found, I like to print (debug.print for example) the whole row content where the string is found, not the paragraph.
How can I do this? Thanks
Sub FindStrings
Dim StringsArr (1 to 3)
StringsArr = Array("string1","string2","string3")
For i=1 to 3
With
Selection.Find
.ClearFormatting
.Text = Strings(i)
Debug.Print CurrentRow 'here I need help
End With
Next
End Sub
The term Row in Word is used only in the context of a table. I assume the term you mean is Line, as in a line of text.
The Word object model has no concept of "line" (or "page") due to the dynamic layout algorithm: anything the user does, even changing the printer, could change where a line or a page breaks over. Since these things are dynamic, there's no object.
The only context where "line" can be used is in connection with a Selection. For example, it's possible to extend a Selection to the start and/or end of a line. Incorporating this into the code in the question it would look something like:
Sub FindStrings()
Dim StringsArr As Variant
Dim bFound As Boolean
Dim rng As Word.Range
Set rng = ActiveDocument.content
StringsArr = Array("string1", "string2", "string3")
For i = LBound(StringsArr) To UBound(StringsArr)
With rng.Find
.ClearFormatting
.Text = StringsArr(i)
.Wrap = wdFindStop
bFound = .Execute
'extend the selection to the start and end of the current line
Do While bFound
rng.Select
Selection.MoveStart wdLine, -1
Selection.MoveEnd wdLine, 1
Debug.Print Selection.Text
rng.Collapse wdCollapseEnd
bFound = .Execute
Loop
End With
Set rng = ActiveDocument.content
Next
End Sub
Notes
Since it's easier to control when having to loop numerous times, a Range object is used as the basic search object, rather than Selection. The found Range is only selected for the purpose of getting the entire line as these "Move" methods for lines only work on a Selection.
Before the loop can continue, the Range (or, if we were working with a selection, the selection) needs to be "collapsed" so that the code does not search and find the same instance of the search term, again. (This is also the reason for Wrap = wdFindStop).

Adding matching comments to a word document for every match on a list of terms

I have a document with various terms, which are used throughout the document, and their definitions.
Format: term = term definition
I want to be able to add terms and definitions to this this list and programatically add a comment which states the definition for all instances of that term in the rest of the document to have the definition on hand whilst I read the document.
For example:
[somewhere in the document].... "Term" ....[rest of paragraph]
Highlight Term and add comment with the definition from the list of terms and definitions.
I'm hoping I've explained this in sufficient details but please do let me know if you need anything else clarified. Many thanks in advance for any help on this.
If you use a two-column table for your Terms and Definitions, you could use a macro like the following:
Sub Demo()
Application.ScreenUpdating = False
Dim strFnd As String, strTip As String, r As Long
With ActiveDocument
For r = 2 To .Tables(1).Rows.Count
strFnd = Split(.Tables(1).Cell(r, 1).Range.Text, vbCr)(0)
strTip = Split(.Tables(1).Cell(r, 2).Range.Text, vbCr)(0)
With .Range(.Tables(1).Range.End, .Range.End)
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Format = False
.Text = strFnd
.Wrap = wdFindStop
.MatchWholeWord = True
.MatchWildcards = False
.MatchCase = True
.Execute
End With
Do While .Find.Found
.Hyperlinks.Add Anchor:=.Duplicate, Address:=.Duplicate, ScreenTip:=strTip, TextToDisplay:=.Text
.Start = .Hyperlinks(1).Range.End
.Find.Execute
Loop
End With
Next
End With
Application.ScreenUpdating = True
End Sub
Note that the macro assumes: (a) the Terms and Definitions is the first table in the document, with a separate row for each Term and its Definition, and only terms after that table are to be checked; (b) only exact matches are to be processed (meaning plurals will be skipped); (c) the Terms are in the table's first column and do not have double quotes around them - quoted terms could be catered for, but we'd have to know whether you're using smart quotes or plain quotes; and (d) the Terms and Definitions occupy only the first paragraph in their cells.
While your question mentions that you want to add a comment for each term, I think it would be cleaner to add hyperlinks instead. This should be an acceptable replacement because hyperlinks have a ScreenTip which displays when you mouse over a link. Therefore we just have to add the definition for each term as a ScreenTip for it to be just as accessible as a comment.
My code below will temporarily save the terms and definitions from the first table in the document as a pair and then loop over all words in the document and add a link for each term that has the definition as a ScreenTip.
If you only have the excel list of terms and their definitions, you can just paste it back over to the beginning of your contract as a table. As long as you have it set up as [term][definition] this should do the trick.
Example Before
Example After
Sub AddDefinitionHyperlink()
Dim defined As Object
Set defined = CreateObject("Scripting.Dictionary")
For Each Row In ActiveDocument.Tables(1).Rows
'left cell
Dim term As String
term = Trim(Left(Row.Cells(1).Range.Text, Len(Row.Cells(1).Range.Text) - 2))
'right cell
Dim definition As String
definition = Trim(Left(Row.Cells(2).Range.Text, Len(Row.Cells(2).Range.Text) - 2))
'connect term and definition
defined.Add LCase(term), definition
If Len(term) > 0 And Len(definition) > 0 Then
'add bookmarks for each word
With ActiveDocument.Bookmarks
If Not .Exists(term) Then
.Add Range:=Row.Cells(1).Range, Name:=term
.DefaultSorting = wdSortByName
.ShowHidden = False
End If
End With
End If
Next Row
'browse all words in the document
For Each para In ActiveDocument.Paragraphs
For Each wrd In para.Range.Words
'check if current word has a definition (bookmark)
If ActiveDocument.Bookmarks.Exists(wrd.Text) Then
If wrd.Hyperlinks.count = 0 Then
'add mouseover definition (screentip) to current term
ActiveDocument.Hyperlinks.Add _
Anchor:=wrd, _
Address:="", _
SubAddress:=wrd.Text, _
ScreenTip:=defined(LCase(wrd.Text)), _
TextToDisplay:=wrd.Text
End If
End If
Next wrd
Next para
End Sub

Paste and Merge Formatting in Word

I am trying to work on a paste and merge formatting macro in microsoft word. I am constantly copying from a website and then pasting from that website into Word. Unfortunately, the website format is always:
Text
Citation
I want the format to be:
"Text." Citation.
My code currently is
Sub Paste_Citation()
' Paste_Citation Macro
On Error Resume Next
Selection.PasteAndFormat (wdFormatSurroundingFormattingWithEmphasis)
Selection.TypeBackspace
Selection.TypeBackspace
Selection.TypeText Text:="."
End Sub
I cannot figure out how to a) to not have a paragraph space between the text and the citation, b)put parentheses around the text. If the text does not have a period at the end, then format like "blah blah". Otherwise, "blah blah." and c) not include the text if the text only includes spaces and a period. I know that I need to do an if statement for c, but I am not very familiar with VBA in Word. Could someone walk through the process with me?
Edit #1
x = Selection.PasteAndFormat(wdFormatSurroundingFormattingWithEmphasis)
'trying to set x equal to the pasted formatted value
Dim Txt As String
Dim Cit As String
Txt = Split(x, Chr(182))
'trying to split the text based on the paragraph symbol
'I am confused on what I need to do from there
Edit #2
I have been working on it longer, and I think that I am pretty close. I figured out how to paste the info and merge the formats, as well as how to delete the paragraph break. My issue now is that I cannot figure out how to have it put quotations around the first paragraph, bring the cursor to the end of the paste, and add a period at the end.
Sub PasteCitation()
'Modified code from http://www.vbaexpress.com/forum/archive/index.php/t-46321.html
Application.ScreenUpdating = False
Dim Txt As Range
Set Txt = Selection.Range
Txt.PasteAndFormat (wdFormatSurroundingFormattingWithEmphasis)
With Txt.Find
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
'Replace single paragraph breaks with a space
.Text = "([!^13])([^13])([!^13])"
.Replacement.Text = "\1 \3"
.Execute Replace:=wdReplaceAll
End With
Application.ScreenUpdating = True
End Sub
The red line of your programming should be to assign the downloaded 'Text' and 'Citation' to variables, say
Dim Txt As String
Dim Cit As String
Txt = "Whatever you scraped from tre website"
Cit = "Whatever else you scraped from the web"
With that accomplished you can start manipulating each of the strings and then place them in your document where you want, perhaps at the location of your current selection.
When you follow this method you may come to individual questions which can be answered quickly. Meanwhile, avoid the use of On Error Resume Next while you don't know what error might crop up and how you want to deal with it.

Word VBA: finding a set of words and inserting predefined comments

I need to automate the insertion of comments into a word document: searching for a predefined set of words (sometimes word strings, and all non case-sensitive) each to which I add a predefined comment.
There are two word sets, with two goals:
Wordset 1: identical comment for each located word
Wordset 2: individual comments (I suggest new text based on the word identified)
I have been semi-automating this with a code that IDs all identified words and highlights them, helping me through the process (but I still need to enter all the comments manually - and I've also been able to enter comments - but only on one word at a time.) As my VBA skills are limited, my attempts to compile a robust macro from bits of other code with similar purposes has unfortunately led me nowhere.
Below are the bits of code I've been using.
Sub HighlightWordList()
Dim range As range
Dim i As Long
Dim TargetList
TargetList = Array("word1", "word2", "word3")
For i = 0 To UBound(TargetList)
Set range = ActiveDocument.range
With range.Find
.Text = TargetList(i)
.Format = True
.MatchCase = True
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
Do While .Execute(Forward:=True) = True
range.HighlightColorIndex = wdYellow
Loop
End With
Next
End Sub
The following code has been able to get me to insert bubbles directly
Sub CommentBubble()
'
'
Dim range As range
Set range = ActiveDocument.Content
Do While range.Find.Execute("Word x") = True
ActiveDocument.Comments.Add range, "my comment to enter in the bubble"
Loop
End Sub
I've tried to have the process repeat itself by doing as shown below, but for reasons I'm certain are evident to many of you (and completely unknown to me) - this strategy has failed, working for "word x" but failing to function for all subsequent words:
Sub CommentBubble()
'
'
Dim range As range
Set range = ActiveDocument.Content
Do While range.Find.Execute("Word x") = True
ActiveDocument.Comments.Add range, "my 1st comment to enter in the bubble"
Loop
Do While range.Find.Execute("Word y") = True
ActiveDocument.Comments.Add range, "my 2nd comment to enter in the bubble"
Loop
End Sub
I've mixed and matched bits of these codes to no avail. Any ideas to help me with either wordset?
Thanks for everyone's help!
Best regards
Benoit, you're almost there! All you need to do is redefine the range object after your first loop (because it would have been exhausted at that point). Like so:
Sub CommentBubble()
Dim rng As range
Set rng = ActiveDocument.Content
Do While rng.Find.Execute("Word x") = True
ActiveDocument.Comments.Add rng, "my 1st comment to enter in the bubble"
Loop
Set rng = ActiveDocument.Content ' <---------------Add This.
Do While rng.Find.Execute("Word y") = True
ActiveDocument.Comments.Add rng, "my 2nd comment to enter in the bubble"
Loop
End Sub
That should do the trick for you (it works on my end). If not, let me know.

Search sentence and replace for hyperlink in Word VB.NET

I trying to replace or search and add hyperlink to specyfy sentence in Word document. I try using this codes. Anyway code is changing only first finding word, not all in document:
Dim r As Word.Range
r = Globals.ThisAddIn.Application.ActiveDocument.Content
With r.Find
.ClearFormatting()
.Text = ("MyWordA MyWordB")
.MatchWholeWord = True
.Forward = True
.Execute()
'If .Found = True Then r.Hyperlinks.Add(r, "http:\\www.whatever", , "Displayed text")
Do While .Execute(Forward:=True) = True
r.Hyperlinks.Add(r, "http:\\www.whatever", , "Displayed text")
'r.Font.ColorIndex = Word.WdColorIndex.wdBlue 'works for all(?)
Loop
End With
Eaven when I want to find only single word in loop for, then code find first one:
doc = Globals.ThisAddIn.Application.ActiveDocument
Dim r As Word.Range = doc.Range
Dim ww As Word.Range
For Each ww In r.Words
If ww.Text = "MyWord" Then _
ww.Hyperlinks.Add(ww, "http:\\www.whatever", , "Displayed text")
Next
Anyone could tell me how I can search all text to replace/add hyperlinks to all text I was looking for?
The problem is that you keep finding the same text over and over again. Within your loop, after adding the hyperlink, you need to move the range after the added hyperlink. The simplest way to do this is to collapse the range by calling
r.Collapse(WdCollapseDirection.wdCollapseEnd)
To troubleshoot issues like this it is helpful to select the current range so that you can see what is going on.
Do While .Execute(Forward:=True) = True
' select range for troubleshooting
r.Select()
r.Hyperlinks.Add(r, "http:\\www.whatever", , "Displayed text")
' move the range after the link
r.Collapse(WdCollapseDirection.wdCollapseEnd)
Loop