Word Macro to Mass Hyperlink variable length strings - vba

I've been looking through the forums for a while now trying to find an answer to my problem, and either I'm dense or it hasn't been answered, so here I am.
Long story short, my job involves writing up word documents that list building deficits and provides hyperlinks to images of said deficits. The visible hyperlink text always follows the same format: '[site abbreviation][(image number)].JPG'. For example, if we are looking at 'Administrative Building', our images will be named 'AB(1).JPG', 'AB(2).JPG', etc, often into the mid-hundreds or thousands. In the word document, they are referenced as 'AB1', 'AB2' etc.
Currently, I have a macro that allows me to automatically create a hyperlink once I've selected the text, but I am trying to create a macro that will look through a document (or better yet, a highlighted selection) and assign hyperlinks to any text that starts with the site's abbreviation all at once.
My current attempt at a mass-hyperlinking macro is frustratingly close, but has one major error: while it will correctly hyperlink the first image name it finds, all subsequent images are linked with the next two characters included in the link. For example, if a sentence were to say "This is not correct (AB33), but this is correct (AB34)', my macro will hyperlink the text 'AB34' (which is correct) and 'AB33) ' (which is incorrect).
This is the macro I've been working with thus far (note that the text between the lines of 'XXXX...' are basic instructions for my coworkers to change the link destination as needed)
Option Explicit
Sub Mass_Hyperlink_v_1_1()
'incomplete: selects incorrect text after first link
Dim fileName As String
Dim filePath As String
Dim rng As Range
Dim tag As String
Dim FileType As String
Dim folder As String
Dim space As String
Dim start As String
Dim report_type As String
Dim temp As String
'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
'Do not touch anything above this line
'Answer the following for the current document. Leave all quotations.
report_type = "CL" 'CL = Checklist
'SR = Site Report
folder = "Doors" 'The name of the folder you are linking images from
'Must match folder exactly
tag = "FS" 'Put file prefix here (ex. if link says "AB123", put "AB")
space = "No" 'Does the image file have a space in it? (ex. if file name is "AB (23)", put "yes")
FileType = ".JPG" 'make sure filetype extensions match
'Do not touch anything below this line
'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
If space = "Yes" Then
start = "%20("
Else: start = "("
End If
If report_type = "CL" Then
folder = "..\Images\" & folder
Else: folder = folder
End If
If report_type = "SR" Then
folder = "Images\" & folder
Else: folder = folder
End If
Set rng = ActiveDocument.Range
With rng.find
.MatchWildcards = True
Do While .Execute(findText:=tag, Forward:=False) = True
rng.MoveStartUntil (tag)
rng.Select
Selection.Extend
Selection.MoveRight Unit:=wdWord, Count:=1, Extend:=wdExtend
'I believe the issue is created here
Selection.start = Selection.start + Len(tag)
ActiveDocument.Range(Selection.start - Len(tag), Selection.start).Delete
fileName = Selection.Text
filePath = folder & "\" & tag & start & fileName & ")" & FileType
ActiveDocument.Hyperlinks.Add Anchor:=Selection.Range, address:= _
filePath, SubAddress:="", ScreenTip:="", TextToDisplay:= _
tag & Selection.Text
rng.Collapse wdCollapseStart
Loop
End With
End Sub
If I've explained this terribly wrong or not provided enough information, please let me know and I'll try to be more clear. And if there is a helpful resource that I'm simply too dense to have found, please let me know! thank you!
edit: if anyone knows how to only select words that start with the tag as opposed to words with the tag text in them, I'd be incredibly appreciative as well!

If you want to match a fixed tag followed by a variable number of digits:
Sub Tester()
TagMatches ActiveDocument, "AB"
End Sub
Sub TagMatches(doc As Document, tag As String)
Dim rng
Set rng = doc.Range
With rng.Find
.Text = tag & "[0-9]{1,}"
.Forward = True
.MatchWildcards = True
Do While .Execute
Debug.Print rng.Text
Loop
End With
End Sub
See: http://word.mvps.org/faqs/general/usingwildcards.htm

Related

Extracting images from Word document using VBA

I need to loop over some word documents, and extract images from a word document and save them in a separate folder.
I've tried the method of saving them as an HTML document, but it is not a good fit for my requirement.
Now, I'm looping through the images using inlineshapes object and then copy-pasting them on a publisher document and then saving them as an image. However, I'm facing a Runtime Automation error when I'm running the script.
For using the Publisher runtime library I've tried both early and late binding but I'm facing the error on both of them.
Can anyone please let me know what is the problem? Also, if anyone can explain why I'm facing this error, that'd be great. As per my understanding, it is due to memory allocation, but I'm not sure.
Here is the code block that I've been working on (fp, dp are folder paths, while filename is the word document name. I'm calling this sub in another sub that is looping over all the files in a folder):
Sub test(ByVal fp As String, ByVal dp As String, ByVal filename As String)
Dim doc As Document
Dim pubdoc As New Publisher.Document
Dim shp As InlineShape
'Application.Screenupdating = False
'Dim pubdoc As Object
'Set pubdoc = CreateObject("Publisher.Document")
Set doc = Documents.Open(fp)
With doc
i = .InlineShapes.Count
Debug.Print i
End With
For j = 1 To i
Set shp = doc.InlineShapes(j)
shp.Select
Selection.CopyAsPicture
pubdoc.Pages(1).Shapes.Paste
pubdoc.Pages(1).Shapes(1).SaveAsPicture (dp & Application.PathSeparator & j & ".jpg")
pubdoc.Pages(1).Shapes(1).Delete
Next
doc.Close (wdDoNotSaveChanges)
pubdoc.Close
'Application.Screenupdating = True
End Sub
Apart from this, if anyone has any suggestions to make this faster, I'm all ears. Thanks in advance!
Just add .zip to the end of the file name, expand the file and look in the word/media folder. All the files will be there, no programming necessary.
Extracting the pictures from a Filtered HTML document that was created from your original source document would be faster. However, you said that was not a good fit for you needs so ... here is example code that will locate pictures in your source document and paste them into a second document.
The speed problem of this type of code is caused by the CopyPicture working from a Selection command, so I recommend using a range instead. Of course the For/Next loop that is required is slower no matter what.
Sub CopyPasteAsPicture()
Dim doc As Word.Document, iShp As Word.InlineShape, shp As Word.Shape
Dim i As Integer, nDoc As Word.Document, rng As Word.Range
Set doc = ActiveDocument
If doc.Shapes.Count > 0 Then
For i = 1 To doc.Shapes.Count
Set shp = doc.Shapes(i)
If shp.Type = msoLinkedPicture Or shp.Type = msoPicture Then
'if you want only pictures extracted then you have
'to specify the type
shp.ConvertToInlineShape
'if you want all extracted pictures to be in the sequence
'they appear in the document then you have to convert
'floating shapes to inline shapes
End If
Next
End If
If doc.Content.InlineShapes.Count > 0 Then
Set nDoc = Word.Documents.Add
Set rng = nDoc.Content
For i = 1 To doc.Content.InlineShapes.Count
doc.Content.InlineShapes(i).Range.CopyAsPicture
rng.Paste
rng.Collapse Word.WdCollapseDirection.wdCollapseEnd
rng.Paragraphs.Add
rng.Collapse Word.WdCollapseDirection.wdCollapseEnd
Next
End If
End Sub
If you want to place all shapes (floating or inline) into a folder as image files, then the best way is to save the source document as a filtered HTML document. Here is the command:
htmDoc.SaveAs2 FileName:=LGPWorking & strFileName, AddToRecentFiles:=False, FileFormat:=Word.WdSaveFormat.wdFormatFilteredHTML
In the above the active document is assigned to the variable htmDoc. I am giving this new document a specific name and location. The output from this is not only the HTML file but also a directory by the same name with an appended "_Files" label. In the "x_Files" directory are all the image files.
If you only want selective images pulled from your original source document, or if you want images pulled from multiple source documents ... then you need to use the above code that I shared for placing only the images you want from one or more source document into a new Word document and then save that new document as an Filtered HTML.
When your routine is done, you can Kill the HTML document and only leave the Files directory.
I had to change a few things around, but this will allow to save a single image on a word document and go through a couple of cycles before it turns into a jpg on the other side, without any white space
filename = ActiveDocument.FullName
saveLocaton = "z:\temp\"
FolderName = "test"
On Error Resume Next
Kill "z:\temp\test_files\*" 'Delete all files
RmDir "z:\temp\test_files" 'Delete folder
ActiveDocument.SaveAs2 filename:="z:\temp\test.html", FileFormat:=wdFormatHTML
ActiveDocument.Close
Kill saveLocaton & FolderName & ".html"
Kill saveLocaton & FolderName & "_files\*.xml"
Kill saveLocaton & FolderName & "_files\*.html"
Kill saveLocaton & FolderName & "_files\*.thmx"
Name saveLocaton & FolderName & "_files\image00" & 1 & ".png" As saveLocaton & FolderName & "_files\" & test2 & "_00" & x & ".jpg"
Word.Application.Visible = True
Word.Application.Activate

Find and Replace Multiple Text Strings on Multiple Text Files from a folder

I'm working on the vba code to accomplish the following tasks
Word Document Open a Text file from the folder
Find and replace the text (multiple Text) based on a excel sheet (which have find what and replace with)
Process all text files in the folder and save it.
I would like to customize the below code for the above requirement,
I'm using Office 2016 and I think I have to replace Application.FileSearch in the script to ApplicationFileSearch for 2003 and prior office editions.
I try to accomplish using the word macro recorder and also used notepad++, I've recorded in Notepad++ also and it works for one file, I would like to do batch process all files in the folder and save it after replacing the text.
As there is too many lines there to replace more than 30 or more lines to replace, I would like the code to look for the text from a excel file like find what and replace with columns.
Sub FindReplaceAllDocsInFolder( )
Dim i As Integer
Dim doc As Document
Dim rng As Range
With Application.FileSearch
.NewSearch
.LookIn = "C:\My Documents"
.SearchSubFolders = False
.FileType = msoFileTypeWordDocuments
If Not .Execute( ) = 0 Then
For i = 1 To .FoundFiles.Count
Set doc = Documents.Open(.FoundFiles(i))
For Each rng In doc.StoryRanges
With rng.Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "Dewey & Cheatem"
.Replacement.Text = "Dewey, Cheatem & Howe"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Execute Replace:=wdReplaceAll
End With
Next rng
doc.Save
doc.Close
Set rng = Nothing
Set doc = Nothing
Next i
Else
MsgBox "No files matched " & .FileName
End If
End With
End Sub
Thanks
Jay
Borrowed from https://social.msdn.microsoft.com/Forums/en-US/62fceda5-b21a-40b6-857c-ad28f12c1b23/use-excel-vba-to-open-a-text-file-and-search-it-for-a-specific-string?forum=isvvba
Sub SearchTextFile()
Const strFileName = "C:\MyFiles\TextFile.txt"
Const strSearch = "Some Text"
Dim strLine As String
Dim f As Integer
Dim lngLine As Long
Dim blnFound As Boolean
f = FreeFile
Open strFileName For Input As #f
Do While Not EOF(f)
lngLine = lngLine + 1
Line Input #f, strLine
If InStr(1, strLine, strSearch, vbBinaryCompare) > 0 Then
MsgBox "Search string found in line " & lngLine, vbInformation
blnFound = True
Exit Do
End If
Loop
Close #f
If Not blnFound Then
MsgBox "Search string not found", vbInformation
End If
End Sub
This is simply just finding the match. You can use the built in function "Replace" which land the total fix. You would also have to fit in the "loop through files" code, which here is a snippet.
Dim StrFile As String
StrFile = Dir(pathtofile & "\*" & ".txt")
Do While Len(StrFile) > 0
Debug.Print StrFile
StrFile = Dir
Loop
I wouldve made this a comment, but it was too much text. This isnt meant to be a full blown answer, just giving you the pieces you need to put it all together on your own.
Thanks for all your help. I have found alternate solution using the below EXE. FART.exe (FART - Find and Replace Text). I have create a batch file with the below command example.
https://emtunc.org/blog/03/2011/farting-the-easy-way-find-and-replace-text/
http://fart-it.sourceforge.net/
Examples:
fart "C:\APRIL2011\Scripts\someFile.txt" oldText newText
This line instructs FART to simply replace the string oldText with newText.
fart -i -r "C:\APRIL2011\Scripts*.prm" march2011 APRIL2011
This line will instruct FART to recursively (-r) search for any file with the .prm extension in the \Scripts folder. The -i flag tells FART to ignore case-sensitivity when looking for the string.

Text Replacement in MS Word

In my MS-Word template, my client sends template with an error in the header section, as below,
VERMONT, ROBIN S. Date: 10/21/2017
File No: 312335 DOB: 05/02/1967Claim#: RE155B53452
DOI: 06/21/2017
The error being the ‘Claim#’ coming up right next to the DOB value, while it should come on the next line, as below:
VERMONT, ROBIN S. Date: 10/21/2017
File No: 312335 DOB: 05/02/1967
Claim#: RE155B53452 DOI: 06/21/2017
Also, this error comes up occasionally in some files and not all, so to solve it, I created a word macro, as below:
Sub ShiftClaimNumber2NextLine()
Dim rngStory As Word.Range
Dim lngJunk As Long
'Fix the skipped blank Header/Footer problem as provided by Peter Hewett
lngJunk = ActiveDocument.Sections(1).Headers(1).Range.StoryType
'Iterate through all story types in the current document
For Each rngStory In ActiveDocument.StoryRanges
'Iterate through all linked stories
Do
With rngStory.Find
.text = "Claim#:"
.Replacement.text = "^pClaim#:"
.Wrap = wdFindContinue
.Execute Replace:=wdReplaceAll
End With
'Get next linked story (if any)
Set rngStory = rngStory.NextStoryRange
Loop Until rngStory Is Nothing
Next
End Sub
And I run the above macro through a Group Macro, as below:
Sub ProcessAllDocumentsInFolder()
Dim file
Dim path As String
' Path to folder
path = "D:\Test\"
file = Dir(path & "*.doc")
Do While file <> ""
Documents.Open FileName:=path & file
' Call the macro to run on each file in the folder
Call ShiftClaimNumber2NextLine
' Saves the file
ActiveDocument.Save
ActiveDocument.Close
' set file to next in Dir
file = Dir()
Loop
End Sub
When I run the macro ProcessAllDocumentsInFolder(), for all files with such an error, it shifts the ‘Claim#’ to the next line.
However, the problem is, it also does so for files that do not have such a problem, thereby adding one enter line below the DOB, as below (as depicted by the yellow line):
What changes should I make to my macro ShiftClaimNumber2NextLine() , so that it does not make any change to files which DO NOT HAVE the ‘Claim#’ problem ?
This is a situation best suited to regular expressions see below link for similar issue, refer vba solution
https://superuser.com/questions/846646/is-there-a-way-to-search-for-a-pattern-in-a-ms-word-document
Sub RegexReplace()
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
On Error Resume Next
RegEx.Global = True
RegEx.Pattern = "[0-9]Claim#:"
ActiveDocument.Range = _
RegEx.Replace(ActiveDocument.Range, "\rClaim#:")
End Sub

How to extract / delete first word of each page?

I did a mailmerge to create dynamic word pages with customer information.
Then I did (by looking on the net) a macro to split the result file into several pages, each page being saved as one file.
Now I'm looking to give those files some names containing customer info. I googled that and I think the (only?) way is to create a mergefield with that info, at the very beginning of the page, then extract and delete it from the page with a macro to put it in file names.
Example: If I have a customer named Stackoverflow I would like to have a file named Facture_Stackoverflow.doc.
I found nowhere how to select, extract and then delete this first word from my page.
Here is my "splitting macro", which currently names the files just with an incremented ID:
Sub DecouperDocument()
Application.Browser.Target = wdBrowsePage
For i = 1 To ActiveDocument.BuiltInDocumentProperties("Number of Pages")
ActiveDocument.Bookmarks("\page").Range.Copy
Documents.Add
Selection.Paste
Selection.TypeBackspace
ChangeFileOpenDirectory "C:\test\"
DocNum = DocNum + 1
ActiveDocument.SaveAs FileName:="Facture_" & DocNum & ".doc"
ActiveDocument.Close
Application.Browser.Next
Next i
ActiveDocument.Close savechanges:=wdDoNotSaveChanges
End Sub
The function below will enable you to extract the first word (and optionally remove it) of a Word document.
Public Function GetFirstWord(Optional blnRemove As Boolean = True) As String
Dim rng As Range
Dim intCharCount As Integer
Dim strWord As String
With ThisDocument
Set rng = .Characters(1)
intCharCount = rng.EndOf(wdWord, wdMove)
With .Range(0, intCharCount - 1)
strWord = .Text
If blnRemove Then
.Delete
End If
End With
End With
GetFirstWord = strWord
End Function
I hope this helps.

Word macro that adds the file name to the body text of a series of Word documents in a folder

I need a Word macro that adds the file name of a document to the first line of that Word document. But not just one document at a time. I would like the Macro to add these file names to a series of Word documents in a particular folder (with each document getting their own file name).
A Macro that adds the file name to document is simple:
Selection.HomeKey Unit:=wdStory
Selection.Fields.Add Range:=Selection.Range, Type:=wdFieldEmpty, _
Text:= "FILENAME "
Selection.TypeParagraph
But how can I get it to add file names to an entire series of documents in a folder? I suppose the folder could be named in the Macro. For example: C:\Users\username\Desktop\somefolder\. I also suppose that a Loop could be used to go through the folder until the loop gets to the end of the documents in the folder.
This should at least get you started. I haven't tested it, but I've written this type of thing a few times before, so the basic idea works.
Dim filePath As String
Dim thisDoc As Document
Dim wordApp As Word.Application
Set wordApp = New Word.Application
'wordApp.Visible = True ' uncomment if you want to watch things happen
'Get first file that matches
filePath = Dir("C:\Users\username\Desktop\somefolder\*.doc") ' or *.whatever
Do While filePath <> ""
Set thisDoc = wordApp.Documents.Open(filePath)
'Add filename at top of document
Selection.HomeKey Unit:=wdStory
Selection.InsertAfter filePath
'Selection.InsertAfter ThisDocument.FullName ' alternative
'Your way using .Fields.Add is also fine if you want the file name as
' a field instead of
' plain text.
thisDoc.Close SaveChanges:=True
'Get next file that matches and start over
filePath = Dir()
Loop
wordApp.Quit
Set wordApp = Nothing