Extracting images from Word document using VBA - vba

I need to loop over some word documents, and extract images from a word document and save them in a separate folder.
I've tried the method of saving them as an HTML document, but it is not a good fit for my requirement.
Now, I'm looping through the images using inlineshapes object and then copy-pasting them on a publisher document and then saving them as an image. However, I'm facing a Runtime Automation error when I'm running the script.
For using the Publisher runtime library I've tried both early and late binding but I'm facing the error on both of them.
Can anyone please let me know what is the problem? Also, if anyone can explain why I'm facing this error, that'd be great. As per my understanding, it is due to memory allocation, but I'm not sure.
Here is the code block that I've been working on (fp, dp are folder paths, while filename is the word document name. I'm calling this sub in another sub that is looping over all the files in a folder):
Sub test(ByVal fp As String, ByVal dp As String, ByVal filename As String)
Dim doc As Document
Dim pubdoc As New Publisher.Document
Dim shp As InlineShape
'Application.Screenupdating = False
'Dim pubdoc As Object
'Set pubdoc = CreateObject("Publisher.Document")
Set doc = Documents.Open(fp)
With doc
i = .InlineShapes.Count
Debug.Print i
End With
For j = 1 To i
Set shp = doc.InlineShapes(j)
shp.Select
Selection.CopyAsPicture
pubdoc.Pages(1).Shapes.Paste
pubdoc.Pages(1).Shapes(1).SaveAsPicture (dp & Application.PathSeparator & j & ".jpg")
pubdoc.Pages(1).Shapes(1).Delete
Next
doc.Close (wdDoNotSaveChanges)
pubdoc.Close
'Application.Screenupdating = True
End Sub
Apart from this, if anyone has any suggestions to make this faster, I'm all ears. Thanks in advance!

Just add .zip to the end of the file name, expand the file and look in the word/media folder. All the files will be there, no programming necessary.

Extracting the pictures from a Filtered HTML document that was created from your original source document would be faster. However, you said that was not a good fit for you needs so ... here is example code that will locate pictures in your source document and paste them into a second document.
The speed problem of this type of code is caused by the CopyPicture working from a Selection command, so I recommend using a range instead. Of course the For/Next loop that is required is slower no matter what.
Sub CopyPasteAsPicture()
Dim doc As Word.Document, iShp As Word.InlineShape, shp As Word.Shape
Dim i As Integer, nDoc As Word.Document, rng As Word.Range
Set doc = ActiveDocument
If doc.Shapes.Count > 0 Then
For i = 1 To doc.Shapes.Count
Set shp = doc.Shapes(i)
If shp.Type = msoLinkedPicture Or shp.Type = msoPicture Then
'if you want only pictures extracted then you have
'to specify the type
shp.ConvertToInlineShape
'if you want all extracted pictures to be in the sequence
'they appear in the document then you have to convert
'floating shapes to inline shapes
End If
Next
End If
If doc.Content.InlineShapes.Count > 0 Then
Set nDoc = Word.Documents.Add
Set rng = nDoc.Content
For i = 1 To doc.Content.InlineShapes.Count
doc.Content.InlineShapes(i).Range.CopyAsPicture
rng.Paste
rng.Collapse Word.WdCollapseDirection.wdCollapseEnd
rng.Paragraphs.Add
rng.Collapse Word.WdCollapseDirection.wdCollapseEnd
Next
End If
End Sub
If you want to place all shapes (floating or inline) into a folder as image files, then the best way is to save the source document as a filtered HTML document. Here is the command:
htmDoc.SaveAs2 FileName:=LGPWorking & strFileName, AddToRecentFiles:=False, FileFormat:=Word.WdSaveFormat.wdFormatFilteredHTML
In the above the active document is assigned to the variable htmDoc. I am giving this new document a specific name and location. The output from this is not only the HTML file but also a directory by the same name with an appended "_Files" label. In the "x_Files" directory are all the image files.
If you only want selective images pulled from your original source document, or if you want images pulled from multiple source documents ... then you need to use the above code that I shared for placing only the images you want from one or more source document into a new Word document and then save that new document as an Filtered HTML.
When your routine is done, you can Kill the HTML document and only leave the Files directory.

I had to change a few things around, but this will allow to save a single image on a word document and go through a couple of cycles before it turns into a jpg on the other side, without any white space
filename = ActiveDocument.FullName
saveLocaton = "z:\temp\"
FolderName = "test"
On Error Resume Next
Kill "z:\temp\test_files\*" 'Delete all files
RmDir "z:\temp\test_files" 'Delete folder
ActiveDocument.SaveAs2 filename:="z:\temp\test.html", FileFormat:=wdFormatHTML
ActiveDocument.Close
Kill saveLocaton & FolderName & ".html"
Kill saveLocaton & FolderName & "_files\*.xml"
Kill saveLocaton & FolderName & "_files\*.html"
Kill saveLocaton & FolderName & "_files\*.thmx"
Name saveLocaton & FolderName & "_files\image00" & 1 & ".png" As saveLocaton & FolderName & "_files\" & test2 & "_00" & x & ".jpg"
Word.Application.Visible = True
Word.Application.Activate

Related

VBA Issues with Inserting Multiple PDF Objects Within a Loop

My set-up is that I have a bunch of blank templates in a folder. Inside each blank template is a fund code (it is the only thing in the template)
The below macro I created (in an external workbook) goes through the folder with the templates, opens each template, and "fills it out" via a loop.
Basically my macro opens each template, assigns the fund code to a variable and then uses that variable in combination with some text strings to pull in other worksheets/PDF objects related to that specific fund code.
My issue is that in a more meaty version of the below code, I added maybe four or five more PDF objects to insert. It'll go through some of the templates and then randomly stop on a random fund code at a random pdf object insert line saying either "object cannot be found" or "object cannot be inserted"
If I press debug and then press F8 to run that line again, it is able to insert the object no problem. So perhaps my code is running too fast for adobe to handle? I am unsure. Perhaps my code isn't doing things as efficiently as possible. This would save sooo much time for my team, I just can't be having it work half the time.
(also the file names have definitely been correct, so that is not an issue)
Public Sub test()
Set currentbook = ActiveWorkbook
Application.AskToUpdateLinks = False
Application.DisplayAlerts = False
Application.ScreenUpdating = False
Dim wbk As Workbook
Dim filename1 As String
Dim Path As String
Dim a As Long
Path = "C:\Users\Bob\Desktop\Workbooks\"
filename1 = Dir(Path & "*.xlsm")
'--------------------------------------------
'OPEN EXCEL FILES
Do While Len(filename1) > 0 'IF NEXT FILE EXISTS THEN
Set wbk = Workbooks.Open(Path & filename1)
wbk.Activate
'Gets Fund Code
Sheets("Initialize").Select
Dim FdCode As String
FdCode = Worksheets("Initialize").Range("D8")
'--------------------------- PDF ADDS
'Add PDF TB----------------------------------------------------
Worksheets("F.a - Working TB").OLEObjects.Add filename:="C:\Users\Bob\Desktop\Raw Reports\R122 04.30.16 - 04.30.17\" & FdCode & " 04.30.16 TB.PDF", Link:=False, DisplayAsIcon:=False, Left:=40, Top:=40, Width:=150, Height:=10
On Error GoTo 0
'Add PDF Closed Options----------------------------------------------------'
Worksheets("T300.1 - Options (Closed)").OLEObjects.Add filename:="C:\Users\Bob\Raw Reports\Other Reports 04.30.16-04.30.17\Breakout\" & FdCode & " other 04.30.17_ CLOSED OPTIONS POSITION REPORT.PDF", Link:=False, DisplayAsIcon:=False, Left:=40, Top:=40, Width:=150, Height:=10
On Error GoTo 0
ActiveWorkbook.Save
wbk.Close False
filename1 = Dir
Loop
Application.ScreenUpdating = True
End Sub

How to keep original word document open when saving copy in HTML?

The problem I'm having is when I run my macro to save the current Word Document as a HTML type, the document still remains open but not in the original .docx format, it's in the .htm format.
If I were to edit the document after the macro is ran, it wouldn't remain on the original .docx format later.
I would appreciate feedback on how to remain in the original format when also saving a copy with a different format. Thanks.
Here is my docx to html code in VBA
Sub DocToHTML()
Dim slice As String
Dim strDocName As String
Dim PathOrg As String
On Error Resume Next
strDocName = ActiveDocument.Name
slice = Left(strDocName, InStrRev(strDocName, ".") - 1)
strDocName = ActiveDocument.Path + "\" + slice
ActiveDocument.SaveAs2 FileName:=strDocName, FileFormat:=wdFormatHTML
End Sub
Before you write code to do things like this stop and think how you would do it in the UI without code. Any code that you write will simply automate that process.
So what would you do in the UI?
Save the original document to preserve any changes that you have made.
Save a copy as html.
Reopen the original document.
Possibly close the html version.
So your code can be rewritten as follows:
Sub DocToHTML()
Dim origName As String
Dim saveName As String
Dim docHTML As Document
If Not ActiveDocument.Saved Then ActiveDocument.Save
origName = ActiveDocument.FullName
saveName = Left(origName, InStrRev(origName, ".") - 1)
ActiveDocument.SaveAs2 FileName:=saveName, FileFormat:=wdFormatHTML
Set docHTML = ActiveDocument
Documents.Open origName
docHTML.Close wdDoNotSaveChanges
End Sub

VBA check if file exists in sub folders

I am relatively amateur at VBA and am using a code provided by tech on the net.
I have an Excel document with files names in column B (not always one file type) which I am trying to ensure I have copies and the correct revision in a designated folder.
Currently, the code works perfectly for a specific folder location, but the files referenced in the Excel spreadsheet exist in various other folders and thus I need to create a code that can search a main folder and loop through the various sub-folders.
See current code below for reference.
Sub CheckIfFileExists()
Dim LRow As Integer
Dim LPath As String
Dim LExtension As String
Dim LContinue As Boolean
'Initialize variables
LContinue = True
LRow = 8
LPath = "K:\location\main folder\sub folder \sub folder"
LExtension = ".pdf"
'Loop through all column B values until a blank cell is found
While LContinue
'Found a blank cell, do not continue
If Len(Range("B" & CStr(LRow)).Value) = 0 Then
LContinue = True
'Check if file exists for document title
Else
'Place "No" in column E if the file does NOT exist
If Len(Dir(LPath & Range("B" & CStr(LRow)).Value & LExtension)) = 0 Then
Range("E" & CStr(LRow)).Value = "No"
'Place "Yes" in column E if the file does exist
Else
Range("E" & CStr(LRow)).Value = "Yes"
End If
End If
LRow = LRow + 1
Wend
End Sub
There are over 1000 documents, so simple windows searches is not ideal, and I have reviewed several previous questions and cannot find an answer that helps.
Okay, my answer is going to revolve around 2 comments from your question. This will serve only as a basis for you to improve upon and adapt to how you need it.
N.B SKIP TO THE BOTTOM OF MY ANSWER TO SEE THE FULL WORKING CODE
The first comment is:
I need to create a code that can search a main folder and loop through the various sub-folders.
The code i will explain below will take a MAIN FOLDER, that you will need to specify, and then it will loop through ALL subfolders of the parent directoy. So you will not need to worry about specific sub folders. As long as you know the name of the file you want to access, the code will find it regardless.
The second is a line of your code:
LPath = "K:\location\main folder\sub folder \sub folder"
This line of code will form part of a UDF (User Defined Function) that i will display below.
Step 1
Re-label LPath to be the what is called the "Host Folder". This is the MAIN FOLDER.
For Example: Host Folder = "K:\User\My Documents\" (Note the backslash at the end is needed)
Step 2
Set a reference to Microsoft Scripting Runtime in 2 places:
i) In the code
Set FileSystem = CreateObject("Scripting.FileSystemObject")
ii) In the VBA Editor. (To a basic google search on how to find the reference library in the VBA editor)
Step 3
This is the main element, this is a sub routine that will find the file no matter where it is, providing a file name and host folder has been provided.
Sub DoFolder(Folder)
Dim SubFolder
For Each SubFolder In Folder.SubFolders
DoFolder SubFolder
Next
Dim File
For Each File In Folder.Files
If File.Name = "Specify Name.pdf" Then
Workbooks.Open (Folder.path & "\" & File.Name), UpdateLinks:=False
Workbooks(File.Name).Activate
Exit Sub
End If
Next
End Sub
The code above will simply open the file once it has found it. This was just my own specific use; adapt as necessary.
MAIN CODE
Option Explicit
Dim FileSystem As Object
Dim HostFolder As String
Sub FindFile()
HostFolder = "K:\User\My Documents\"
Set FileSystem = CreateObject("Scripting.FileSystemObject")
DoFolder FileSystem.GetFolder(HostFolder)
End Sub
Sub DoFolder(Folder)
Dim SubFolder
For Each SubFolder In Folder.SubFolders
DoFolder SubFolder
Next
Dim File
For Each File In Folder.Files
If File.Name = "Specify Name.pdf" Then
Workbooks.Open (Folder.path & "\" & File.Name), UpdateLinks:=False
Workbooks(File.Name).Activate
Exit Sub
End If
Next
End Sub
You can chop this up how you see fit, you can probably throw it into your sub CheckIfFileExists() or just use it on its own.
Let me know how you get along so i can help you understand this further

Word Macro to Mass Hyperlink variable length strings

I've been looking through the forums for a while now trying to find an answer to my problem, and either I'm dense or it hasn't been answered, so here I am.
Long story short, my job involves writing up word documents that list building deficits and provides hyperlinks to images of said deficits. The visible hyperlink text always follows the same format: '[site abbreviation][(image number)].JPG'. For example, if we are looking at 'Administrative Building', our images will be named 'AB(1).JPG', 'AB(2).JPG', etc, often into the mid-hundreds or thousands. In the word document, they are referenced as 'AB1', 'AB2' etc.
Currently, I have a macro that allows me to automatically create a hyperlink once I've selected the text, but I am trying to create a macro that will look through a document (or better yet, a highlighted selection) and assign hyperlinks to any text that starts with the site's abbreviation all at once.
My current attempt at a mass-hyperlinking macro is frustratingly close, but has one major error: while it will correctly hyperlink the first image name it finds, all subsequent images are linked with the next two characters included in the link. For example, if a sentence were to say "This is not correct (AB33), but this is correct (AB34)', my macro will hyperlink the text 'AB34' (which is correct) and 'AB33) ' (which is incorrect).
This is the macro I've been working with thus far (note that the text between the lines of 'XXXX...' are basic instructions for my coworkers to change the link destination as needed)
Option Explicit
Sub Mass_Hyperlink_v_1_1()
'incomplete: selects incorrect text after first link
Dim fileName As String
Dim filePath As String
Dim rng As Range
Dim tag As String
Dim FileType As String
Dim folder As String
Dim space As String
Dim start As String
Dim report_type As String
Dim temp As String
'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
'Do not touch anything above this line
'Answer the following for the current document. Leave all quotations.
report_type = "CL" 'CL = Checklist
'SR = Site Report
folder = "Doors" 'The name of the folder you are linking images from
'Must match folder exactly
tag = "FS" 'Put file prefix here (ex. if link says "AB123", put "AB")
space = "No" 'Does the image file have a space in it? (ex. if file name is "AB (23)", put "yes")
FileType = ".JPG" 'make sure filetype extensions match
'Do not touch anything below this line
'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
If space = "Yes" Then
start = "%20("
Else: start = "("
End If
If report_type = "CL" Then
folder = "..\Images\" & folder
Else: folder = folder
End If
If report_type = "SR" Then
folder = "Images\" & folder
Else: folder = folder
End If
Set rng = ActiveDocument.Range
With rng.find
.MatchWildcards = True
Do While .Execute(findText:=tag, Forward:=False) = True
rng.MoveStartUntil (tag)
rng.Select
Selection.Extend
Selection.MoveRight Unit:=wdWord, Count:=1, Extend:=wdExtend
'I believe the issue is created here
Selection.start = Selection.start + Len(tag)
ActiveDocument.Range(Selection.start - Len(tag), Selection.start).Delete
fileName = Selection.Text
filePath = folder & "\" & tag & start & fileName & ")" & FileType
ActiveDocument.Hyperlinks.Add Anchor:=Selection.Range, address:= _
filePath, SubAddress:="", ScreenTip:="", TextToDisplay:= _
tag & Selection.Text
rng.Collapse wdCollapseStart
Loop
End With
End Sub
If I've explained this terribly wrong or not provided enough information, please let me know and I'll try to be more clear. And if there is a helpful resource that I'm simply too dense to have found, please let me know! thank you!
edit: if anyone knows how to only select words that start with the tag as opposed to words with the tag text in them, I'd be incredibly appreciative as well!
If you want to match a fixed tag followed by a variable number of digits:
Sub Tester()
TagMatches ActiveDocument, "AB"
End Sub
Sub TagMatches(doc As Document, tag As String)
Dim rng
Set rng = doc.Range
With rng.Find
.Text = tag & "[0-9]{1,}"
.Forward = True
.MatchWildcards = True
Do While .Execute
Debug.Print rng.Text
Loop
End With
End Sub
See: http://word.mvps.org/faqs/general/usingwildcards.htm

Word VBA code for saving forms

I have Word survey files, each containing forms filled by subjects. Until now I have manually exported the forms data by saving as txt and choosing the option "save form data as delimited text file".
I want to programmatically save as delimited text file all the .doc documents in a given directory. Alternatively, if this were to be too complicated, it would be sufficient to save one file at a time. The new txt files must have the same name as the original .doc files.
Thanks for your input Jan Schejbal. I've reached a solution with this piece of code, so I share it for whose who encounter the same problem. I received help from here
Sub Save_Forms_Data()
Application.ScreenUpdating = False
Dim strFolder As String, strFile As String, wdDoc As Document, strDocName As String
strFolder = CurDir
If strFolder = "" Then Exit Sub
strFile = Dir(strFolder & "\*.doc", vbNormal)
While strFile <> ""
Set wdDoc = Documents.Open(FileName:=strFolder & "\" & strFile, AddToRecentFiles:=False, Visible:=False)
With wdDoc
strDocName = Left(.FullName, InStrRev(.FullName, ".")) & "txt"
.SaveAs2 FileName:=strDocName, FileFormat:=wdFormatText, AddToRecentFiles:=False, _
SaveFormsData:=True, Encoding:=1252, InsertLineBreaks:=False, LineEnding:=wdCRLF
.Close SaveChanges:=False
End With
strFile = Dir()
Wend
Set wdDoc = Nothing
Application.ScreenUpdating = True
Application.Quit SaveChanges:=wdDoNotSaveChanges
End Sub
You can record a macro, which means you start the recording, do certain actions, then stop the recording, and VBA code for said actions is automatically generated. The code may not be very clean, but it should give you a good start to show you how the syntax looks and what commands you need for your actions. For certain things (e.g. dynamically specifying the file name), you will need to consult the documentation, but if you have any programming experience in any common language, this should not pose a significant problem once you have the "skeleton" provided by the macro recorder.
The more you want to automate, the more VBA you will need to learn. As VBA really isn't difficult, and it seems like you have a lot of repetitive work in front of you if you don't automate it, I'd suggest you learn it and Google what you need. This way, you will get your work done in a similar timeframe (or less, especially if this is not just a one-off thing), you will have a macro to do it next time, it will be less boring, and you will have learned a bit of VBA.