How to pull text from a shape in a Word document using VBA? - vba

I'm trying to grab the text from inside a shape on a Word document.
Sub textgrab()
MsgBox ActiveDocument.Shapes("Rectangle 85").TextFrame.TextRange.Text
End Sub
I get the error:
Run-time error '-2147024809 (80070057)':
The item with the specified name wasn't found.
In the Word document when I go to the top menu, hit the shape format tab, and in the arrange section, I select 'selection pane', I get a list of all the shapes, 'Rectangle 85' is there.
When I select it, it highlights the box i'm trying to grab the value from.
This is a pdf that I've opened in Word. I'm trying to automate a process that will open a pdf invoice, grab the dollar total, and pull it into Excel.

Solution for those that stumble upon this later. I used the following:
ActiveDocument.ActiveWindow.Panes(1).Pages(1).Rectangles.Item(i).Range

Word can only extract text from Drawing objects. These are inserted in the UI, for example, from Insert/Shapes. Shape.TextFrame.TextRange has no OCR capabilities, so can't be used to get text "embedded" in other kinds of graphic objects, such as an embedded PDF file or a JPG or anything similar.
When uncertain whether a particular Shape supports reading or writing text, right-click it in the UI and see if the menu selection Add Text or Edit Text is available.

Related

Call a subprocedure with an input argument inside a shape

My goal is to call a macro (with input of specific information from the shape) inside a shape in the MS VISIO application.
The macro is to open a PDF datasheet of the specific shapes (the element in my block diagram).
I have an idea to trigger the macro from the specific shape, which is inserting action section and using runaddon function to run the subprocedure (using shell to open a specific PDF file).
Question:
How do I pass an argument (type of string) of the shape (in my case it is a PDF file name for this shape) to the subprocedure? If that can be achieved, I can open each shape's corresponding datasheet.
How do I save the input argument value (type string) inside each shape?
And how do I call them inside VBA subprocedure?
Visio has hyperlinks. You can just add a hyperlink to your shape pointing to your PDF, and the menu item action as well as click behavior will be added automatically.
So, click Ctrl+K to add a hyperlink, or use "insert" menu.
You can also add hyperlink programmatically, or by linking to a data source (like excel). Also you can build URL of the link as a formula, out of shape data, like shape text.
If you still want to do it with vba macro, you can try CALLTHIS function: https://learn.microsoft.com/en-us/previous-versions/office/developer/office-2003/aa212649(v=office.11)?redirectedfrom=MSDN#example-1

MS Word Ignores Content Control Inside a Rich Text Box

Is there a reason why my MS Word VBA macro is ignoring a dropdown list I placed inside a shape (a rich text box)? I've tried referring to it by tag, name, number, etc. I even had the macro tell me the count of content controls:
MsgBox(ActiveDocument.ContentControls.Count)
I get 0.
Nothing works. If I take it out of the shape, it works fine. MS Word gives me a count of 1 item. But for some reason MS Word won't acknowledge it inside the shape. Any help on how to do this?
Edited as my previous post was completely wrong.
Each textbox in the main text story is a Shape which you can access using an index number. A shape has various properties but text etc. is in its Textframe, if it has one. But in that case the Range you need is not called Range but TextRange. So, e.g. the first contentControl in Shape 2 is
ActiveDocument.Shapes(2).TextFrame.TextRange.ContentControls(1)
You will probably need to iterate through your shapes and you may need to verify that a given shape is a textbox and/or that it has a TextFrame.
If your text box is in another Story such as a header or footer, you will probably need to identify the relevant StoryRange.

Microsoft Word MacroButton - placeholder text visibility

I have a Microsoft Office 2013 Word template, in which I have some text-field elements, created by using Quick Parts -> Field -> MACROBUTTON noname [Type your text here].
If I fill only some of these fields (i.e. "[Name]", "[Address]") and I print or save as PDF, all the fields that I have not filled will display as [Insert your text here] in the printed paper or PDF. To be clear, the placeholder text must be manually removed (or replaced with the text you want).
I've readed somewhere, that you can create a macro, which will not display the placeholder text in the PFD- or printed version of the document, if there is no text written manually to that specific field (you leave it as it was). As this would be handy in cases, where you don't fill all the neccessery fields, my question is:
Q: Can this be achieved only by using Macro Button, and if not, what is needed to create text fields as described below that are not included in the printed or PDF saved version of the document?
This cannot be achieved without using actual macro code. Right now your solution contains no macro code, the fields simply function as "targets" and when the user types on the field it is deleted. Where the user does not type, the prompt remains. You'd need code to delete these fields from the document.
Given your requirement, the code would have to fire in the DocumentBeforeSave and the DocumentBeforePrint events. These events require a class and supporting code in a standard module. The basic information on how to set these up is in the Word object model language reference: https://msdn.microsoft.com/en-us/library/office/ff821218.aspx
An alternative to MacroButton fields would be to use ContentControls. But here, again, code and the same events would be required to remove/hide placeholder text.

Convert text to image in Microsoft Word

I have a large book written in Microsoft Word and want to create a macro that will find all text using a predefined style and convert that text to an inline image. This text will be in Arabic and generally no longer than 4-5 lines. Is this possible?
UPDATE: Here's an example to show what I'm referring to:
I want to replace that entire line in Arabic with an image (as if I cropped this attached image to only include the Arabic and then replaced the line in Arabic with the image).
The reason I want a macro or script to do this is because there are hundreds of such lines and updating them one by one is cumbersome plus that will make modifications difficult later on.
UPDATE2: I found an interesting option here: http://windowssecrets.com/forums/showthread.php/31344-Convert-Text-to-an-Image-of-Text-in-VBA-(Office-2000-Sr1a)
It looks like you can cut a piece of text and then "Paste Special" as an image. So if there's a way to automate that that might work.
This is not an answer although I hope it will grow into a community answer. At the moment it is an exploration of what is required to solve the problem.
I know from the discussion when this question was posted on Super User that Abdullah wishes to publish his book on Kindle. So the question is really about how to get a document in English and Arabic ready for publication as an e-Book.
The Kindle does not support Arabic. The number of languages it does support is slowly increasing but there is no evidence I can find that Amazon has plans to add Arabic in the foreseeable future.
The format behind an Amazon e-Book is a cut down version of HTML. If a Word document containing Arabic letters is exported to HTML, the Arabic letters are included as character entities; for example: “ﭐ &#amp;64337; ﭒ ﭓ”. Importing the original Word or the HTML version to Kindle, results in the leading bits being discarded so these characters are displayed as P, Q, R and S instead of “ﭐ ﭑ ﭒ ﭓ (Alef Wasla isolated form, Alef Wasla final form, Beeh Wasla isolated form and Beeh Wasla final form).
I have tried Abdullah’s idea of saving some Arabic letters in a PNG file and creating an HTML file containing <p> … </p> <img src= “Arabic.png” > <p> … </p>. The appearance of this file on my Kindle 2 is perfectly acceptable so this has the potential to be a solution. The question is: how can the necessary conversions be performed?
We need to extract each Arabic string from either the Word document or its HTML equivalent and import it into a program that can convert them to PNG files.
The only way that I know of automating this would be to copy each string to a slide within PowerPoint. With PowerPoint’s SaveAs option it is possible to save each slide as a separate PNG file. The slides are named: SLIDE1.PNG, SLIDE2.PNG, SLIDE3.PNG and so on in sequence which would allow a macro to relate the results to the original strings. It would then be possible to replace the Arabic strings in the HTML file with the image elements. None of this would be too difficult to automate but there is a problem with the slides all being the size of the PowerPoint page. The page could be made smallish but what we need is for each slide to be cropped to just bigger than that slide’s text. I cannot think of any way of automating this cropping.
Does anyone have a better approach than converting each Arabic phrase to a PNG file?
I have been looking for PNG editors with some sort of command line interface but can find nothing that would be easier than using PowerPoint. Does anyone know of an alternative to PowerPoint?
Does anyone have any suggestions for automating the cropping of each image? When a string is placed in a PowerPoint slide it is possible to set its width to, say, 6.5cm (which looks good on my Kindle) and get the height determined by PowerPoint. This could be saved for later use if anyone knows how to use it.
Implementing solution
Pending any suggestions for improving the approach described above, the following outlines how I would implement it.
I would not attempt to process the Word document. I would save it as a Web Page, Filtered HTML file, which is a required step on the way to creating a Kindle eBook, and process that.
Within the HTML file created from my test document, the Arabic phrase comes out as:
<p class="MsoNormal"></p>
<p class="MsoNormal" align="center" style="text-align:center"><span dir="RTL"
style="font-size:24.0pt;font-family:Arial">
&#64336;&#64337;&#64338;&#64339;&#64340;&#64341;
&#64342;&#64343;&#65153;&#65154;&#65276;&#65275;
&#65274;&#65273;&#65246;&#65226;&#65227;&#65228;
</span><span style="font-size:24.0pt"></span></p>
<p class="MsoNormal"></p>
<p class="MsoNormal"></p>
I assume Abdullah's document will result in something similar. Note 1: the above is a random collection of Arabic letters. Note 2: they are held left-to-right in reading sequence even though, when displayed or printed, they are read right-to-left.
The whole of this block will have to be replaced with something like:
<br><imc src="xxxx.png"><br>
where the file xxxx.png holds an image of the Arabic text.
The file names, such as xxxx.png, could be systematic (A001.png, A002.png, ...) but I would have thought that transliterating the first ten or twenty characters of the phrase from the Arabic to English alphabets and using the result, with a numeric suffix, as the file name would be more convenient.
I would hold the records necessary to manage the process in an Excel worksheet. I would place the VBA code in the same workbook.
The steps in the conversion process that I envisage are:
VBA macro to extract Arabic strings from latest HTML file and add new strings to the Excel worksheet. (More about the Excel worksheet later.)
VBA macro to create PowerPoint file, with one slide per new string, and use SaveAs in PNG format to create one PNG file per slide before discarding the PowerPoint file.
Human to crop each PNG file. (There appears to be no way of automating the cropping so this task will be minimised by use of data in the Excel worksheet.)
VBA macro to rename each slide from SLIDEnnn.PNG to its permanent name and to record the permanent name in the Excel worksheet.
VBA macro to update the latest HTML file by replacing the block containing the Arabic phrase with the appropriate HTML IMG element.
The Excel worksheet needs two columns: Arabic phrase and PNG file name. If there is any risk of the worksheet being sorted between steps 2 and 4, we may need a sequence number as well.
Macro 1 will extract an Arabic phrase from the HTML file, look down the list in the worksheet for this phrase and add the phrase at the bottom if it is not already present.
Macro 2 will look for phrases in the worksheet that do not have a PNG file name. These new phrases are the ones to be written to the PowerPoint presentation. That is, a phrase only goes into this process once.
Task 3, cropping each PNG file, will be a pain. All I can say is that it will only be once per phrase.
Macro 4 will assume that the SLIDE001.PNG, SLIDE002.PNG, … are in the sequence of phrases without PNG files in the worksheet. If this might not be true (because the worksheet has been sorted) we will either need a sequence number or to retain the PowerPoint file. The macro will assign a unique name to each new phrase, record this name in the worksheet and rename the PNG file.
Macro 5 creates a new copy of the latest HTML file using the contents of the worksheet to determine which phrase to replace with which PNG file.
This process is not ideal but it will achieve the desired result and has no obvious complications. Any suggestions for improving it?
Before you begin these instructions, press record in the Microsoft Word macro editor, so you can see what the VBA code is.
I'm wondering if this will be easier if you convert the docx file to .rtf (rich text format) and replace that line with an image? Go to File > Save As.. > name it "old.rtf", then replace the line with an image and Save As.. again and name it "new.rtf" and then download Beyond Compare or your favorite diff program to see what happened. It should be easy to do this pro-grammatically if you choose to. I think working in text would be easier than Microsoft's binary format unless you can find a good library to modify their doc or docx formats.
Sub CopySelPasteAsPicture()
' Take a picture of a selection and paste it at the
' document end
With Selection
.CopyAsPicture
End With
ActiveDocument.Content.Select
With Selection
.Collapse Direction:=wdCollapseEnd
.TypeParagraph
.TypeParagraph
.PasteSpecial DataType:=wdPasteMetafilePicture
End With
End Sub

Using VBA in MS Word 2007 to define page elements?

I'd like to be able to create a page element which I can feed text and it will form itself into the preferred layout. For instance:
{MACRO DocumentIntro("Introduction to Business Studies", "FP015", "Teachers' Guide")}
with that as a field, the output should be a line, the first two strings a certain size and font, centred, another line and then the third string fonted, sized and centred.
I know that's sort of TeX-like and perhaps beyond the scope of VBA, but if anyone's got any idea how it might be possible, please tell!
EDIT:
Ok, if I put the required information into Keyword, as part of the document properties, with some kind of unique separator, then that gets that info in, and the info will be unique to each document. Next one puts a bookmark where the stuff is going to be displayed. Then one creates an AutoOpen macro that goes to that bookmark, pulls the relevants out of the keywords, and forms the text appropriately into the bookmark's .Selection.
Is that feasible?
You're certainly on the right track here for a coding solution. However, there is a simpler way with no code - this is the type of scenario that Content Controls in Word 2007 were built for and with Fields/Properties, you can bind to content controls (CC). These CC can hold styles (like centered, bold, etc.). No VBA required.
The very easiest thing to do is to pick 3 built-in document properties that you will always want these to be. For example, "Title" could be your first string, "Subject" your second string and "Keywords" your third. Then, just go to the Insert ribbon, Quick Parts, Document Properties and insert, place and format those how you like. Then go to Word's start button (the orb thingy) and then under Prepare choose Properties. Here you can type, for example "Introduction to Business Studies", into the Title box and then just deselect it somehow (like click in another box). The Content Control for Title will be filled in automatically with your text.
If you want to use this for multiple files, just create this file as a .dotx (after CC insertion/placement/formatting and before updating the Document Properties' text). Then every time all you'll have to do is set these three properties with each new file.
Well, yes, it did turn out to be feasible.
Sub autoopen()
Dim sKeywords As String
sKeywords = ActiveDocument.BuiltInDocumentProperties(4)
ActiveDocument.Bookmarks("foo").Select
Selection.Text = sKeywords
End Sub
Okay, I have some filling out to do, but at least the guts of it are there.