Section Header Range.Text Returning Empty String Instead of Actual Text - vba

I have a PDF file that I am trying to parse text out of. I opened the file using Microsoft Word, and text I need is in the header. On the first page, the header is justified left with a center tab that has the text (plain English name document title instead of the complicated reference name) that I am trying to grab. There is a right tab that has a page number control that I don't care about.
When I try to run the following:
Debug.Print ThisDocument.Sections(1).Headers(wdHeaderFooterPrimary).Exists
it gives me True, so I know the header exists. However, when I try to run
Debug.Print ThisDocument.Sections(1).Headers(wdHeaderFooterPrimary).Range.Text
it gives me nothing but an empty string, which I can further confirm by wrapping it in a Len(…) command which gives me 1. How can I get the text out of the header?
Of note, I tried using some Adobe SDK functions which would have been easier, but I do not have the professional Acrobat suite so I do not have access to those tools. Hence the MS Word workaround.

Related

How to pull text from a shape in a Word document using VBA?

I'm trying to grab the text from inside a shape on a Word document.
Sub textgrab()
MsgBox ActiveDocument.Shapes("Rectangle 85").TextFrame.TextRange.Text
End Sub
I get the error:
Run-time error '-2147024809 (80070057)':
The item with the specified name wasn't found.
In the Word document when I go to the top menu, hit the shape format tab, and in the arrange section, I select 'selection pane', I get a list of all the shapes, 'Rectangle 85' is there.
When I select it, it highlights the box i'm trying to grab the value from.
This is a pdf that I've opened in Word. I'm trying to automate a process that will open a pdf invoice, grab the dollar total, and pull it into Excel.
Solution for those that stumble upon this later. I used the following:
ActiveDocument.ActiveWindow.Panes(1).Pages(1).Rectangles.Item(i).Range
Word can only extract text from Drawing objects. These are inserted in the UI, for example, from Insert/Shapes. Shape.TextFrame.TextRange has no OCR capabilities, so can't be used to get text "embedded" in other kinds of graphic objects, such as an embedded PDF file or a JPG or anything similar.
When uncertain whether a particular Shape supports reading or writing text, right-click it in the UI and see if the menu selection Add Text or Edit Text is available.

Unable to extract and re-insert MS Word Content Control using VBA and InsertXML

This question is related to my other question: Range.InsertXML using Transform
In MS Word it is easy to insert a content control using VBA, for example:
ThisDocument.ContentControls.Add wdContentControlRichText, Selection.Range
I've recently started exploring more in the XML side of things, e.g.:
Debug.Print ThisDocument.Range.XML seems to (or actually does) produce the XML for a Word document. However, if I create a NEW, BLANK document and add a Content Control I am unable to extract and reinsert the Content Control (oCC).
My steps:
added 2 blank paragraphs to a new document
added oCC to the 2nd paragraph
selected the oCC paragraph
immediate window: thisdocument.Paragraphs(1).Range.InsertXML selection.Range.XML
At first glance it LOOKS like the Content Control was duplicated, BUT on closer inspection, it was deleted and only the formatted text remains (see image, top paragraph is actually just formatted text).
Thinking I could out smart MS Word I set the properties of the Content Control to '...can not be deleted', but that didn't help.
I've also tried to insert into a separate document in case the issue had something to do with duplication of something that ought to have been unique.
In a nutshell:
To answer this question I need a way to insert a Content Control to a document using a combination of VBA and XML (or confirmation that what I am attempting is not possible).
Just realized I should use Selection.Range.WordOpenXML instead of Selection.Range.XML

Microsoft Word MacroButton - placeholder text visibility

I have a Microsoft Office 2013 Word template, in which I have some text-field elements, created by using Quick Parts -> Field -> MACROBUTTON noname [Type your text here].
If I fill only some of these fields (i.e. "[Name]", "[Address]") and I print or save as PDF, all the fields that I have not filled will display as [Insert your text here] in the printed paper or PDF. To be clear, the placeholder text must be manually removed (or replaced with the text you want).
I've readed somewhere, that you can create a macro, which will not display the placeholder text in the PFD- or printed version of the document, if there is no text written manually to that specific field (you leave it as it was). As this would be handy in cases, where you don't fill all the neccessery fields, my question is:
Q: Can this be achieved only by using Macro Button, and if not, what is needed to create text fields as described below that are not included in the printed or PDF saved version of the document?
This cannot be achieved without using actual macro code. Right now your solution contains no macro code, the fields simply function as "targets" and when the user types on the field it is deleted. Where the user does not type, the prompt remains. You'd need code to delete these fields from the document.
Given your requirement, the code would have to fire in the DocumentBeforeSave and the DocumentBeforePrint events. These events require a class and supporting code in a standard module. The basic information on how to set these up is in the Word object model language reference: https://msdn.microsoft.com/en-us/library/office/ff821218.aspx
An alternative to MacroButton fields would be to use ContentControls. But here, again, code and the same events would be required to remove/hide placeholder text.

How to parse text from a plain text file and use the result to highlight a PDF file

Back in 2010, some guy claimed to be capable of doing this:
http://www.mobileread.com/forums/showthread.php?t=103847
"The Kindle stores its annotations in a Mobipocket (".mobi") file for each document and in one long text file named "My Clippings.txt." In this post I describe a system that synchronizes these annotations with PDF versions of the corresponding documents on a computer.
Overview
This system is embodied in an Applescript that parses the My Clippings file and controls the Skim PDF reader. The script first parses the clippings file. It then searches through the clippings and isolates any that come from documents on the kindle matching the filename of the currently open PDF file (the "pertinent clippings"). The script then iterates through each of the pertinent clippings, locating the matching text or location in the PDF document and applying highlights or adding notes where appropriate. The end result is an annotated, printable PDF document that matches the document on the kindle.
You can download the script here: http://dl.dropbox.com/u/2541109/KindleClippings.scpt. Before running the script, be sure to change the value of MyEmail to match your sending address and to verify that the Kindle mount point defined in MyClippingsFile is correct. You'll also need the free Skim PDF Reader.
To use it, send or copy a document file to your kindle. Remember, the kindle supports RTF, DOC, TXT and other common text formats and it will convert them into MobiPocket files internally for easier reading. Make some notes. Then take the same document that you just sent to the kindle and convert it to a PDF, e.g. by using the print to PDF feature in Mac OS X. Be sure to keep the filename the same. Open that same PDF in Skim and run the script. The highlights and notes should appear in the PDF.
If you're interested in how this works, read more on my blog here:
[not longer available]
Sadly, his script is no longer available, nor his blog.
Do you guys know if this is possible? I've been looking for this kind of functionality but can't find it anywhere.
This code, using python and PyMuPDF, works:
import fitz
# the document to annotate
doc = fitz.open("text_to_highlight.pdf")
# the text to be marked
text_list = [
"first piece of text",
"second piece of text",
"third piece of text"
]
for page in doc:
for text in text_list:
rl = page.search_for(text, quads = True)
page.add_highlight_annot(rl)
# save to a new PDF
doc.save("text_annotated.pdf")
The original 'My Clippings.txt' should be manipulated somehow, stringr could work but I found more useful to manipulate the text with multiple selections in Sublime Text---the goal is to have a list of highlights in the form of text_list above.
I am trying to do this using Python + a Windows macro creator (I'm a Win 7 user). You can use this approach to save the file as RTF, DOCX, PDF, etc. So far, it's been reasonably effective. Do note 2 things first:
1- the 'My Clippings' file only saves the text and the page, it does not save the location on the page (e.g., if you highlighted "mammals are animals" on page 15, it will give you this line and the page number, but if there are more than one "mammals are animals" on page 15, it's impossible to know which one you've highlighted). This is specially bad when you've highlighted a generic word, like "animals" or "the". And if you made comments by pressing on a word, this word is the only information you'll get about what in that page the comment refers to (e.g., I pressed on "animals" and the menu popped up, I selected 'Comment'. If "animals" appears 20 times on page 15, I cannot know to which of them my comment is refering).
2- The only way to retrieve the location on the page would be to analyze the *.pds and *.pdt files, inside the *.sdr folder in Kindle's drive ('Documents'). I can make no sense of these files.
In Python, you can run an easy code to extract the information you want from "My Clippings". Then you can use a macro creator to automate the process of copying the text and annotating it to the PDF (using Adobe Acrobat, for example), and then saving the PDF file.
Exemplifying with Adobe Acrobat:
Say I want to save all my highlights to the PDF file. First, I'll create a *.txt file on Python and run a script to copy all the strings related to the highlights to this new txt file (i.e., the highlighted text & the page number). Here's an example of such code (but first, copy and paste the "My Clippings.txt" file to the IDE start folder, e.g.: C:\Python27):
#for python 2.7.6
with open('My Clippings.txt','r') as rf:
with open('My Clippings Output.txt','w') as wf:
access = 0
bookTitle = 'Book Title'#put the book file's name as it's written in "My Clippings.txt"
for x in rf:
if access == 1:
wf.write(x)
if bookTitle in x:
access = 1
#for highlights only, instead of all annotations, include this if statement:
if (' | Added on ' in x) and ('- Your Note ' in x) or ('- Your Bookmark ' in x):
access = 0
if x == '==========\n':
access = 0
Then I'll create a macro to copy the page number in the "My Clippings Output.txt" file (it's inside the same folder you put the "My Clippings.txt" file), paste in Acrobat "page window", find (ctrl+f) the string in the page, then press "highlight". Done!
There's a catch in Acrobat though, the search/find function has a limit of ~28 chars, so your highlighted text can't be longer than that. I still don't know how to circumvent this limitation... I raised this problem here https://superuser.com/questions/884221/how-to-search-and-highlight-long-passages-in-a-pdf-file . As a bypass to the 28 chars limit on Acrobat, you can program the macro to copy using "shift"+"right arrow 28 times", and then use "cut" instead of "copy".
There are many free-to-use and libre macro creators out there, just google and choose the one you like best. For Windows, my favorite one is Pulover's Macro Creator. If you have any doubts about the process you can comment here or PM me. I'd prefer you to comment here, so that I can improve the answer

Insert image from URL bookmark Microsoft word

I have a image URL contained in my sql database.
I create a bookmark for that column in the word document (this works fine).
Now I want to use the image URL that is passed from the database to insert an image.
I have tried hyperlink (does not work and does not display image).
I have tried Quick Parts - IncludePicture (does not work).
I have been Googleing and have not found anything that works.
Ok let me simplify this.
I want to insert a image using an URL.
You can do this in alot of different ways I know.
For instance using Quick Parts and the selecting IncludePicture you would the past the URL of the picture and BAM image inserted.
Now I want to do exactly that with one exception. The URL is a microsoft word bookmark that I get from my database.
For some reason this does not want to work. I have also checked the bookmark data and it is correct and yes it is a valid URL because if I copy and paste it from the database in the way I described above it works.
So is there any other way to do this?
To be honest I still don't know where is exactly your problem. I assumed that you have knowledge and code to take both bookmark name and url from your database using VBA. If so, there would be quite simple code which would allow you to load picture from web to bookmark in your word document.
Below is the code I have tested with half of success. If I add any picture it will work fine. But will not work with url of active google map. I have no idea what you you mean with 'static google map' (in comment), you didn't provide any example therefore you need to make your own test.
Before you run this for test be sure you have two bookmarks in your active document: bookmark_logo and bookmark_poland. Hope this will help a bit.
Sub Insert_picture_To_Bookmark()
Dim mapURL As String
Dim soLOGO As String
soLOGO = "http://cdn.sstatic.net/stackoverflow/img/apple-touch-icon.png"
ActiveDocument.Bookmarks("bookmark_logo"). _
Range.InlineShapes.AddPicture _
soLOGO, True, True
mapURL = "https://maps.google.pl/maps?q=poland&hl=pl&sll=50.046766,20.004863&sspn=0.22047,0.617294&t=h&hnear=Polska&z=6"
ActiveDocument.Bookmarks("bookmark_poland"). _
Range.InlineShapes.AddPicture _
mapURL, True, True
End Sub