Generate a url for a specific text or area in a pdf (like a bookmark) - pdf

So that the link can be used in some external applications, and users in that application can click the link and navigate to that specific location (text or area) in the pdf.
Is this possible?

There are several different URL # (fragment) constructs that were introduced by Adobe for use with their web browser PDf viewer plug-in. The current accessible list is at https://pdfobject.com/pdf/pdf_open_parameters_acro8.pdf
Many newer competing PDF enabled browsers may use similar, but there is no guarantee one URL.pdf#Fits all. Most will respect #page=number&zoom view or fit but it is very variable, thus you need to check each feature across browsers.
Important to switch off any PDF remember my view settings
For your previous request using Bookmarks can be unreliable (as can the comments ID) unless you are using Acrobat and can check the bookmark function. By far the more reliable may be the use of zoom at a scroll location so here using MS Edge:-
"C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe" file:///C:/Users/.../Desktop/CEI-PID-Sample-Rev4annotsC.pdf#zoom=300,200,200
I support SumatraPDF however its browser plug-in whilst still functional is way past EOL thus unsupported, it had a limited support for those fragments even though the CLI was recently extended for some Acrobat /Action commands.
However related to your desire the one single portable EXE can jump to well formed text blocks. Here I call "V 4508" in a fairly similar fashion
"C:\Program Files\SumatraPDF\SumatraPDF.exe" "C:\Users\WDAGUtilityAccount\Desktop\SandBox\apps\Graphics\CEI-PID-Sample-Rev4annotsC.pdf" -zoom 300 -search "V 4508"
And if I press A then the text will be highlighted for comments

Related

Download pdf - accessibility for screen readers

I'm curious how to make an accessible button for screen readers which downloads PDF.
I know that there is an option using href and pass there an URL to the pdf file, and even a download attribute inside an anchor to open a download window.
But it's not a good way for a screen reader. The screen reader reads it as a link but actually, this is not a link because it triggers downloading a pdf file rather than redirect to another page. So this can be confusing for people with vision disorders who rely on their screen readers.
Is it a good accessibility way to create such a button? Or relying on <a href='path-to-pdf'>...</a> is completely enough and not confusing for people with disabilities ?
General answer and basics of file download
Both a link and a button are perfectly fine, it doesn't make much difference.
IN any case, it's very important to explicitly indicate that the link or button is going to download a file rather than open a page, to avoid surprise.
The simplest and most reliable is just to write it textually, i.e. "View the report (PDF)".
You may also put a PDF icon next to the link to indicate it, but make sure to use a real image, i.e. <img alt="PDF" /> and not CSS stuff, since the later may not be rendered to screen readers and/or don't give you the opportunity to set alt text (which is very important).
A good practice is also to indicate the file size if its size is big (more than a few megabytes), so that users having a slow or limited connection won't get stuck or burn their mobile data subscription needlessly.
It's also good to indicate the number of pages if it's more than just a few, so that people can have an idea on how big it is, and if they really can take the required time to read it.
Example: "View the report (PDF, 44 pages, 17 MB)"
Note that similarly, that's a good practice to indicate the duration of a podcast or video beforehand.
Additional considerations with PDF
First of all, you should make sure that your PDF is really accessible. Most aren't by default, sadly.
You should easily find resources on how to proceed to make a PDF accessible if you don't know.
Secondly, for an accessible PDF files to be effectively read accessibly, it has to be opened inside a real PDF reading program which supports tagged PDFs, like Adobe Reader.
The problem is that nowadays, most browsers have an integrated PDF viewer. These viewers usually don't support tagged PDFs, and so, even if you make an accessible PDF, it won't be accessible to the user if it is open inside that integrated browser viewer.
So you must make sure that your link or button triggers an effective download or opening in a true PDF reading program, rather than opening in an integrated viewer of the browser.
Several possibilities that may or may not work depending on OS/browser to bypass the integrated viewer. They have to be tested to make sure they work:
Send a header Content-Disposition: attachment; filename="something.pdf"
Send a Content-Type different than "application/pdf" or "text/pdf", e.g. "application/octet-stream" to fake out basic type detection
Make the link don't ends with .pdf
Use the download attribute of <a>
The most reliable are response headers. Most browsers don't rely only on file extension alone to decide what to do.
Either a link or a button is fine. The most important thing is that the user is informed about what the element does - i.e. it downloads/opens a PDF file. So, this should be reflected in the element's label, whether that is a visible text label or an icon that uses alt text or aria-label to explicitly describe the element's purpose.
I agree with Quentinc's suggestion to also inform the user upfront about the number of pages and size of the document - that's a nice touch that I don't see very often!
PDF accessibility is a whole other topic, but again as QuentinC points out, there's not much good in allowing a user to download or view a PDF that isn't accessible, so it's a good idea to ensure the PDF has been tested against JAWS/NVDA/VoiceOver/TalkBack to ensure it is readable.

Can you embed a separate pdf into Indesign and open it after exporting to PDF?

I would like to ask the following if possible. We have a client that wants a separate pdf document, embedded in a main pdf document and opens when you click it. Like the function in MS Word where you can attach another Word document inside a Word document (Word-ception, lol) and you can still open it.
I've tried it in Acrobat Pro with the Attachment and Link tools. Another option was to put the link document in an ftp server for accessibility. but our client really wants this functionality. Is this possible in Indesign?
Thank you!
Using Word as your example vehicle there are several ways to link 2 documents.
One is an appendix to the other, in PDF terms is a merge or binding but its one flowing document with separate sequential sections/chapters.
Another way is to link to an external file, in PDF terms a hyperlink to a relative second file, which can be locally folder relative or a web absolute reference. You have tried that.
In Word we can add objects internally with icons, in PDF that can be an annotation comment attachment to save externally and action accordingly. You also seem to discount that approach.
Finally PDF offers an Adobe Specific Structure where multiple PDFs attachments can be imbedded in an overall PDF wrapper. These are called Portfolios and not! to be confused with their portfolio service
They are unpopular since in a browser without Adobe Reader they should only offer the cover page.
Whilst in securer offline readers the files may well be shown as attachments that you need to save or independently open to view them.
Only some non Acrobat viewers may view them as a collection. And in the past that required runing insecure SWFlash, But I understand that has changed ?
Here is how the 3 internal PDF files seen above were shown in older Acrobat 9.
Possibly the best experience is using Foxit Reader

Are there any Linux scriptable pdf readers?

I would like to make a call to a local server running a REST interface from within a pdf reader, passing the selected text as argument. The first option that came to mind was to write a simple bash script with a curl call inside, and use a script from within the pdf app to execute it.
Are there any scriptable pdf readers that would allow that? It seems that none of the most common ones (e.g. Okular, Evince, gv, xpdf) would make it possible.
Added after #DanielH comment:
I'm not asking for a method to launch arbitrary code when the a pdf file is opened. Rather, I am asking if the user can choose to launch an external app from within the pdf reader (or script).
Okular (KDE reader) has a limited form of this functionality: using KDE web shortcuts, the user can select a portion of text and then launch a browser with a predefined url that contain the text in the URL.
For instance, when an Okular user selects a word and then chooses the "Google" shortcut, Okular will launch a browser and put the following pseudo-URI in the location bar:
https://www.google.com/search?q=\{#}&ie=UTF-8
with {#} replaced by the selected text. KDE Web shortcuts are user-editable and can result in rather complex queries.
This functionality is very close to what I am after, except that the only http request that can be managed this way (as far as I know), is GET. Instead, I would need to start a POST http request, which as far as I know cannot be done from the location bar. Hence, my question about using using bash+curl.
I should have known...anything related to text on Unix? The answer is always emacs.
In particular:
the pdf-tools package can be used to read pdf files in an emacs buffer. It is much more efficient than PdfView and gives full access to annotations and all the usual emacs editing command (selecting regions, etc).
The standard emacs command shell-command-on-region can be used to launch a command with the selected region passed to it as standard input.
The command (as always in emacs) can be called interactively or from an emacs function.
All it's needed is to write a simple shell script that wraps up the needed command and passes on the input it receives.
In my case, I needed to call curl with a few parameters and a JSON string encapsulating the POST request. Iencountered a minor problem wince emacs passes the selected text (region) as standard input and not as argument, but this SX question (and #andy answer especially) helped me understand how to read piped standard input into a variable.

PDF downloading instead of opening in new tab

This is not a back-end programming question. I can only modify the markup or script (or the document itself). The reason I'm asking here is because all my searches for appropriate terms inevitably lead to questions and solutions about programming this functionality. I'm not trying to force it via progrmaming; I have to find out why this PDF is behaving differently.
So:
I have a bunch of links to PDFs on a page. Most of them open in new tabs, but one of them, the most recent, starts to open in a tab, but then the tab closes and the PDF gets downloaded as a file instead. All markup is consistent - there's nothing differnt about the odd-man-out except the actual URL.
You can see this here:
http://calwater.mwnewsroom.com/Investor-Relations/Financial-Reports/Annual-Reports
All annual reports up to 2012 open in a new tab, but 2013 downloads instead.
This leads me to believe that there is some meta-data property of the PDF itself that tells it how to open, and that, in this case, the 2013 PDF was created using different settings.
Apparently, the PDF was saved out to PDF from InDesign.
Does anyone have any insight?
Problem solved. There was simply an error in the string (like an extra period) that references the attachment such that it couldn't tell it was a PDF. Fixing the reference fixed the problem.

PDF Outline Text - Automation of Acrobat Sequences

I have built an application that automates the filling out of form fields inside a pdf. It then takes various assets and combines them together to generate a "print ready" product. All of this is accomplished using the magic of iTextSharp. When form fields are populated, they are then flattened to text. The problem is that even with the fonts embedded they aren't really attached to the form fields in a meaningful way (like straight text elements are) and the printers are complaining that the pdf is generating licensing errors due to this. I researched this a bit and it just seems to be the nature of how form fields are.
The artists we are working with requested that we research a way to "outline" the text that is created from flattening the form fields. I found that running the PDF Optimizer with a custom preset allows for Text Outlining in Acrobat, and even better I can generate an Acrobat Sequence that runs this command on the pdf. The problem is that Sequences can not be automated, at all.
I found a plug-in called AutoBatch that allows for the execution of Sequences on the command line through a batch file. The downside is that this would require installing Acrobat Pro and the Plug-in on the server this application will be running on. Further it seems like an overkill solution just to outline the text in the pdf. For all I know at this point iTextSharp may allow me to do this programmatic, but searching for such a thing on google returns little results and nothing relevant.
So the question: Is there a better way to outline text in a pdf than the current solution I have implemented or am I kind of stuck?
TLDR; PDF is generated w/ non-standard fonts. I need to "outline" this text to send it to the printer. Currently using AutoBatch Acrobat Plug-In to execute Acrobat Sequence from the Command Line. Seems excessive, wondering if anyone knows a better way to automate font outlining.
I am also in a printing environment and have used forms for "Box Covers" plenty of times to shorten the code used to produce box covers.
I simple us "pdfStamper.FormFlattening = true;" and the printers (Xerox DP180 and DC5000) has no problems in using the PDF.
The moment I leave out FormFlattening the printer gives a lot of errors regarding the PDF.
If you are using FormFlattening then check if the printer has the font locally installed in order for it to reference the font from the print engine instead of the PDF resources.