Are there any Linux scriptable pdf readers? - pdf-reader

I would like to make a call to a local server running a REST interface from within a pdf reader, passing the selected text as argument. The first option that came to mind was to write a simple bash script with a curl call inside, and use a script from within the pdf app to execute it.
Are there any scriptable pdf readers that would allow that? It seems that none of the most common ones (e.g. Okular, Evince, gv, xpdf) would make it possible.
Added after #DanielH comment:
I'm not asking for a method to launch arbitrary code when the a pdf file is opened. Rather, I am asking if the user can choose to launch an external app from within the pdf reader (or script).
Okular (KDE reader) has a limited form of this functionality: using KDE web shortcuts, the user can select a portion of text and then launch a browser with a predefined url that contain the text in the URL.
For instance, when an Okular user selects a word and then chooses the "Google" shortcut, Okular will launch a browser and put the following pseudo-URI in the location bar:
https://www.google.com/search?q=\{#}&ie=UTF-8
with {#} replaced by the selected text. KDE Web shortcuts are user-editable and can result in rather complex queries.
This functionality is very close to what I am after, except that the only http request that can be managed this way (as far as I know), is GET. Instead, I would need to start a POST http request, which as far as I know cannot be done from the location bar. Hence, my question about using using bash+curl.

I should have known...anything related to text on Unix? The answer is always emacs.
In particular:
the pdf-tools package can be used to read pdf files in an emacs buffer. It is much more efficient than PdfView and gives full access to annotations and all the usual emacs editing command (selecting regions, etc).
The standard emacs command shell-command-on-region can be used to launch a command with the selected region passed to it as standard input.
The command (as always in emacs) can be called interactively or from an emacs function.
All it's needed is to write a simple shell script that wraps up the needed command and passes on the input it receives.
In my case, I needed to call curl with a few parameters and a JSON string encapsulating the POST request. Iencountered a minor problem wince emacs passes the selected text (region) as standard input and not as argument, but this SX question (and #andy answer especially) helped me understand how to read piped standard input into a variable.

Related

Generate a url for a specific text or area in a pdf (like a bookmark)

So that the link can be used in some external applications, and users in that application can click the link and navigate to that specific location (text or area) in the pdf.
Is this possible?
There are several different URL # (fragment) constructs that were introduced by Adobe for use with their web browser PDf viewer plug-in. The current accessible list is at https://pdfobject.com/pdf/pdf_open_parameters_acro8.pdf
Many newer competing PDF enabled browsers may use similar, but there is no guarantee one URL.pdf#Fits all. Most will respect #page=number&zoom view or fit but it is very variable, thus you need to check each feature across browsers.
Important to switch off any PDF remember my view settings
For your previous request using Bookmarks can be unreliable (as can the comments ID) unless you are using Acrobat and can check the bookmark function. By far the more reliable may be the use of zoom at a scroll location so here using MS Edge:-
"C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe" file:///C:/Users/.../Desktop/CEI-PID-Sample-Rev4annotsC.pdf#zoom=300,200,200
I support SumatraPDF however its browser plug-in whilst still functional is way past EOL thus unsupported, it had a limited support for those fragments even though the CLI was recently extended for some Acrobat /Action commands.
However related to your desire the one single portable EXE can jump to well formed text blocks. Here I call "V 4508" in a fairly similar fashion
"C:\Program Files\SumatraPDF\SumatraPDF.exe" "C:\Users\WDAGUtilityAccount\Desktop\SandBox\apps\Graphics\CEI-PID-Sample-Rev4annotsC.pdf" -zoom 300 -search "V 4508"
And if I press A then the text will be highlighted for comments

Download pdf - accessibility for screen readers

I'm curious how to make an accessible button for screen readers which downloads PDF.
I know that there is an option using href and pass there an URL to the pdf file, and even a download attribute inside an anchor to open a download window.
But it's not a good way for a screen reader. The screen reader reads it as a link but actually, this is not a link because it triggers downloading a pdf file rather than redirect to another page. So this can be confusing for people with vision disorders who rely on their screen readers.
Is it a good accessibility way to create such a button? Or relying on <a href='path-to-pdf'>...</a> is completely enough and not confusing for people with disabilities ?
General answer and basics of file download
Both a link and a button are perfectly fine, it doesn't make much difference.
IN any case, it's very important to explicitly indicate that the link or button is going to download a file rather than open a page, to avoid surprise.
The simplest and most reliable is just to write it textually, i.e. "View the report (PDF)".
You may also put a PDF icon next to the link to indicate it, but make sure to use a real image, i.e. <img alt="PDF" /> and not CSS stuff, since the later may not be rendered to screen readers and/or don't give you the opportunity to set alt text (which is very important).
A good practice is also to indicate the file size if its size is big (more than a few megabytes), so that users having a slow or limited connection won't get stuck or burn their mobile data subscription needlessly.
It's also good to indicate the number of pages if it's more than just a few, so that people can have an idea on how big it is, and if they really can take the required time to read it.
Example: "View the report (PDF, 44 pages, 17 MB)"
Note that similarly, that's a good practice to indicate the duration of a podcast or video beforehand.
Additional considerations with PDF
First of all, you should make sure that your PDF is really accessible. Most aren't by default, sadly.
You should easily find resources on how to proceed to make a PDF accessible if you don't know.
Secondly, for an accessible PDF files to be effectively read accessibly, it has to be opened inside a real PDF reading program which supports tagged PDFs, like Adobe Reader.
The problem is that nowadays, most browsers have an integrated PDF viewer. These viewers usually don't support tagged PDFs, and so, even if you make an accessible PDF, it won't be accessible to the user if it is open inside that integrated browser viewer.
So you must make sure that your link or button triggers an effective download or opening in a true PDF reading program, rather than opening in an integrated viewer of the browser.
Several possibilities that may or may not work depending on OS/browser to bypass the integrated viewer. They have to be tested to make sure they work:
Send a header Content-Disposition: attachment; filename="something.pdf"
Send a Content-Type different than "application/pdf" or "text/pdf", e.g. "application/octet-stream" to fake out basic type detection
Make the link don't ends with .pdf
Use the download attribute of <a>
The most reliable are response headers. Most browsers don't rely only on file extension alone to decide what to do.
Either a link or a button is fine. The most important thing is that the user is informed about what the element does - i.e. it downloads/opens a PDF file. So, this should be reflected in the element's label, whether that is a visible text label or an icon that uses alt text or aria-label to explicitly describe the element's purpose.
I agree with Quentinc's suggestion to also inform the user upfront about the number of pages and size of the document - that's a nice touch that I don't see very often!
PDF accessibility is a whole other topic, but again as QuentinC points out, there's not much good in allowing a user to download or view a PDF that isn't accessible, so it's a good idea to ensure the PDF has been tested against JAWS/NVDA/VoiceOver/TalkBack to ensure it is readable.

docx to pdf conversion with libreoffice

I downloaded libreoffice to try to convert docx to pdf. Having a hard time getting it to run. I've looked around on forums and it seems the command is
soffice --convert-to pdf filename.docx
or
libreoffice --convert-to: pdf:writer_pdf_Export filename.docx
soffice is the command I have that works, as the .exe. I'm navigating to where the .exe is and trying to run it there. I'm getting the following error.
C:\Program Files\LibreOffice\program> soffice convert-to pdf C:\Users\mwolfe\OneDrive - Company Inc\doc_converter\test_file.doc
LibreOffice 6.1.4.2 9d0f32d1f0b509096fd65e0d4bec26ddd1938fd3
Error in option: -
Usage: soffice [argument...]
argument - switches, switch parameters and document URIs (filenames).
Using without special arguments:
Opens the start center, if it is used without any arguments.
{file} Tries to open the file (files) in the components
suitable for them.
{file} {macro:///Library.Module.MacroName}
Opens the file and runs specified macros from
the file.
Getting help and information:
--help | -h | -? Shows this help and quits.
--helpwriter Opens built-in or online Help on Writer.
--helpcalc Opens built-in or online Help on Calc.
--helpdraw Opens built-in or online Help on Draw.
--helpimpress Opens built-in or online Help on Impress.
--helpbase Opens built-in or online Help on Base.
--helpbasic Opens built-in or online Help on Basic scripting
language.
--helpmath Opens built-in or online Help on Math.
--version Shows the version and quits.
--nstemporarydirectory
(MacOS X sandbox only) Returns path of the temporary
directory for the current user and exits. Overrides
all other arguments.
General arguments:
--quickstart[=no] Activates[Deactivates] the Quickstarter service.
--nolockcheck Disables check for remote instances using one
installation.
--infilter={filter} Force an input filter type if possible. For example:
--infilter="Calc Office Open XML"
--infilter="Text (encoded):UTF8,LF,,,"
--pidfile={file} Store soffice.bin pid to {file}.
--display {display} Sets the DISPLAY environment variable on UNIX-like
platforms to the value {display} (only supported by a
start script).
User/programmatic interface control:
--nologo Disables the splash screen at program start.
--minimized Starts minimized. The splash screen is not displayed.
--nodefault Starts without displaying anything except the splash
screen (do not display initial window).
--invisible Starts in invisible mode. Neither the start-up logo nor
the initial program window will be visible. Application
can be controlled, and documents and dialogs can be
controlled and opened via the API. Using the parameter,
the process can only be ended using the taskmanager
(Windows) or the kill command (UNIX-like systems). It
cannot be used in conjunction with --quickstart.
--headless Starts in "headless mode" which allows using the
application without GUI. This special mode can be used
when the application is controlled by external clients
via the API.
--norestore Disables restart and file recovery after a system crash.
--safe-mode Starts in a safe mode, i.e. starts temporarily with a
fresh user profile and helps to restore a broken
configuration.
--accept={UNO-URL} Specifies an UNO-URL connect-string to create an UNO
acceptor through which other programs can connect to
access the API. UNO-URL is string the such kind
uno:connection-type,params;protocol-name,params;ObjectName.
--unaccept={UNO-URL} Closes an acceptor that was created with --accept. Use
--unaccept=all to close all open acceptors.
--language={lang} Uses specified language, if language is not selected
yet for UI. The lang is a tag of the language in IETF
language tag.
Developer arguments:
--terminate_after_init
Exit after initialization complete (no documents loaded).
--eventtesting Exit after loading documents.
New document creation arguments:
The arguments create an empty document of specified kind. Only one of them may
be used in one command line. If filenames are specified after an argument,
then it tries to open those files in the specified component.
--writer Creates an empty Writer document.
--calc Creates an empty Calc document.
--draw Creates an empty Draw document.
--impress Creates an empty Impress document.
--base Creates a new database.
--global Creates an empty Writer master (global) document.
--math Creates an empty Math document (formula).
--web Creates an empty HTML document.
File open arguments:
The arguments define how following filenames are treated. New treatment begins
after the argument and ends at the next argument. The default treatment is to
open documents for editing, and create new documents from document templates.
-n Treats following files as templates for creation of new
documents.
-o Opens following files for editing, regardless whether
they are templates or not.
--pt {Printername} Prints following files to the printer {Printername},
after which those files are closed. The splash screen
does not appear. If used multiple times, only last
{Printername} is effective for all documents of all
--pt runs. Also, --printer-name argument of
--print-to-file switch interferes with {Printername}.
-p Prints following files to the default printer, after
which those files are closed. The splash screen does
not appear. If the file name contains spaces, then it
must be enclosed in quotation marks.
--view Opens following files in viewer mode (read-only).
--show Opens and starts the following presentation documents
of each immediately. Files are closed after the showing.
Files other than Impress documents are opened in
default mode , regardless of previous mode.
--convert-to OutputFileExtension[:OutputFilterName]
[--outdir output_dir] [--convert-images-to]
Batch convert files (implies --headless). If --outdir
isn't specified, then current working directory is used
as output_dir. If --convert-images-to is given, its
parameter is taken as the target MIME format for *all*
images written to the output format. If --convert-to is
used more than once, the last value of OutputFileExtension
[:OutputFilterName] is effective. If --outdir is used more
than once, only its last value is effective. For example:
--convert-to pdf *.odt
--convert-to epub *.doc
--convert-to pdf:writer_pdf_Export --outdir /home/user *.doc
--convert-to "html:XHTML Writer File:UTF8" *.doc
--convert-to "txt:Text (encoded):UTF8" *.doc
--print-to-file [--printer-name printer_name] [--outdir output_dir]
Batch print files to file. If --outdir is not specified,
then current working directory is used as output_dir.
If --printer-name or --outdir used multiple times, only
last value of each is effective. Also, {Printername} of
--pt switch interferes with --printer-name.
--cat Dump text content of the following files to console
(implies --headless). Cannot be used with --convert-to.
--script-cat Dump text content of any scripts embedded in the files to console
(implies --headless). Cannot be used with --convert-to.
-env:<VAR>[=<VALUE>] Set a bootstrap variable. For example: to set
a non-default user profile path:
-env:UserInstallation=file:///tmp/test
I can't figure out what I'm doing wrong.
You need to add -- for commands and quotes around paths
first ensure you can open the doc from the command line. start with:
"C:\Program Files\LibreOffice\program\soffice.exe" "C:\Users\mwolfe\OneDrive - Company Inc\doc_converter\test_file.doc"
Then try the export command:
"C:\Program Files\LibreOffice\program\soffice.exe" --convert-to pdf "C:\Users\mwolfe\OneDrive - Company Inc\doc_converter\test_file.doc"
However with the latest libre office I'm not able to get any conversion working for me in windows command line.

VBA: set txt to be printed in Portrait mode

this time I'm fighting against a .txt file which doesn't want to be (programmatically) set to be printed in Portrait-mode instead of Landscape-mode (which is the default apparently).
Thing is I know how to do that with application like Word or Excel, but sadly enough I'm working on a device that has no Office at all.
I'm not providing any code at all since my problem is pretty straightforward, and I think I need a simple command in order to solve it. What I basically (programmatically) do in my subroutine is:
Open the file as #1 (I know this appears so '80, but I don't want to modify an up-and-running system, potentially having errors show up)
Write text to the file
Close #1
Save the file
Call text editor shell to show the file to the user
How can I then automatically set the print format to Portrait?
P.s.= I do not have the possibility to insert a userform or an object to print the txt file in "special ways", the user has to print the file from txt editor itself (wordpad just in case)
First to state the obvious: there are no print settings stored in text files (or indeed anything else except for the text). Print settings would be controlled within whatever you are using to print - in this case Notepad or Wordpad.
There are only very limited command line switches for Notepad and Wordpad, which unfortunately don't include page setup. In theory you may be able to automate setting portrait using SendKeys (see here and here) but if it is possible at all it's likely to be difficult and unreliable (focus and timing are two issues).
I can't see a good way round this within the parameters of your question. Adding an object within your application would probably have been the best solution. You might try looking for an alternative text editor you could install that is easier to automate. The only other alternative might be to set defaults within the printer drivers and hope that those stick when the user opens Notepad.

How to convert XDP to PDF in Adobe LiveCycle ES3 via HTTP REST request

I have: LiveCycle server (ES3, JBOSS), Workbench, Designer.
Using LC Desginer I convert PDF to XDP - it's template now.
Now I need to convert that XDP file to PDF.
So, I guess I should somehow call LiveCycle server by HTTP request, in body of this request I can send body of XDP document. All what I need from LC is just PDF.
Looks like simple task, but I can't find ANY information how to do this. I see a lot of examples how to do this in Java, but I don't need Java, I need to do it via HTTP (REST endpoint or SOAP if it's not possible).
Maybe I need to create some "application" in Workbench? If so, is there any step-by-step documentation? Or maybe somebody can explain me, how to do this. Maybe there is already build-in application in ES3 Server - I think it's very common and simple case.
UPD: I've opened job at Odesk for this issue, I promise to post solution here to share knowledge with community
As was promised, here is how to solve this issue:
It's not enough just put PDF into LiveCycle Designer. You exactly need to design form in LC Designer. You can use your PDF as template, but for all things which you want to fulfil by your custom data, you need to add objects in LC Designer (take a look at the "Insert" menu, try Table or Text Field) and add Data Connection in the "Data View" tab. I think it's pretty easy step for professionals, but it can take some time to get by beginners. Save results of your work as, for instance, Template.xdp file.
Also you now have the example of XML file - source of the data. Let's name it DataSource.xml
Now we can install LiveCycle Server. Best for LC ES3 is RHEL 5.5 (we spent around 2 days just to find correct combination of OS and settings). You'll need a clever system-administrator (or just experienced in Adobe LiveCycle :))
Server is working now, you can see web-interface, so let's create application in Adobe LiveCycle Workbench ES3. Add an application with new name and add a process to that application. It will take to many words to describe all steps of process, just take a look at screenshots of result (and notice variables also):
Now most easy part - call this app by the HTTP request. But we can't just send usual POST request to Adobe LiveCycle :) We have to send content of 2 files (Template.xdp and DataSource.xml) as multipart/form-data and names of the parts are the names of input variables (in my example is xmlTemplate and xmlData). And don't forget Authorization header with Basic authorization credentials.
In the Response you'll receive base64-encoded body of the PDF document.
Thanks to Thierry Stortenbeker for this application and for help and patience.
Yes, you have to create an LC application using workbench. Here is how to do it:
Create a new application in workbench using File -> New -> Application.
Create a new Process using the right click menu at the application.
Drop in renderPDF form activity from the activity and name it "renderPDFForm".
Select renderPDF form activity to add variables using the bottom pane of variables.
Add a variable of "Document" type and name it "inputXDP". We will use this to pass xdp file. Mark it to be "input" variable.
Add a variable of "Document" type and name it "outPDF". Mark it to be "output" variable.
Now double click renderPDFForm activity, this will open a property editor on the left side.
Expand "Input" section if not expanded already. Make sure "Form" to be picked up from variable. Then choose "inputXDP" from the dropdown.
Expand "Output" section if not expanded already. Make sure "Rendered Form" to be picked up from variable. Then choose "outPDF" variable from the dropdown.
Now deploy your application by right click menu on application.
That is it. You are ready to go. Now save the process and double click "default start point" to get the rest url where this service would be exposed. The rest url should look like http://localhost:8080/rest/services/RestFormRender/renderForm:1.0. Here RestFormRender is name of the application and renderForm is name of the process. Now make a GET/POST call to this REST url and specify XDP bytes in "inputXDP" request parameter.