How to command line extract svg from pdf using inkscape? - pdf

Is there a command line option for asking inkscape to extract svg from pdf page 3 (for example)?
The command I use now is
$ inkscape -f test.pdf -l test.svg
but I would like also the option to export a specific page from this pdf.

What about extracting the page you need with pdftk (or in fact, any other suitable tool) first:
mypage=$(mktemp -u XXXXXX.pdf)
pdftk test.pdf cat 3 output "$mypage"
inkscape -l test.svg "$mypage"
rm "$mypage"
(It would be nice to be able to pipe the output from pdftk directly to inkscape. Unfortunately, when provided from stdin, data are expected by inkscape to be svg. A named pipe doesn't help either, because inkscape seems to attempt to traverse pdf files more than once.)

Related

Create Landscape PDF with Pandoc using wkhtmltopdf

I am trying to create PDF with landscape orientation in Pandoc.
I am using WkHtmlToPdf as a PDF Engine. I chose not to use LaTeX. Here is the command I am using:
pandoc test.md -t html -o test.pdf
But it creates a portrait orientation. How can I create PDF in landscape mode?
Things I have tried without success
pandoc -V geometry:landscape test.md -t html -o test.pdf
pandoc -O landscape test.md -t html -o test.pdf
Please help.
Note: I do not want to use LaTeX as my PDF engine.
Please try this:
pandoc test.md -t html \
--pdf-engine-opt="-O" --pdf-engine-opt="Landscape" \
-o test.pdf
Pandoc doesn't know the -O option, it must be given to wkhtmltopdf. Use --pdf-engine-opts, once for each option or argument that you want pandoc to pass to the pdf engine.
The above was tested and succeeds on:
Ubuntu 20.04
pandoc 2.5
wkhtmltopdf 0.12.5

How to create a PDF/A from command line with Libre Office Draw in headless mode?

LibreOffice Draw allows you to open a non PDF/A file and export this a PDF/A-1b or PDF/A-2b file.
The same is possible from the command line by calling on macOS
/Applications/LibreOffice.app/Contents/MacOS/soffice --headless \
--convert-to pdf:draw_pdf_Export \
--outdir ./pdfout \
./input-non-pdfa.pdf
or an a Linux simply
libreoffice --headless \
--convert-to pdf:draw_pdf_Export \
--outdir ./pdfout \
./input-non-pdfa.pdf
On the command line it is possible to tell the convert-to to create a pdf and use LibreOffice Draw to do this by telling --convert-to pdf:draw_pdf_Export.
Is there also a way to tell LibreOffice to produce a PDF/A document in headless mode?
For PDF/A-1(means PDF/A-1b?):
soffice --headless --convert-to pdf:"writer_pdf_Export:SelectPdfVersion=1" --outdir outdir input.pdf
Change the value from 1 to 2 for PDF/A-2, here is the Libreoffice source code Common.xcs, pdfexport.cxx and pdffilter.cxx.
(Maybe outdated) API/Tutorials/PDF export - Apache OpenOffice Wiki
Python Guide - PDF export filter data - The Document Foundation Wiki
excel->pdf変換 command のdpi設定 - Ask LibreOffice
Change default resolution in batch PNG conversion [closed] - Ask LibreOffice
Since with LibreOffice is possibile only via GUI, for command line solution use gs
First convert pdf to ps
pdftops input.pdf input.ps
Then convert ps to pdf/a archival format of PDF
gs -dPDFA -dBATCH -dNOPAUSE -dNOOUTERSAVE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=input-A.pdf input.ps

How can I convert pdf to asciidoc using pandoc?

I am trying to convert pdf book to asciidoc document.I have tried the following command:
pandoc -s s.pdf -t asciidoc -o example28.txt
I got "Unknown reader" problem.
q#q-ABRA-A5-V12-1:~/Downloads$ pandoc -s s.pdf -t asciidoc -o example28.txt
pandoc: Unknown reader: pdf
Pandoc can convert to PDF, but not from PDF.
How can I fix this or is there another way to convert from pdf to asciidoc?
Have you tried pdf2txt?
https://pypi.org/project/pdfminer/
It's one of the tools provided there.

Convert PDF to PCL using Ghostscript 9.15

Requirement is to convert PDF to PCL with a macro embedded (currently testing this on Windows, however I will need to use this runtime in the application and print it from UNIX). The macro will be used later in another document to embed this cropped image and printed on one single page. I will be using PCL escape codes to call the MacroNumber and then the image will be printed. (You can consider this as a logo image.)
I am able to convert the PDF with whitespace to just the PDF without any whitespace by using CropBox.
"c:\progra~1\gs\gs9.15\bin\gswin64.exe" -o _sourcePDFcropped.pdf \
-sDEVICE=pdfwrite -c "[/CropBox [1 140 320 650] /PAGES pdfmark" \
-f _sourcePDF.pdf
However, when I convert this _sourcePDFcropped.pdf to PCL, this still adding whitespace.
"c:\progra~1\gs\gs9.15\bin\gswin64c.exe" -dBATCH -dNOPAUSE \
-sDEVICE=pxlcolor -g100x200 -sOutputFile=_sourceFedGroundCroppedTest.pcl \
-f _sourceFedGroundCropped.pdf
I tried using MKPCL and it does the job. Because it doesn't have much support, I am trying to use Ghostscript.
MKPCL.EXE -c4 -t -m 100 -p Image.jpg Image.MAC
I also tried ImageMagick which internally uses Ghostscript. So I am guessing, if I use the right switches in GS, I should be able to achieve my goal.
Input PDF File: Click Here
P.S: I have seen other PDF to PCL queries on Stackoverflow, others are more of straight forward PDF to PCL. Mine is to crop the PDF and output should be PCL.
Question continued: Link
I processed the sample input PDF with the following command line, using a self-compiled Ghostscript v9.16 (unreleased, from current GhostPDL GIT sources):
gs -o - \
-sDEVICE=pdfwrite \
-c "[/CropBox [1 140 320 650] /PAGES pdfmark" \
-f source.pdf \
\
| gs -o tst.pcl \
-sDEVICE=pxlcolor \
-dUseCropBox \
-f -
(As you may well have noticed, I'm connecting 2 different Ghostscript commands through a pipe in order to save writing a temporary PDF file to disk.)
If you want to do the same on Windows, the command line in a cmd.exe/DOS box would be:
gswin64c.exe -o - ^
-sDEVICE=pdfwrite ^
-c "[/CropBox [1 140 320 650] /PAGES pdfmark" ^
-f source.pdf ^
^
| gswin64c.exe -o tst.pcl ^
-sDEVICE=pxlcolor ^
-dUseCropBox ^
-f -
Then I opened it with the self-compiled PCL viewer (also from GhostPDL sources), pcl6:
pcl6 tst.pcl
This is a screenshot showing the pcl6 window:
As KenS also pointed out: it is important to use -dUseCropBox when processing the cropped PDF intermediate data!
Adding a CropBox doesn't really do much, it leaves the PDF exactly the same, but adds a CropBox entry for the page. GS will usually use the MediaBox, not the CropBox, so adding a CropBox to a PDF has no effect.
You could try adding -dUseCropBox. If the white space you think is being added is in fact present in the original PDF, but masked by the CropBox, then using -dUseCropBox will have GS use the CropBox when rendering the PDF.

Ghostscript loses font while extracting the page from PDF

I split PDF into pages with help of usable command line:
for G in $(seq 1 $(pdfinfo 47.pdf | sed -n 's/Pages:[^0-9]*\([0-9]*\).*/\1/p')) ; do
gs \
-dSAFER \
-sDEVICE=pdfwrite \
-dBATCH \
-dNOPAUSE \
-dFirstPage=$G \
-dLastPage=$G \
-o $G.pdf \
47.pdf ;
done
But some pages appears without text (Graphics are still present)
So, I have tried to extract embedded font from PDF:
gs -q -dNODISPLAY extractFonts.ps -c "(47.pdf) extractFonts quit"
These fonts I have installed in system Fonts folder.
After that, I have repeat splitting and no changes were happened.
How-to be sure that pages will be extracting correctly, I have no idea now.
Ghostscript and pdfwrite are not actually intended for the purpose of splitting PDF files up, there are other tools which will probably work better, why not try pdftk ?
If you really want to use Ghostscript then I would advise you to get hold of the latest bleeding-edge code from the Git repository, in that code the pdfwrite device will accept an output file name containing a '%d' and will write one file per page.
Beyond that, it seems most likely to me that you are simply experiencing a bug, rather than 'losing the font', if the font was missing the text would still be ther but in a differnt font. Which version of GS are you using ?