How can I convert pdf to asciidoc using pandoc?

How can I convert pdf to asciidoc using pandoc? - pdf

I am trying to convert pdf book to asciidoc document.I have tried the following command:
pandoc -s s.pdf -t asciidoc -o example28.txt
I got "Unknown reader" problem.
q#q-ABRA-A5-V12-1:~/Downloads$ pandoc -s s.pdf -t asciidoc -o example28.txt
pandoc: Unknown reader: pdf
Pandoc can convert to PDF, but not from PDF.
How can I fix this or is there another way to convert from pdf to asciidoc?

Have you tried pdf2txt?
https://pypi.org/project/pdfminer/
It's one of the tools provided there.

Related

Image file path error while using pandoc for markdown to pdf conversion

I have a markdown file with images in the same directory as the .md file, e.g. with ![Image test ](test.png) in the .md file.
I convert them into pdf throughpandoc:
pandoc --standalone --pdf-engine=xelatex file.md -o output.pdf
and I get the following error
[WARNING] Could not fetch resource 'test.png': replacing image with description
However, with the same directory I have no issues while converting .md into .html output. Can someone suggest what is the issue with pandoc?

Create Landscape PDF with Pandoc using wkhtmltopdf

I am trying to create PDF with landscape orientation in Pandoc.
I am using WkHtmlToPdf as a PDF Engine. I chose not to use LaTeX. Here is the command I am using:
pandoc test.md -t html -o test.pdf
But it creates a portrait orientation. How can I create PDF in landscape mode?
Things I have tried without success
pandoc -V geometry:landscape test.md -t html -o test.pdf
pandoc -O landscape test.md -t html -o test.pdf
Please help.
Note: I do not want to use LaTeX as my PDF engine.

Please try this:
pandoc test.md -t html \
--pdf-engine-opt="-O" --pdf-engine-opt="Landscape" \
-o test.pdf
Pandoc doesn't know the -O option, it must be given to wkhtmltopdf. Use --pdf-engine-opts, once for each option or argument that you want pandoc to pass to the pdf engine.
The above was tested and succeeds on:
Ubuntu 20.04
pandoc 2.5
wkhtmltopdf 0.12.5

How to create a PDF/A from command line with Libre Office Draw in headless mode?

LibreOffice Draw allows you to open a non PDF/A file and export this a PDF/A-1b or PDF/A-2b file.
The same is possible from the command line by calling on macOS
/Applications/LibreOffice.app/Contents/MacOS/soffice --headless \
--convert-to pdf:draw_pdf_Export \
--outdir ./pdfout \
./input-non-pdfa.pdf
or an a Linux simply
libreoffice --headless \
--convert-to pdf:draw_pdf_Export \
--outdir ./pdfout \
./input-non-pdfa.pdf
On the command line it is possible to tell the convert-to to create a pdf and use LibreOffice Draw to do this by telling --convert-to pdf:draw_pdf_Export.
Is there also a way to tell LibreOffice to produce a PDF/A document in headless mode?

For PDF/A-1(means PDF/A-1b?):
soffice --headless --convert-to pdf:"writer_pdf_Export:SelectPdfVersion=1" --outdir outdir input.pdf
Change the value from 1 to 2 for PDF/A-2, here is the Libreoffice source code Common.xcs, pdfexport.cxx and pdffilter.cxx.
(Maybe outdated) API/Tutorials/PDF export - Apache OpenOffice Wiki
Python Guide - PDF export filter data - The Document Foundation Wiki
excel->pdf変換 command のdpi設定 - Ask LibreOffice
Change default resolution in batch PNG conversion [closed] - Ask LibreOffice

Since with LibreOffice is possibile only via GUI, for command line solution use gs
First convert pdf to ps
pdftops input.pdf input.ps
Then convert ps to pdf/a archival format of PDF
gs -dPDFA -dBATCH -dNOPAUSE -dNOOUTERSAVE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=input-A.pdf input.ps

How to command line extract svg from pdf using inkscape?

Is there a command line option for asking inkscape to extract svg from pdf page 3 (for example)?
The command I use now is
$ inkscape -f test.pdf -l test.svg
but I would like also the option to export a specific page from this pdf.

What about extracting the page you need with pdftk (or in fact, any other suitable tool) first:
mypage=$(mktemp -u XXXXXX.pdf)
pdftk test.pdf cat 3 output "$mypage"
inkscape -l test.svg "$mypage"
rm "$mypage"
(It would be nice to be able to pipe the output from pdftk directly to inkscape. Unfortunately, when provided from stdin, data are expected by inkscape to be svg. A named pipe doesn't help either, because inkscape seems to attempt to traverse pdf files more than once.)

GhostScript alternative

Im currently using CentOS 5.6 (Ghostscript 8 - ImageMagick-6.2.8 )
and im trying to convert the first image of the pdf to a jpg file.
I understand that my current setup is unable to convert compressed pdf files, but is there an alternative that it can use with the same functionality?

The 'understanding' that Ghostscript is unable to convert 'compressed PDF' is wrong. Where did you pick it up?
PDF by default uses compression internally for most its objects. It's rather unusual to find a PDF 'in the wild' which is completely uncompressed.
Which exact version of Ghostscript are you using? (Try gs -v).
BTW, you do not need ImageMagick to convert (multipage) PDF to a series of JPEGs. Try this command:
gs \
-o img_%03d.jpeg \
-sDEVICE=jpeg \
input.pdf
or, for a resolution of 300 dpi (instead of the default 72 dpi):
gs \
-o img_%03d.jpeg \
-sDEVICE=jpeg \
-r300 \
input.pdf
The _%03d-part of the output filename will attach a 3-digit number to the img-name that increments with each PDF page.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How can I convert pdf to asciidoc using pandoc? - pdf

Have you tried pdf2txt? https://pypi.org/project/pdfminer/ It's one of the tools provided there.

Related

Image file path error while using pandoc for markdown to pdf conversion

Create Landscape PDF with Pandoc using wkhtmltopdf

How to create a PDF/A from command line with Libre Office Draw in headless mode?

How to command line extract svg from pdf using inkscape?

GhostScript alternative

Categories

Resources