I can easily convert a pdf to an odt file using:
soffice --infilter="writer_pdf_import" --convert-to odt a.pdf
But when I try to do:
soffice --infilter="writer_pdf_import" --convert-to odg a.pdf
I get an error:
no export filter
TL;DR the answer is at the bottom but do read the following as to why there can be issues
ODG is a multi-part graphics file usually a blank template, often similar to an ORA, however there are many ways they can be structured and converted TO a set of PDF page printouts, as they contain thumbnails, plus one or more high resolution images or scalable vector graphics. Common variants can be used with Inkscape, Krita possibly Scribus / OODraw and other more GRAPHIC apps.
PDF is a page document output format thus not a suitable candidate for converting to professional images with scalar graphics. *Except see the last comment
ODG or ORA may be done well in image conversion but the reverse is not usually true.
Open Office Graphic is like a DocX, a zip wrapper around a core object, here it is a Jpeg but could be PNG SVG etc.
However the contents of the zip are not simple potentially running to thousands of lines of coding. Thus you need to use a more appropriate method to hand build an ODG not simple command line conversion from cruder PDF.
The real strength of a EXPORT from draw as PDF is the hybrid use of embedding ODFG content thus opening such a PDF you can edit it in Draw.
And it will look just as good in any PDF viewer. However it is too specialist to be simply translated without the app settings. In reality the PDF is the chimera/polyglot ODG.
But if you wish to try with simple files the command line is for a.pdf to a.odg
soffice --infilter="draw_pdf_import" --convert-to odg a.pdf
Related
I am looking for a rough overview of how one would go about embedding graphics (coming from a PDF file) into another PDF file when writing a C++ document processor.
Background: I work on the LilyPond music typesetter, and recently added Cairo output to the system. Now I would like to support adding externally provided graphics to the PDF files that we generate (eg. adding a logo onto page laid out). This is trivial with EPS for PS output.
I can see how you could hook up Poppler to read the PDF, and render the PDF contents onto a Cairo surface, but I wonder if there is a simpler shortcut (eg. embed the PDF file as a binary stream, and then point directly to that stream).
If you need to go via an external route, like reading the PDF and writing it into an existing PDF using Cairo, that would be simpler. To do it manually:
A PDF page consists of a stream of operators for drawing it, and a dictionary of external resources (fonts, images etc.). To stamp one PDF page onto another, you would need to:
a) Find all objects for external resources in the stamp which are needed, and add them to the destination PDF.
b) Convert the page to a "Form Xobject", which is a sort of reusable piece of content. Add this to the /XObjects entry in the destination page, making sure to pick a fresh name.
c) Add some operators to the page content in the destination page to invoke the new xobject
To see how this might work, you could play with -stamp-as-xobject and -postpend-content "/XObjName Do" from section 8.4 of the cpdf manual.
Making this work for arbitrary PDFs is really not for the faint of heart, I'm afraid.
when I convert pdf to image in linux command line, it seems inkscape gets the best result (better quality than gs with same dpi). Unfortunately, it only converts the first page to png. How to convert every pdf page to different png file? Do I have to extract one PDF page and store to a new pdf file , then do inkscape concert, and so on?
This isn't solely using Inkscape, but you could use e.g. pdftk to split up the pdf-file into separate pages and convert every page into a png with Inkscape. For example, like this:
pdftk file.pdf burst;
l=$(ls pg_*.pdf)
for i in $l; do inkscape "$i" -z --export-dpi=300 --export-area-page --export-png="$i.png"; done
Note that pdftk burst creates pdf-files called pg_0001.pdf, etc., so if you have any files named like that, they'll be overwritten. You can remove them afterwards easily using
rm pg_*.pdf
Lu Kas' answer threw warnings for me without doing the conversion. Probably because I'm running Inkscape 1.1
However, i got it running by replacing some deprecated commands:
inkscape pdfFile.pdf --export-dpi=300 --export-area-page --export-filename=imageFile.png;
For batch processing rather than slowly looping through file by file inkscape has a shell mode for command file scripting. See https://wiki.inkscape.org/wiki/index.php/Using_the_Command_Line#Shell_mode
However like all other #file.txt scripts you need to write a custom text file. and for Windows users run against higher ranking inkscape.com not .exe
Since version 1.0 (currently 1.2) a multipage pdf of contents can be addressed for multiple outputs. for some other examples see https://inkscape.org/doc/inkscape-man.html#EXAMPLES
Commands get replaced over time so currently to export png use --export-type="xxx" to batch export a list of input files to type xxx. Thus in this case --export-type="png"
Also for pdf related inputs and support see https://wiki.inkscape.org/wiki/index.php/Using_the_Command_Line#New_options
For windows users there is a handy batchfile converter here https://gist.github.com/JohannesDeml/779b29128cdd7f216ab5000466404f11
I am exploring tools to convert PDF documents to PDF/A. Ghostscript seems to give out of the box support for such a conversion. One issue seems to be that some true type fonts that are a part of the original PDF document are not converted correctly. If I copy a text from the converted PDF/A document, and paste it in notepad, the copied text appears to be garbled text.
The original document text can be copied to notepad just fine.
I am using the following script:
gswin64 -dPDFA -dBATCH -dNOPAUSE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=FilteredOutput.pdf Filtered1Page.pdf
I have uploaded a sample 1 page source PDF in Google Drive:
SampleInput
A sample output PDF/A document generated from the command is in Google drive here:
SampleOutput
Running the above query on this PDF in a windows machine will reproduce the issue.
Are there any settings / commands make the PDF/A conversion to be handled properly?
Copy and paste from a PDF is not guaranteed. Subset fonts will not have a usable Encoding (such as ASCII or UTF-8), in which case they will only be amenable to cut/paste/search if they have an associated ToUnicode CMap, many PDF files do not contain ToUnicode CMaps.
Of course, the PDF/A specification states (oddly in my opinion) that you should not use subset fonts, but its not always possible to tell whether a font is subset (not all creators follow the XXXXX+ convention), and even if the font isn't subset there still isn't any guarantee that its Encoding is one that is usable.
Looking at the file you have posted, it does not contain one of the fonts it uses (Arial,Bold) and so Ghostscript substitutes with DroidSansFallback, and the font it does contain (FreeSansBold) is a subset (FWIW this font doesn't actually seem to be used....). The fallback font is a CIDFont, so there is no real prospect of the text being 'correct'.
I believe that if you make a real font available to Ghostscript to replace Arial,Bold then it will probably work correctly. This would also fix the rather more obvious problem of the spacing of the characters being incorrect (in one place, wildly incorrect), which is caused by the fallback font having different widths to the original.
NB as the warning messages have already told you don't use -dUseCIEColor.
The fact that you cannot copy/paste/search a PDF does not mean that it is not a valid PDF/A-1b file though, so thsi does not mean that the creation (NOT conversion) of the PDF/A-1b is not 'proper'.
How can I easily crop a PDF page in a given PDF file? I prefer using as little coding as possible, and guess border geometries as little as possible...
There are several options:
Crop by point-and-click using a GUI front-end:
pdf-quench
krop
briss
PDF scissors
Crop by using the command line:
pdfcrop command (provided by texlive-extra-utils), using the following arguments: pdfcrop --margins '-30 -30 -250 -150' --clip input.pdf output.pdf (-left -top -right -bottom format).
PDFCrop
convert -crop command (provided by imagemagick)
Ghostscript
Crop by writing your own script:
Python
LaTeX
For quick, GUI-aided PDF cropping tasks, try pdfarranger (available in Debian repos, formerly known as PDF-Shuffler).
For precise point-and-click cropping, one option is to use LibreOffice Draw.
The instructions below assume you want to crop part of a single-page PDF:
Start with a blank document
Select the Insert > Image... menu
Navigate to the PDF you wish to crop
The contents of the PDF will show up as an image
Right-click on the PDF content in your document and select the "Crop" menu item.
Use the handles to resize the viewable area of the PDF to the section you want to remain after cropping
Click outside of the PDF to disable the crop handles
Click again on the PDF content to position it however you want by:
Dragging it around the page
Using the arrow keys to move it
Use the Draw positioning tools to align or center the PDF content.
When you're happy with the result, save, export it to PDF, or print it.
For multi-page PDFs, You'll have to work page by page by first splitting the PDF into multiple pages using some other tool like PDF Arranger (or simply "Printing to PDF" each page of the PDF you want to crop in your PDF viewer), cropping them one by one with Draw, then recombining them into a single PDF (using PDF Arranger again).
You could try using the pdfCropMargins Python program (https://pypi.org/project/pdfCropMargins/) with the -pg option to select the particular page. The command-line program offers many options, and also has an optional GUI.
You can use Inkscape to losslessly crop PDFs. This uses Inkscape's built-in SVG-PDF conversion.
Open your file in Inkscape: File -> Open -> select your file -> Open
Resize PDF:
Using user-input values: File -> Document properties -> Page -> Custom size
Using auto resize to content: File -> Document properties -> Page -> Custom size -> Resize page to content... -> set desired margin -> Resize page to drawing or selection
Inkscape is a particularly good option as often PDF crop utilities (such as krop, mentioned in other answers) do not change the actual size of the object, instead adjusting how much of the object (e.g. an A4 page) is displayed.
E.g. from krop homepage:
Unfortunately, there is no simple way to eliminate
unnecessary/invisible parts of a PDF file. krop only adjusts which
parts of a PDF are displayed; the original content is still there in
the file and will, for instance, show up when editing the file in
inkscape
Editing directly in Inkscape does exactly what this says is impossible.
The list of tools provided by #sparkler was interesting, but did not help me very much.
Some of the tools provided, actually cropped my pages, but usually they involved some conversion to an image which made pdf files blurry and hard to read.
In the end I used podofocrop of PoDoFo tools which was able to retain all the graphics at full resolution and the text as real text.
It will crop all pages to the minimal size (i.e. without a border).
The command is: podofocrop input.pdf output.pdf
To install on MacOS use brew install podofo
What are the alternatives to process illustrator files or PDFs into XAML. My Current workflow works like this:
Open the PDF file in Adobe illustrator
Save the file as .ai (Adobe Illustrator) file
Open in Expression Design
Do some processing, mainly separating elements to layers and removing unneeded parts.
Save as XAML
Add XAML to Blend project
My only problem is that this way the text gets converted to paths. I would like to keep my text in XAML as well instead of paths.
Is there any other way to do this, so I keep the text? Any other tools?
I think what you want is to have Glyphs elements instead of Paths.
The problem is that Glyphs elements require you to specify the URI of the font file. Also, Glyphs elements reference glyphs by their index into a font file (it may happen that a converter that generates Glyphs elements - like the Microsoft XPS Document Writer - uses indices into font subset files: so these indices may not be the right indices to the same glyphs as defined in the original font file). I have been able to "solve" this problem in two ways with my own PDF to XAML conversion tools.
1. approach: Embed the font-subset file, BASE64 coded, in the generated XAML code and have the application implement a class that, upon loading, extracts and decodes an embedded font-subset file to a temporary location and hands a valid URI to that temporary file back to the XAML loader.
or, 2. approach: Have most font files already installed along with my application and, again, adding some support by my application that replaces the font name by an URI to the installed font file upon loading of the XAML code. The problem with this second approach is that glyph indices need to be correctly mapped to the installed font file, which may not be all that trivial to do. (You can find a link to an example file that has been generated for this way of loading on my blog: in particular take a peek at the file truncatedcone-xaml.txt)
In short: both solutions require a special PDF to XAML converter and support by the loading application. The reason I wanted to do it this way instead of just having my PDFs converted to Paths only is that my application is a shared whiteboard: thus I want my vector graphics to be as small as possible. (Conversion to paths tends to blow up the XAML code by a factor of 10 or more in most cases).
I am contemplating the implementation of a third approach: this would consist in generating the outline for every glyph that is used only once and then add support by my application to transform and position these glyph outlines in a way closely analogous to what Glyphs elements do that would otherwise have to be generated. The advantage would be that the generated XAML would still be relatively small (comparable to the second approach described above) without requiring the relevant font files to be installed along with the application and without having to map glyph indices from a subset file to the installed font file. The reason I have not yet tried to implement this in earnest is twofold: first, my current (second) approach already works very well for what I currently need; second, there might be performance problems with this third approach as reagards loading and / or rendering.
There's a (free) Adobe Illustrator plugin to export to XAML. Not sure it does exactly what you are looking for, though.
Find it at http://www.mikeswanson.com/XAMLExport/
Well an XPS file is actually a ZIP file. So if you open it with a ZIP-archiver or if you rename its extension to ZIP you can see what is inside. It already contains the pages as XAML code (those files have the form [pagenumber].fpage). However, that XAML code may refer to other files (like raster images and font subset files, those are typically odttf files - basically encrypted true type files) that are included in that ZIP archive as well. Which means, that the XAML code that you find in an XPS document may not be directly usable as pure XAML in your application. I have written python scripts to do the conversion of XAML taken from XPS documents (generated by the Microsoft XPS Document Writer) to get XAML files that my application can load (see approaches 1 and 2 above). I could send you copies of those python scripts (they are not particularly great code, which is no problem for me since I am now using a different approach to convert PDFs to XAML anyway).
#gyurisc: Keeping the font file should work but keeping the text might turn out to be a problem, because, you see, glyphs are not characters. It might be that you could figure out the character by examining the font file that a given glyph is part of, but that would involve parsing the font file. If you are unlucky, your PDF to XPS converter does even not keep enough information in the font subset files to figure out the character a given glyph (very likely) represents.
For example: If I convert a PDF file to XPS with the help of Microsoft's XPS Document Writer, and then try to select a piece of text from that XPS document, I can (only apparently) copy it to the clipboard. However, if I then paste it back into a Word document, I get garbage. Whereas if I select that same piece of text in the original PDF document and paste it into the same Word document, I get reasonably meaningful text. So Microsoft's XPS Document Writer apparently does not care about the interpretation of a "glyph run" as text, and thus it seems very likely to me that the link between the glyph indices that one finds in the generated XPS code and the characters they are meant to represent is already broken at that point. (But, admittedly, that's just a guess.)
A representation of text (as opposed to a run of glyphs) would be a TextBlock element in XAML, I suppose. However, my guess is that a typical PDF to XPS converter is unlikely to generate TextBlock elements. XPS is mainly meant to be rendered - on screen or on paper - it doesn't suggest itself as a file format that is particularly suitable for data exchange (exchange of text in your case).