Extract pdf_tex graphics from pdf-file - pdf

I created a lot of vector graphics with inkscape, where I add a lot of math-code directly in the image (see example below). I normally extract a pdf_tex out of the svg document, and include the pdf_tex in my LaTeX code. After compiling the file in my TEX-editor, a perfect vector graphic occurs.
My question:
As I would like to make a presentation which uses some images out of my LaTeX-file, is there any chance to extract the compiled pdf_tex images from the compiled pdf-file or maybe directly out of inkscape?

Related

Embedding PDF graphics in PDF output file programmatically

I am looking for a rough overview of how one would go about embedding graphics (coming from a PDF file) into another PDF file when writing a C++ document processor.
Background: I work on the LilyPond music typesetter, and recently added Cairo output to the system. Now I would like to support adding externally provided graphics to the PDF files that we generate (eg. adding a logo onto page laid out). This is trivial with EPS for PS output.
I can see how you could hook up Poppler to read the PDF, and render the PDF contents onto a Cairo surface, but I wonder if there is a simpler shortcut (eg. embed the PDF file as a binary stream, and then point directly to that stream).
If you need to go via an external route, like reading the PDF and writing it into an existing PDF using Cairo, that would be simpler. To do it manually:
A PDF page consists of a stream of operators for drawing it, and a dictionary of external resources (fonts, images etc.). To stamp one PDF page onto another, you would need to:
a) Find all objects for external resources in the stamp which are needed, and add them to the destination PDF.
b) Convert the page to a "Form Xobject", which is a sort of reusable piece of content. Add this to the /XObjects entry in the destination page, making sure to pick a fresh name.
c) Add some operators to the page content in the destination page to invoke the new xobject
To see how this might work, you could play with -stamp-as-xobject and -postpend-content "/XObjName Do" from section 8.4 of the cpdf manual.
Making this work for arbitrary PDFs is really not for the faint of heart, I'm afraid.

How to convert a "pdf" to "odg" file with OpenOffice cmd

I can easily convert a pdf to an odt file using:
soffice --infilter="writer_pdf_import" --convert-to odt a.pdf
But when I try to do:
soffice --infilter="writer_pdf_import" --convert-to odg a.pdf
I get an error:
no export filter
TL;DR the answer is at the bottom but do read the following as to why there can be issues
ODG is a multi-part graphics file usually a blank template, often similar to an ORA, however there are many ways they can be structured and converted TO a set of PDF page printouts, as they contain thumbnails, plus one or more high resolution images or scalable vector graphics. Common variants can be used with Inkscape, Krita possibly Scribus / OODraw and other more GRAPHIC apps.
PDF is a page document output format thus not a suitable candidate for converting to professional images with scalar graphics. *Except see the last comment
ODG or ORA may be done well in image conversion but the reverse is not usually true.
Open Office Graphic is like a DocX, a zip wrapper around a core object, here it is a Jpeg but could be PNG SVG etc.
However the contents of the zip are not simple potentially running to thousands of lines of coding. Thus you need to use a more appropriate method to hand build an ODG not simple command line conversion from cruder PDF.
The real strength of a EXPORT from draw as PDF is the hybrid use of embedding ODFG content thus opening such a PDF you can edit it in Draw.
And it will look just as good in any PDF viewer. However it is too specialist to be simply translated without the app settings. In reality the PDF is the chimera/polyglot ODG.
But if you wish to try with simple files the command line is for a.pdf to a.odg
soffice --infilter="draw_pdf_import" --convert-to odg a.pdf

White gradient artifacts left over after converting an SVG file to PDF

I have an SVG file of a bar plot that I need to convert to a PDF. The bar plot was made in matplotlib, saved as a PDF and imported into Inkscape. I used Inkscape to add annotations to the figure and then export it back to a PDF to be used in a final document.
This is what the PDF file looks like going into Inkscape
After adding text elsewhere on the figure and saving as a PDF I get the same plot with these white lines:
These are not your typical PDF render artifacts, rather a closer inspection shows that they have a gradient to them.
I think this is somehow a product of the SVG file. I have used an online SVG-to-PDF converter and the lines are still present. Additionally, I use this method to make all my figure, Matplotlib to Inkscape to PDF and I have not had this issue with any other figures.
I've found that Inkscape does this when you import a bar graph which has a shading type that is not the same as any of the preset Inkscape patterns. I've seen this exact issue when I've imported graphs from R programing language and excel so I don't think it's specific to Matplotlib. I don't know the root cause, however, since I experience this problem a lot I'll share the workaround options I typically employ when I get this issue. One is not necessarily better than another and it depends on the situation which I use.
Option 1) Convert the PDF to a .png bitmap image in some other program, (Gimp, Photoshop, Powerpoint....) then embed the image in Inkscape. Make your changes then export from Inkscape as a PDF. This has the disadvantage that the graph will no longer be a vector map. Use option 2 or 3 to keep it a vector map.
Option 2) Import the pdf into Inkscape, ungroup the pdf object, delete the stripped filling in the bar graph, then recreate the filling using an Inkscape made fill. In the worst cases I've actually made custom bar graph patterns in Inkscape to exactly match the pattern that I had before. This process is a pain.
Option 3) Create shapes that cover over the artifacts, remove border lines from the shapes and use the eye dropper to make them exactly the same color as the good parts.
Like I said these are not an academic understanding of the problem to avoid the problem but I hope it can help you accomplish your task.

PDFClown image extraction images inverted

I'm working with PDFClown and I'm trying to extract images from a pdf file. I use the example code provided by the source code that can be found at http://pdfclown.org.
ImageExtractionSample.java.
The problem is the images are negative and flipped horizontally. Does anyone know how to resolve this problem?
Check with other PDF files to see if other PDF files are also giving the rotated or flipped images. ImageExtractionSample.java is not checking rotation or matrix defined transformations for the image object but just writes the content to a file as is (so it will work for JPG images but not for CCIT encoded images for example).
So there are things to consider when you extract image from PDF:
image can be rotated using the attached transformation matrix (CTM);
image can be rotated/transformed as part of the form which is transformed;
image can be placed without transformation on a page but the page itself is rotated;
image may contain the overlaid Mask on top of it (and the Mask can be rotated and transformed);
JPG image is stored pretty much as is but there are other formats supported by PDF like CCIT compression, LZW compressed images etc;
But the general suggestion is that when you extract JPG image from PDF using PDFClown you should just flip and rotate extracted images like suggested on the SourceForge project discussion page.
if you could point to the particular PDF sample file then it would be easier to suggest the solution.
If you're on Windows then you may use this free PDF Multitool utility to compare non-transformed and transformed images from PDF using "Extract raw images (without transformation)" option in images extraction dialog.
Disclaimer: I work for ByteScout, the PDF Multitool utility is free for both commercial and non-commercial purposes.

Import vector graphics from PDF to GIMP

I need to extract vector graphics from a PDF image and import them into GIMP, either as paths or as high-resolution raster images. Specifically, I need to get contour lines from USGS topographical maps and overlay them on satellite images. Any suggestions?
So far I have tried:
--Using GIMP's native PDF importing function to import them as raster images. Problem: To do so at high resolution crashes my computer. Possible solution would be to import only a selected area of a PDF, but as far as I can tell this is not possible.
--Using ImageMagick to convert the PDF to a raster image. Problem: Used with the "-scale" parameter, "convert" appears to rasterize the PDF and then upscale it, leading to a choppy image.
--Using InkScape to extract the necessary vector elements from the PDF. Problem: InkScape freezes when I try to open a moderately large (25 Mb) PDF.
Any other ideas?
Many thanks,
treacl
The option you didn't mention above is to try to use the ghostscript program directly to render your output - ghostscript is used internally by GIMP to import PDF files, so you likely have it installed already.
There are tens of command switches to pass ghostscript for it to render a file into another format - the switches you need to pass are for determining the output size, resolution and which page to print. I didn't find any switch to select a portion of the page to be rendered - so, if your document is a single page, it is possible the generated file will still be to big for GIMP - but you will likely be able to crop it with ImageMagick, at least.
I guess the relevant command line for you would be something along:
gs -dNOPAUSE -dBATCH -sDEVICE=png16m -sOutputFile=page.png -dFirstPage=<pagenumber> -dLastPage=<pagenumber> -r<dpiresolution> -f<filename.pdf>
If the resulting image is still too large to be generated or operated upon, you can try changing the output format to use a smaller color depth (this one is 3 bytes per pixel: png16m) . It should be possible to pass postscript commands to transform the device, so that the area of interest is scaled up to your page size (and the remaining parts are cropped out of the rendering) - that would be the definitive fix for you - but of the top of my head, I don't know how to do that with ghostscript.
Alternatively, you can try passing ImageMagick the -density parameter as suggested in the comments.