I have an SVG file of a bar plot that I need to convert to a PDF. The bar plot was made in matplotlib, saved as a PDF and imported into Inkscape. I used Inkscape to add annotations to the figure and then export it back to a PDF to be used in a final document.
This is what the PDF file looks like going into Inkscape
After adding text elsewhere on the figure and saving as a PDF I get the same plot with these white lines:
These are not your typical PDF render artifacts, rather a closer inspection shows that they have a gradient to them.
I think this is somehow a product of the SVG file. I have used an online SVG-to-PDF converter and the lines are still present. Additionally, I use this method to make all my figure, Matplotlib to Inkscape to PDF and I have not had this issue with any other figures.
I've found that Inkscape does this when you import a bar graph which has a shading type that is not the same as any of the preset Inkscape patterns. I've seen this exact issue when I've imported graphs from R programing language and excel so I don't think it's specific to Matplotlib. I don't know the root cause, however, since I experience this problem a lot I'll share the workaround options I typically employ when I get this issue. One is not necessarily better than another and it depends on the situation which I use.
Option 1) Convert the PDF to a .png bitmap image in some other program, (Gimp, Photoshop, Powerpoint....) then embed the image in Inkscape. Make your changes then export from Inkscape as a PDF. This has the disadvantage that the graph will no longer be a vector map. Use option 2 or 3 to keep it a vector map.
Option 2) Import the pdf into Inkscape, ungroup the pdf object, delete the stripped filling in the bar graph, then recreate the filling using an Inkscape made fill. In the worst cases I've actually made custom bar graph patterns in Inkscape to exactly match the pattern that I had before. This process is a pain.
Option 3) Create shapes that cover over the artifacts, remove border lines from the shapes and use the eye dropper to make them exactly the same color as the good parts.
Like I said these are not an academic understanding of the problem to avoid the problem but I hope it can help you accomplish your task.
Related
I am using python to create a bunch of plots of my data and then Inkscape to layout individual plots and schematics in panels of a master figure. I also add some extra elements such as panel names and titles.
Every time I modify something in my Python code, I have to manually paste the svg/pdf plots into Inkscape. I have noticed that if I create the plots as raster images instead of vectors, I can insert a link in the Inkscape document, and have the figure panel update every time I regenerate the plots in python, which is phenomenal!
Currently I am using this approach with a high DPI, but ideally, I would like to insert a link to a svg/pdf file so that the entire figure is a vector instead of a high DPI, big sized raster. I have seen that it is possible to include pdfs (don't think svgs work) in this manner in Adobe Illustrator .ai-files, and I wonder if there is any way to do it in Inkscape as well?
When I insert image links, Inkscape creates a tag in the svg similar to this
<image
sodipodi:absref="path/to/image.png"
xlink:href="./image.png"
y="14.014872"
x="5.9285898"
id="image12160"
preserveAspectRatio="none"
height="441.91879"
width="466.55328" />
I can modify the path to point to an svg file, but Inkscape will automatically convert the svg to a low resolution raster. If I change the path to a pdf, I get an error. Is there anything I can modify in the svg-code to be able to link pdf/svg-files and have them render in Inkscape as vector files?
I want to create a visualization of a matrix for some academic work. I decided to go about this by having the pixels in the image correspond to the values in the matrix. I created the nice small png that follows:
When properly scaled up, you get a very reasonable image:
This is a screenshot from within inkscape. However, when export this as a pdf, both evince and chrome do a terrible job at upscaling what should be very trivial, and instead I get something that looks like:
The pdf itself seems to scale appropriately well for printing, but unfortunately I do a lot of my editing without printing, and this looks unacceptable. I did find this incredibly old thread about people seeming to have a similar issue with chrome's pdf viewer, and the "solution" was to just upscale the raster graphics. This is a solution, but is terribly inefficient.
Is anyone aware of a way to change the pdf so that it gets upscaled appropriately? Maybe a config change in evince or chrome that will render these properly? Even a nice way to go from a raster image to a vector image might be suitable?
The comments aggregated into an answer...
An image dictionary in a PDF has an (optional) boolean entry Interpolate. It is specified as a flag indicating whether image interpolation shall be performed by a conforming reader.
The program used by the OP to create the PDF, Inkscape, seems to have explicitly set this flag to true. Editing the PDF to unset this flag creates a file which looks as desired by the OP.
(This also is a solution proposed in this Inkscape forum thread eventually found by the OP, which is to save the PDF with high-resolution bitmaps embedded. File -> Inkscape Preferences -> Bitmaps -> Resolution for Create Bitmap Copy, and set it to 6000 dpi)
The fact that interpolation looks different in different viewers and different output media, is by design. The PDF specification states on interpolation:
A conforming Reader may choose to not implement this feature of PDF, or may use any specific implementation of interpolation that it wishes.
A different way to get around this problem (especially as some PDF viewers have the tendency to not really live up to the specification and e.g. interpolate ignoring that flag) would be to use vector graphics here, drawing the bitmap pixels as rectangles. The result should be optimal.
I need to crop a pdf document using the linux shell and then extract the text just in that cropped pdf.
My idea was to crop a pdf using pdfcrop linux tool and then use a txt2pdf text extractor tool to extract the text just in the cropped area, but i've realized that i'm thinking on images, and when i try to do this the result is the same than doing it over the original, not cropped, pdf.
I guess it's a layer problem. As the pdf format works with layers, if i don't "crop" all the layers, the result is gonna include all the information from all the layers, which i don't want.
I would appreciate so much if someone has any idea of how i could do a real "all layers cropping" in a pdf. If its possible or if i should start thinking on another solution.
TY
Its not layers, its the fact that cropping a PDF usually involves simply setting the CropBox, which doesn't alter the actual contents of the PDF (other than the CropBox) at all. Most text extraction code will ignore the CropBox and extract all the text....
You could, with some effort, use Ghostscript to produce a genuinely cropped PDF (though note that partially cropped glyphs will still be included) and then extract the text from that. But that's pretty ugly.
Alternatively Ghostscript and MuPDF can both extract text with co-ordinate information, which may be enough for your needs.
I need to extract vector graphics from a PDF image and import them into GIMP, either as paths or as high-resolution raster images. Specifically, I need to get contour lines from USGS topographical maps and overlay them on satellite images. Any suggestions?
So far I have tried:
--Using GIMP's native PDF importing function to import them as raster images. Problem: To do so at high resolution crashes my computer. Possible solution would be to import only a selected area of a PDF, but as far as I can tell this is not possible.
--Using ImageMagick to convert the PDF to a raster image. Problem: Used with the "-scale" parameter, "convert" appears to rasterize the PDF and then upscale it, leading to a choppy image.
--Using InkScape to extract the necessary vector elements from the PDF. Problem: InkScape freezes when I try to open a moderately large (25 Mb) PDF.
Any other ideas?
Many thanks,
treacl
The option you didn't mention above is to try to use the ghostscript program directly to render your output - ghostscript is used internally by GIMP to import PDF files, so you likely have it installed already.
There are tens of command switches to pass ghostscript for it to render a file into another format - the switches you need to pass are for determining the output size, resolution and which page to print. I didn't find any switch to select a portion of the page to be rendered - so, if your document is a single page, it is possible the generated file will still be to big for GIMP - but you will likely be able to crop it with ImageMagick, at least.
I guess the relevant command line for you would be something along:
gs -dNOPAUSE -dBATCH -sDEVICE=png16m -sOutputFile=page.png -dFirstPage=<pagenumber> -dLastPage=<pagenumber> -r<dpiresolution> -f<filename.pdf>
If the resulting image is still too large to be generated or operated upon, you can try changing the output format to use a smaller color depth (this one is 3 bytes per pixel: png16m) . It should be possible to pass postscript commands to transform the device, so that the area of interest is scaled up to your page size (and the remaining parts are cropped out of the rendering) - that would be the definitive fix for you - but of the top of my head, I don't know how to do that with ghostscript.
Alternatively, you can try passing ImageMagick the -density parameter as suggested in the comments.
I'm trying to use pdf content (mathematics) in my webpage. I basically want to convert the pdf to some vector image. Converting the pdf to swf does the job very well, but as flash isn't supported on every platform, I'm trying to find another solution.
I read about svg, but as those pdf's contain a lot of mathematics, the result of the converters I found is really ugly and incorrect.
I've also thought about retyping the latex, and displaying it using mathjax, in some way this is the best solution, but also very time consuming.
The only thing I want is to convert it to a nice vector image, I don't want to change the content, or anything else. Besides converting to swf or retyping it, is there any other solution ?
Edit:
this is svg output
and here original pdf
The only solution I could find is illustrator.
Just open the pdf, save as svg, and choose to embed all used glyphs.
Result is perfect:
https://dl.dropboxusercontent.com/u/58922976/Sol-10.1.svg
what about using flash + raster image in case of platform without flash, if flash mostly works for you?
Your PDF is a little difficult for reasons that are probably not apparent to you.
The core problem with it is that some of the graphics in the document are actually drawn using custom glyphs. You can see this if you copy and paste the text out of Acrobat. There are a variety of unusual characters in there that don't seem to serve any useful purpose. That's those squares at the bottom of your SVG with EEs and FFs in them.
However these characters are actually custom glyphs for things like the braces around the matrices at the bottom of the page. So they are both fairly important and also very specific to this document.
I tried ABCpdf .NET to convert your PDF to SVG. It worked fine apart from these custom glyphs at the bottom. The output was about 90KB. It looked very similar to your inkscape SVG output but just a bit smaller (the inkscape one is 160KB).
The only way to get rid of these non-Unicode glyphs is to vectorize the text. I did this using ABCpdf and the output looked fine in SVG. But... vectorized text is big and SVG isn't a particularly efficient medium. The output was about 1MB! Zipped it goes down to half that but it's still no-where near as efficient as the original PDF.
The problems I am seeing here are going to be universal whatever format you use. These custom characters are always going to be problematic whether you output to SVG, SWF, HTML canvas, VML or indeed any vector format.
So what would I suggest? Well the obvious vector format that is widely used on the web is... PDF!
I know it's not quite what you're looking for but I think this is the realistic solution given the constraints above. :-)