.png and .eps files generated by matplotlib won't print on RHEL 5 - matplotlib

I'm using matplotlib.pyplot to plot some data, and after running plt.show() I save the image as either a PNG image or encapsulated postscript.
When I open these saved files with evince and try to print them, a job is sent to the printer but nothing is printed. The plots display on the screen with no problems.
Is there something specific I need to do in matplotlib to generate printable PNGs and EPSes? Is this a bug in matplotlib?

Here are a few steps to start debugging this situation:
(1) Stop your cupsd (CUPS daemon).
(2) Change in cupsd.conf: LogLevel debug (instead of LogLevel info.
(3) Delete your log file /var/log/cups/error_log.
(4) Start your cupsd again.
(5) Print your problem EPS.
(6) Check your cupsd.conf logfile for errors.
Report the errors here. (Errors may be recognized by the prefixed E on each line.) Also, what is the result of the Auto-typing step?
Alternatively: can you provide a link to a sample of your problem EPS?

Related

org-mode inline images not working (remotely with TRAMP)

I am working with emacs org-mode on a remote machine using TRAMP. I connect code cells to a jupyter server (on that remote machine) where I start a python 3 kernel. Code execution works perfectly fine, I can also create plots with matplotlib. While a .png is generated in the right temp file location, the output of the code cell is a (relative) link to the file without displaying it inline as expected.
An example code-block look like this:
#+BEGIN_SRC jupyter-python :session /jpy:localhost#9090:TEST
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(range(10), range(10))
#+END_SRC
#+RESULTS:
:RESULTS:
| <matplotlib.lines.Line2D | at | 0x7f1c43a289a0> |
[[file:./.ob-jupyter/e1eecf5d59de9bfa1d3468867a64aadf4b1a6261.png]]
:END:
C-c C-x C-v gives the message: 'No images to display inline'
C-c o opens and displays the file in a different buffer correctly.
I would expect the file to display correctly inline in the org-mode buffer.
I tried to change the link manually to a TRAMP path, looking something like this:
[[file:/ssh:MYSERVER:/PATH_TO_TEMP_FOLDER/.ob-jupyter/e1eecf5d59de9bfa1d3468867a64aadf4b1a6261.png]]
which also allows me to open the file with C-c o, but won't display the file inline (Same behavior as described above)
If I open the jupyter-repl session directly and type fig (after executing the above code block) The figure displays as expected in the jupyter-repl buffer
If I run the jupyter session locally, inline plotting works as expected
Update:
I realized if I C-f on the link to open the file, the link expands to an invalid tramp link, which throws the following error message:
File is missing: /ssh:bih:/PATH_TO_CORRECT_FOLDER/00_test/file:./.ob-jupyter/
Note the file:./ at the end of the link that doesn't belong there. So I think something is going wrong somewhere between TRAMP and org-mode (or emacs-jupyter). Any ideas how to fix this?
C-h v org-display-remote-inline-images says:
org-display-remote-inline-images is a variable defined in ‘org.el’.
Its value is ‘skip’
How to display remote inline images.
Possible values of this option are:
skip Don’t display remote images.
download Always download and display remote images.
cache Display remote images, and open them in separate buffers
for caching. Silently update the image buffer when a file
change is detected.
Check the value and maybe customize the variable to do something other than skip.

Crop PDF Content

I have a pdf that I would like to impose. It has 8.5x11" pages, media box, and crop box. I want the pdf to have 17x11" pages, by merging adjacent pages. Unfortunately, most pages have content either completely outside or straddling the crop box. Because each page can only have a single stream and crop box, when imposed, the overlapping content becomes visible. This is bad.
I don't want to rasterize my pdf because that would fix the DPI ahead-of-time. So I won't consider exporting pages as images, appending the images (imagemagick), then embedding these paired images into a new pdf.
I've also had problems imposing in postscript - issues with transparency, font rasterization, and other visual glitches during the pdf->ps->pdf conversions.
The answer should be scriptable.
So far I've tried:
podofo imposition scripts (lua)
PyPDF2 (python)
ghostscript
latex
The question "Ghostscript removes content outside the crop box?" suggests that ghostscript's pdfwrite module, when generating an output pdf file, will rasterize and crop content according to the crop box. So I'd only have to pipe my pdf through ghostscript's pdfwrite module. Unfortunately, this doesn't work.
I was about to give up when I tried printing the pdf to another pdf through evince. It works perfectly - text & vector elements within the crop box are not rasterized, and elements outside the crop box are removed (I haven't tested straddling elements yet). The quality is high - resolution (page size) and appearance are identical. In fact, everything seems to be the same except for the metadata.
So:
the question is possible
the answer already exists
How can I access it?
I think this functionality might be provided by cup's pdftopdf binary. I don't have any problems calling an external binary.... but can't figure out how to use pdftopdf.
Edit: Link to test pdf. It contains raster, vector, and text items - some partially occluded by partially transparent items - that span as well as abut adjacent pages. Once again, printing this PDF through cups appears to crop all content outside the crop box. However, opening the filtered pdf in inkscape shows that the off-page items are individually masked, not cropped - except text, which is trimmed.
The trick is to use Form XObjects to impose multiple pages within a single page. Form XObjects can reference entire PDF pages, and maintain independent clips. PyPDF2 doesn't support Form XObjects, so merging unifies the stream of all input pages such that they share the clip/media box of the output page. I've been successful in using both pdflatex and pdfrw (python) - test programs are inlined below. Since Form XObjects are derived from a similar postscript level 2 feature, as suggested by KenS it should be possible to achieve the same goal in ghostscript using "page clips". In fact he shared a ghostscript 2x1 imposition script in another answer, but it appears horrendously complicated. Combined with the font rasterization issues of poppler's pdftops (even with compatibility level > 1.4), I've abandoned the ghostscript approach.
Latex script derived from How to stitch two PDF pages together as one big page?. Requires pdflatex:
\documentclass{article}
\usepackage{pdfpages}
\usepackage[paperwidth=8.5in, paperheight=11in]{geometry}
\usepackage[multidot]{grffile}
\pagestyle{plain}
\begin{document}
\setlength\voffset{+0.0in}
\setlength\hoffset{+0.0in}
\includepdf[ noautoscale=true
, frame=false
, pages={1}
]
{<file.pdf>}
\eject \paperwidth=17in \pdfpagewidth=17in \paperheight=11in \pdfpageheight=11in
\includepdf[ nup=2x1
, noautoscale=true
, frame=false
, pages={2-,}
]
{<file.pdf>}
\end{document}
pdfrw (python script) derived from pdfrw:examples:booklet. Requires pdfrw >= 0.2:
#!/usr/bin/env python3
# Copyright:
# Yclept Nemo
# 2016
# License:
# GPLv3
import itertools
import argparse
import pdfrw
# from itertool recipes in the python documentation
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return itertools.zip_longest(*args, fillvalue=fillvalue)
def pagemerge(page, *pages):
merged = pdfrw.PageMerge() + page
for page in reversed(list(itertools.takewhile(lambda i: i is not None, reversed(pages)))):
merged = merged + page
merged[-1].x = merged[-2].x + merged[-2].w
return merged.render()
parser = argparse.ArgumentParser(description='Impose PDF files using Form XOBjects')
parser.add_argument\
( "source"
, help="PDF, source path"
, type=pdfrw.PdfReader
)
parser.add_argument\
( "-s", "--spacer"
, help="PDF, spacer path"
, type=lambda fp: next(iter(pdfrw.PdfReader(fp).pages), None)
)
parser.add_argument\
( "target"
, help="PDF, target path"
)
args = parser.parse_args()
pages = args.source.pages[:1]
for pair in grouper(args.source.pages[1:], 2):
assert pair[0] is not None
pages.append(pagemerge(pair[0], args.spacer, pair[1]))
# include metadata in target
target = pdfrw.PdfWriter()
target.addpages(pages)
target.trailer.Info = args.source.Info
target.write(args.target)
Some idiosyncrasies as of pdfrw 0.2:
Note that the operations +=, append and extend are not defined for pdfrw.PageMerge, even though it behaves like a list. Furthermore + acts like += in that it modifies the left-hand-side object.
Ghostscript and the pdfwrite device do not, in general, rasterise the content of input PDF files (the caveat is for cases involving transparent input and the output being < PDF 1.4).
Object which are entirely clipped out are not preserved into the output.
So the short answer is that this should be entirely feasible using Ghostscript and the pdfwrite device, with the advantage that its possible to impose the pages as well in a single operation. I do have an open bug report about clipping in a similar situation (reverse imposition) but have not yet had time to address it.
Note that Ghostscript normally uses the MediaBox for the clip region, if you want to use the CropBox then you need to add -dUseCropBox to the command line.

Inkscape "PDF + Latex" export

I'm using inkscape to produce vector figures, save them in SVG format to export them later as "PDF + Latex" much in the vein of TUG inkscape+pdflatex guide.
Trying to produce a simple figure, however, turns out to be extremely frustating.
The first figure
is an example of the figure I would like to export in the form of "PDF + Latex" (shown here in PNG format).
If I export this to a PDF figure without latex macros the PDF produced looks exactly the same, except for some minor differences with the fonts used to render the text.
When I try to export this using the "PDF + Latex" option the PDF file produced consists on a PDF document of 2 pages (again as .png here):
This, of course, does not looks good when compiling my latex document. So far the guide at TUG has been very helpful, but I still can't produce a working "PDF + Latex" export from inkscape.
What am I doing wrong?
I worked around this by putting all the text in my drawing at the top
select text and then Object -> Raise to top
Inkscape only generates the separate pages if the text is below another object.
I asked this question on the Inkscape online discussion page and got some very helpful guidance from one of the users there.
This is a known bug https://bugs.launchpad.net/ubuntu/+bug/1417470 which was inadvertently introduced in Inkscape 0.91 in an attempt to fix a previous bug https://bugs.launchpad.net/inkscape/+bug/771957.
It seems this bug does two things:
The *.pdf_tex file will have an extra \includegraphics statement which needs to be deleted manually as described in the link to the bug above.
The *.pdf file may be split into multiple pages, regardless of the size of the image. In my case the line objects were split off onto their own page. I worked around this by turning off the text objects (opacity to zero) and then doing a standard PDF export.
If you can execute linux commands, this works:
# Generate the .pdf and .pdf_tex files
inkscape -z -D --file="$SVGFILE" --export-pdf="$PDFFILE" --export-latex
# Fix the number of pages
sed -i 's/\\\\/\n/g' ${PDFFILE}_tex;
MAXPAGE=$(pdfinfo $PDFFILE | grep -oP "(?<=Pages:)\s*[0-9]+" | tr -d " ");
sed -i "/page=$(($MAXPAGE+1))/,\${/page=/d}" ${PDFFILE}_tex;
with:
$SVGFILE: path of the svg
$PDF_FILE: path of the pdf
It is possible to include these commands in a script and execute it automatically when compiling your tex file (so that you don't have to manually export from inkscape each time you modify your svg).
Try it with an illustration that is less wide.
Alternatively, use a wider paperwidth setting.

Convert PDF text into outlines?

Does anybody know a way to vectorize the text in a PDF document? That is, I want each letter to be a shape/outline, without any textual content. I'm using a Linux system, and open source or a non-Windows solution would be preferred.
The context: I'm trying to edit some old PDFs, for which I no longer have the fonts. I'd like to do it in Inkscape, but that will replace all the fonts with generic ones, and that's barely readable. I've also been converting back and forth using pdf2ps and ps2pdf, but the font info stays there. So when I load it into Inkscape, it still looks awful.
Any ideas? Thanks.
To achieve this, you will have to:
Split your PDF into individual pages;
Convert your PDF pages into SVG;
Edit the pages you want
Reassemble the pages
This answer will omit step 3, since that's not programmable.
Splitting the PDF
If you don't want a programmatic way to split documents, the modern way would be with using stapler. In your favorite shell:
stapler burst file.pdf
Would generate {file_1.pdf,...,file_N.pdf}, where 1...N are the PDF pages. Stapler itself uses PyPDF2 and the code for splitting a PDF file is not that complex. The following function splits a file and saves the individual pages in the current directory. (shamelessly copying from the commands.py file)
import math
import os
from PyPDF2 import PdfFileWriter, PdfFileReader
def split(filename):
with open(filename) as inputfp:
inputpdf = PdfFileReader(inputfp)
base, ext = os.path.splitext(os.path.basename(filename))
# Prefix the output template with zeros so that ordering is preserved
# (page 10 after page 09)
output_template = ''.join([
base,
'_',
'%0',
str(math.ceil(math.log10(inputpdf.getNumPages()))),
'd',
ext
])
for page in range(inputpdf.getNumPages()):
outputpdf = PdfFileWriter()
outputpdf.addPage(inputpdf.getPage(page))
outputname = output_template % (page + 1)
with open(outputname, 'wb') as fp:
outputpdf.write(fp)
Converting the individual pages to SVG
Now to convert the PDFs to editable files, I'd probably use pdf2svg.
pdf2svg input.pdf output.svg
If we take a look at the pdf2svg.c file, we can see that the code in principle is not that complex (assuming the input filename is in the filename variable and the output file name is in the outputname variable). A minimal working example in python follows. It requires the pycairo and pypoppler libraries:
import os
import cairo
import poppler
def convert(inputname, outputname):
# Convert the input file name to an URI to please poppler
uri = 'file://' + os.path.abspath(inputname)
pdffile = poppler.document_new_from_file(uri, None)
# We only have one page, since we split prior to converting. Get the page
page = pdffile.get_page(0)
# Get the page dimensions
width, height = page.get_size()
# Open the SVG file to write on
surface = cairo.SVGSurface(outputname, width, height)
context = cairo.Context(surface)
# Now we finally can render the PDF to SVG
page.render_for_printing(context)
context.show_page()
At this point you should have an SVG in which all text has been converted to paths, and will be able to edit with Inkscape without rendering issues.
Combining steps 1 and 2
You can call pdf2svg in a for loop to do that. But you would need to know the number of pages beforehand. The code below figures the number of pages and does the conversion in a single step. It requires only pycairo and pypoppler:
import os, math
import cairo
import poppler
def convert(inputname, base=None):
'''Converts a multi-page PDF to multiple SVG files.
:param inputname: Name of the PDF to be converted
:param base: Base name for the SVG files (optional)
'''
if base is None:
base, ext = os.path.splitext(os.path.basename(inputname))
# Convert the input file name to an URI to please poppler
uri = 'file://' + os.path.abspath(inputname)
pdffile = poppler.document_new_from_file(uri, None)
pages = pdffile.get_n_pages()
# Prefix the output template with zeros so that ordering is preserved
# (page 10 after page 09)
output_template = ''.join([
base,
'_',
'%0',
str(math.ceil(math.log10(pages))),
'd',
'.svg'
])
# Iterate over all pages
for nthpage in range(pages):
page = pdffile.get_page(nthpage)
# Output file name based on template
outputname = output_template % (nthpage + 1)
# Get the page dimensions
width, height = page.get_size()
# Open the SVG file to write on
surface = cairo.SVGSurface(outputname, width, height)
context = cairo.Context(surface)
# Now we finally can render the PDF to SVG
page.render_for_printing(context)
context.show_page()
# Free some memory
surface.finish()
Assembling the SVGs into a single PDF
To reassemble you can use the pair inkscape / stapler to convert the files manually. But it is not hard to write code that does this. The code below uses rsvg and cairo. To convert from SVG and merge everything into a single PDF:
import rsvg
import cairo
def convert_merge(inputfiles, outputname):
# We have to create a PDF surface and inform a size. The size is
# irrelevant, though, as we will define the sizes of each page
# individually.
outputsurface = cairo.PDFSurface(outputname, 1, 1)
outputcontext = cairo.Context(outputsurface)
for inputfile in inputfiles:
# Open the SVG
svg = rsvg.Handle(file=inputfile)
# Set the size of the page itself
outputsurface.set_size(svg.props.width, svg.props.height)
# Draw on the PDF
svg.render_cairo(outputcontext)
# Finish the page and start a new one
outputcontext.show_page()
# Free some memory
outputsurface.finish()
PS: It should be possible to use the command pdftocairo, but it doesn't seem to call render_for_printing(), which makes the output SVG maintain the font information.
I'm afraid to vectorize the PDFs you would still need the original fonts (or a lot of work).
Some possibilities that come to mind:
dump the uncompressed PDF with pdftk and discover what the font names are, then look for them on FontMonster or other font service.
use some online font recognition service to get a close match with your font, in order to preserve kerning (I guess kerning and alignment are what's making your text unreadable)
try replacing the fonts manually (again pdftk to convert the PDF to a PDF which is editable with sed. This editing will break the PDF, but pdftk will then be able to recompress the damaged PDF to a useable one).
Here's what you really want - font substitution. You want some code/app to be able to go through the file and make appropriate changes to the embedded fonts.
This task is doable and is anywhere from easy to non-trivial. It's easy when you have a font that matches the metrics of the font in the file and the encoding used for the font is sane. You could probably do this with iText or DotPdf (the latter is not free beyond the evaluation, and is my company's product). If you modified pdf2ps, you could probably manage changing the fonts on the way through too.
If the fonts used in the file are font subsets that have creative reencoding, then you are in hell and will likely have all manner of pain doing the change. Here's why:
PostScript was designed at a point when there was no Unicode. Adobe used a single byte for characters and whenever you rendered any string, the glyph to draw was taken from a 256 entry table called the encoding vector. If a standard encoding didn't have what you wanted, you were encouraged to make fonts on the fly based on the standard font that differed only in encoding.
When Adobe created Acrobat, they wanted to make transition from PostScript as easy as possible so that font mechanism was modeled. When the ability to embed fonts into PDFs was added, it was clear that this would bloat the files, so PDF also included the ability to have font subsets. Font subsets are made by taking an existing font and removing all the glyphs that won't be used and re-encoding it into the PDF. The may be no standard relationship between the encoding vector and the code points in the file - all those may be changed. Instead, there may be an embedded PostScript function /ToUnicode which will translate encoded characters to a Unicode representation.
So yeah, non-trivial.
For the folks who come after me:
The best solutions I found were to use Evince to print as SVG, or to use the pdf2svg program that's accessible via Synaptic on Mint. However, Inkscape wasn't able to cope with the resulting SVGs--it entered an infinite loop with the error message:
File display/nr-arena-item.cpp line 323 (?): Assertion item->state & NR_ARENA_ITEM_STATE_BBOX failed
I'm giving up this quest for now, but maybe I'll try again in a year or two. In the meantime, maybe one of these solutions will work for you.

Generated corrupt large ply file - how to find the error

I just wrote a java class to generate meshes from a cylinder list stored to a ply file. I tested the files with a hand generated list of 3 cylinders. The resulting file I can open both in Meshlab and Cloudcompare.
When I use the class in my real program I have to write a mesh for more than 13000 cylinders.
Cloudcompare gives me the following error : Reading error(no access right?)
Meshlab this one : error details, unexptected eof
I already checked if my ply file contains the exact number of vertices and faces defined in the header. I also assured, there are no nan (checked for 'n','a', etc in winedit) values contained.
I can reproduce the errors with my test file from the 3 hand made cylinder file by deleting the last line. But as mentioned earlier, I already checked if the line numbers are correct (might be an empty line not caught by my eyes though, as scrolling down half a million lines is impossible).
So are there any programs available to parse the ply file for errors? Open source tools would be appreciated here. Or are the files just to large? 436302 lines to be exact. I use ascii version of ply.
Found a non open source tool called nugraf, which provides information about the corrupted line numbers.
Java seems to print NAN with '?'. For this char i did not check, so problem seems to be solved and I can debug my java software now again.