How to avoid gray outline artefacts when converting an eps image to pdf? - pdf

To generate vector graphics figures with LaTeX labels, I use gnuplot and the cairolatex terminal, creating the image via plot "data.txt" u 1:2:3 matrix with image notitle followed by:
latex figuregen.tex
dvips -E -ofile.eps figuregen
# Correct the bounding box automatically:
epstool --copy --bbox file.eps filename.eps
## Create a pdf:
ps2pdf -dPDFSETTINGS=/prepress -dSubsetFonts=true -dEmbedAllFonts=true -dMaxSubsetPct=100 -dCompatibilityLevel=1.3 -dEPSCrop filename.eps filename.pdf
Here is a zoom on a specific region of the original eps image:
White regions actually correspond to NaN values in the data file.
Now using the pdf file converted from eps:
In the pdf version, there are now unwanted outlines around all the NaN pixels, creating an awful lot of noise in the higher portion of the image.
I want to have these images as pdf, free of artefacts, and preserve high-quality LaTeX labels. I suspect that there might be a ps2pdf option to deactivate this kind of unwanted behaviour but I just cannot find it.
I tried things such as: -dGraphicsAlphaBits=1, -dNOINTERPOLATE, -dALLOWPSTRANSPARENCY, -dNOTRANSPARENCY, -dCompatibilityLevel=1.4 or -dCompatibilityLevel=1.5, but without success.
I also tried fixing this directly in gnuplot, but without success (see e.g. below).
Would any of you know how to solve this issue?
Thank you very much for your time!
EDIT
What's even more surprising and problematic is that these artefacts also appear when printed.
Note however that they do not appear at extreme levels of zoom in evince when only a small part of the data set is plotted.
MWE:
# plot.plt
set size ratio -1
set palette defined ( 0 '#D73027', 1 '#F46D43', 2 '#FDAE61', 3 '#FEE090', 4 '#FFFFD9', 5 '#E0F3F8', 6 '#ABD9E9', 7 '#74ADD1', 8 '#4575B4' )
#set yr [300:0] ### no artefacts if zoom is higher than 1310% in evince
set yr [400:100] ### no artefacts if zoom is higher than 1780% in evince
#set yr [450:0] ### artefacts at all zoom levels if we show more data, or all of it
set term cairolatex dashed color; set output "temp.tex"
plot "data.txt" u 1:2:3 matrix with image notitle
set output #Closes the temporary output file.
!sed -e 's|/Title|%/Title|' -e 's|/Subject|%/Subject|' -e 's|/Creator|%/Creator|' -e 's|/Author|%/Author|' < temp.tex > graph.tex
and, for completeness:
% figuregen.tex
\documentclass[dvips]{article}
\pagestyle{empty}
\usepackage[dvips]{graphicx} %
\begin{document}
\input graph.tex
\end{document}
If needed, part of the data can be found in text form here; enough to reproduce the issue: https://paste.nomagic.uk/?e0343cc8f759040a#DkRxNiNrH6d3QMZ985CxhA21pG2HpVEShrfjg84uvAdt
EDIT 2
In fact, same artefact issues appear when using set terminal cairolatex pdf
set terminal cairolatex standalone pdf size 16cm,10.5cm dashed transparent
set output "plot.tex"
directly with pdflatex
gnuplot<plot.plt
pdflatex plot.tex
(Note, this is using Gnuplot Version 5.2 patchlevel 6).

The actual problem is, that NaN values are set to transparent black pixels (#00000000).
The transparency causes these gray outline artifacts, depending on the zooming level. If you zoom close enough, then you see no artifacts.
But as soon as the image pixels are smaller than your monitor pixels, the values are interpolated for screen display. Its seems that pdf viewers like evince (I tested also okular and mupdf) interpolate both color and alpha channels, so that the alpha value of the Nan pixels is changed, and the underlying black appears as gray border around the color pixels.
I tried several ways. The easiest one, which actually worked for me was to use the tikz terminal with option externalimages which saves images created with image as separate png file.
These png file also contains transparency, and the final result has the same artifacts.
But you can use imagemagick's convert to change the transparent NaN pixels of the png to white with
convert temp.001.png -alpha off -fill white -opaque black temp.001.png
So, a fully working plot file is
# plot.plt
set size ratio -1
set palette defined ( 0 '#D73027', 1 '#F46D43', 2 '#FDAE61', 3 '#FEE090', 4 '#FFFFD9', 5 '#E0F3F8', 6 '#ABD9E9', 7 '#74ADD1', 8 '#4575B4' )
set ytics 100
set yrange reverse
set term tikz standalone externalimages background "white"; set output "temp.tex"
plot "data.txt" u 1:2:3 matrix with image notitle
# temp.001.png is the external image which contains only the 'with image' part
# We must remove the #00000000 color, which represents the NaN pixels
# I couldn't replace the colors directly, but I could first remove the alpha channel
# and then change black to white, because no other black pixel appear
!convert temp.001.png -alpha off -fill white -opaque black temp.001.png
set output #Closes the temporary output file.
!sed -e 's|/Title|%/Title|' -e 's|/Subject|%/Subject|' -e 's|/Creator|%/Creator|' -e 's|/Author|%/Author|' < temp.tex > graph.tex
!pdflatex graph.tex
Mupdf screen shot for graph.pdf:
Note, that I used standalone to be able to directly compile the resulting file, so that I could check the result.
A more cumbersome alternative would be to "manually" plot with image to a png file, and include that in a second plot, like I described in Big data surface plots: Call gnuplot from tikz to generate bitmap and include automatically? Then you can have more influence on how the png is generated.

Just for the records, with image pixels seems to do the "trick" and will create a file without grey surrounding of NaN datapoints. Tested with gnuplot 5.2.6.
plot FILE u 1:2:3 matrix with image pixels notitle
Code:
### avoid shading around NaN datapoints
reset session
set size ratio -1
FILE = "data.txt"
set palette defined ( 0 '#D73027', 1 '#F46D43', 2 '#FDAE61', 3 '#FEE090', 4 '#FFFFD9', 5 '#E0F3F8', 6 '#ABD9E9', 7 '#74ADD1', 8 '#4575B4' )
set term cairolatex dashed color
set output "temp.tex"
plot FILE u 1:2:3 matrix with image pixels notitle
set output
### end of code
Result: (a PNG of a screenshot, since it looks like I cannot add a PDF here)

Related

How to convert a rectangular region from one page of a multipage PDF to PNG? Clipping/Cropping problem

I can convert an entire page of a PDF with ghostscript to PNG, but clipping a rectangular region does not work. Here is what I currently have:
gs -dSAFER -dNOPAUSE -dBATCH -sDEVICE=png16m -r200 -dFirstPage=45 -dLastPage=45 -sOutputFile=outfile.png -q -c 0 0 640 150 rectclip -f infile.pdf
This does convert the entire page 45 to a PNG file, but it does not crop or clip it to the specified region.
Later I found out that with the -g option I can set the size of the resulting PNG file. For example adding -g640x150 will make the output file exactly that size in pixels. It clips the lower left hand corner of the page. And with -c "<> setpagedevice" I can move the clipped rectangle to the right by 100 pixels and up by 200 pixels.
There is one remaining problem. I don't want the clipped area to go beyond the page boundaries. How can I make sure to stay inside the page boundaries?
The clip operator works by appending the current path to the existing clipping path, consequently the size of the clipped area can only be reduced and never expanded.
If the -g option sets a larger size than the page boundaries, then there may be undrawn portions in the final output.

GraphicsMagick crop PDF

I've got a 8.5x11 PDF at 300dpi. It has a single UPC label in the top left corner of the PDF. Imagine that there could be 30 labels on a 1 sheet, but we just have 1 label.
I'm trying to crop the PDF to be just the size of the 1 label. So far I've got this
gm convert -density 300 single.pdf out.pdf
Which doesn't do any cropping. When I crop to say 300x100 it makes a 20MB file with 30000 pages.
I have not a clue how to use -crop to actually crop to the correct size. I need it to be 3.5inches by 1.125 inches.
Using the following input PDF (here converted to a PNG):
the following command will crop the label:
gm wiz.pdf -crop 180x50+1+1 cropped.pdf
This label is sized 180x50 pixels.
For an 8.5x11in PDF at 300 PPI you'd have a 2450x3300 pixels PDF (which I doubt you do, but that's another question) and you'd need to use -crop 1050x337+0+0 (more exactly, 1050x337.5+0+0 -- but you cannot crop half pixels!).
Note, the +0+0 part crops the top left corner. If you need offset to the right by N pixels and to the bottom by M pixels use +N+M...
Using ImageMagick instead...
You could also use ImageMagick's convert command:
convert wiz.pdf[180x50+1+1] cropped.pdf
Comment about image sizes...
One additional comment about this remark:
"I have not a clue how to use -crop to actually crop to the correct size."
There is no other real size for raster images than pixels. ABC pixels wide and XYZ pixels high...
There is no such thing as an absolute, real size for a digital image that you can measure in inches... unless you additionally can state the resolution at which a given image is rendered on a display or a print device!
An 8.50x11in sized image at 300 PPI will translate to 2550x3300 pixels.
However, if your image does not contain this amount of pixels (which is the real, absolute size of any raster image), you may still be able to render it at 300 PPI -- but its size in inches will be different from 8.5x11in!
So, whenever you want to crop, use the absolute number of pixels you want. Don't use resolution/density at all on your command line!

Raw pdf color conversion (with known conversion formula) from RGB to CMYK

This question is related to
Script (or some other means) to convert RGB to CMYK in PDF?
however way more specific. Consider that I am not an expert in print production ;)
Situation: For printing I am only allowed to use two colors, Cyan and Black. The printery requests the final PDF to be in DeviceCMYK with only the Channels C and K used.
pdflatex automatically does that (with the xcolor package) for all fonts and drawn objects, however I have more than 100 sketches/figures in PDF format which are embedded in the manuscript. Due to an admittedly badly designed workflow (late realization that Inkscape cannot export CMYK PDFs), all these figures were created in Inkscape, and thus are RGB PDFs.
However, the only used colors within Inkscape were RGB complements of CMY(K), e.g. 100% Cyan is (0,255,255) RGB and 50% K is (127,127,127) etc.
Problem: I need to convert all these PDF figures from RGB to DeviceCMYK (or alternatively the whole PDF of the final manuscript) with a specific conversion formula.
I did a lot of google research and tried the often suggested ways of using e.g. Ghostscript or various print production tools in Adobe Acrobat, however all of the conversion techniques I found so far wanted to use ICC color profiles or used some other conversion strategy which filled the channels MY and spared some C and K, for example.
I know the exact conversion formula for the raw color numbers from our Inkscape-RGBs to the channels C and K, however I do not know or find any program or tool that allows me to manually specify conversion formulas.
Question: Is there any workflow to convert my PDFs from RGB to C(MY)K manually with my own specific conversion formula for the raw numbers with the converted PDF being in DeviceCMYK using a tool, script or Adobe product?
Due to the large number of figures I would prefer a batched solution which doesn't require too much coding from my side, but if it should be the only solution, I'd also be open minded for a workflow like "load/convert/save" within a program for every single figure or writing a small program with an easy-to-handle C++ PDF API for example.
Limitations and additional info: A different file format (like TikZ figures) is not possible any more since it does not work perfectly and the necessary adaptions to the figures would create too much overhead. A maybe helpful information: Since the figures are created in Inkscape, there are no raster images within the PDFs. I also do not want all figures to be converted to raster images during the color conversion.
Edit:
I have created an example of a RGB PDF-figure created with inkscape.
I also did a manual object-by-object color conversion to a CMYK-PDF with Illustrator, to show how the result should look like. Illustrator stores the axial shading in a DeviceN colorspace with the colors cyan and black, which is close enough^^
Here is an idea, I think it will work if your PDF files are using exclusively the colorspaces DeviceGray, DeviceRGB and DeviceCMYK:
1- Convert all your PDF files to Postscript (with pdf2ps from ghostscript for example)
2- Write a Postscript program that redefines the operators setrgbcolor, setgray and setcolor with your own implementation in the Postscript language, your implementation will internally use setcmykcolor and it will compute the values using your custom formula.
Here is an example for redefining the setgray operator:
% The operator setcmykcolor expects 4 values in the stack
% When setgray is called, we can expect to have 1 value in the stack, we will
% use it for the black component of cmyk by adding 3 zeros and rolling the
% top 4 elements of the stack 3 times
/setgray { 0 0 0 4 3 roll setcmykcolor } bind def
3- Paste your Postcript program at the begining of each resulting ps file from step 1.
4- Convert all your files back to PDF (with ps2pdf for example)
See it in action by saving this piece of code as sample.ps:
/setgray { 0 0 0 4 3 roll setcmykcolor } bind def
0.5 setgray
0 0 moveto
600 600 lineto
stroke
showpage
Convert it to PDF with ghostscript using this command line (I used version 9.14):
gswin64c.exe -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=sample.pdf sample.ps
The resulting PDF will have the following page content:
q 0.1 0 0 0.1 0 0 cm
/R7 gs
10 w
% The K operator is the PDF equivalent of setcmykcolor in postscript
0 0 0 0.5 K
0 0 m
3000 3000 l
S
Q
As you can see, the ps-> pdf conversion will preserve the cmky colors specified in postscript with the setcmykcolor operator.
Maybe you can post your formula as a new question and someone could help you out translating it to postscript.
Since you have access to Illustrator, you might want to try importing the PDF into Illustrator and using Illustrator's scripting capabilities to iterate over the elements and replace fill/stroke RGB colors with their CMYK replacement colors.
The difficulty will be with the shading patterns (Gradients) used in the PDF; if they are imported as GradientColor, then in theory it's a matter of digging into the GradientColor to find the base RGB colors and substitute their CMYK replacement.
A very similar problem was solved using the ActivePDF.dll with C++ (or C#??).

disturbing artifacts in pdf

I'm struggling with a problem when making plots with filledcurves. Between the filled areas, there seems to be a "gap". However, these artifacts do not appear on the print, but depend on the viewer and zoom-options. In Gnuplot I use the eps terminal, the eps-files look great, but the lines appear when I'm converting to pdf. The conversion it either done directly after plotting or when converting the latex-document from dvi to pdf. As most of the documents are here on the display nowadays, this is an issue. The problem also appears when I'm directly using the pdfcairo terminal in Gnuplot, so it's not caused by the conversion (tried epstopdf and ps2pdf) alone.
I attached a SCREENSHOT of a plot displayed in "acroread" (same problem in other pdf-viewers).
Has anybody an idea how to get rid of it but keeping the graphic vectorized?
I just ran into the same issue. Apparently the filling between two curves
is done as a set of polygons that do not exactly touch one another, thus
the thin white lines visible on some PDF viewers.
One way to fix the issue is to draw over these polygon boundaries. First
define min and max functions in gnuplot:
min(x, y) = x < y ? x : y
max(x, y) = x > y ? x : y
Then, assuming that column 1 of "datafile" contains your x values and
that columns 2 and 3 contain the y values of curves 2 and 3, write:
plot "datafile" using 1:2:3 with filledcurves lc rgb "gray", \
"" using 1:2:(min($2, $3)):(max($2, $3)) with yerrorbars ps 0 lt 1 \
lc rgb "gray" lw 0.5
The first plot instruction fills the spaces between the curves in gray.
The second plot instruction draws points of zero size (ps 0) at each
x value (1) on curve (2) with thin (lw 0.5), continuous (lt 1), gray
(lc rgb "gray"), vertical errorbars (yerrorbars) from the lower to
the higher of curves 2 and 3.
This covers the white lines. To get best results you may need to
experiment with the thickness of the bars (e.g., lw 0.6, lw 0.2).
This issue is fixed with gnuplot 5.2, see https://sourceforge.net/p/gnuplot/patches/749/
The actual problem was, that filled curves were previously plotted as many quadrilaterals, which leads to artifacts (white stripes) in many viewers due to antialiasing.
Since version 5.2 filled curves are rendered as single polygon, which prevents these problems (see issue linked above).
The problem is still present in Gnuplot 5.0.4 and at least the cairolatex terminal which I use to output PDFs.
I also wanted to color the area between two curves, in my case defined as functions.
When I used something like
f(x) = 2 + sin(x)
g(x) = cos(x)
plot '+' using 1:(f($1)):(g($1)) with filledcurves closed
I got the same vertical white lines as in the question.
A simple solution for curves where one is always above the other is to let Gnuplot fill the area from the upper curve to the x-axis with the desired color and then paint it over with white from the lower curve downwards:
f(x) = 2 + sin(x)
g(x) = cos(x)
plot f(x) with filledcurves x1, g(x) w filledc x1 fs lc rgb "white"
Apparently this filledcurves style (not between curves but between a curve and an axis) avoids the trapezoid artifacts.
This can be readily extended for plotting data files and multiple stacked cures like in the question. Just paint from top to bottom and finish with white for the empty area between the lowest curve and the x-axis.
For overlapping curves a construction of minimum and maximum curves like in the answer from françois-tonneau might do the trick.
If you're talking about the red and cyan bits the gap could be an illusion caused by the Red + Cyan = White on a RGB screen. Maybe there's no gap, but the border areas appear as white due to the proximity of the pixels.
Take the screenshot and blow it up so you can see the individual pixels around the perceived gap.
If this is the case, maybe selecting a different colour scheme for the adjacent colurs would get rid of the effect. I certainly can't see anything matching your description on anywhere but the red and cyan bits.
From https://groups.google.com/forum/#!topic/comp.graphics.apps.gnuplot/ivRaKpu5cJ8, it seemed to be a pure Gostscript issue.
Using the eps terminal of Gnuplot and converting the eps file to pdf with
epstopdf -nogs <file.eps> -o <file.pdf>
solved the problem on my system. From the corresponding Man page, the "-nogs" option instructs epstopdf not to use Gostscript.

why the '.pdf' figure produced by gnuplot is large than matlab?

I used gnuplot to plot a stacked bar figure, the produced .pdf file is 50k. But the figure produced by matlab is 1-9k usually.
When I inserted the gnuplot-produced pdf figure in latex (pdflatex), and open the paper, the figure seems need re-produced (the stacked columns show one by one).
here is the command
----+
set style histogram rowstacked#errorbars gap 1 lw 3#clustered
set style data histograms
set bar
set term postscript eps color "Helvetica" 24
set output "file.eps"
plot
---+
the size of the output 'file.eps' is 26k. the size of 'file.pdf' produced by epstopdf is 65k.
Thank you.