why the '.pdf' figure produced by gnuplot is large than matlab? - pdf

I used gnuplot to plot a stacked bar figure, the produced .pdf file is 50k. But the figure produced by matlab is 1-9k usually.
When I inserted the gnuplot-produced pdf figure in latex (pdflatex), and open the paper, the figure seems need re-produced (the stacked columns show one by one).
here is the command
----+
set style histogram rowstacked#errorbars gap 1 lw 3#clustered
set style data histograms
set bar
set term postscript eps color "Helvetica" 24
set output "file.eps"
plot
---+
the size of the output 'file.eps' is 26k. the size of 'file.pdf' produced by epstopdf is 65k.
Thank you.

Related

How to avoid gray outline artefacts when converting an eps image to pdf?

To generate vector graphics figures with LaTeX labels, I use gnuplot and the cairolatex terminal, creating the image via plot "data.txt" u 1:2:3 matrix with image notitle followed by:
latex figuregen.tex
dvips -E -ofile.eps figuregen
# Correct the bounding box automatically:
epstool --copy --bbox file.eps filename.eps
## Create a pdf:
ps2pdf -dPDFSETTINGS=/prepress -dSubsetFonts=true -dEmbedAllFonts=true -dMaxSubsetPct=100 -dCompatibilityLevel=1.3 -dEPSCrop filename.eps filename.pdf
Here is a zoom on a specific region of the original eps image:
White regions actually correspond to NaN values in the data file.
Now using the pdf file converted from eps:
In the pdf version, there are now unwanted outlines around all the NaN pixels, creating an awful lot of noise in the higher portion of the image.
I want to have these images as pdf, free of artefacts, and preserve high-quality LaTeX labels. I suspect that there might be a ps2pdf option to deactivate this kind of unwanted behaviour but I just cannot find it.
I tried things such as: -dGraphicsAlphaBits=1, -dNOINTERPOLATE, -dALLOWPSTRANSPARENCY, -dNOTRANSPARENCY, -dCompatibilityLevel=1.4 or -dCompatibilityLevel=1.5, but without success.
I also tried fixing this directly in gnuplot, but without success (see e.g. below).
Would any of you know how to solve this issue?
Thank you very much for your time!
EDIT
What's even more surprising and problematic is that these artefacts also appear when printed.
Note however that they do not appear at extreme levels of zoom in evince when only a small part of the data set is plotted.
MWE:
# plot.plt
set size ratio -1
set palette defined ( 0 '#D73027', 1 '#F46D43', 2 '#FDAE61', 3 '#FEE090', 4 '#FFFFD9', 5 '#E0F3F8', 6 '#ABD9E9', 7 '#74ADD1', 8 '#4575B4' )
#set yr [300:0] ### no artefacts if zoom is higher than 1310% in evince
set yr [400:100] ### no artefacts if zoom is higher than 1780% in evince
#set yr [450:0] ### artefacts at all zoom levels if we show more data, or all of it
set term cairolatex dashed color; set output "temp.tex"
plot "data.txt" u 1:2:3 matrix with image notitle
set output #Closes the temporary output file.
!sed -e 's|/Title|%/Title|' -e 's|/Subject|%/Subject|' -e 's|/Creator|%/Creator|' -e 's|/Author|%/Author|' < temp.tex > graph.tex
and, for completeness:
% figuregen.tex
\documentclass[dvips]{article}
\pagestyle{empty}
\usepackage[dvips]{graphicx} %
\begin{document}
\input graph.tex
\end{document}
If needed, part of the data can be found in text form here; enough to reproduce the issue: https://paste.nomagic.uk/?e0343cc8f759040a#DkRxNiNrH6d3QMZ985CxhA21pG2HpVEShrfjg84uvAdt
EDIT 2
In fact, same artefact issues appear when using set terminal cairolatex pdf
set terminal cairolatex standalone pdf size 16cm,10.5cm dashed transparent
set output "plot.tex"
directly with pdflatex
gnuplot<plot.plt
pdflatex plot.tex
(Note, this is using Gnuplot Version 5.2 patchlevel 6).
The actual problem is, that NaN values are set to transparent black pixels (#00000000).
The transparency causes these gray outline artifacts, depending on the zooming level. If you zoom close enough, then you see no artifacts.
But as soon as the image pixels are smaller than your monitor pixels, the values are interpolated for screen display. Its seems that pdf viewers like evince (I tested also okular and mupdf) interpolate both color and alpha channels, so that the alpha value of the Nan pixels is changed, and the underlying black appears as gray border around the color pixels.
I tried several ways. The easiest one, which actually worked for me was to use the tikz terminal with option externalimages which saves images created with image as separate png file.
These png file also contains transparency, and the final result has the same artifacts.
But you can use imagemagick's convert to change the transparent NaN pixels of the png to white with
convert temp.001.png -alpha off -fill white -opaque black temp.001.png
So, a fully working plot file is
# plot.plt
set size ratio -1
set palette defined ( 0 '#D73027', 1 '#F46D43', 2 '#FDAE61', 3 '#FEE090', 4 '#FFFFD9', 5 '#E0F3F8', 6 '#ABD9E9', 7 '#74ADD1', 8 '#4575B4' )
set ytics 100
set yrange reverse
set term tikz standalone externalimages background "white"; set output "temp.tex"
plot "data.txt" u 1:2:3 matrix with image notitle
# temp.001.png is the external image which contains only the 'with image' part
# We must remove the #00000000 color, which represents the NaN pixels
# I couldn't replace the colors directly, but I could first remove the alpha channel
# and then change black to white, because no other black pixel appear
!convert temp.001.png -alpha off -fill white -opaque black temp.001.png
set output #Closes the temporary output file.
!sed -e 's|/Title|%/Title|' -e 's|/Subject|%/Subject|' -e 's|/Creator|%/Creator|' -e 's|/Author|%/Author|' < temp.tex > graph.tex
!pdflatex graph.tex
Mupdf screen shot for graph.pdf:
Note, that I used standalone to be able to directly compile the resulting file, so that I could check the result.
A more cumbersome alternative would be to "manually" plot with image to a png file, and include that in a second plot, like I described in Big data surface plots: Call gnuplot from tikz to generate bitmap and include automatically? Then you can have more influence on how the png is generated.
Just for the records, with image pixels seems to do the "trick" and will create a file without grey surrounding of NaN datapoints. Tested with gnuplot 5.2.6.
plot FILE u 1:2:3 matrix with image pixels notitle
Code:
### avoid shading around NaN datapoints
reset session
set size ratio -1
FILE = "data.txt"
set palette defined ( 0 '#D73027', 1 '#F46D43', 2 '#FDAE61', 3 '#FEE090', 4 '#FFFFD9', 5 '#E0F3F8', 6 '#ABD9E9', 7 '#74ADD1', 8 '#4575B4' )
set term cairolatex dashed color
set output "temp.tex"
plot FILE u 1:2:3 matrix with image pixels notitle
set output
### end of code
Result: (a PNG of a screenshot, since it looks like I cannot add a PDF here)

Matplotlib multiple scatter subplots - reduce svg file size

I generated a plot in Matplotlib which consists of 50 subplots. In each of these subplots I have a scatterplot with about 3000 datapoints. I'm doing this, because I just want to have an overview of the different scatter plots in a document I'm working on.
This also works so far and looks nice, but the problem is obviously that the SVG file that I'm getting is really big (about 15 MB). And Word just can't handle such a big SVG file.
So my question: is there a way to optimize this SVG file? A lot of my datapoints in the scatter plots are overlapping each other, so I guess it should be possible remove many "invisible" ones of them without changing the visible output. (so something like this in illustrator seems to be what I want to do: Link) Is it also possible to do something like this in Inkscape? Or even directly in Matplotlib?
I know that I can just produce a PNG file, but I would prefer to have the plot as a vector graphic in my document.
If you want to keep all the data points as vector graphics, its unlikely you'll be able to reduce the file size.
While not ideal, one potential option is to rasterize only the data points created by ax.scatter, and leave the axes, labels, titles, etc. all as vector elements on your figure. This can dramatically reduce the file size, and if you set the dpi high enough, you probably won't lose any useful information from the plot.
You can do this by setting rasterized=True when calling ax.scatter.
You can then control the dpi of the rasterized elements using dpi=300 (or whatever dpi you want) when you fig.savefig.
Consider the following:
import matplotlib.pyplot as plt
figV, axesV = plt.subplots(nrows=10, ncols=5)
figR, axesR = plt.subplots(nrows=10, ncols=5)
for ax in figV.axes:
ax.scatter(range(3000), range(3000))
for ax in figR.axes:
ax.scatter(range(3000), range(3000), rasterized=True)
figV.savefig('bigscatterV.svg')
figR.savefig('bigscatterR.svg', dpi=300)
bigscatterV.svg has a file size of 16MB, while bigscatterR.svg has a file size of only 250KB.

Raw pdf color conversion (with known conversion formula) from RGB to CMYK

This question is related to
Script (or some other means) to convert RGB to CMYK in PDF?
however way more specific. Consider that I am not an expert in print production ;)
Situation: For printing I am only allowed to use two colors, Cyan and Black. The printery requests the final PDF to be in DeviceCMYK with only the Channels C and K used.
pdflatex automatically does that (with the xcolor package) for all fonts and drawn objects, however I have more than 100 sketches/figures in PDF format which are embedded in the manuscript. Due to an admittedly badly designed workflow (late realization that Inkscape cannot export CMYK PDFs), all these figures were created in Inkscape, and thus are RGB PDFs.
However, the only used colors within Inkscape were RGB complements of CMY(K), e.g. 100% Cyan is (0,255,255) RGB and 50% K is (127,127,127) etc.
Problem: I need to convert all these PDF figures from RGB to DeviceCMYK (or alternatively the whole PDF of the final manuscript) with a specific conversion formula.
I did a lot of google research and tried the often suggested ways of using e.g. Ghostscript or various print production tools in Adobe Acrobat, however all of the conversion techniques I found so far wanted to use ICC color profiles or used some other conversion strategy which filled the channels MY and spared some C and K, for example.
I know the exact conversion formula for the raw color numbers from our Inkscape-RGBs to the channels C and K, however I do not know or find any program or tool that allows me to manually specify conversion formulas.
Question: Is there any workflow to convert my PDFs from RGB to C(MY)K manually with my own specific conversion formula for the raw numbers with the converted PDF being in DeviceCMYK using a tool, script or Adobe product?
Due to the large number of figures I would prefer a batched solution which doesn't require too much coding from my side, but if it should be the only solution, I'd also be open minded for a workflow like "load/convert/save" within a program for every single figure or writing a small program with an easy-to-handle C++ PDF API for example.
Limitations and additional info: A different file format (like TikZ figures) is not possible any more since it does not work perfectly and the necessary adaptions to the figures would create too much overhead. A maybe helpful information: Since the figures are created in Inkscape, there are no raster images within the PDFs. I also do not want all figures to be converted to raster images during the color conversion.
Edit:
I have created an example of a RGB PDF-figure created with inkscape.
I also did a manual object-by-object color conversion to a CMYK-PDF with Illustrator, to show how the result should look like. Illustrator stores the axial shading in a DeviceN colorspace with the colors cyan and black, which is close enough^^
Here is an idea, I think it will work if your PDF files are using exclusively the colorspaces DeviceGray, DeviceRGB and DeviceCMYK:
1- Convert all your PDF files to Postscript (with pdf2ps from ghostscript for example)
2- Write a Postscript program that redefines the operators setrgbcolor, setgray and setcolor with your own implementation in the Postscript language, your implementation will internally use setcmykcolor and it will compute the values using your custom formula.
Here is an example for redefining the setgray operator:
% The operator setcmykcolor expects 4 values in the stack
% When setgray is called, we can expect to have 1 value in the stack, we will
% use it for the black component of cmyk by adding 3 zeros and rolling the
% top 4 elements of the stack 3 times
/setgray { 0 0 0 4 3 roll setcmykcolor } bind def
3- Paste your Postcript program at the begining of each resulting ps file from step 1.
4- Convert all your files back to PDF (with ps2pdf for example)
See it in action by saving this piece of code as sample.ps:
/setgray { 0 0 0 4 3 roll setcmykcolor } bind def
0.5 setgray
0 0 moveto
600 600 lineto
stroke
showpage
Convert it to PDF with ghostscript using this command line (I used version 9.14):
gswin64c.exe -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=sample.pdf sample.ps
The resulting PDF will have the following page content:
q 0.1 0 0 0.1 0 0 cm
/R7 gs
10 w
% The K operator is the PDF equivalent of setcmykcolor in postscript
0 0 0 0.5 K
0 0 m
3000 3000 l
S
Q
As you can see, the ps-> pdf conversion will preserve the cmky colors specified in postscript with the setcmykcolor operator.
Maybe you can post your formula as a new question and someone could help you out translating it to postscript.
Since you have access to Illustrator, you might want to try importing the PDF into Illustrator and using Illustrator's scripting capabilities to iterate over the elements and replace fill/stroke RGB colors with their CMYK replacement colors.
The difficulty will be with the shading patterns (Gradients) used in the PDF; if they are imported as GradientColor, then in theory it's a matter of digging into the GradientColor to find the base RGB colors and substitute their CMYK replacement.
A very similar problem was solved using the ActivePDF.dll with C++ (or C#??).

How to get Gnuplot not to crop scientific notation

I'm using gnuplot to generate the following surface plot.
The important part of the command file I'm using is:
set terminal pdfcairo size 3,3;
In particular the size 3,3 resizes the plot to the way I want but crops out part of the z-axis label in the process. If I use a wider size like size 4,3 or don't use the size option at all then the z-axis labels fit as follows:
It seems that gnuplot doesn't take the width of the label into consideration when resizing the plot.
Is there a way to maybe move the plot to the right before resizing to 3,3 so that there's room to scientific notation?
You can set the lmargin option:
set lmargin 10
(or whatever size doesn't crop the label).

disturbing artifacts in pdf

I'm struggling with a problem when making plots with filledcurves. Between the filled areas, there seems to be a "gap". However, these artifacts do not appear on the print, but depend on the viewer and zoom-options. In Gnuplot I use the eps terminal, the eps-files look great, but the lines appear when I'm converting to pdf. The conversion it either done directly after plotting or when converting the latex-document from dvi to pdf. As most of the documents are here on the display nowadays, this is an issue. The problem also appears when I'm directly using the pdfcairo terminal in Gnuplot, so it's not caused by the conversion (tried epstopdf and ps2pdf) alone.
I attached a SCREENSHOT of a plot displayed in "acroread" (same problem in other pdf-viewers).
Has anybody an idea how to get rid of it but keeping the graphic vectorized?
I just ran into the same issue. Apparently the filling between two curves
is done as a set of polygons that do not exactly touch one another, thus
the thin white lines visible on some PDF viewers.
One way to fix the issue is to draw over these polygon boundaries. First
define min and max functions in gnuplot:
min(x, y) = x < y ? x : y
max(x, y) = x > y ? x : y
Then, assuming that column 1 of "datafile" contains your x values and
that columns 2 and 3 contain the y values of curves 2 and 3, write:
plot "datafile" using 1:2:3 with filledcurves lc rgb "gray", \
"" using 1:2:(min($2, $3)):(max($2, $3)) with yerrorbars ps 0 lt 1 \
lc rgb "gray" lw 0.5
The first plot instruction fills the spaces between the curves in gray.
The second plot instruction draws points of zero size (ps 0) at each
x value (1) on curve (2) with thin (lw 0.5), continuous (lt 1), gray
(lc rgb "gray"), vertical errorbars (yerrorbars) from the lower to
the higher of curves 2 and 3.
This covers the white lines. To get best results you may need to
experiment with the thickness of the bars (e.g., lw 0.6, lw 0.2).
This issue is fixed with gnuplot 5.2, see https://sourceforge.net/p/gnuplot/patches/749/
The actual problem was, that filled curves were previously plotted as many quadrilaterals, which leads to artifacts (white stripes) in many viewers due to antialiasing.
Since version 5.2 filled curves are rendered as single polygon, which prevents these problems (see issue linked above).
The problem is still present in Gnuplot 5.0.4 and at least the cairolatex terminal which I use to output PDFs.
I also wanted to color the area between two curves, in my case defined as functions.
When I used something like
f(x) = 2 + sin(x)
g(x) = cos(x)
plot '+' using 1:(f($1)):(g($1)) with filledcurves closed
I got the same vertical white lines as in the question.
A simple solution for curves where one is always above the other is to let Gnuplot fill the area from the upper curve to the x-axis with the desired color and then paint it over with white from the lower curve downwards:
f(x) = 2 + sin(x)
g(x) = cos(x)
plot f(x) with filledcurves x1, g(x) w filledc x1 fs lc rgb "white"
Apparently this filledcurves style (not between curves but between a curve and an axis) avoids the trapezoid artifacts.
This can be readily extended for plotting data files and multiple stacked cures like in the question. Just paint from top to bottom and finish with white for the empty area between the lowest curve and the x-axis.
For overlapping curves a construction of minimum and maximum curves like in the answer from françois-tonneau might do the trick.
If you're talking about the red and cyan bits the gap could be an illusion caused by the Red + Cyan = White on a RGB screen. Maybe there's no gap, but the border areas appear as white due to the proximity of the pixels.
Take the screenshot and blow it up so you can see the individual pixels around the perceived gap.
If this is the case, maybe selecting a different colour scheme for the adjacent colurs would get rid of the effect. I certainly can't see anything matching your description on anywhere but the red and cyan bits.
From https://groups.google.com/forum/#!topic/comp.graphics.apps.gnuplot/ivRaKpu5cJ8, it seemed to be a pure Gostscript issue.
Using the eps terminal of Gnuplot and converting the eps file to pdf with
epstopdf -nogs <file.eps> -o <file.pdf>
solved the problem on my system. From the corresponding Man page, the "-nogs" option instructs epstopdf not to use Gostscript.