gnuplot: Create multiple boxplots from different data files on the same output - iteration

I have a set of files that contain data that I want to produce a set of box plots for in order to compare them. I can get the data into gnuplot, but I don't know the correct format to separate each file into its own plot.
I have tried reading all the required files into a variable, which does work, however when the plot is produced, all the boxplots are on top of each other. I need to get gnuplot to index each plot along one space for each new data file.
For example, this produces the output with overlaying plots:
FILES = system("ls -1 /path/to/files/*")
plot for [data in FILES] data using (1):($4) with boxplot notitle
I know the X position is being stated explicitly there with the (1), but I'm not sure what to replace it with to get the position to move for each plot. This isn't a problem with other chart types, since they don't have the same field locating them.

You can try the following.
You can access the file in your file list by index via word(FILES,i). Check help word and help words. The code below assumes that you have some datafiles Data0*.dat in your directory. Maybe there is a smarter/shorter way to implement the xtic labels.
Code:
### boxplots from a list of files
reset session
# get a list of files (Windows)
FILES = system('dir /B "C:\Data\Data0*.dat"')
# set tics as filenames
set xtics () # remove xtics
set yrange [-2:27]
do for [i=1:words(FILES)] {
set xtics add (word(FILES,i) i) rotate by 45 right
}
plot for [i=1:words(FILES)] word(FILES,i) u (i):2 w boxplot notitle
### end of code
Result:

Related

Matplotlib video creation

EDIT: ImportanceOfBeingErnest provided the answer, however I am still inviting you all to explain, why is savefig logic different from animation logic.
I want to make a video in matplotlib. I went through manuals and examples and I just don't get it. (regarding matplotlib, I always copy examples, because after five years of python and two years of mathplotlib I still understand 0.0% of matplotlib syntax)
After half a dozen hours here is what I came up to. Well, I get empty video. No idea why.
import os
import math
import matplotlib
matplotlib.use("Agg")
from matplotlib import pyplot as plt
import matplotlib.animation as animation
# Set up formatting for the movie files
Writer = animation.writers['ffmpeg']
writer = Writer(fps=15, metadata=dict(artist='Me'), bitrate=1800)
numb=100
temp=[0.0]*numb
cont=[0.0]*numb
for i in range(int(4*numb/10),int(6*numb/10)):
temp[i]=2
cont[i]=2
fig = plt.figure()
plts=fig.add_subplot(1,1,1)
plts.set_ylim([0,2.1])
plts.get_xaxis().set_visible(False)
plts.get_yaxis().set_visible(False)
ims = []
for i in range(1,10):
line1, = plts.plot(range(0,numb),temp, linewidth=1, color='black')
line2, = plts.plot(range(0,numb),cont, linewidth=1, color='red')
# savefig is here for testing, works perfectly!
# fig.savefig('test'+str(i)+'.png', bbox_inches='tight', dpi=300)
ims.append([line1,line2])
plts.lines.remove(line1)
plts.lines.remove(line2)
for j in range(1,10):
tempa=0
for k in range (1,numb-1):
tempb=temp[k]+0.51*(temp[k-1]-2*temp[k]+temp[k+1])
temp[k-1]=tempa
tempa=tempb
temp[numb-1]=0
for j in range(1,20):
conta=0
for k in range (1,numb-1):
contb=cont[k]+0.255*(cont[k-1]-2*cont[k]+cont[k+1])
cont[k-1]=conta
conta=contb
cont[numb-1]=0
im_ani = animation.ArtistAnimation(fig, ims, interval=50, repeat_delay=3000,blit=True)
im_ani.save('im.mp4', writer=writer)
Can someone help me with this?
If you want to have a plot which is not empty, the main idea would be not to remove the lines from the plot.
That is, delete the two lines
plts.lines.remove(line1)
plts.lines.remove(line2)
If you delete these two lines the output will look something like this
[Link to orginial size animation]
Now one might ask, why do I not need to remove the artist in each iteration step, as otherwise all of the lines would populate the canvas at once?
The answer is that the ArtistAnimation takes care of this. It will only show those artists in the supplied list that correspond to the given time step. So while at the end of the for loop you end up with all the lines drawn to the canvas, once the animation starts they will all be removed and only one set of artists is shown at a time.
In such a case it is of course not a good idea to use the loop for saving the individual images as the final image would contain all of the drawn line at once,
The solution is then either to make two runs of the script, one for the animation, and one where the lines are removes in each timestep. Or, maybe better, use the animation istself to create the images.
im_ani.save('im.png', writer="imagemagick")
will create the images as im-<nr>.png in the current folder. It will require to have imagemagick installed.
I'm trying here to answer the two questions from the comments:
1. I have appended line1 and line2 before deleting them. Still they disappeared in the final result. How come?
You have appended the lines to a list. After that you removed the lines from the axes. Now the lines are in the list but not part of the axes. When doing the animation, matplotlib finds the lines in the list and makes them visible. But they are not in the axes (because they have been removed) so the visibility of some Line2D object, which does not live in any axes but only somewhere in memory, is changed. But that isn't reflected in the plot because the plot doesn't know this line any more.
2. If I understand right, when you issue line1, = plts.plot... command then the line1 plot object is added to the plts graph object. However, if you change the line1 plot object by issuing line1, = plts.plot... command again, matplotlib does change line1 object but before that saves the old line1 to the plts graph object permanently. Is this what caused my problem?
No. The first step is correct, line1, = plts.plot(..) adds a Line2D object to the axes. However, in a later loop step line1, = plts.plot() creates another Line2D object and puts it to the canvas. The initial Line2D object is not changed and it doesn't know that there is now some other line next to it in the plot. Therefore, if you don't remove the lines they will all be visible in the static plot at the end.

Efficiently Plotting Many Lines in VisPy

From all example code/demos I have seen in the VisPy library, I only see one way that people plot many lines, for example:
for i in range(N):
pos = pos.copy()
pos[:, 1] = np.random.normal(scale=5, loc=(i+1)*30, size=N)
line = scene.visuals.Line(pos=pos, color=color, parent=canvas.scene)
lines.append(line)
canvas.show()
My issue is that I have many lines to plot (each several hundred thousand points). Matplotlib proved too slow because of the total number of points plotted was in the millions, hence I switched to VisPy. But VisPy is even slower when you plot thousands of lines each with thousands of points (the speed-up comes when you have millions of points).
The root cause is in the way lines are drawn. When you create a plot widget and then plot a line, each line is rendered to the canvas. In matplotlib you can explicitly state to not show the canvas until all lines are drawn in memory, but there doesn't appear to be the same functionality in VisPy, making it useless.
Is there any way around this? I need to plot multiple lines so that I can change properties interactively, so flattening all the data points into one plot call won't work.
(I am using a PyQt4 to embed the plot in a GUI. I have also considered pyqtgraph.)
You should pass an array to the "connect" parameter of the Line() function.
xy = np.random.rand(5,2) # 2D positions
# Create an array of point connections :
toconnect = np.array([[0,1], [0,2], [1,4], [2,3], [2,4]])
# Point 0 in your xy will be connected with 1 and 2, point
# 1 with 4 and point 2 with 3 and 4.
line = scene.visuals.Line(pos=xy, connect=toconnect)
You only add one object to your canvas but the control pear line is more limited.

Using two arguments in sprintf function in gnuplot

I've been trying to graph several files on the same gnuplot graph using sprintf to read in the filenames. I can read in one argument o.k. if I write:
filename(n) = sprintf("oi04_saxs_%05d_ave_div_sub.dat", n)
plot for [i=1:10] filename(i) u 1:2
then my graph is o.k and I get all files with that argument plotted on the same graph. However I have a string of characters that changes near the end of my filename and when I try to reflect this in
filename(n,m) = sprintf("oi04_saxs_%05d_0001_%04s_ave_div_sub.dat",n,m)
plot for [i=1:10] filename(i,m) u 1:2
I get the following error message: 'undefined variable m'. I've tried removing the loop and just running
plot for filename(m)
and this results in the same error message. Help in understanding what's going wrong and how to fix it would be much appreciated :)
This is my full script:
unset multiplot
reset
set termoption enhanced
set encoding utf8
set term pdf size 18cm,18cm font 'Arial'
set pointsize 0.25
set output 'StoppedFlowResults.pdf'
set logscale
set xlabel '{/:Italic r} / [Q]'
set ylabel '{/:Italic Intensity}'
filename(n) = sprintf("./Result_curve-%d.txt/estimate.d", n)
myColorGradient(n) = sprintf("#%02x00%02x", 256-(n-1)*8-1, (n-1)*8)
set key off
set multiplot layout 2,1
filename(n,m) = sprintf("oi04_saxs_%05d_0001_%04s_ave_div_sub.dat",n,m);
plot for [i=1:10] filename(i,m) u 1:2 not
unset multiplot
set output
based on help for, you can have nested iterations e.g.:
plot for [i=1:3] for [j=1:3] sin(x*i)+cos(x*j)
In your case you can mix this with strings (you have yet to define string possible values) with something like:
plot for [i=1:3] for [m in "A B C D"] filename(i,m) u 1:2 not

matplotlib: plot or scatter without line through marker

is there a simple way to have scatter() plots (or just plots) with data points shown by some marker and connected by lines, but, when markerfacecolor='none' (or facecolor=none) have the line not shown within the area of the marker.
E.g.:
xx = arange(0.0,10.0,0.5)
yy = sin(xx)
plt.plot(xx,yy,'k-',marker='o',markerfacecolor='none')
results in the following figure.
But I would like the lines connecting data points to start not from the center of each marker but from its borders.

Gnuplot: plotting histograms on Y-axis

I'm trying to plot a histogram for the following data:
<text>,<percentage>
--------------------
"Statement A",50%
"Statement B",20%
"Statement C",30%
I used the set datafile separator "," to obtain the corresponding columns. The plot should have percentage on the X-axis and the statements on the Y-axis (full character string). So each histogram is horizontal.
How can I do this in gnuplot?
Or is there other tools for plotting good vector images?
The gnuplot histogram and boxes plotting styles are for vertical boxes. To get horizontal boxes, you can use boxxyerrorbars.
For the strings as y-labels, I use yticlabels and place the boxes at the y-values 0, 1 and 2 (according to the row in the data file, which is accessed with $0).
I let gnuplot treat the second column as numerical value, which strips the % off. It is added later in the formatting of the xtics:
set datafile separator ','
set format x '%g%%'
set style fill solid
plot 'data.txt' using ($2*0.5):0:($2*0.5):(0.4):yticlabels(1) with boxxyerrorbars t ''
The result with version 4.6.4 is:
#Christoph Thank you. Your answer helped me.
#Slayer Regarding your question to add labels using gnuplot v5.2 patchlevel 6 and using #Christoph's provided sample.
Sample Code:
# set the data file delimiter
set datafile separator ','
# set the x-axiz labels to show percentage
set format x '%g%%'
# set the x-axis min and max range
set xrange [ 0 : 100]
# set the style of the bars
set style fill solid
# set the textbox style with a blue line colour
set style textbox opaque border lc "blue"
# plot the data graph and place the labels on the bars
plot 'plotv.txt' using ($2*0.5):0:($2*0.5):(0.3):yticlabels(1) with boxxyerrorbars t '', \
'' using 2:0:2 with labels center boxed notitle column
Sample Data Provided:(plotv.txt)
<text>,<percentage>
--------------------
"Statement A",50%
"Statement B",20%
"Statement C",30%
Reference(s):
gnuplot 5.2 demo sample - textbox and the related sample data
gnuplot