Gnuplot: How to load and display single numeric value from data file - sum

My data file has this content
# data file for use with gnuplot
# Report 001
# Data as of Tuesday 03-Sep-2013
total 1976
case1 522 278 146 65 26 7
case2 120 105 15 0 0 0
case3 660 288 202 106 63 1
I am making a histogram from the case... lines using the script below - and that works. My question is: how can I load the grand total value 1976 (next to the word 'total') from the data file and either (a) store it into a variable or (b) use it directly in the title of the plot?
This is my gnuplot script:
reset
set term png truecolor
set terminal pngcairo size 1024,768 enhanced font 'Segoe UI,10'
set output "output.png"
set style fill solid 1.00
set style histogram rowstacked
set style data histograms
set xlabel "Case"
set ylabel "Frequency"
set boxwidth 0.8
plot for [i=3:7] 'mydata.dat' every ::1 using i:xticlabels(1) with histogram \
notitle, '' every ::1 using 0:2:2 \
with labels \
title "My Title"
For the benefit of others trying to label histograms, in my data file, the column after the case label represents the total of the rest of the values on that row. Those total numbers are displayed at the top of each histogram bar. For example for case1, 522 is the total of (278 + 146 + 65 + 26 + 7).
I want to display the grand total somewhere on my chart, say as the second line of the title or in a label. I can get a variable into sprintf into the title, but I have not figured out syntax to load a "cell" value ("cell" meaning row column intersection) into a variable.
Alternatively, if someone can tell me how to use the sum function to total up 522+120+660 (read from the data file, not as constants!) and store that total in a variable, that would obviate the need to have the grand total in the data file, and that would also make me very happy.
Many thanks.

Lets start with extracting a single cell at (row,col). If it is a single values, you can use the stats command to extract the values. The row and col are specified with every and using, like in a plot command. In your case, to extract the total value, use:
# extract the 'total' cell
stats 'mydata.dat' every ::::0 using 2 nooutput
total = int(STATS_min)
To sum up all values in the second column, use:
stats 'mydata.dat' every ::1 using 2 nooutput
total2 = int(STATS_sum)
And finally, to sum up all values in columns 3:7 in all rows (i.e. the same like the previous command, but without using the saved totals) use:
# sum all values from columns 3:7 from all rows
stats 'mydata.dat' every ::1 using (sum[i=3:7] column(i)) nooutput
total3 = int(STATS_sum)
These commands require gnuplot 4.6 to work.
So, your plotting script could look like the following:
reset
set terminal pngcairo size 1024,768 enhanced
set output "output.png"
set style fill solid 1.00
set style histogram rowstacked
set style data histograms
set xlabel "Case"
set ylabel "Frequency"
set boxwidth 0.8
# extract the 'total' cell
stats 'mydata.dat' every ::::0 using 2 nooutput
total = int(STATS_min)
plot for [i=3:7] 'mydata.dat' every ::1 using i:xtic(1) notitle, \
'' every ::1 using 0:(s = sum [i=3:7] column(i), s):(sprintf('%d', s)) \
with labels offset 0,1 title sprintf('total %d', total)
which gives the following output:

For linux and similar.
If you don't know the row number where your data is located, but you know it is in the n-th column of a row where the value of the m-th column is x, you can define a function
get_data(m,x,n,filename)=system('awk "\$'.m.'==\"'.x.'\"{print \$'.n.'}" '.filename)
and then use it, for example, as
y = get_data(1,"case2",4,"datafile.txt")
using data provided by user424855
print y
should return 15

It's not clear to me where your "grand total" of 1976 comes from. If I calculate 522+120+660 I get 1302 not 1976.
Anyway, here is a solution which works even without stats and sum which were not available in gnuplot 4.4.0.
In the data you don't necessarily need the "grand total" or the sum of each row, because gnuplot can calculate this for you. This is done by (not) plotting the file as a matrix, and at the same time summing up the rows in the string variable S0 and the total sum in variable Total. There will be a warning warning: matrix contains missing or undefined values which you can ignore. The labels are added by plotting '+' ... with labels extracting the desired values from the S0 string.
Data: SO18583180.dat
So, the reduced input data looks like this:
# data file for use with gnuplot
# Report 001
# Data as of Tuesday 03-Sep-2013
case1 278 146 65 26 7
case2 105 15 0 0 0
case3 288 202 106 63 1
Script: (works for gnuplot>=4.4.0, March 2010 and gnuplot 5.x)
### histogram with sums and total sum
reset
FILE = "SO18583180.dat"
set style histogram rowstacked
set style data histograms
set style fill solid 0.8
set xlabel "Case"
set ylabel "Frequency"
set boxwidth 0.8
set key top left noautotitle
set grid y
set xrange [0:2]
set offsets 0.5,0.5,0,0
Total = 0
S0 = ''
addSums(v) = S0.sprintf(" %g",(M=$2,(N=$1+1)==1?S1=0:0,S1=S1+v))
plot for [i=2:6] FILE u i:xtic(1) notitle, \
'' matrix u (S0=addSums($3),Total=Total+$3,NaN) w p, \
'+' u 0:(real(S2=word(S0,int($0*N+N)))):(S2) every ::::M w labels offset 0,0.7 title sprintf("Total: %g",Total)
### end of script
Result: (created with gnuplot 4.4.0, Windows terminal)

Related

How to connect points with different indices (one data file) in gnuplot

I have a file "a_test.dat" with two data blocks that I can select via the corresponding index.
# first
x1 y1
3 1
6 2
9 8
# second
x2 y2
4 5
8 2
2 7
Now I want to connect the data points of both indices with an arrow.
set arrow from (x1,y1) to (x2,y2).
I can plot both blocks with one plot statement. But I cannot get the points to set the arrows.
plot "a_test.dat" index "first" u 1:2, "" index "second" u 1:2
From version 5.2 you can use gnuplot arrays:
stats "a_test.dat" nooutput
array xx[STATS_records]
array yy[STATS_records]
# save all data into two arrays
i = 1
fnset(x,y) = (xx[i]=x, yy[i]=y, i=i+1)
# parse data ignoring output
set table $dummy
plot "" using (fnset($1,$2)) with table
unset table
# x2,y2 data starts at midpoint in array
numi = int((i-1)/2)
plot for [i=1:numi] $dummy using (xx[i]):(yy[i]):(xx[numi+i]-xx[i]):(yy[numi+i]-yy[i]) with vectors
Use stats to count the number of lines in the file, so that the array can
be large enough. Create an array xx and another yy to hold the data.
Use plot ... with table to read the file again, calling your function
fnset() for each data line with the x and y column values. The function
saves them at the current index i, which it increments. It was
initialised to 1.
For 3+3 data lines, i ends up at 7, so we set numi to (i-1)/2 i.e. 3.
Use plot for ... vectors to draw the arrows. Each arrow needs 4 data
items from the array. Note that the second x,y must be a relative delta,
not an absolute position.

How to plot a chart so it adds to the value of previous value instead of plotting it over a zero line

In this code i have ploted pct_day. Since the value does not increase like it would in a stock value, is it possible to plot this data where the current value which is to be plotted is added to the previous value and that data is plotted. This way the line graph would increase over time as opposed to the image below where the chart is plotted over a zero line?
High Low Open Close Volume Adj Close year pct_day
month day
1 2 794.913004 779.509998 788.783002 789.163007 6.372860e+08 789.163007 1997.400000 0.002211
3 833.470005 818.124662 823.937345 828.889339 9.985193e+08 828.889339 1997.866667 0.004160
4 863.153573 849.154299 858.737861 853.571429 1.042729e+09 853.571429 1997.714286 -0.003345
5 900.455715 888.571429 895.716426 894.472137 1.022023e+09 894.472137 1998.357143 -0.001216
6 847.453076 837.161537 840.123847 844.383843 8.889831e+08 844.383843 1998.076923 0.003679
... ... ... ... ... ... ... ... ... ...
12 27 909.735997 900.942000 905.528664 904.734009 7.485793e+08 904.734009 1998.133333 -0.000308
28 946.635010 940.440016 942.995721 944.127147 7.552150e+08 944.127147 1998.071429 0.001251
29 950.723837 941.625390 944.760775 947.200773 6.830400e+08 947.200773 1998.076923 0.002899
30 891.501671 883.954989 887.031665 887.819181 6.010675e+08 887.819181 1997.833333 0.001844
31 918.943857 910.320763 916.251549 913.786154 6.879523e+08 913.786154 1997.923077 -0.002772
363 rows × 8 columns
in Jupyter notebook as shows below:
You need the cumulative sum of the column pct_day. First, create a new column where you compute that value by means of numpy cumsum
pct_value_list = df['pct_value'].tolist()
pct_value_cumsum = list(np.cumsum(pct_value_list))
df['pct_value_cumsum'] = pct_value_cumsum
After that you can plot by df.plot(y='pct_value_cumsum')

Graph to show departure and arrival times between stations

I have the start and end times of trips made by a bus, with the times in an Excel sheet. I want to make the graph as below :
I tried with Matlab nodes and graphs but did not got the exact figure, below is the Matlab code which I tried as an example:
A = [1 4]
B = [2 3]
weights = [5 5];
G = digraph(A,B,weights,4)
plot(G)
And the figure it generates:
I have got many more than 4 points in the Excel sheet, and I want them to all be displayed as in the first image.
Overview
You don't need any sort of complicated graph package for this, just use normal line plots! Here are methods in Excel and Matlab.
Excel
Give each bus stop a number, and list the bus stop number by the time it arrives/leaves there. I'll use stops number 0 and 1 for this example.
0 04:41
1 05:35
1 05:40
0 06:34
0 06:51
1 07:45
1 15:21
0 16:15
Then simply highlight the data and insert a "scatter with straight lines"
The rest is formatting. You can format the y-axis and tick "values in reverse order" to get the time increasing as in your desired plot. You can change the x-axis tick marks to just show integer stop numbers, get rid of the legend etc.
Final output:
Matlab
Here is the Matlab documentation for converting Excel formatted dates into Matlab datetime arrays: Convert Excel Date Number to Datetime.
Once you have the datetime objects, you can do this easily with the standard plot function.
% Set times up as a datetime array, could do this any number of ways
times = datetime(strcat({'1/1/2000 '}, {'04:41', '05:35', '05:40', '06:34', '06:51', '07:45', '15:21', '16:15'}, ':00'), 'format', 'dd/MM/yyyy HH:mm:ss');
% Set up the location of the bus at each of the above times
station = [0,1,1,0,0,1,1,0];
% Plot
plot(station, times) % Create plot
set(gca, 'xtick', [0,1]) % Limit to just ticks at the 2 stops
set(gca, 'ydir', 'reverse') % Reverse y axis to have earlier at top
set(gca,'XTickLabel',{'R', 'L'}) % Name the stops
Output:

gnuplot, how to label only certain points?

I'm using the following gnuplot commands to create a plot:
#!/bin/bash
gnuplot << 'EOF'
set term postscript portrait color enhanced
set output 'out.ps'
plot 'data_file' u 3:2 w points , '' u 3:2:($4!=-3.60 ? $1:'aaa') w labels
EOF
where data_file looks like this:
O4 -1.20 -0.33 -5.20
O9.5 -1.10 -0.30 -3.60
B0 -1.08 -0.30 -3.25
B0.5 -1.00 -0.28 -2.60
B1.5 -0.90 -0.25 -2.10
B2.5 -0.80 -0.22 -1.50
B3 -0.69 -0.20 -1.10
I want gnuplot to label all points with the strings found in column 1, except the one where column 4 is equal to -3.60 in which case I want the aaa string. What I'm getting is that the $4=-3.60 data point is being labeled correctly as aaa, but the rest are not being labeled at all.
Update: gnuplot has no problem showing numbers as labels using the conditional statement, ie: any column but 1 is correctly displayed as a label for each point respecting the conditions imposed. That is, this line displays column 2 (numbres) as point labels respecting the conditional statement:
plot 'data_file' u 3:2 w points , '' u 3:2:($4!=-3.60 ? $2:'aaa') w labels
Update 2: It also has no problem in plotting column 1 as point labels if I plot it as a whole, ie not using a conditional statement. That is, this line plots correctly all the point labels in column 1 (strings):
plot 'data_file' u 3:2 w points , '' u 3:2:1 w labels
So clearly the problem is in using the conditional statement together with the strings column. Any of these used separately works just fine.
In a more clean way maybe, this should work. It seems label can't display a computed number if it isn't turned in a string.
#!/bin/bash
gnuplot << 'EOF'
set term postscript portrait color enhanced
set output 'out.ps'
plot 'data_file' u 3:2 w points , '' u 3:2:($4!=-3.60 ? sprintf("%d",$1):'aaa') w labels
EOF
Is this what you want?
#!/bin/bash
gnuplot << 'EOF'
set term postscript portrait color enhanced
set output 'out.ps'
plot 'data_file' u 3:2 w points , \
'' u (($4 == -3.60)? 1/0 : $3):2:1 w labels
EOF
All I do here is set (x) points where the column 4 equals -3.6 to NaN (1/0). Since gnuplot ignores those points, life is good. I think the problem with your script is that you were filtering a column where gnuplot expects string input -- although I haven't played around with it enough to verify that. I just switched the filter to a column where gnuplot expects numbers (the x position) and it works just fine.

How to Resize using Lanczos

I can easily calculate the values for sinc(x) curve used in Lanczos, and I have read the previous explanations about Lanczos resize, but being new to this area I do not understand how to actually apply them.
To resample with lanczos imagine you
overlay the output and input over
eachother, with points signifying
where the pixel locations are. For
each output pixel location you take a
box +- 3 output pixels from that
point. For every input pixel that lies
in that box, calculate the value of
the lanczos function at that location
with the distance from the output
location in output pixel coordinates
as the parameter. You then need to
normalize the calculated values by
scaling them so that they add up to 1.
After that multiply each input pixel
value with the corresponding scaling
value and add the results together to
get the value of the output pixel.
For example, what does "overlay the input and output" actually mean in programming terms?
In the equation given
lanczos(x) = {
0 if abs(x) > 3,
1 if x == 0,
else sin(x*pi)/x
}
what is x?
As a simple example, suppose I have an input image with 14 values (i.e. in addresses In0-In13):
20 25 30 35 40 45 50 45 40 35 30 25 20 15
and I want to scale this up by 2, i.e. to an image with 28 values (i.e. in addresses Out0-Out27).
Clearly, the value in address Out13 is going to be similar to the value in address In7, but which values do I actually multiply to calculate the correct value for Out13?
What is x in the algorithm?
If the values in your input data is at t coordinates [0 1 2 3 ...], then your output (which is scaled up by 2) has t coordinates at [0 .5 1 1.5 2 2.5 3 ...]. So to get the first output value, you center your filter at 0 and multiply by all of the input values. Then to get the second output, you center your filter at 1/2 and multiply by all of the input values. Etc ...