gnuplot store one number from data file into variable - variables

OSX v10.6.8 and Gnuplot v4.4
I have a data file with 8 columns. I would like to take the first value from the 6th column and make it the title. Here's what I have so far:
#m1 m2 q taua taue K avgPeriodRatio time
#1 2 3 4 5 6 7 8
K = #read in data here
graph(n) = sprintf("K=%.2e",n)
set term aqua enhanced font "Times-Roman,18"
plot file using 1:3 title graph(K)
And here is what the first few rows of my data file looks like:
1.00e-07 1.00e-07 1.00e+00 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
1.11e-06 1.00e-07 9.02e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
2.12e-06 1.00e-07 4.72e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
3.13e-06 1.00e-07 3.20e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12090.00
I don't know how to correctly read in the data or if this is even the right way to go about this.
EDIT #1
Ok, thanks to mgilson I now have
#m1 m2 q taua taue K avgPeriodRatio time
#1 2 3 4 5 6 7 8
set term aqua enhanced font "Times-Roman,18"
K = "`head -1 datafile | awk '{print $6}'`"
print K+0
graph(n) = sprintf("K=%.2e",n)
plot file using 1:3 title graph(K)
but I get the error: Non-numeric string found where a numeric expression was expected
EDIT #2
file = "testPlot.txt"
K = "`head -1 file | awk '{print $6}'`"
K=K+0 #Cast K to a floating point number #this is line 9
graph(n) = sprintf("K=%.2e",n)
plot file using 1:3 title graph(K)
This gives the error--> head: file: No such file or directory
"testPlot.gnu", line 9: Non-numeric string found where a numeric expression was expected

You have a few options...
FIRST OPTION:
use columnheader
plot file using 1:3 title columnheader(6)
I haven't tested it, but this may prevent the first row from actually being plotted.
SECOND OPTION:
use an external utility to get the title:
TITLE="`head -1 datafile | awk '{print $6}'`"
plot 'datafile' using 1:3 title TITLE
If the variable is numeric, and you want to reformat it, in gnuplot, you can cast strings to a numeric type (integer/float) by adding 0 to them (e.g).
print "36.5"+0
Then you can format it with sprintf or gprintf as you're already doing.
It's weird that there is no float function. (int will work if you want to cast to an integer).
EDIT
The script below worked for me (when I pasted your example data into a file called "datafile"):
K = "`head -1 datafile | awk '{print $6}'`"
K=K+0 #Cast K to a floating point number
graph(n) = sprintf("K=%.2e",n)
plot "datafile" using 1:3 title graph(K)
EDIT 2 (addresses comments below)
To expand a variable in backtics, you'll need macros:
set macro
file="mydatafile.txt"
#THE ORDER OF QUOTES (' and ") IS CRUCIAL HERE.
cmd='"`head -1 ' . file . ' | awk ''{print $6}''`"'
# . is string concatenation. (this string has 3 pieces)
# to get a single quote inside a single quoted string
# you need to double. e.g. 'a''b' yields the string a'b
data=#cmd
To address your question 2, it is a good idea to familiarize yourself with shell utilities -- sed and awk can both do it. I'll show a combination of head/tail:
cmd='"`head -2 ' . file . ' | tail -1 | awk ''{print $6}''`"'
should work.
EDIT 3
I recently learned that in gnuplot, system is a function as well as a command. To do the above without all the backtic gymnastics,
data=system("head -1 " . file . " | awk '{print $6}'")
Wow, much better.

This is a very old question, but here's a nice way to get access to a single value anywhere in your data file and save it as a gnuplot-accessible variable:
set term unknown #This terminal will not attempt to plot anything
plot 'myfile.dat' index 0 every 1:1:0:0:0:0 u (var=$1):1
The index number allows you to address a particular dataset (separated by two carriage returns), while every allows you to specify a particular line.
The colon-separated numbers after every should be of the form 1:1:<line_number>:<block_number>:<line_number>:<block_number>, where the line number is the line with the the block (starting from 0), and the block number is the number of the block (separated by a single carriage return, again starting from 0). The first and second numbers say plot every 1 lines and every one data block, and the third and fourth say start from line <line_number> and block <block_number>. The fifth and sixth say where to stop. This allows you to select a single line anywhere in your data file.
The last part of the plot command assigns the value in a particular column (in this case, column 1) to your variable (var). There needs to be two values to a plot command, so I chose column 1 to plot against my variable assignment statement.

Here is a less 'awk'-ward solution which assigns the value from the first row and 6th column of the file 'Data.txt' to the variable x16.
set table
# Syntax: u 0:($0==RowIndex?(VariableName=$ColumnIndex):$ColumnIndex)
# RowIndex starts with 0, ColumnIndex starts with 1
# 'u' is an abbreviation for the 'using' modifier
plot 'Data.txt' u 0:($0==0?(x16=$6):$6)
unset table
A more general example for storing several values is given below:
# Load data from file to variable
# Gnuplot can only access the data via the "plot" command
set table
# Syntax: u 0:($0==RowIndex?(VariableName=$ColumnIndex):$ColumnIndex)
# RowIndex starts with 0, ColumnIndex starts with 1
# 'u' is an abbreviation for the 'using' modifier
# Example: Assign all values according to: xij = Data33[i,j]; i,j = 1,2,3
plot 'Data33.txt' u 0:($0==0?(x11=$1):$1),\
'' u 0:($0==0?(x12=$2):$2),\
'' u 0:($0==0?(x13=$3):$3),\
'' u 0:($0==1?(x21=$1):$1),\
'' u 0:($0==1?(x22=$2):$2),\
'' u 0:($0==1?(x23=$3):$3),\
'' u 0:($0==2?(x31=$1):$1),\
'' u 0:($0==2?(x32=$2):$2),\
'' u 0:($0==2?(x33=$3):$3)
unset table
print x11, x12, x13 # Data from first row
print x21, x22, x23 # Data from second row
print x31, x32, x33 # Data from third row

Related

'grep' or 'awk' for extracting numeric data from a file

I have a CFD output file containing alpha-numeric data. My goal is to extract certain rows having numeric data to be able to plot. I was able to extract data which starts with a numeric value using grep. However, some of the rows of this extracted data start with a number but contains alphabets also which i do not want. here is a sample
3185 interface metric data, zone 1444, binary.
33268 interface metric data, zone 1440, binary.
3d, double precision, pressure-based, SST k-omega solver.
1 1.0000e+00 1.0163e-01 4.9782e-06 1.2250e-05 6.5126e-06 3.8876e+01 4.1845e+03 7.8685e+02 7.9475e+02 7.8234e+02 3.0537e+00 4.4427e+02 106:48:28 4999
2 1.0000e+00 6.5455e-02 1.4961e-04 2.2052e-04 1.3530e-02 6.8334e-01 4.5948e-01 7.9448e+02 8.0249e+02 7.9007e+02 1.3742e+00 5.7040e+02 92:12:06 4998
4587 interface metric data, zone 2541, binary.
6584 interface metric data, zone 1254, binary.
3 1.0000e+00 4.2029e-02 1.5227e-04 2.1588e-04 3.0255e-03 6.4570e-01 1.2661e-01 7.8044e+02 7.9048e+02 7.7804e+02 -2.3999e+05 6.4085e+02 80:35:24 4997
4 9.9121e-01 3.0808e-02 1.1390e-04 1.7132e-04 1.6542e-03 6.0594e-01 3.4626e-02 7.8613e+02 7.9673e+02 7.8422e+02 -1.9033e+05 7.0184e+02 70:56:41 4996
This is the command i used grep -P '^\s*\d+' file. How can i modify grep command to give me last rows with numeric data only ie
1 1.0000e+00 1.0163e-01 4.9782e-06 1.2250e-05 6.5126e-06 3.8876e+01 4.1845e+03 7.8685e+02 7.9475e+02 7.8234e+02 3.0537e+00 4.4427e+02 106:48:28 4999
2 1.0000e+00 6.5455e-02 1.4961e-04 2.2052e-04 1.3530e-02 6.8334e-01 4.5948e-01 7.9448e+02 8.0249e+02 7.9007e+02 1.3742e+00 5.7040e+02 92:12:06 4998
3 1.0000e+00 4.2029e-02 1.5227e-04 2.1588e-04 3.0255e-03 6.4570e-01 1.2661e-01 7.8044e+02 7.9048e+02 7.7804e+02 -2.3999e+05 6.4085e+02 80:35:24 4997
4 9.9121e-01 3.0808e-02 1.1390e-04 1.7132e-04 1.6542e-03 6.0594e-01 3.4626e-02 7.8613e+02 7.9673e+02 7.8422e+02 -1.9033e+05 7.0184e+02 70:56:41 4996
How can i modify grep command to give me last 4 rows only
Pipe the grep output to tail.
grep -P '^\s*\d+' file | tail -n 4
Given that the text in the question is the only thing we have to go on, I see a few patterns we might use to extract the last four lines.
The following matches lines whose first field is a number and contain no commas:
egrep '^[[:space:]]*[0-9][^,]+$'
This one matches lines containing numbers in scientific notation:
grep '[0-9]e[+-][0-9]'
And this one matches lines containing what looks like a time followed by an integer at the end of the line:
egrep '[0-9]+(:[0-9]{2}){2} [0-9]+$'
Or if you want an explicit match for the whole line -- that is, an integer, a number of scientific numbers, a time and then an integer, you can bundle it all together:
egrep '^[[:space:]]*[0-9]([[:space:]]+-?[0-9]+\.[0-9]+e[+-][0-9]+)+[[:space:]]+[0-9]+(:[0-9]{2}){2} [0-9]+$'
Note that I'm using explicit class names and ERE rather than shortcuts and PREG to maintain compatibility with non-Linux environments.
If your desired section of data can be identified by a certain header, e.g., the 3d, before it, you can look for the header and only start printing matching lines afterwards, e.g.,
awk '/^\s*3d,/ { in_data=1; next } in_data && /^\s*[0-9]/' file
Here /^\s*3d,/ is the pattern for the header which indicates the beginning of the "data section" (from the next line, i.e., not including the header itself). And /^\s*[0-9]/ is the pattern for lines to print within the data section.
In case there is no such header, you could try to identify the first line of data itself with a more complicated pattern, e.g., the number of fields combined with a regular expression:
awk 'NF == 15 && /^\s*[0-9]*\s*/ { in_data=1 } in_data && /^\s*[0-9]/' file

Sqldf in R - error with first column names

Whenever I use read.csv.sql I cannot select from the first column with and any output from the code places an unusual character (A(tilde)-..) at the begging of the first column's name.
So suppose I create a df.csv file in in Excel that looks something like this
df = data.frame(
a = 1,
b = 2,
c = 3,
d = 4)
Then if I use sqldf to query the csv which is in my working directory I get the following error:
> read.csv.sql("df.csv", sql = "select * from file where a == 1")
Error in result_create(conn#ptr, statement) : no such column: a
If I query a different column than the first, I get a result but with the output of the unusual characters as seen below
df <- read.csv.sql("df.csv", sql = "select * from file where b == 2")
View(df)
Any idea how to prevent these characters from being added to the first column name?
The problem is presumably that you have a file that is larger than R can handle and so only want to read a subset of rows into R and specifying the condition to filter it by involves referring to the first column whose name is messed up so you can't use it.
Here are two alternative approaches. The first one involves a bit more code but has the advantage that it is 100% R. The second one is only one statement and also uses R but additionally makes use an of an external utility.
1) skip header Read the file in skipping over the header. That will cause the columns to be labelled V1, V2, etc. and use V1 in the condition.
# write out a test file - BOD is a data frame that comes with R
write.csv(BOD, "BOD.csv", row.names = FALSE, quote = FALSE)
# read file skipping over header
DF <- read.csv.sql("BOD.csv", "select * from file where V1 < 3",
skip = 1, header = FALSE)
# read in header, assign it to DF and fix first column
hdr <- read.csv.sql("BOD.csv", "select * from file limit 0")
names(DF) <- names(hdr)
names(DF)[1] <- "TIME" # suppose we want TIME instead of Time
DF
## TIME demand
## 1 1 8.3
## 2 2 10.3
2) filter Another way to proceed is to use the filter= argument. Here we assume we know that the end of the column name is ime but there are other characters prior to that that we don't know. This assumes that sed is available and on your path. If you are on Windows install Rtools to get sed. The quoting might need to be changed depending on your shell.
When trying this on Windows I noticed that sed from Rtools changed the line endings so below we specified eol= to ensure correct processing. You may not need that.
DF <- read.csv.sql("BOD.csv", "select * from file where TIME < 3",
filter = 'sed -e "1s/.*ime,/TIME,/"' , eol = "\n")
DF
## TIME demand
## 1 1 8.3
## 2 2 10.3
So I figured it out by reading through the above comments.
I'm on a Windows 10 machine using Excel for Office 365. The special characters will go away by changing how I saved the file from a "CSV UTF-8 (Comma Delimited)" to just "CSV (Comma delimited)".

Pandas not consistently skipping input number of rows for skiprows argument?

I am using Pandas to organize CSV files to later plot with matplotlib. First I create a Pandas dataframe to find the line containing 'Pt'. This is what I search for to use as my header line. header
Then I save the index of this line and apply it to the skiprow argument when creating the new dataframe which I will use.
Oddly, depending on the file format, even though the correct index is found, the wrong line shows up as the header. For example, note how in Pandas line 54 has 'Pt" right after the tab:
correct index on first file
The dataframe comes out correctly here.
correct dataframe on first file
For another file, line 44 is correctly recognized with having 'Pt'.
correct index on second file
But the dataframe includes line 43 as the header!
incorrect dataframe on second file
I have tried setting header=0, header=none. Am I missing something?
Here is the code
entire_df = pd.read_csv(file_path, header=None)
print(entire_df.head(60))
header_idx = -1
for index, row in entire_df.iterrows(): # find line with desired header
if any(row.str.contains('Pt')):
print("Yes! I have pt!")
print("Header index is: " + str(index))
print("row contains:")
print(entire_df.loc[[index]])
header_idx = index # correct index obtained!
break
df = pd.read_csv(file_path, delimiter='\t', skiprows=header_idx, header=0) # use line index to exclude extra information above
print(df.head())
Here are sections of the two files that give different results. They are saved as .dta files. I cannot share the entire files.
file1 (properly made dataframe)
FRAMEWORKVERSION QUANT 7.07 Framework Version
INSTRUMENTVERSION LABEL 4.32 Instrument Version
CURVE TABLE 16875
Pt T Vf Im Vu Pwr Sig Ach Temp IERange Over
# s V A V W V V deg C # bits
0 0.1 3.49916E+000 -1.40364E-002 0.00000E+000 -4.91157E-002 -4.22328E-001 0.00000E+000 1.41995E+003 11 ...........
1 0.2 3.49439E+000 -1.40305E-002 0.00000E+000 -4.90282E-002 -4.22322E-001 0.00000E+000 1.41995E+003 11 ...........
2 0.3 3.49147E+000 -1.40258E-002 0.00000E+000 -4.89705E-002 -4.22322E-001
file2 (dataframe with wrong header)
FRAMEWORKVERSION QUANT 7.07 Framework Version
INSTRUMENTVERSION LABEL 4.32 Instrument Version
CURVE TABLE 18
Pt T Vf Vm Ach Over Temp
# s V vs. Ref. V V bits deg C
0 2.00833 3.69429E+000 3.69429E+000 0.00000E+000 ........... 1419.95
1 4.01667 3.69428E+000 3.69352E+000 0.00000E+000 ........... 1419.95
2 6.025 3.69419E+000 3.69284E+000 0.00000E+000 ........... 1419.95
3 8.03333 3.69394E+000 3.69211E+000 0.00000E+000 ........... 1419.95
Help would be much appreciated.
You should pay attention to your indentation levels. Your code block in which you want to set the header_idx depending on your if any(row.str.contains('Pt')) condition has the same intendation level as the if statement, which means it is executed at each iteration of the for loop, and not just when the condition is met.
for index, row in entire_df.iterrows():
if any(row.str.contains('Pt')):
[...]
header_idx = index
Adapt the indentation like that to put the assignment under the control of the if statement:
for index, row in entire_df.iterrows():
if any(row.str.contains('Pt')):
[...]
header_idx = index

Assign a value to a gnuplot function out of a data file

In Gnuplot, I try to assign a value to a function f(x) from a two column data file in order to plot that function as a horizontal line.
f(x)=value of $2 at $1==2 from filename.dat
plot 'filename.dat' , ' ' x<=2?f(x):1/0
I tried :
awk '$1==2,print{$2}' filename.dat
but it says :
Warning: encountered a string when expecting a number
Did you try to generate a file name using dummy variable x or y?".
Any suggestions?
P.S.: I know that there should be a < sign after awk, but it would not display it here.
I suggest this approach:
filename = "test.txt"
y = system(sprintf("awk '$1==2{print $2}' %s", filename))
f(x)=real(y)
plot filename u 1:2 w l, f(x) t ''
It seems that gnuplot is interpreting the value of y as a string. Therefore, we have to enforce a conversion to a number, which I do by converting it to a real number. (Another possibility f(x)=y+0)
Here is the ouput for my demo test.txt with the following content:
1 1
2 0.5
3 0

Plotting a function directly from a text file

Is there a way to plot a function based on values from a text file?
I know how to define a function in gnuplot and then plot it but that is not what I need.
I have a table with constants for functions that are updated regularly. When this update happens I want to be able to run a script that draws a figure with this new curve. Since there are quite few figures to draw I want to automate the procedure.
Here is an example table with constants:
location a b c
1 1 3 4
2
There are two ways I see to solve the problem but I do not know if and how they can be implemented.
I can then use awk to produce the string: f(x)=1(x)**2+3(x)+4, write it to a file and somehow make gnuplot read this new file and plot on a certain x range.
or use awk inside gnuplot something like f(x) = awk /1/ {print "f(x)="$2 etc., or use awk directly in the plot command.
I any case, I'm stuck and have not found a solution to this problem online, do you have any suggestions?
Another possibilty to have a somewhat generic version for this, you can do the following:
Assume, the parameters are stored in a file parameters.dat with the first line containing the variable names and all others the parameter sets, like
location a b c
1 1 3 4
The script file looks like this:
file = 'parameters.dat'
par_names = system('head -1 '.file)
par_cnt = words(par_names)
# which parameter set to choose
par_line_num = 2
# select the respective string
par_line = system(sprintf('head -%d ', par_line_num).file.' | tail -1')
par_string = ''
do for [i=1:par_cnt] {
eval(word(par_names, i).' = '.word(par_line, i))
}
f(x) = a*x**2 + b*x + c
plot f(x) title sprintf('location = %d', location)
This question (gnuplot store one number from data file into variable) had some hints for me in the first answer.
In my case I have a file which contains parameters for a parabola. I have saved the parameters in gnuplot variables. Then I plot the function containing the parameter variables for each timestep.
#!/usr/bin/gnuplot
datafile = "parabola.txt"
set terminal pngcairo size 1000,500
set xrange [-100:100]
set yrange [-100:100]
titletext(timepar, apar, cpar) = sprintf("In timestep %d we have parameter a = %f, parameter c = %f", timepar, apar, cpar)
do for [step=1:400] {
set output sprintf("parabola%04d.png", step)
# read parameters from file, where the first line is the header, thus the +1
a=system("awk '{ if (NR == " . step . "+1) printf \"%f\", $1}' " . datafile)
c=system("awk '{ if (NR == " . step . "+1) printf \"%f\", $2}' " . datafile)
# convert parameters to numeric format
a=a+0.
c=c+0.
set title titletext(step, a, c)
plot c+a*x**2
}
This gives a series of png files called parabola0001.png,
parabola0002.png,
parabola0003.png,
…, each showing a parabola with the parameters read from the file called parabola.txt. The title contains the parameters of the given time step.
For understanding the gnuplot system() function you have to know that:
stuff inside double quotes is not parsed by gnuplot
the dot is for concatenating strings in gnuplot
the double quotes for the awk printf command have to be escaped, to hide them from gnuplot parser
To test this gnuplot script, save it into a file with an arbitrary name, e.g. parabolaplot.gplot and make it executable (chmad a+x parabolaplot.gplot). The parabola.txt file can be created with
awk 'BEGIN {for (i=1; i<=1000; i++) printf "%f\t%f\n", i/200, i/100}' > parabola.txt
awk '/1/ {print "plot "$2"*x**2+"$3"*x+"$4}' | gnuplot -persist
Will select the line and plot it
This was/is another question about how to extract specific values into variables with gnuplot (maybe it would be worth to create a Wiki entry about this topic).
There is no need for using awk, you can do this simply with gnuplot only (hence platform-independent), even with gnuplot 4.6.0 (March 2012).
You can do a stats (check help stats) and assign the values to variables.
Data: SO15007620_Parameters.txt
location a b c
1 1 3 4
2 -1 2 3
3 2 1 -1
Script: (works with gnuplot 4.6.0, March 2012)
### read parameters from separate file into variables
reset
FILE = "SO15007620_Parameters.txt"
myLine = 1 # line index 0-based
stats FILE u (a=$2, b=$3, c=$4) every ::myLine::myLine nooutput
f(x) = a*x**2 + b*x + c
plot f(x) w l lc rgb "red" ti sprintf("f(x) = %gx^2 + %gx + %g", a,b,c)
### end of script
Result: