Praat scripting: creating a text file - scripting

just working with Praat at the moment, and I'm trying to write a script to do the following with a collection of 3 Sound (narrative) files. I've managed as far as c), the scripting part is relatively easy. What I don't get is how to write it to a text file with those columns. Any help would be great!
a) create a program that extracts all intervals on the phone tier of each of Narratives 1–3 which represent vowels whose label is a single letter, keeping times. I need each resulting Sound to have an appropriate label which identifies the vowel concerned
b) creates a Formant (burg) object corresponding to each of those intervals
c) calculates the midpoint of each Formant object
c) gets the values of formants 1, 2 and 3 at each of those midpoints
d) writes a text file with the following heading:
Narrative# Label Midpoint Time F1 F2 F3
and under that, the appropriate information for each vowel

The easy way
The easiest way to do this would be to write your output to a Table object and then use Praat's Save to comma-separated file command to save it to an external file. Examples below use the new (slightly more reasonable) new syntax, so make sure to update Praat before trying them out (or try the shorthand versions in this answer's edit history).
Here's an example:
# Create a Table with no rows
table = Create Table with column names:
..."table", 0, "Narrative Label Midpoint Time F1 F2 F3"
for i to number_of_intervals
# Assuming you have your Formant objects in an array named "burg"
selectObject(burg[i])
# Run your analysis here
# For this example, I'm assuming values for the columns are in
# variables called narrative$, label$, midpoint, time, f1, f2 and f3
selectObject(table)
Append row
current_row = Get number of rows
# Insert your values
Set string value: current_row, "Narrative", narrative$
Set string value: current_row, "Label", label$
Set numeric value: current_row, "Midpoint", midpoint
Set numeric value: current_row, "Time", time
Set numeric value: current_row, "F1", f1
Set numeric value: current_row, "F2", f2
Set numeric value: current_row, "F3", f3
endfor
# Save it!
# Remember to select it if the table is not the active selection at
# the end of the loop
Save to comma-separated file: /path/to/file
# And then you can get rid of it
removeObject(table)
Or you could use, if you prefer tabs
Save to tab-separated file: /path/to/file
Note that this method won't allow you to have "Narrative#" as a column name.
The 'l33t' way
Alternatively, you could use Praat's file directives write directly to the file as explained in the documentation:
sep$ = ","
# sep$ = tab$
# Create / overwrite file and write header
writeFileLine: "/path/to/file",
..."Narrative#" + sep$ +
..."Label" + sep$ +
..."Midpoint" + sep$ +
..."Time" + sep$ +
..."F1" + sep$ +
..."F2" + sep$ +
..."F3"
for i to number_of_intervals
selectObject(burg[i])
# Run your analysis here
appendFileLine: "/path/to/file",
...narrative$ + sep$ +
...label$ + sep$ +
...string$(midpoint) + sep$ +
...string$(time) + sep$ +
...string$(f1) + sep$ +
...string$(f2) + sep$ +
...string$(f3)
endfor

The Praat user group has an answer to a similar question here.

Related

Arcpy Script to loop through field and run Union Analysis

I have a polygon file in form of a fishnet. Also another feature class with polygons named Trawl_Buffers. There is a unique field within Trawl_Buffers based on YEAR. I'd like to create a script to run a selection on YEAR, and then perform a union analysis with the fishnet polygon for each YEAR. So the desired output would be "Trawl_Buffers_union2003", "Trawl_Buffers_union2004" etc. I have a function that will get me the unique list of the years, and puts them in a list which i called vals.
Then seems I need to run a for loop over this list of unique years, create a temporary selection, then use that as input for the union, but I am having trouble implementing the query process.
Here is where I started, but seriously tripping
import arcpy
#Set the data environment
arcpy.env.overwriteOutput = True
arcpy.env.workspace = r'C:\Data\working\AK_Fishing_VMS\2021_Delivery\ArcPro_proj\ArcPro_proj.gdb'
trawlBuffs = r'C:\Data\working\AK_Fishing_VMS\2021_Delivery\ArcPro_proj\ArcPro_proj.gdb\buffers\buffers_testing'
fishnet = r'C:\Data\working\AK_Fishing_VMS\2021_Delivery\ArcPro_proj\ArcPro_proj.gdb\fishnets\vms_net1k'
unionOut = r'C:\Data\working\AK_Fishing_VMS\2021_Delivery\ArcPro_proj\ArcPro_proj.gdb\unions\union'
# function to get unique values for the YEAR field found within the trawlBuffs fc
def unique_values(table, field):
with arcpy.da.SearchCursor(table, [field]) as cursor:
return sorted({row[0] for row in cursor})
# Get the unique values for the field 'YEAR' found within the 'trawl_buffs' featureclass table
vals = unique_values(trawlBuffs, "YEAR")
# Create a query string for the selected country
yearSelectionClause = '"YEAR" = ' + "'" + vals + "'"
#loop through the years, create selection, union, make permanent
for year in vals:
year_layer = str(year) + "_union"
arcpy.MakeFeatureLayer_management(trawlBuffs, year_layer)
arcpy.SelectLayerByAttribute_management(year_layer, "NEW_SELECTION", "\"YEAR"\" = %d" % (year))
arcpy.Union_analysis(fishnet, year_layer , unionOut)
arcpy.CopyFeatures_management(year_layer, "union_" + str(year))

Sqldf in R - error with first column names

Whenever I use read.csv.sql I cannot select from the first column with and any output from the code places an unusual character (A(tilde)-..) at the begging of the first column's name.
So suppose I create a df.csv file in in Excel that looks something like this
df = data.frame(
a = 1,
b = 2,
c = 3,
d = 4)
Then if I use sqldf to query the csv which is in my working directory I get the following error:
> read.csv.sql("df.csv", sql = "select * from file where a == 1")
Error in result_create(conn#ptr, statement) : no such column: a
If I query a different column than the first, I get a result but with the output of the unusual characters as seen below
df <- read.csv.sql("df.csv", sql = "select * from file where b == 2")
View(df)
Any idea how to prevent these characters from being added to the first column name?
The problem is presumably that you have a file that is larger than R can handle and so only want to read a subset of rows into R and specifying the condition to filter it by involves referring to the first column whose name is messed up so you can't use it.
Here are two alternative approaches. The first one involves a bit more code but has the advantage that it is 100% R. The second one is only one statement and also uses R but additionally makes use an of an external utility.
1) skip header Read the file in skipping over the header. That will cause the columns to be labelled V1, V2, etc. and use V1 in the condition.
# write out a test file - BOD is a data frame that comes with R
write.csv(BOD, "BOD.csv", row.names = FALSE, quote = FALSE)
# read file skipping over header
DF <- read.csv.sql("BOD.csv", "select * from file where V1 < 3",
skip = 1, header = FALSE)
# read in header, assign it to DF and fix first column
hdr <- read.csv.sql("BOD.csv", "select * from file limit 0")
names(DF) <- names(hdr)
names(DF)[1] <- "TIME" # suppose we want TIME instead of Time
DF
## TIME demand
## 1 1 8.3
## 2 2 10.3
2) filter Another way to proceed is to use the filter= argument. Here we assume we know that the end of the column name is ime but there are other characters prior to that that we don't know. This assumes that sed is available and on your path. If you are on Windows install Rtools to get sed. The quoting might need to be changed depending on your shell.
When trying this on Windows I noticed that sed from Rtools changed the line endings so below we specified eol= to ensure correct processing. You may not need that.
DF <- read.csv.sql("BOD.csv", "select * from file where TIME < 3",
filter = 'sed -e "1s/.*ime,/TIME,/"' , eol = "\n")
DF
## TIME demand
## 1 1 8.3
## 2 2 10.3
So I figured it out by reading through the above comments.
I'm on a Windows 10 machine using Excel for Office 365. The special characters will go away by changing how I saved the file from a "CSV UTF-8 (Comma Delimited)" to just "CSV (Comma delimited)".

Create new index using pandas by appending a comma

Currently the index in my data frame has the default values of 0,1,2 .. n-1 where n is the number of rows in the dataframe.
Is there a simple way to change the index values to 0,,1,,2,, ... n-1, where a comma is appended to each index value. So 0 becomes 0, and 1 becomes 1, and so on.
I'd keep it simple.
d.index = d.index.to_series().astype(str) + ','
I converted the existing index to a series so that I could conveniently add a comma to it. However I had to ensure it was of type str before I did so.
Sure, see below:
d = pd.DataFrame(...)
d.index = [str(i)+',' for i in d.index]
But what are you trying to do with this? It seems odd to modify the index like this. If you're trying to print the the data frame in a special format or something else there is probably a better way.
For custom output, you could do something like
for i, row in d.iterrows():
print i + ': ' + ', '.join(row)

Plotting a function directly from a text file

Is there a way to plot a function based on values from a text file?
I know how to define a function in gnuplot and then plot it but that is not what I need.
I have a table with constants for functions that are updated regularly. When this update happens I want to be able to run a script that draws a figure with this new curve. Since there are quite few figures to draw I want to automate the procedure.
Here is an example table with constants:
location a b c
1 1 3 4
2
There are two ways I see to solve the problem but I do not know if and how they can be implemented.
I can then use awk to produce the string: f(x)=1(x)**2+3(x)+4, write it to a file and somehow make gnuplot read this new file and plot on a certain x range.
or use awk inside gnuplot something like f(x) = awk /1/ {print "f(x)="$2 etc., or use awk directly in the plot command.
I any case, I'm stuck and have not found a solution to this problem online, do you have any suggestions?
Another possibilty to have a somewhat generic version for this, you can do the following:
Assume, the parameters are stored in a file parameters.dat with the first line containing the variable names and all others the parameter sets, like
location a b c
1 1 3 4
The script file looks like this:
file = 'parameters.dat'
par_names = system('head -1 '.file)
par_cnt = words(par_names)
# which parameter set to choose
par_line_num = 2
# select the respective string
par_line = system(sprintf('head -%d ', par_line_num).file.' | tail -1')
par_string = ''
do for [i=1:par_cnt] {
eval(word(par_names, i).' = '.word(par_line, i))
}
f(x) = a*x**2 + b*x + c
plot f(x) title sprintf('location = %d', location)
This question (gnuplot store one number from data file into variable) had some hints for me in the first answer.
In my case I have a file which contains parameters for a parabola. I have saved the parameters in gnuplot variables. Then I plot the function containing the parameter variables for each timestep.
#!/usr/bin/gnuplot
datafile = "parabola.txt"
set terminal pngcairo size 1000,500
set xrange [-100:100]
set yrange [-100:100]
titletext(timepar, apar, cpar) = sprintf("In timestep %d we have parameter a = %f, parameter c = %f", timepar, apar, cpar)
do for [step=1:400] {
set output sprintf("parabola%04d.png", step)
# read parameters from file, where the first line is the header, thus the +1
a=system("awk '{ if (NR == " . step . "+1) printf \"%f\", $1}' " . datafile)
c=system("awk '{ if (NR == " . step . "+1) printf \"%f\", $2}' " . datafile)
# convert parameters to numeric format
a=a+0.
c=c+0.
set title titletext(step, a, c)
plot c+a*x**2
}
This gives a series of png files called parabola0001.png,
parabola0002.png,
parabola0003.png,
…, each showing a parabola with the parameters read from the file called parabola.txt. The title contains the parameters of the given time step.
For understanding the gnuplot system() function you have to know that:
stuff inside double quotes is not parsed by gnuplot
the dot is for concatenating strings in gnuplot
the double quotes for the awk printf command have to be escaped, to hide them from gnuplot parser
To test this gnuplot script, save it into a file with an arbitrary name, e.g. parabolaplot.gplot and make it executable (chmad a+x parabolaplot.gplot). The parabola.txt file can be created with
awk 'BEGIN {for (i=1; i<=1000; i++) printf "%f\t%f\n", i/200, i/100}' > parabola.txt
awk '/1/ {print "plot "$2"*x**2+"$3"*x+"$4}' | gnuplot -persist
Will select the line and plot it
This was/is another question about how to extract specific values into variables with gnuplot (maybe it would be worth to create a Wiki entry about this topic).
There is no need for using awk, you can do this simply with gnuplot only (hence platform-independent), even with gnuplot 4.6.0 (March 2012).
You can do a stats (check help stats) and assign the values to variables.
Data: SO15007620_Parameters.txt
location a b c
1 1 3 4
2 -1 2 3
3 2 1 -1
Script: (works with gnuplot 4.6.0, March 2012)
### read parameters from separate file into variables
reset
FILE = "SO15007620_Parameters.txt"
myLine = 1 # line index 0-based
stats FILE u (a=$2, b=$3, c=$4) every ::myLine::myLine nooutput
f(x) = a*x**2 + b*x + c
plot f(x) w l lc rgb "red" ti sprintf("f(x) = %gx^2 + %gx + %g", a,b,c)
### end of script
Result:

gnuplot store one number from data file into variable

OSX v10.6.8 and Gnuplot v4.4
I have a data file with 8 columns. I would like to take the first value from the 6th column and make it the title. Here's what I have so far:
#m1 m2 q taua taue K avgPeriodRatio time
#1 2 3 4 5 6 7 8
K = #read in data here
graph(n) = sprintf("K=%.2e",n)
set term aqua enhanced font "Times-Roman,18"
plot file using 1:3 title graph(K)
And here is what the first few rows of my data file looks like:
1.00e-07 1.00e-07 1.00e+00 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
1.11e-06 1.00e-07 9.02e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
2.12e-06 1.00e-07 4.72e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
3.13e-06 1.00e-07 3.20e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12090.00
I don't know how to correctly read in the data or if this is even the right way to go about this.
EDIT #1
Ok, thanks to mgilson I now have
#m1 m2 q taua taue K avgPeriodRatio time
#1 2 3 4 5 6 7 8
set term aqua enhanced font "Times-Roman,18"
K = "`head -1 datafile | awk '{print $6}'`"
print K+0
graph(n) = sprintf("K=%.2e",n)
plot file using 1:3 title graph(K)
but I get the error: Non-numeric string found where a numeric expression was expected
EDIT #2
file = "testPlot.txt"
K = "`head -1 file | awk '{print $6}'`"
K=K+0 #Cast K to a floating point number #this is line 9
graph(n) = sprintf("K=%.2e",n)
plot file using 1:3 title graph(K)
This gives the error--> head: file: No such file or directory
"testPlot.gnu", line 9: Non-numeric string found where a numeric expression was expected
You have a few options...
FIRST OPTION:
use columnheader
plot file using 1:3 title columnheader(6)
I haven't tested it, but this may prevent the first row from actually being plotted.
SECOND OPTION:
use an external utility to get the title:
TITLE="`head -1 datafile | awk '{print $6}'`"
plot 'datafile' using 1:3 title TITLE
If the variable is numeric, and you want to reformat it, in gnuplot, you can cast strings to a numeric type (integer/float) by adding 0 to them (e.g).
print "36.5"+0
Then you can format it with sprintf or gprintf as you're already doing.
It's weird that there is no float function. (int will work if you want to cast to an integer).
EDIT
The script below worked for me (when I pasted your example data into a file called "datafile"):
K = "`head -1 datafile | awk '{print $6}'`"
K=K+0 #Cast K to a floating point number
graph(n) = sprintf("K=%.2e",n)
plot "datafile" using 1:3 title graph(K)
EDIT 2 (addresses comments below)
To expand a variable in backtics, you'll need macros:
set macro
file="mydatafile.txt"
#THE ORDER OF QUOTES (' and ") IS CRUCIAL HERE.
cmd='"`head -1 ' . file . ' | awk ''{print $6}''`"'
# . is string concatenation. (this string has 3 pieces)
# to get a single quote inside a single quoted string
# you need to double. e.g. 'a''b' yields the string a'b
data=#cmd
To address your question 2, it is a good idea to familiarize yourself with shell utilities -- sed and awk can both do it. I'll show a combination of head/tail:
cmd='"`head -2 ' . file . ' | tail -1 | awk ''{print $6}''`"'
should work.
EDIT 3
I recently learned that in gnuplot, system is a function as well as a command. To do the above without all the backtic gymnastics,
data=system("head -1 " . file . " | awk '{print $6}'")
Wow, much better.
This is a very old question, but here's a nice way to get access to a single value anywhere in your data file and save it as a gnuplot-accessible variable:
set term unknown #This terminal will not attempt to plot anything
plot 'myfile.dat' index 0 every 1:1:0:0:0:0 u (var=$1):1
The index number allows you to address a particular dataset (separated by two carriage returns), while every allows you to specify a particular line.
The colon-separated numbers after every should be of the form 1:1:<line_number>:<block_number>:<line_number>:<block_number>, where the line number is the line with the the block (starting from 0), and the block number is the number of the block (separated by a single carriage return, again starting from 0). The first and second numbers say plot every 1 lines and every one data block, and the third and fourth say start from line <line_number> and block <block_number>. The fifth and sixth say where to stop. This allows you to select a single line anywhere in your data file.
The last part of the plot command assigns the value in a particular column (in this case, column 1) to your variable (var). There needs to be two values to a plot command, so I chose column 1 to plot against my variable assignment statement.
Here is a less 'awk'-ward solution which assigns the value from the first row and 6th column of the file 'Data.txt' to the variable x16.
set table
# Syntax: u 0:($0==RowIndex?(VariableName=$ColumnIndex):$ColumnIndex)
# RowIndex starts with 0, ColumnIndex starts with 1
# 'u' is an abbreviation for the 'using' modifier
plot 'Data.txt' u 0:($0==0?(x16=$6):$6)
unset table
A more general example for storing several values is given below:
# Load data from file to variable
# Gnuplot can only access the data via the "plot" command
set table
# Syntax: u 0:($0==RowIndex?(VariableName=$ColumnIndex):$ColumnIndex)
# RowIndex starts with 0, ColumnIndex starts with 1
# 'u' is an abbreviation for the 'using' modifier
# Example: Assign all values according to: xij = Data33[i,j]; i,j = 1,2,3
plot 'Data33.txt' u 0:($0==0?(x11=$1):$1),\
'' u 0:($0==0?(x12=$2):$2),\
'' u 0:($0==0?(x13=$3):$3),\
'' u 0:($0==1?(x21=$1):$1),\
'' u 0:($0==1?(x22=$2):$2),\
'' u 0:($0==1?(x23=$3):$3),\
'' u 0:($0==2?(x31=$1):$1),\
'' u 0:($0==2?(x32=$2):$2),\
'' u 0:($0==2?(x33=$3):$3)
unset table
print x11, x12, x13 # Data from first row
print x21, x22, x23 # Data from second row
print x31, x32, x33 # Data from third row