Use awk to set the title of a data series in gnuplot, based on the filename - awk

I'm trying to use gnuplot to compare several data files. I get those files through a list, using an ls command, and then I plot those files.
List = "`echo $(ls GRID/3D/Ratio_5/DATA.dat.* | sort -V )`"
This gives me a list of files DATA.dat.0001 DATA.dat.0002 ... DATA.dat.0100 representing different times, and then I can plot them using
plot for [i in List] i u 1:2 title system('basename '.i)
In that case, no issues. However, in some cases, I need to compare different Ratio_ directories (no time dependence here), so I create my list
List = "`echo $(ls GRID/3D/Ratio_*/DATA.dat | sort -V )`"
But the same plot command gives all my series the same name: DATA.dat. I tried
plot for [i in List] i u 1:2 title system('dirname '.i)
but it prints the whole path except DATA.dat, and I'm just interested on the Ratio_ number.
I create a function to extract the Ratio from the filename string
Ratio(filename) = "awk -F / '{print $3}' ".filename
plot for [i in List] i u 1:2 title Ratio(i)
but it just print the whole thing awk -F / '{print $3}' GRID/3D/Ratio_1/DATA.dat for every data series. I tried to add a system call within the definition of Ratio(filename) without success
Ratio(filename)=system("awk '{print $3}' " .filename)
plot for [i in List] i u 1:2 title Ratio(i)
but now each data series is named like the whole 3rd column of the corresponding DATA.dat, with its several hundred lines.
I'm out of ideas. This is probably just a simple syntax problem, a misplaced quote, point or comma. I know I can call my gnuplot script in the 3D directory and use dirname option, but that's not what I want. I'm calling this script along with many others placed in the same directory, to create plots for a scientific article, and placing those scripts in different places would be a mess.
Any help is appreciated. Thanks

When you do
Ratio(filename) = "awk -F / '{print $3}' ".filename
you are just creating a string, not executing it. Furthermore, when you do
Ratio(filename)=system("awk '{print $3}' " .filename)
you're asking awk to parse the file's content, rather than its name. Try:
Ratio(filename)=system("echo ".filename." | awk -F / '{print $3}'")

You need to quote List to preserve the linefeeds in it when you pass it to awk. Compare the following:
List=$(ls GRID/3D/Ratio*/DATA.dat)
echo "$List"
GRID/3D/Ratio_1/DATA.dat
GRID/3D/Ratio_2/DATA.dat
echo $List
GRID/3D/Ratio_1/DATA.dat GRID/3D/Ratio_2/DATA.dat
and this:
awk -F\/ '{print $3}' <<< $list
awk -F\/ '{print $3}' <<< "$List"
Ratio_1
Ratio_2

Related

I need to sum all the values in a column across multiple files

I have a directory with multiple csv text files, each with a single line in the format:
field1,field2,field3,560
I need to output the sum of the fourth field across all files in a directory (can be hundreds or thousands of files). So for an example of:
file1.txt
field1,field2,field3,560
file2.txt
field1,field2,field3,415
file3.txt
field1,field2,field3,672
The output would simply be:
1647
I've been trying a few different things, with the most promising being an awk command that I found here in response to another user's question. It doesn't quite do what I need it to do, and I am an awk newb so I'm unsure how to modify it to work for my purpose:
awk -F"," 'NR==FNR{a[NR]=$4;next}{print $4+a[FNR]:' file1.txt file2.txt
This correctly outputs 975.
However if I try pass it a 3rd file, rather than add field 4 from all 3 files, it adds file1 to file2, then file1 to file3:
awk -F"," 'NR==FNR{a[NR]=$4;next}{print $4+a[FNR]:' file1.txt file2.txt file3.txt
975
1232
Can anyone show me how I can modify this awk statement to accept more than two files or, ideally because there are thousands of files to sum up, an * to output the sum of the fourth field of all files in the directory?
Thank you for your time and assistance.
A couple issues with the current code:
NR==FNR is used to indicate special processing for the 1st file; in this case there is no processing that is 'special' for just the 1st file (ie, all files are to be processed the same)
an array (eg, a[NR]) is used to maintain a set of values; in this case you only have one global value to maintain so there is no need for an array
Since you're only looking for one global sum, a bit more simpler code should suffice:
$ awk -F',' '{sum+=$4} END {print sum+0}' file{1..3}.txt
1647
NOTES:
in the (unlikely?) case all files are empty, sum will be undefined so print sum will display a blank link; sum+0 insures we print 0 if sum remains undefined (ie, all files are empty)
for a variable number of files file{1..3}.txt can be replaced with whatever pattern will match on the desired set of files, eg, file*.txt, *.txt, etc
Here we go (no need to test NR==FNR in a concatenation):
$ cat file{1,2,3}.txt | awk -F, '{count+=$4}END{print count}'
1647
Or same-same 🇹🇭 (without wasting some pipe(s)):
$ awk -F, '{count+=$4}END{print count}' file{1,2,3}.txt
1647
$ perl -MList::Util=sum0 -F, -lane'push #a,$F[3];END{print sum0 #a}' file{1..3}.txt
1647
$ perl -F, -lane'push #a,$F[3];END{foreach(#a){ $sum +=$_ };print "$sum"}' file{1..3}.txt
1647
$ cut -d, -f4 file{1..3}.txt | paste -sd+ - | bc
1647

awk command to print columns with colum data

cat file1.txt | awk -F '{print $1 "|~|" $2 "|~|" $3}' > file2.txt
I am using above command to filter first three columns from file1 and put into file.
But only getting the column names and not the column data.
How to do that?
|~| - is the delimiter.
file1.txt has values as :
a|~|b|~|c|~|d|~|e
1|~|2|~|3|~|4|~|5
11|~|22|~|33|~|44|~|55
111|~|222|~|333|~|444|~|555
my expedted output is :
a|~|b|~|c
1|~|2|~|3
11|~|22|~|33
111|~|222|~|333
With your shown samples, please try following awk code. You need to set field separator to |~| and remove starting space from lines, then print the lines.
awk -F'\\|~\\|' -v OFS='|~|' '{sub(/^[[:blank:]]+/,"");print $1,$2,$3}' Input_file
In case you want to keep spaces(which was in initial post before edit) then try following:
awk -F'\\|~\\|' -v OFS='|~|' '{print $1,$2,$3}' Input_file
NOTE: Had a chat with user in room and got to know why this code was not working for user because of gunzip -c file was being used wrongly, its output was being saved into a variable on which user was running awk program, so correcting that command generated right file and awk program ran fine on it. Adding this as a reference for future readers.
One approach would be:
awk -v FS="," -v OFS="|~|" '{gsub(/[|][~][|]/,","); sub(/^\s*/,""); print $1,$2,$3}' file1.txt
The approach simply replaces all "|~|" with a "," setting the output file separator to "|~|". All leading whitespace is trimmed with sub().
Example Use/Output
With your data in file1.txt, you would have:
$ awk -v FS="," -v OFS="|~|" '{gsub(/[|][~][|]/,","); sub(/^\s*/,""); print $1,$2,$3}' file1.txt
a|~|b|~|c
1|~|2|~|3
11|~|22|~|33
111|~|222|~|333
Let me know if this is what you intended. You can simply redirect, e.g. > file2.txt to write to the second file.
For such cases, my bash+awk script rcut comes in handy:
rcut -Fd'|~|' -f-3 ip.txt
The -F option enables fixed string input delimiter (which is given using the -d option). And by default, the output field separator will also be same as -d when -F is active. -f-3 is similar to cut syntax to specify first three fields.
For better speed, use hck command:
hck -Ld'|~|' -D'|~|' -f-3 ip.txt
Here, -L enables literal field separator and -D specifies output field separator.
Another benefit is that hck supports -z option to automatically handle common compressed formats based on filename extension (adding this since OP had an issue with compressed input).
Another way:
sed 's/|~|/\t/g' file1.txt | awk '{print $1"|~|"$2"|~|"$3}' > file2.txt
First replace the |~| delimiter, and use the default awk separator, then print columns what you need.

Bash code for Selecting few columns from a variable

In a file I have a list of coordinates stored (see figure, to the left).
From there I want to copy the coordinates only (red marked) and put them in another file.
I copy the correct section from the file using COORD=`grep -B${i} '&END COORD' ${cpki_file}. Then I tried to use awk to extract the required numbers from the COORD variable . It does output all the numbers in the file but deletes the spaces between values (figure, to the right).
How to write the red marked section as they are?
N=200
NEndCoord=`grep -B${N} '&END COORD' ${cpki_file}|wc -l`
NCoord=`grep -B${N} '&END COORD' ${cpki_file}| grep -B200 '&COORD' |wc -l`
let i=$NEndCoord-$NCoord
COORD=`grep -B${i} '&END COORD' ${cpki_file}`
echo "$COORD" | awk '{ print $2 $3 $4 }'
echo "$COORD" | awk '{ print $2 $3 $4 }'>tmp.txt
When you start using combinations of grep, sed, awk, cut and alike, you should realize you can do it all in a single awk command. In case of the OP, this would do exactly the same:
awk '/[&]END COORD/{p=0}
p { print $2,$3,$4 }
/[&]COORD/{p=1}' file
This parses the file keeping track of a printing flag p. The flag is set if "&COORD" is found and unset if "&END COORD" is found. Printing is done, only when the flag p is set. Since we don't want to print the line with "&END COORD", we have to reset the flag before we do the check for the printing. The same holds for the line with "&COORD", but there we have to reset it after we do the check for the printing (its a bit a weird reversed logic).
The problem with the above is that it will also process the lines
UNIT angstrom
If you want to have these removed, you might want to do a check on the total columns:
awk '/[&]END COORD/{p=0}
p && (NF==4){ print $2,$3,$4 }
/[&]COORD/{p=1}' file
Of only print the lines which do not contain "UNIT" or are empty:
awk '/[&]END COORD/{p=0}
p && (NF>0) && ($1 != "UNIT"){ print $2,$3,$4 }
/[&]COORD/{p=1}' file
sed one-liner:
sed -n '/^&COORD$/,/^UNIT/{s/.*[[:space:]]\+\(.*\)[[:space:]]\+\(.*\)[[:space:]]\+\(.*\)/\1\t\2\t\3/p}' <infile.txt >outfile.txt
Explanation:
Invocation:
sed: stream editor
-n: do not print unless eplicit
Commands in sed:
/^&COORD$/,/^UNIT/: Selects groups of lines after &COORDS and before UNIT.
{s/.*[[:space:]]\+\(.*\)[[:space:]]\+\(.*\)[[:space:]]\+\(.*\)/\1\t\2\t\3/p}: Process each selected lines.
s/.*[[:space:]]\+\(.*\)[[:space:]]\+\(.*\)[[:space:]]\+\(.*\): Regex capture space delimited groups except the first.
/\1\t\2\t\3/: Replace with tab delimited values of the captured groups.
p: Explicit printout.

How to AWK print only specific item?

I have a log file that looks like this:
RPT_LINKS=1,T1999
RPT_NUMALINKS=1
RPT_ALINKS=1,1999TK,2135,2009,31462,29467,2560
RPT_TXKEYED=1
RPT_ETXKEYED=0
I have used grep to isolate the line I am interested in with the RPT_ALINKS. In that line I want to know how to use AWK to print only the link that ends with a TK.
I am really close running this:
grep -w 'RPT_ALINKS' stats2.log | awk -F 'TK' '{print FS }'
But I am sure those who are smarter than me already know I am getting only the TK back, how do I get the entire field so that I would get a return of 1999TK?
If there is only a single RT in that line and RT is always at the end:
awk '/RPT_ALINKS/{match($0,/[^=,]*TK/); print substr($0,RSTART,RLENGTH)}'
You can also use a double grep
grep -w 'RPT_ALINKS' stats2.log | grep -wo '[^=,]*TK'
The following sed solution also works nicely:
sed '/RPT_ALINKS/s/\(^.*[,=]\)\([^=,]*TK\)\(,.*\)\?/\2/'
It doesn't get any more elegant
awk -F '=' '$1=="RPT_ALINKS" {n=split($2,array,",")
for(i=1; i<=n; i++)
if (array[i] ~ /TK$/)
{print array[i]}}
' stats2.log
n=split($2,array,","): split 1,1999TK,2135,2009,31462,29467,2560 with , to array array. n contains number of array elements, here 7.
Here is a simple solution
awk -F ',|=' '/^RPT_ALINKS/ { for (i=1; i<=NF; i++) if ($i ~ /TK$/) print $i }' stats2.log
It looks only on the record which begins with RPT_ALINKS. And there it check every field. If field ends with TK, then it prints it.
Dang, I was just about to post the double-grep alternative, but got scooped. And all the good awk solutions are taken as well.
Sigh. So here we go in bash, for fun.
$ mapfile a < stats2.log
$ for i in "${a[#]}"; do [[ $i =~ ^RPT_ALINKS=(.+,)*([^,]+TK) ]] && echo "${BASH_REMATCH[2]}"; done
1999TK
This has the disadvantage of running way slower than awk and not using fields. Oh, and it won't handle multiple *TK items on a single line. And like sed, this is processing lines as patterns rather than fields, which saps elegance. And by using mapfile, we limit the size of input you can handle because your whole log is loaded into memory. Of course you don't really need to do that, but if you were going to use a pipe, you'd use a different tool anyway. :-)
Happy Thursday.
With a sed that has -E for EREs, e.g. GNU or OSX/BSD sed:
$ sed -En 's/^RPT_ALINKS=(.*,)?([^,]*TK)(,.*|$)/\2/p' file
1999TK
With GNU awk for the 3rd arg to match():
$ awk 'match($0",",/^RPT_ALINKS=(.*,)?([^,]*TK),.*/,a){print a[2]}' file
1999TK
Instead of looping through it, you can use an other alternative.
This will be fast, loop takes time.
awk -F"TK" '/RPT_ALINKS/ {b=split($1,a,",");print a[b]FS}' stats2.log
1999TK
Here you split the line by setting field separator to TK and search for line that contains RPT_ALINKS
That gives $1=RPT_ALINKS=1,1999 and $2=,2135,2009,31462,29467,2560
$1 will always after last comma have our value.
So split it up using split function by comma. b would then contain number of fields.
Since we know that number would be in last section we do use a[b] and add FS that contains TK

Pipe grep results to awk, get field, pipe field list as pattern to grep in same file

I'm having a hard time figuring out a one-liner to search for a pattern in a file, get a field from the matching lines, and search the same file again with that field as a pattern, to get a different field. You could say the data looks like this:
CONNECT=desired_output CONNECTION=pattern2
SEARCH=pattern1 CONNECTION=pattern2
So far I've tried these unholy messes to no avail:
zgrep -i pattern1 file | awk '{print $8}' | xargs -I % zgrep % file | zgrep pattern2
zgrep -i pattern1 file | awk '{print $8}' | zgrep -f - file
If I leave off the last grep, it gives results (which would need another grep and awk to get the desired output field). If I include the last grep, it never comes back with output.
Similar questions seem to point to awk being able to do this whole concept, but I can't decipher the examples given to adjust to my use case.
Using each line of awk output as grep pattern
How to run grep inside awk?
Sounds like you want something like this, untested since you didn't provide testable input/output:
awk '
NR==FNR {
if ( /constant_regexp1/ ) {
dynamic_regexp2 = $8
}
next
}
$0 ~ dynamic_regexp2
' file file
Never use the word "pattern" btw as it's ambiguous, always use "string" or "regexp", whichever it is you mean.