How to print insert in awk loop - awk

How to print the file name in the loop? I want to print the file name and the average value of column 4 at same line:
for i in `ls *cov`
do
awk '{sum +=$4;n++}END{print sum/n}' $i
done
I mean I want to
awk '{sum +=$4;n++}END{print $i\t sum/n}' $i

You can use bash variables in an awk script using the -v flag:
awk -v file=$i '{sum +=$4;n++}END{print file\t sum/n}' $i
But, there is also the built in awk variable FILENAME:
awk '{sum +=$4;n++}END{print FILENAME\t sum/n}' $i
Which is much cleaner since you aren't passing around variables.

Lose the loop (see why-is-using-a-shell-loop-to-process-text-considered-bad-practice) and just use:
awk -v OFS='\t' '{sum+=$4} ENDFILE{print FILENAME, (FNR>0 ? sum/FNR : 0); sum=0}' *cov
The above uses GNU awk for ENDFILE, there's simple tweaks for other awks but the important things are:
A surrounding shell loop is neither required nor desirable.
The variable n isn't needed since awk has builtin variables.
You have to protect yourself from divide by zero on empty files.

Related

compare on number of awk result is not correct

I use awk to get the number of fields for multiple files, and then use if [[ ]] to judge whether the number of fields equal to an exact number, if so, then return the file name.
The code is as follows:
for file in /root/TB_MOVIL_CDR/incorrect_files/*
do
num=$(awk -F '|' '{print NF}' $file)
if [[ $num -eq 24 ]];then
echo $file
fi
done
But if found the result is not correct, I am confused,
the syntax of if [[ $num -eq 24 ]] is wrong ?
No need for bash commands when you do use awk, try this:
awk -F'|' 'NF==24 {print FILENAME}' /root/TB_MOVIL_CDR/incorrect_files/*
This will test if number of fields are 24 in each line of each file, and if it finds a line with 24 fields, it will print the file name. NB if there are several files with 24 fields it print filename multiple times. This can be avoided if needed.
You can use this gnu awk to print the file name only once for each file with 24 fields (gnu due to the ENDFILE function):
awk -F'|' 'NF==24 {f=1} ENDFILE {if (f) print FILENAME;f=0}' /root/TB_MOVIL_CDR/incorrect_files/*
A shorter gnu awk that print file name once for each file.
awk -F'|' 'NF==24 {print FILENAME;nextfile}' /root/TB_MOVIL_CDR/incorrect_files/*

Proper way to use variables in awk in a script? [duplicate]

I found some ways to pass external shell variables to an awk script, but I'm confused about ' and ".
First, I tried with a shell script:
$ v=123test
$ echo $v
123test
$ echo "$v"
123test
Then tried awk:
$ awk 'BEGIN{print "'$v'"}'
$ 123test
$ awk 'BEGIN{print '"$v"'}'
$ 123
Why is the difference?
Lastly I tried this:
$ awk 'BEGIN{print " '$v' "}'
$ 123test
$ awk 'BEGIN{print ' "$v" '}'
awk: cmd. line:1: BEGIN{print
awk: cmd. line:1: ^ unexpected newline or end of string
I'm confused about this.
#Getting shell variables into awk
may be done in several ways. Some are better than others. This should cover most of them. If you have a comment, please leave below.                                                                                    v1.5
Using -v (The best way, most portable)
Use the -v option: (P.S. use a space after -v or it will be less portable. E.g., awk -v var= not awk -vvar=)
variable="line one\nline two"
awk -v var="$variable" 'BEGIN {print var}'
line one
line two
This should be compatible with most awk, and the variable is available in the BEGIN block as well:
If you have multiple variables:
awk -v a="$var1" -v b="$var2" 'BEGIN {print a,b}'
Warning. As Ed Morton writes, escape sequences will be interpreted so \t becomes a real tab and not \t if that is what you search for. Can be solved by using ENVIRON[] or access it via ARGV[]
PS If you have vertical bar or other regexp meta characters as separator like |?( etc, they must be double escaped. Example 3 vertical bars ||| becomes -F'\\|\\|\\|'. You can also use -F"[|][|][|]".
Example on getting data from a program/function inn to awk (here date is used)
awk -v time="$(date +"%F %H:%M" -d '-1 minute')" 'BEGIN {print time}'
Example of testing the contents of a shell variable as a regexp:
awk -v var="$variable" '$0 ~ var{print "found it"}'
Variable after code block
Here we get the variable after the awk code. This will work fine as long as you do not need the variable in the BEGIN block:
variable="line one\nline two"
echo "input data" | awk '{print var}' var="${variable}"
or
awk '{print var}' var="${variable}" file
Adding multiple variables:
awk '{print a,b,$0}' a="$var1" b="$var2" file
In this way we can also set different Field Separator FS for each file.
awk 'some code' FS=',' file1.txt FS=';' file2.ext
Variable after the code block will not work for the BEGIN block:
echo "input data" | awk 'BEGIN {print var}' var="${variable}"
Here-string
Variable can also be added to awk using a here-string from shells that support them (including Bash):
awk '{print $0}' <<< "$variable"
test
This is the same as:
printf '%s' "$variable" | awk '{print $0}'
P.S. this treats the variable as a file input.
ENVIRON input
As TrueY writes, you can use the ENVIRON to print Environment Variables.
Setting a variable before running AWK, you can print it out like this:
X=MyVar
awk 'BEGIN{print ENVIRON["X"],ENVIRON["SHELL"]}'
MyVar /bin/bash
ARGV input
As Steven Penny writes, you can use ARGV to get the data into awk:
v="my data"
awk 'BEGIN {print ARGV[1]}' "$v"
my data
To get the data into the code itself, not just the BEGIN:
v="my data"
echo "test" | awk 'BEGIN{var=ARGV[1];ARGV[1]=""} {print var, $0}' "$v"
my data test
Variable within the code: USE WITH CAUTION
You can use a variable within the awk code, but it's messy and hard to read, and as Charles Duffy points out, this version may also be a victim of code injection. If someone adds bad stuff to the variable, it will be executed as part of the awk code.
This works by extracting the variable within the code, so it becomes a part of it.
If you want to make an awk that changes dynamically with use of variables, you can do it this way, but DO NOT use it for normal variables.
variable="line one\nline two"
awk 'BEGIN {print "'"$variable"'"}'
line one
line two
Here is an example of code injection:
variable='line one\nline two" ; for (i=1;i<=1000;++i) print i"'
awk 'BEGIN {print "'"$variable"'"}'
line one
line two
1
2
3
.
.
1000
You can add lots of commands to awk this way. Even make it crash with non valid commands.
One valid use of this approach, though, is when you want to pass a symbol to awk to be applied to some input, e.g. a simple calculator:
$ calc() { awk -v x="$1" -v z="$3" 'BEGIN{ print x '"$2"' z }'; }
$ calc 2.7 '+' 3.4
6.1
$ calc 2.7 '*' 3.4
9.18
There is no way to do that using an awk variable populated with the value of a shell variable, you NEED the shell variable to expand to become part of the text of the awk script before awk interprets it. (see comment below by Ed M.)
Extra info:
Use of double quote
It's always good to double quote variable "$variable"
If not, multiple lines will be added as a long single line.
Example:
var="Line one
This is line two"
echo $var
Line one This is line two
echo "$var"
Line one
This is line two
Other errors you can get without double quote:
variable="line one\nline two"
awk -v var=$variable 'BEGIN {print var}'
awk: cmd. line:1: one\nline
awk: cmd. line:1: ^ backslash not last character on line
awk: cmd. line:1: one\nline
awk: cmd. line:1: ^ syntax error
And with single quote, it does not expand the value of the variable:
awk -v var='$variable' 'BEGIN {print var}'
$variable
More info about AWK and variables
Read this faq.
It seems that the good-old ENVIRON awk built-in hash is not mentioned at all. An example of its usage:
$ X=Solaris awk 'BEGIN{print ENVIRON["X"], ENVIRON["TERM"]}'
Solaris rxvt
You could pass in the command-line option -v with a variable name (v) and a value (=) of the environment variable ("${v}"):
% awk -vv="${v}" 'BEGIN { print v }'
123test
Or to make it clearer (with far fewer vs):
% environment_variable=123test
% awk -vawk_variable="${environment_variable}" 'BEGIN { print awk_variable }'
123test
You can utilize ARGV:
v=123test
awk 'BEGIN {print ARGV[1]}' "$v"
Note that if you are going to continue into the body, you will need to adjust
ARGC:
awk 'BEGIN {ARGC--} {print ARGV[2], $0}' file "$v"
I just changed #Jotne's answer for "for loop".
for i in `seq 11 20`; do host myserver-$i | awk -v i="$i" '{print "myserver-"i" " $4}'; done
I had to insert date at the beginning of the lines of a log file and it's done like below:
DATE=$(date +"%Y-%m-%d")
awk '{ print "'"$DATE"'", $0; }' /path_to_log_file/log_file.log
It can be redirect to another file to save
Pro Tip
It could come handy to create a function that handles this so you dont have to type everything every time. Using the selected solution we get...
awk_switch_columns() {
cat < /dev/stdin | awk -v a="$1" -v b="$2" " { t = \$a; \$a = \$b; \$b = t; print; } "
}
And use it as...
echo 'a b c d' | awk_switch_columns 2 4
Output:
a d c b

awk print several substring

I would like to be able to print several substrings via awk.
Here an example of what I usually do;
awk' {print substr($0,index($0,string),10)} ' test.txt > result.txt
This allow me to print 10 letters after the discovery of my string.
But the result is the first one substring, instead of several as I expected.
Here an example if I use the string "ATGC" :
test.txt
ATGCATATAAATGCTTTTTTTTT
result.txt
ATGCATATAA
instead of
ATGCATATAA
ATGCTTTTTT
What I have to add ?
I'm sure the answer is easy for you guys !
Thank you for your help.
If you have gawk (gnu awk), you can make use of FPAT:
awk -v FPAT='ATGC.{6}' '{for(i=1;i<=NF;i++)print $i}' file
With your example:
$ awk -v FPAT='ATGC.{6}' '{for(i=1;i<=NF;i++)print $i}' <<<"ATGCATATAAATGCTTTTTTTTT"
ATGCATATAA
ATGCTTTTTT
awk '{print substr($0,1,10),RS substr($0,length -12,10)}' file
ATGCATATAA
ATGCTTTTTT

Combine grep -f and awk

I am using two commands:
awk '{ print $2 }' SomeFile.txt > Pattern.txt
grep -f Pattern.txt File.txt
With the first command I create a list of desirable patterns. With the second command I extract all lines in File.txt that match the lines in the Pattern.txt
My question is, is there a way to combine awk and grep in a pipeline so that I don't have to generate the intermediate Pattern.txt file?
Thanks!
You can do this all in one invocation of awk:
awk 'NR==FNR{a[$2];next}{for(i in a)if($0~i)print}' Somefile.txt File.txt
Populate keys in the array a from the second column of the first file. NR==FNR identifies the first file (total record number is equal to this file's record number). next skips the second block for the first file.
In the second block, loop through all the keys in the array and if the line matches any of them, print it. To avoid printing the line more than once if it matches more than one pattern, you could add a next here too, i.e. {for(i in a)if($0~i){print;next}}.
If the "patterns" are actually fixed strings, it is even simpler:
awk 'NR==FNR{a[$2];next}$0 in a' Somefile.txt File.txt
If your shell supports it, you can use process substitution:
grep -f <(awk '{ print $2 }' SomeFile.txt) File.txt
bash and zsh will support that, others will probably too, didn't tested.
Simpler as the above and supported by all shells would be to use a pipe:
awk '{ print $2 }' SomeFile.txt | grep -f - File.txt
- is used as the argument to -f. - has a special meaning here and stands for stdin. Thanks to Tom Fenech for mentioning that!

awk first line not working removing columns

I'm trying to remove columns beyond number 26 from all lines of a file, using this code:
awk '{ FS = ";" ; for(i=1;i<NF;i++) if (i<26) printf $i FS}{print $26}'
It is working well in all the lines but for the first one, where it shows 2 more fields (and cuts the last in two).
Is there anything wrong in my code?
Thanks a lot
This is because you set FS on every line, while it should be in a BEGIN{} block (or outside as a parameter, like others answers correctly suggest):
awk 'BEGIN{FS=";"} {for(i=1;i<NF;i++) if (i<26) printf $i FS}{print $26}' file
In fact, to accomplish your goal it is easier to use cut:
cut -d';' -f-26 file
^ ^^^
| all fields up to the 26th
delimiter
Example with 4 cols
sample file:
$ cat a
1col1;col2;col3;col4;col5;col6
2col1;col2;col3;col4;col5;col6
3col1;col2;col3;col4;col5;col6
previous code:
$ awk '{FS=";"; for(i=1;i<NF;i++) if (i<4) printf $i FS}{print $4}' a
2col1;col2;col3;col4
3col1;col2;col3;col4
new code:
$ awk 'BEGIN{FS=";"} {for(i=1;i<NF;i++) if (i<4) printf $i FS}{print $4}' a
1col1;col2;col3;col4
2col1;col2;col3;col4
3col1;col2;col3;col4
with cut:
$ cut -d';' -f-4 a
1col1;col2;col3;col4
2col1;col2;col3;col4
3col1;col2;col3;col4
You can try this awk,
awk -F';' 'NF>26{NF=26}1' OFS=';' yourfile
#fedorqui is right.
But you can also use this to set Field Separator :
awk -F";" '{for(i=1;i<NF;i++) if (i<26) printf $i FS}{print $26}' file