awk first line not working removing columns - awk

I'm trying to remove columns beyond number 26 from all lines of a file, using this code:
awk '{ FS = ";" ; for(i=1;i<NF;i++) if (i<26) printf $i FS}{print $26}'
It is working well in all the lines but for the first one, where it shows 2 more fields (and cuts the last in two).
Is there anything wrong in my code?
Thanks a lot

This is because you set FS on every line, while it should be in a BEGIN{} block (or outside as a parameter, like others answers correctly suggest):
awk 'BEGIN{FS=";"} {for(i=1;i<NF;i++) if (i<26) printf $i FS}{print $26}' file
In fact, to accomplish your goal it is easier to use cut:
cut -d';' -f-26 file
^ ^^^
| all fields up to the 26th
delimiter
Example with 4 cols
sample file:
$ cat a
1col1;col2;col3;col4;col5;col6
2col1;col2;col3;col4;col5;col6
3col1;col2;col3;col4;col5;col6
previous code:
$ awk '{FS=";"; for(i=1;i<NF;i++) if (i<4) printf $i FS}{print $4}' a
2col1;col2;col3;col4
3col1;col2;col3;col4
new code:
$ awk 'BEGIN{FS=";"} {for(i=1;i<NF;i++) if (i<4) printf $i FS}{print $4}' a
1col1;col2;col3;col4
2col1;col2;col3;col4
3col1;col2;col3;col4
with cut:
$ cut -d';' -f-4 a
1col1;col2;col3;col4
2col1;col2;col3;col4
3col1;col2;col3;col4

You can try this awk,
awk -F';' 'NF>26{NF=26}1' OFS=';' yourfile

#fedorqui is right.
But you can also use this to set Field Separator :
awk -F";" '{for(i=1;i<NF;i++) if (i<26) printf $i FS}{print $26}' file

Related

Proper way to use variables in awk in a script? [duplicate]

I found some ways to pass external shell variables to an awk script, but I'm confused about ' and ".
First, I tried with a shell script:
$ v=123test
$ echo $v
123test
$ echo "$v"
123test
Then tried awk:
$ awk 'BEGIN{print "'$v'"}'
$ 123test
$ awk 'BEGIN{print '"$v"'}'
$ 123
Why is the difference?
Lastly I tried this:
$ awk 'BEGIN{print " '$v' "}'
$ 123test
$ awk 'BEGIN{print ' "$v" '}'
awk: cmd. line:1: BEGIN{print
awk: cmd. line:1: ^ unexpected newline or end of string
I'm confused about this.
#Getting shell variables into awk
may be done in several ways. Some are better than others. This should cover most of them. If you have a comment, please leave below.                                                                                    v1.5
Using -v (The best way, most portable)
Use the -v option: (P.S. use a space after -v or it will be less portable. E.g., awk -v var= not awk -vvar=)
variable="line one\nline two"
awk -v var="$variable" 'BEGIN {print var}'
line one
line two
This should be compatible with most awk, and the variable is available in the BEGIN block as well:
If you have multiple variables:
awk -v a="$var1" -v b="$var2" 'BEGIN {print a,b}'
Warning. As Ed Morton writes, escape sequences will be interpreted so \t becomes a real tab and not \t if that is what you search for. Can be solved by using ENVIRON[] or access it via ARGV[]
PS If you have vertical bar or other regexp meta characters as separator like |?( etc, they must be double escaped. Example 3 vertical bars ||| becomes -F'\\|\\|\\|'. You can also use -F"[|][|][|]".
Example on getting data from a program/function inn to awk (here date is used)
awk -v time="$(date +"%F %H:%M" -d '-1 minute')" 'BEGIN {print time}'
Example of testing the contents of a shell variable as a regexp:
awk -v var="$variable" '$0 ~ var{print "found it"}'
Variable after code block
Here we get the variable after the awk code. This will work fine as long as you do not need the variable in the BEGIN block:
variable="line one\nline two"
echo "input data" | awk '{print var}' var="${variable}"
or
awk '{print var}' var="${variable}" file
Adding multiple variables:
awk '{print a,b,$0}' a="$var1" b="$var2" file
In this way we can also set different Field Separator FS for each file.
awk 'some code' FS=',' file1.txt FS=';' file2.ext
Variable after the code block will not work for the BEGIN block:
echo "input data" | awk 'BEGIN {print var}' var="${variable}"
Here-string
Variable can also be added to awk using a here-string from shells that support them (including Bash):
awk '{print $0}' <<< "$variable"
test
This is the same as:
printf '%s' "$variable" | awk '{print $0}'
P.S. this treats the variable as a file input.
ENVIRON input
As TrueY writes, you can use the ENVIRON to print Environment Variables.
Setting a variable before running AWK, you can print it out like this:
X=MyVar
awk 'BEGIN{print ENVIRON["X"],ENVIRON["SHELL"]}'
MyVar /bin/bash
ARGV input
As Steven Penny writes, you can use ARGV to get the data into awk:
v="my data"
awk 'BEGIN {print ARGV[1]}' "$v"
my data
To get the data into the code itself, not just the BEGIN:
v="my data"
echo "test" | awk 'BEGIN{var=ARGV[1];ARGV[1]=""} {print var, $0}' "$v"
my data test
Variable within the code: USE WITH CAUTION
You can use a variable within the awk code, but it's messy and hard to read, and as Charles Duffy points out, this version may also be a victim of code injection. If someone adds bad stuff to the variable, it will be executed as part of the awk code.
This works by extracting the variable within the code, so it becomes a part of it.
If you want to make an awk that changes dynamically with use of variables, you can do it this way, but DO NOT use it for normal variables.
variable="line one\nline two"
awk 'BEGIN {print "'"$variable"'"}'
line one
line two
Here is an example of code injection:
variable='line one\nline two" ; for (i=1;i<=1000;++i) print i"'
awk 'BEGIN {print "'"$variable"'"}'
line one
line two
1
2
3
.
.
1000
You can add lots of commands to awk this way. Even make it crash with non valid commands.
One valid use of this approach, though, is when you want to pass a symbol to awk to be applied to some input, e.g. a simple calculator:
$ calc() { awk -v x="$1" -v z="$3" 'BEGIN{ print x '"$2"' z }'; }
$ calc 2.7 '+' 3.4
6.1
$ calc 2.7 '*' 3.4
9.18
There is no way to do that using an awk variable populated with the value of a shell variable, you NEED the shell variable to expand to become part of the text of the awk script before awk interprets it. (see comment below by Ed M.)
Extra info:
Use of double quote
It's always good to double quote variable "$variable"
If not, multiple lines will be added as a long single line.
Example:
var="Line one
This is line two"
echo $var
Line one This is line two
echo "$var"
Line one
This is line two
Other errors you can get without double quote:
variable="line one\nline two"
awk -v var=$variable 'BEGIN {print var}'
awk: cmd. line:1: one\nline
awk: cmd. line:1: ^ backslash not last character on line
awk: cmd. line:1: one\nline
awk: cmd. line:1: ^ syntax error
And with single quote, it does not expand the value of the variable:
awk -v var='$variable' 'BEGIN {print var}'
$variable
More info about AWK and variables
Read this faq.
It seems that the good-old ENVIRON awk built-in hash is not mentioned at all. An example of its usage:
$ X=Solaris awk 'BEGIN{print ENVIRON["X"], ENVIRON["TERM"]}'
Solaris rxvt
You could pass in the command-line option -v with a variable name (v) and a value (=) of the environment variable ("${v}"):
% awk -vv="${v}" 'BEGIN { print v }'
123test
Or to make it clearer (with far fewer vs):
% environment_variable=123test
% awk -vawk_variable="${environment_variable}" 'BEGIN { print awk_variable }'
123test
You can utilize ARGV:
v=123test
awk 'BEGIN {print ARGV[1]}' "$v"
Note that if you are going to continue into the body, you will need to adjust
ARGC:
awk 'BEGIN {ARGC--} {print ARGV[2], $0}' file "$v"
I just changed #Jotne's answer for "for loop".
for i in `seq 11 20`; do host myserver-$i | awk -v i="$i" '{print "myserver-"i" " $4}'; done
I had to insert date at the beginning of the lines of a log file and it's done like below:
DATE=$(date +"%Y-%m-%d")
awk '{ print "'"$DATE"'", $0; }' /path_to_log_file/log_file.log
It can be redirect to another file to save
Pro Tip
It could come handy to create a function that handles this so you dont have to type everything every time. Using the selected solution we get...
awk_switch_columns() {
cat < /dev/stdin | awk -v a="$1" -v b="$2" " { t = \$a; \$a = \$b; \$b = t; print; } "
}
And use it as...
echo 'a b c d' | awk_switch_columns 2 4
Output:
a d c b

Exact string match in awk

I have a file test.txt with the next lines
1997 100 500 2010TJ
2010TJXML 16 20 59
I'm using the next awk line to get information only about string 2010TJ
awk -v var="2010TJ" '$0 ~ var {print $0}' test.txt
But the code print the two lines. I want to know how to get the line containing the exact string
1997 100 500 2010TJ
the string can be placed in any column of the file.
Several options:
Use a gawk word boundary (not POSIX awk...):
$ gawk '/\<2010TJ\>/' file
An actual space or tab or what is separating the columns:
$ awk '/^2010TJ /' file
Or compare the field directly to the string:
$ awk '$1=="2010TJ"' file
You can loop over the fields to test each field if you wish:
$ awk '{for (i=1;i<=NF;i++) if ($i=="2010TJ") {print; next}}' file
Or, given your example of setting a variable, those same using a variable:
$ gawk -v s=2010TJ '$0~"\\<" s "\\>"'
$ awk -v s=2010TJ '$0~"^" s " "'
$ awk -v s=2010TJ '$1==s'
Note the first is a little different than the second and third. The first is the standalone string 2010TJ anywhere in $0; the second and third is a string that starts with that string.
Try this (for testing only column 1) :
awk '$1 == "2010TJ" {print $0}' test.txt
or grep like (all columns) :
gawk '/\<2010TJ\>/ {print $0}' test.txt
Note
\< \> is word boundarys
another awk with word boundary
awk '/\y2010TJ\y/' file
note \y matches either beginning or end of a word.

Using pipe character as a field separator

I'm trying different commands to process csv file where the separator is the pipe | character.
While those commands do work when the comma is a separator, it throws an error when I replace it with the pipe:
awk -F[|] "NR==FNR{a[$2]=$0;next}$2 in a{ print a[$2] [|] $4 [|] $5 }" OFS=[|] file1.csv file2.csv
awk "{print NR "|" $0}" file1.csv
I tried, "|", [|], /| to no avail.
I'm using Gawk on windows. What I'm I missing?
You tried "|", [|] and /|. /| does not work because the escape character is \, whereas [] is used to define a range of fields, for example [,-] if you want FS to be either , or -.
To make it work "|" is fine, are you sure you used it this way? Alternativelly, escape it --> \|:
$ echo "he|llo|how are|you" | awk -F"|" '{print $1}'
he
$ echo "he|llo|how are|you" | awk -F\| '{print $1}'
he
$ echo "he|llo|how are|you" | awk 'BEGIN{FS="|"} {print $1}'
he
But then note that when you say:
print a[$2] [|] $4 [|] $5
so you are not using any delimiter at all. As you already defined OFS, do:
print a[$2], $4, $5
Example:
$ cat a
he|llo|how are|you
$ awk 'BEGIN {FS=OFS="|"} {print $1, $3}' a
he|how are
For anyone finding this years later: ALWAYS QUOTE SHELL METACHARACTERS!
I think gawk (GNU awk) treats | specially, so it should be quoted (for awk). OP had this right with [|]. However [|] is also a shell pattern. Which in bash at least, will only expand if it matches a file in the current working directory:
$ cd /tmp
$ echo -F[|] # Same command
-F[|]
$ touch -- '-F|'
$ echo -F[|] # Different output
-F|
$ echo '-F[|]' # Good quoting
-F[|] # Consistent output
So it should be:
awk '-F[|]'
# or
awk -F '[|]'
awk -F "[|]" would also work, but IMO, only use soft quotes (") when you have something to actually expand (or the string itself contains hard quotes ('), which can't be nested in any way).
Note that the same thing happens if these characters are inside unquoted variables.
If text or a variable contains, or may contain: []?*, quote it, or set -f to turn off pathname expansion (a single, unmatched square bracket is technically OK, I think).
If a variable contains, or may contain an IFS character (space, tab, new line, by default), quote it (unless you want it to be split). Or export IFS= first (bearing the consequences), if quoting is impossible (eg. a crazy eval).
Note: raw text is always split by white space, regardless of IFS.
Try to escape the |
echo "more|data" | awk -F\| '{print $1}'
more
You can escape the | as \|
$ cat test
hello|world
$ awk -F\| '{print $1, $2}' test
hello world

what does it means in awk script? `awk -F "|" '{!a[$1]++}{printf RS $1}{print FS $2}' input.txt`

I need meaning of below code in unix, and help me go forward..
`awk -F "|" '{!a[$1]++}{printf RS $1}{print FS $2}' input.txt`
My sample i/p file is like below
1|Balaji 1|Kumar 3|India 3|China 3|Australia 1|Dinesh
I need o/p like below
1|Balaji|Kumar|Dinesh 3|India|China|Australia
I won't explain the awk line in your question. because it doesn't make much sense:
created array a[], but never use
wrong usage of RS, FS
try this one-liner:
awk -F'[| ]' '{for(i=1;i<=NF;i++)if(i%2)a[$i]=a[$i]?a[$i]"|"$(i+1):$(i+1)}
END{for(x in a) printf x"|"a[x]" ";print ""}' file
with your example:
kent$ echo "1|Balaji 1|Kumar 3|India 3|China 3|Australia 1|Dinesh"|awk -F'[| ]' '{for(i=1;i<=NF;i++)if(i%2)a[$i]=a[$i]?a[$i]"|"$(i+1):$(i+1)}END{for(x in a) printf x"|"a[x]" ";print ""}'
1|Balaji|Kumar|Dinesh 3|India|China|Australia
Note that there would be an ending space, it could be removed in the END loop.
Surprisingly, it can be change to simply. I am not sure why !a[$1]++ is written inside that.Its obsolete overe there:
awk -F "|" '{printf RS $1}{print FS $2}' input.txt
it will print first the record separator which is newline and then $1 which is the first field and then the field separator which is "|" and then the second field $2 and then a newline(since the statement is print. If printf is used newline will not be printed).
Based on your comment, below should work:
awk '{
for(i=1;i<=NF;i++){split($i,a,"|");
b[a[1]]?b[a[1]]=b[a[1]]" "a[2]:b[a[1]]=a[2]
}
for(j in b)printf j"|"b[j]" ";
print"";}' your_file
Changing record selector makes it easy to read this data. It have only a small bug that I do not see how to solve, it prints it on two line.
awk -F\| '{a[$1]=a[$1]?a[$1]"|"$2:$2} END{for(i in a) printf i"|"a[i]" "}' RS=" " file
1|Balaji|Kumar|Dinesh
3|India|China|Australia
New version with correct output, thanks to Birei
awk -F\| '{sub(/\n/,x, $0); a[$1]=a[$1]?a[$1]"|"$2:$2} END{for(i in a) printf i"|"a[i]" "}' RS=" "
1|Balaji|Kumar|Dinesh 3|India|China|Australia

awk to read specific column from a file

I have a small problem and I would appreciate helping me in it.
In summary, I have a file:
1,5,6,7,8,9
2,3,8,5,35,3
2,46,76,98,9
I need to read specific lines from it and print them into another text document. I know I can use (awk '{print "$2" "$3"}') to print the second and third columns beside each other. However, I need to use two statement as (awk '{print "$2"}' >> file.text) then (awk '{print "$3"}' >> file.text), but the two columns would appear under each other and not beside each other.
How can I make them appear beside each other?
If you must extract the columns in separate processes, use paste to stitch them together. I assume your shell is bash/zsh/ksh, and I assume the blank lines in your sample input should not be there.
paste -d, <(awk -F, '{print $2}' file) <(awk -F, '{print $3}' file)
produces
5,6
3,8
46,76
Without the process substitutions:
awk -F, '{print $2}' file > tmp1
awk -F, '{print $3}' file > tmp2
paste -d, tmp1 tmp2 > output
Update based on your answer:
On first appearance, that's a confusing setup. Does this work?
for (( x=1; x<=$number_of_features; x++ )); do
feature_number=$(sed -n "$x {p;q}" feature.txt)
if [[ -f out.txt ]]; then
cut -d, -f$feature_number file.txt > out.txt
else
paste -d, out.txt <(cut -d, -f$feature_number file.txt) > tmp &&
mv tmp out.txt
fi
done
That has to read the file.txt file a number of times. It would clearly be more efficient to only have to read it once:
awk -F, -f numfeat=$number_of_features '
# read the feature file into an array
NR==FNR {
colno[++i] = $0
next
}
# now, process the file.txt and emit the desired columns
{
sep = ""
for (i=1; i<=numfeat; i++) {
printf "%s%s", sep, $(colno[i])
sep = FS
}
print ""
}
' feature.txt file.txt > out.txt
Thanks all for contributing in the answers. I believe that i should be more clearer in my question, sorry for that.
My code is as follow:
for (( x = 1; x <= $number_of_features ; x++ )) # the number extracted from a text file
do
feature_number=$(awk 'FNR == "'$x'" {print}' feature.txt)
awk -F, '{print $"'$feature_number'"}' file.txt >> out.txt
done
Basically, I extract the feature number (which is the same as column number) from a text document and then print that column. the text document may contains many features number.
The thing is, each time I have different features number (which reflect the column number). so, applying the above solutions are not sufficient for this problem.
I hope it is clearer now.
Waiting for your comments please.
Thanks
Ahmad
instead of using awks file redirection, use shell redirection eg
awk '{print $2,$3}' >> file
the comma is replaced with the value of the output field seperator( space by default ).