Adding columns with awk. What is wrong with this awk command? - awk

I want to add two columns to a file with ~10,000 columns. I want to insert as the first column the nr 22 on each row. Then I want the original first column as the second column, then as the third column I want to insert the line nr (NR), and after that I want the rest of the original columns to be printed. I thought I could do that with the following awk line:
awk '{print 22, $1, NR; for(i=2;i<=NF;++i) print $i}' file
It prints the first three columns (22, $1, NR) well, but after that, there is a new line started for each value, so the file is printed like this:
22 $1 NR
$2
$3
$4
etc...
instead of:
22 $1 NR $2 $3 $4 etc...
What did I do wrong?

How about using printf instead since print adds a newline.
awk '{printf("%d, %d, %d, ", 22, $1, NR); for(i=2;i<=NF;++i) printf("%d, ", i)}' file
Or you can play with the ORS and OFS, the Output Record Separator and the Output Field Separator. Normally you add those in a BEGIN statement like this:
awk 'BEGIN { ORS = " " } {print 22, $1, NR; for(i=2;i<=NF;++i) print $i}{print "\n"}' file
Note that an extra printf "\n" is needed, else everything ends up on one line...
Read more in gawk manual output separators

For more precise control over the output format than what is provided by print(which print a newline by default), use printf.

Related

selecting columns in awk discarding corresponding header

How to properly select columns in awk after some processing. My file here:
cat foo
A;B;C
9;6;7
8;5;4
1;2;3
I want to add a first column with line numbers and then extract some columns of the result. For the example let's get the new first (line numbers) and third columns. This way:
awk -F';' 'FNR==1{print "linenumber;"$0;next} {print FNR-1,$1,$3}' foo
gives me this unexpected output:
linenumber;A;B;C
1 9 7
2 8 4
3 1 3
but expected is (note B is now the third column as we added linenumber as first):
linenumber;B
1;6
2;5
3;2
[fixed and revised]
To get your expected output, use:
$ awk 'BEGIN {
FS=OFS=";"
}
{
print (FNR==1?"linenumber":FNR-1),$(FNR==1?3:1)
}' file
Output:
linenumber;C
1;9
2;8
3;1
To add a column with line number and extract first and last columns, use:
$ awk 'BEGIN {
FS=OFS=";"
}
{
print (FNR==1?"linenumber":FNR-1),$1,$NF
}' file
Output this time:
linenumber;A;C
1;9;7
2;8;4
3;1;3
Why do you print $0 (the complete record) in your header? And, if you want only two columns in your output, why to you print 3 (FNR-1, $1 and $3)? Finally, the reason why your output field separators are spaces instead of the expected ; is simply that... you did not specify the output field separator (OFS). You can do this with a command line variable assignment (OFS=\;), as shown in the second and third versions below, but also using the -v option (-v OFS=\;) or in a BEGIN block (BEGIN {OFS=";"}) as you wish (there are differences between these 3 methods but they don't matter here).
[EDIT]: see a generic solution at the end.
If the field you want to keep is the second of the input file (the B column), try:
$ awk -F\; 'FNR==1 {print "linenumber;" $2; next} {print FNR-1 ";" $2}' foo
linenumber;B
1;6
2;5
3;2
or
$ awk -F\; 'FNR==1 {print "linenumber",$2; next} {print FNR-1,$2}' OFS=\; foo
linenumber;B
1;6
2;5
3;2
Note that, as long as you don't want to keep the first field of the input file ($1), you could as well overwrite it with the line number:
$ awk -F\; '{$1=FNR==1?"linenumber":FNR-1; print $1,$2}' OFS=\; foo
linenumber;B
1;6
2;5
3;2
Finally, here is a more generic solution to which you can pass the list of indexes of the columns of the input file you want to print (1 and 3 in this example):
$ awk -F\; -v cols='1;3' '
BEGIN { OFS = ";"; n = split(cols, c); }
{ printf("%s", FNR == 1 ? "linenumber" : FNR - 1);
for(i = 1; i <= n; i++) printf("%s", OFS $(c[i]));
printf("\n");
}' foo
linenumber;A;C
1;9;7
2;8;4
3;1;3

awk / gawk printf when variable format string, changing zero to dash

I have a table of numbers I am printing in awk using printf.
The printf accomplishes some truncation for the numbers.
(cat <<E\OF
Name,Where,Grade
Bob,Sydney,75.12
Sue,Sydney,65.2475
George,Sydney,84.6
Jack,Sydney,35
Amy,Sydney,
EOF
)|gawk 'BEGIN{FS=","}
FNR==1 {print("Name","Where","Grade");next}
{if ($3<50) {$3=0}
printf("%s,%s,%d \n",$1,$2,$3)}'
This produces:
Name Where Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,0
Amy,Sydney,0
What I want is to display scores which are less than 50, or missing, as a dash ("-").
Name Where Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,-
Amy,Sydney,-
This requires the 3rd string format in printf change from %d to %s.
So in some rows, the third column should be a value, and in some rows, the third column should be a string. How can I tell this to GAWK? Or should I just pipe through another awk to re-format?
$ gawk 'BEGIN{FS=","}
FNR==1 {print("Name","Where","Grade");next}
{if ($3<50) {$3="-"} else {$3=sprintf("%d", $3)}
printf("%s,%s,%s \n",$1,$2,$3)}' ip.txt
Name Where Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,-
Amy,Sydney,-
use if-else to assign value to $3 as needed
sprintf allows to assign result of formatting to a variable
for this case, you could use int function as well
now printf will have %s for $3 as well
Assuming you missed the commas for the header and space after third column is not needed, you could do this with a simple one-liner
$ awk -F, -v OFS=, 'NR>1{$3 = $3 < 50 ? "-" : int($3)} 1' ip.txt
Name,Where,Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,-
Amy,Sydney,-
?: ternary operator is alternate for if-else
1 is an awk idiom to print contents of $0

Awk editing with field delimiter

Imagine if you have a string like this
Amazon.com Inc.:181,37:184,22
and you do awk -F':' '{print $1 ":" $2 ":" $3}' then it will output the same thing.
But can you declare $2 in this example so it only outputs 181 and not ,37?
Thanks in advance!
You can change the field separator so that it contains either : or ,, using a bracket expression:
awk -F'[:,]' '{ print $2 }' file
If you are worried that , may appear in the first field (which will break this approach), you could use split:
awk -F: '{ split($2, a, /,/); print a[1] }' file
This splits the second field on the comma and then prints the first part. Any other fields containing a comma are unaffected.

awk ternay operator, count fs with ,

How to make this command line:
awk -F "," '{NF>0?$NF:$0}'
to print the last field of a line if NF>0, otherwise print the whole line?
Working data
bogota
dept math, bogota
awk -F, '{ print ( NF ? $NF : $0 ) }' file
Actually, you don't need ternary operator for this, but use :
awk -F, '{print $NF}' file
This will print the last field, i.e, if there are more than 1 field, it will print the last field, if line has only one field, it will print the same.

searching multiple patterns in awk

I've a text file of thousands of lines
:ABC:xyz:1234:200:some text:xxx:yyyy:11818:AAA:BBB
:ABC:xyz:6789:200:some text:xxx:yyyy:203450:AAA:BBB
:EFG:xyz:11818:200:some text:xxx:yyyy:154678:AAA:BBB
:HIJ:xyz:203450:200:some text:xxx:yyyy:154678:AAA:BBB
:KLM:xyz:7777:200:some text:xxx:yyyy:11818:AAA:BBB
.....
....
:DEL:xyz:1234:200:some text:xxx:yyyy:203450:AAA:BBB
I need to find more than one occurrence of the 9th column i.e the o/p should show
:ABC:xyz:1234:200:some text:xxx:yyyy:11818:AAA:BBB
:KLM:xyz:7777:200:some text:xxx:yyyy:11818:AAA:BBB
:ABC:xyz:6789:200:some text:xxx:yyyy:203450:AAA:BBB
:DEL:xyz:1234:200:some text:xxx:yyyy:203450:AAA:BBB
I tried:
awk -F ":" '$9 > 2 {split($0,a,":"); print $0}'
this prints all the records.
awk -F':' 'NR==FNR{cnt[$9]++;next} cnt[$9]>1' file file
or if you don't want to parse the file twice:
awk -F':' 'cnt[$9]++{printf "%s", prev[$9]; delete prev[$9]; print; next} {prev[$9]=$0 ORS}' file
This should do it in pure awk:
awk -F":" '{if( s[$9] ){ print } else if( f[$9] ){ print f[$9]; s[$9]=1; print }; f[$9]=$0 }'
Explanation:
The "f" array stores values of the 9th column that have occurred at least once.
The "s" array stores values of the 9th column that have occurred twice or more.
If the 9th column has occurred before, print the first occurrence, and this line.
If the 9th column has occurred twice or more before, print this line.
Here is another awk
awk -F: '{++a[$9];b[NR]=$0} END {for (i=1;i<=NR;i++) {split(b[i],c,":");if (a[c[9]]>1) print b[i]}}' file