How to replace blank space zero? - awk

I have a file:
nr kl1 kl2 kl3 kl4
d1 15 58 63 58
d2 3 3
d3 3 8 0
I want to print:
nr kl1 kl2 kl3 kl4
d1 15 58 63 58
d2 0 3 3 0
d3 3 0 8 0
I tried gsub solution, but it does not work.
awk '{gsub(/ /, 0, $2); print }' file
Thank you for your help.
EDIT:
Ed Morton solution works on gawk, but it does not work on mawk.
$ gawk 'BEGIN{ FIELDWIDTHS="5 5 5 5 5"; OFS="" }NR>1 {for (i=2;i<=NF;i++)$i=sprintf("%-5d",$i)}{ sub(/ +$/,""); print }' file
nr kl1 kl2 kl3 kl4
d1 15 58 63 58
d2 0 3 3 0
d3 3 0 8 0
.
$ mawk 'BEGIN{ FIELDWIDTHS="5 5 5 5 5"; OFS="" }NR>1 {for (i=2;i<=NF;i++)$i=sprintf("%-5d",$i)}{ sub(/ +$/,""); print }' file
nr kl1 kl2 kl3 kl4
d115 58 63 58
d23 3
d33 8 0
How to do the same, but the mawk?

What you tried didn't work because your fields aren't separated by spaces, they're a fixed width. Try this with GNU awk:
BEGIN{ FIELDWIDTHS="5 5 5 5 5"; OFS="" }
NR>1 {
for (i=2;i<=NF;i++)
$i=sprintf("%-5d",$i)
}
{ sub(/ +$/,""); print }

Related

How to update one file's column from another file's column in awk

I have two files, the first file:
1 AA
2 BB
3 CC
4 DD
and the second file
15 AA
17 BB
20 CC
25 FF
File 1 should be updated and the expected output should looks like this:
15 AA
17 BB
20 CC
4 DD
I have tried this script from another post but it didn't work
awk 'NR==FNR{a[$1]=$2;next}a[$1]{print $2,a[$1]}' file1 file2
$ awk 'NR==FNR{a[$2]=$1; next} $2 in a{$1=a[$2]} 1' file2 file1
15 AA
17 BB
20 CC
4 DD
Here is an awk:
awk 'FNR==NR{f2[$2]=$0; next}
$2 in f2 {print f2[$2]; next}
1' file2 file1
Prints:
15 AA
17 BB
20 CC
4 DD

Combine two columns into new and print all columns

I want to combine columns 1 and 2 and add them as a new column in my data frame. Then I want to print all the old columns and the newly created column. I can combine the columns using the script below, but not sure how to print all columns, not only the combined:
awk ' { print $1 $2 "_" $NF } ' input_file
in
c1 c2 c3
12 1 12
4 4 57
out
c1 c2 c3 c4
12 1 12 12_1
4 4 57 4_4
If you want to print the _ between field 1 and 2, then the first output would be c1 c2 c3 c1_c2 instead of c1 c2 c3 c4
You can add a column at the end with the value of $1 and $2 and then print the whole line:
awk ' { $(NF+1) = $1"_"$2 }1' input_file
Output
c1 c2 c3 c1_c2
12 1 12 12_1
4 4 57 4_4
Or you can print the whole line followed by field $1 and $2
awk '{print $0, $1"_"$2}' input_file
Output
c1 c2 c3 c1_c2
12 1 12 12_1
4 4 57 4_4
Here is a Generic solution in awk. Just mention field numbers in awk variable named fields eg: 1,2,3,4,7,8(example) and it will add all fields values to last column. Written and tested in GNU awk should work in any awk.
awk -v fields="1,2" '
BEGIN{
num=split(fields,arr,",")
for(i=1;i<=num;i++){
field[arr[i]]
}
}
FNR==1{
print
next
}
{
val=""
for(i=1;i<=NF;i++){
if(i in field){
val=(val?val "_":"")$i
}
}
print $0,val
}
' Input_file
$ awk '{print $0, (NR>1 ? $1"_"$2 : "c4")}' file
c1 c2 c3 c4
12 1 12 12_1
4 4 57 4_4
or to get tab-separated output if your input is tab-separated:
$ awk 'BEGIN{FS=OFS="\t"} {print $0, (NR>1 ? $1"_"$2 : "c4")}' file
c1 c2 c3 c4
12 1 12 12_1
4 4 57 4_4
or if it isn't:
$ awk -v OFS='\t' '{$(NF+1)=(NR>1 ? $1"_"$2 : "c4")} 1' file
c1 c2 c3 c4
12 1 12 12_1
4 4 57 4_4
Another awk which at FNR==1 uses the field name in $NF to create the field name for the next field (c3 -> c4, c -> c1, etc):
$ awk '{
printf "%s%s%s\n",
$0,
OFS,
(FNR>1?$1 "_" $2:(match($3,/[0-9]+$/)?substr($3,1,RSTART-1) substr($3,RSTART)+1:$3 1))
}' file
Output:
c1 c2 c3 c4
12 1 12 12_1
4 4 57 4_4
golfed version
$ awk '$++NF=NR>1?$1"_"$2:"c4"' file
c1 c2 c3 c4
12 1 12 12_1
4 4 57 4_4

How to print the next or previous line using awk?

I have a file with 8 columns
1743 abc 04 10 29 31 34 35
1742 def 11 19 21 23 27 52
1741 ghi 15 18 20 32 48 49
and I also have a awk line that print the complete line that contains some specific numbers. The code is
awk -v col=1 '{ delete c; for (i=col; i<=NF; ++i) ++c[$i];
if (c['"$1"']>0 && c['"$2"']>0 && c['"$3"']>0 && c['"$4"']>0) print }'
< input_file
(the variables $1,$2,$3 and $4 is because I'm using it on bash).
In the previous example, when I put the numbers 11 21 27 and 52 I'll get the line 1742.
How can I print the next or the previous line? Like in the previous example, if I use the numbers, 11 21 27 and 52 how I take the line 1743 or the line 1741?
$ cat a.sh
echo "BEFORE"
awk -v p1="$1" -v p2="$2" -v p3="$3" -v p4="$4" -v col=1 -f before.awk file
echo "AFTER"
awk -v p1="$1" -v p2="$2" -v p3="$3" -v p4="$4" -v col=1 -f after.awk file
Quoting #triplee: "To print the previous line, remember the previous line in a variable."
$ cat before.awk
prev { delete c;
for (i=col; i<=NF; ++i) ++c[$i]
if (c[p1]>0 && c[p2]>0 && c[p3]>0 && c[p4]>0) print prev
}
{ prev = $0 }
Again, #triplee: "To print the next line, remember that you want to, and print and reset this variable on the next iteration."
$ cat after.awk
f { print; f = 0 }
{
delete c;
for (i=col; i<=NF; ++i) ++c[$i]
if (c[p1]>0 && c[p2]>0 && c[p3]>0 && c[p4]>0) f = 1
}
$ ./a.sh 11 21 27 52
BEFORE
1743 abc 04 10 29 31 34 35
AFTER
1741 ghi 15 18 20 32 48 49
a different approach with double scanning
$ awk -v search="11 21 27 52" -v offset=-1 '
NR==FNR {n=split(search,s);
for(i=1;i<=n;i++) if(FS $0 FS !~ FS s[i] FS) next;
line=NR; next}
FNR==line+offset' file{,}
1743 abc 04 10 29 31 34 35
you can set offset to any value (not just -1,0,1).
N.B. It only find one match though, if there are multiple matches only the last one will be reported. This can be handled by keeping the matched line numbers in an array instead of a scalar value (here line variable).

How to sum a selection of columns?

I'd like to sum multiple columns in a text file similar to this:
GeneA Sample 34 7 8 16
GeneA Sample 17 7 10 91
GeneA Sample 42 9 8 11
I'd like to generate the sum at the bottom of columns 3-5 so it will look like:
GeneA Sample 34 7 8 16
GeneA Sample 17 7 10 91
GeneA Sample 42 9 8 11
93 23 26
I can use this for a single column but don't know how to specify a range of columns:
awk -F'\t' '{sum+=$3} END {print sum}' input file> out
The easiest way is just repeat summing for each column, i.
awk -F '\t' '{
s3 += $3
s4 += $4
s5 += $5
}
END {
print s3, s4, s5
}' input_file > out
In awk:
$ awk '
{
for(i=3;i<=NF;i++) # loop wanted fields
s[i]+=$i } # sum to hash, index on field #
END {
for(i=3;i<=NF;i++) # same old loop
printf "%s%s",s[i],(i==NF?ORS:OFS) } # output
' file
93 23 26 118
Currently the for loop goes thru every numeric field. Change the parameters if needed.
$ awk -v OFS='\t' '{s3+=$3; s4+=$4; s5+=$5; $1=$1} 1;
END {print "","",s3,s4,s5}' file
GeneA Sample 34 7 8 16
GeneA Sample 17 7 10 91
GeneA Sample 42 9 8 11
93 23 26
Try this. Note that NF just means number of fields. And AWK indexing starts with 1. So the example here has a range of 3 to the last col.
awk '{ for(i=3;i<=NF;i++) sum[i] += $i } END { for(i=3;i<=NF;i++) printf( "%d ", sum[i] ); print "" }' input_file
If you want fewer columns, say 3 and 4, then I'd suggest:
awk '{ for(i=3;i<=4 && i<=NF;i++) sum[i] += $i } END { for(i=3;i<=4 && i<=NF;i++) printf( "%d ", sum[i] ); print "" }' input_file

compare a text file with another files

I have a file named file.txt as shown below
12 2
15 7
134 8
154 12
155 16
167 6
175 45
45 65
812 54
I have another five files named A.txt, B.txt, C.txt, D.txt, E.txt. The contents of these files are shown below.
A.txt
45
134
B.txt
15
812
155
C.txt
12
154
D.txt
175
E.txt
167
I need to check, which file contains the values of first column of file.txt exists and print the name of the file as third column.
Output:-
12 2 C
15 7 B
134 8 A
154 12 C
155 16 B
167 6 E
175 45 D
45 65 A
812 54 B
This should work:
One-liner:
awk 'FILENAME != "file.txt"{ a[$1]=FILENAME; next } $1 in a { $3=a[$1]; sub(/\..*/,"",$3) }1' {A..E}.txt file.txt
Formatted with comments:
awk '
#Check if the filename is not of the main file
FILENAME != "file.txt" {
#Create a hash. Store column 1 values of look up files as key and assign filename as values
a[$1]=FILENAME
#Skip the rest of the action
next
}
#Check the first column of main file is a key in the hash
$1 in a {
#If the key exists, assign the value of the key (which is filename) as Column 3 of main file
$3=a[$1]
#Using sub function, strip the extension of the file name as desired in your output
sub(/\..*/,"",$3)
#1 is a non-zero value forcing awk to print. {A..E} is brace expansion of your files.
}1' {A..E}.txt file.txt
Note: The main file needs to be passed at the end.
Test:
[jaypal:~/Temp] awk 'FILENAME != "file.txt"{ a[$1]=FILENAME; next } $1 in a { $3=a[$1]; sub(/\..*/,"",$3) ; printf "%-5s%-5s%-5s\n",$1,$2,$3}' {A..E}.txt file.txt
12 2 C
15 7 B
134 8 A
154 12 C
155 16 B
167 6 E
175 45 D
45 65 A
812 54 B
#! /usr/bin/awk -f
FILENAME == "file.txt" {
a[FNR] = $0;
c=FNR;
}
FILENAME != "file.txt" {
split(FILENAME, name, ".");
k[$1] = name[1];
}
END {
for (line = 1; line <= c; line++) {
split(a[line], seg, FS);
print a[line], k[seg[1]];
}
}
# $ awk -f script.awk *.txt
This solution does not preserve the order
join <(sort file.txt) \
<(awk '
FNR==1 {filename = substr(FILENAME, 1, length(FILENAME)-4)}
{print $1, filename}
' [ABCDE].txt |
sort) |
column -t
12 2 C
134 8 A
15 7 B
154 12 C
155 16 B
167 6 E
175 45 D
45 65 A
812 54 B