Accessing two fields of a line before a matched line - awk

Given the following in a file, file.txt:
Line with asdfgh output1 45 output2 80
Special Header
Line with output1 38 output2 99
Special Header
I would like to print to a file:
45 80
38 99
i.e., in the line immediately preceding lines whose first column is Special, store the number (which can be a float type with decimals) after output1 and output2
I tried:
awk '($1 == "Special" ) {printf f; printf ("\n");} {f=$0}' file.txt > output.txt
This captures the entirety of the previous line and I get output.txt which looks like this:
Line with output1 45 output2 80
Line with output1 38 output2 99
Now, within the captured variable f, how can I access the specific values after output1 and output2?

Like this:
$ awk '{for (i=1; i<NF; i++)
if ($i == "output1") {arr[NR]=$(i+1)}
else if ($i == "output2") {arr[NR]=arr[NR]" "$(i+1)}}
($1 == "Special") {print arr[NR-1]}
' file
Output
45 80
38 99

With GNU awk:
$ awk '$1=="Special"{print m[1], m[2]}
{match($0, /output1\s+(\S+).*output2\s+(\S+)/, m)}' ip.txt
45 80
38 99
With perl:
$ perl -ne 'print "#d\n" if /^Special/; #d = /output[12]\s+(\S+)/g' ip.txt
45 80
38 99

$ awk '$1=="Special"{print x, y} {x=$(NF-2); y=$NF}' file
45 80
38 99

Related

How to print the next or previous line using awk?

I have a file with 8 columns
1743 abc 04 10 29 31 34 35
1742 def 11 19 21 23 27 52
1741 ghi 15 18 20 32 48 49
and I also have a awk line that print the complete line that contains some specific numbers. The code is
awk -v col=1 '{ delete c; for (i=col; i<=NF; ++i) ++c[$i];
if (c['"$1"']>0 && c['"$2"']>0 && c['"$3"']>0 && c['"$4"']>0) print }'
< input_file
(the variables $1,$2,$3 and $4 is because I'm using it on bash).
In the previous example, when I put the numbers 11 21 27 and 52 I'll get the line 1742.
How can I print the next or the previous line? Like in the previous example, if I use the numbers, 11 21 27 and 52 how I take the line 1743 or the line 1741?
$ cat a.sh
echo "BEFORE"
awk -v p1="$1" -v p2="$2" -v p3="$3" -v p4="$4" -v col=1 -f before.awk file
echo "AFTER"
awk -v p1="$1" -v p2="$2" -v p3="$3" -v p4="$4" -v col=1 -f after.awk file
Quoting #triplee: "To print the previous line, remember the previous line in a variable."
$ cat before.awk
prev { delete c;
for (i=col; i<=NF; ++i) ++c[$i]
if (c[p1]>0 && c[p2]>0 && c[p3]>0 && c[p4]>0) print prev
}
{ prev = $0 }
Again, #triplee: "To print the next line, remember that you want to, and print and reset this variable on the next iteration."
$ cat after.awk
f { print; f = 0 }
{
delete c;
for (i=col; i<=NF; ++i) ++c[$i]
if (c[p1]>0 && c[p2]>0 && c[p3]>0 && c[p4]>0) f = 1
}
$ ./a.sh 11 21 27 52
BEFORE
1743 abc 04 10 29 31 34 35
AFTER
1741 ghi 15 18 20 32 48 49
a different approach with double scanning
$ awk -v search="11 21 27 52" -v offset=-1 '
NR==FNR {n=split(search,s);
for(i=1;i<=n;i++) if(FS $0 FS !~ FS s[i] FS) next;
line=NR; next}
FNR==line+offset' file{,}
1743 abc 04 10 29 31 34 35
you can set offset to any value (not just -1,0,1).
N.B. It only find one match though, if there are multiple matches only the last one will be reported. This can be handled by keeping the matched line numbers in an array instead of a scalar value (here line variable).

How to sum a selection of columns?

I'd like to sum multiple columns in a text file similar to this:
GeneA Sample 34 7 8 16
GeneA Sample 17 7 10 91
GeneA Sample 42 9 8 11
I'd like to generate the sum at the bottom of columns 3-5 so it will look like:
GeneA Sample 34 7 8 16
GeneA Sample 17 7 10 91
GeneA Sample 42 9 8 11
93 23 26
I can use this for a single column but don't know how to specify a range of columns:
awk -F'\t' '{sum+=$3} END {print sum}' input file> out
The easiest way is just repeat summing for each column, i.
awk -F '\t' '{
s3 += $3
s4 += $4
s5 += $5
}
END {
print s3, s4, s5
}' input_file > out
In awk:
$ awk '
{
for(i=3;i<=NF;i++) # loop wanted fields
s[i]+=$i } # sum to hash, index on field #
END {
for(i=3;i<=NF;i++) # same old loop
printf "%s%s",s[i],(i==NF?ORS:OFS) } # output
' file
93 23 26 118
Currently the for loop goes thru every numeric field. Change the parameters if needed.
$ awk -v OFS='\t' '{s3+=$3; s4+=$4; s5+=$5; $1=$1} 1;
END {print "","",s3,s4,s5}' file
GeneA Sample 34 7 8 16
GeneA Sample 17 7 10 91
GeneA Sample 42 9 8 11
93 23 26
Try this. Note that NF just means number of fields. And AWK indexing starts with 1. So the example here has a range of 3 to the last col.
awk '{ for(i=3;i<=NF;i++) sum[i] += $i } END { for(i=3;i<=NF;i++) printf( "%d ", sum[i] ); print "" }' input_file
If you want fewer columns, say 3 and 4, then I'd suggest:
awk '{ for(i=3;i<=4 && i<=NF;i++) sum[i] += $i } END { for(i=3;i<=4 && i<=NF;i++) printf( "%d ", sum[i] ); print "" }' input_file

awk and sprintf to zero fill

Using awk and sprintf how can I zero fill both before and after a decimal point
input
11
12.2
9.6
output
110
122
096
I can get either using these, but not both
sprintf("%.1f", $1)
output
110
122
96
sprintf("%03d", $1)
output
011
012
096
x = sprintf("%06.3f", 1.23)
Output:
$ awk 'BEGIN{x = sprintf("%06.3f", 1.23); print x}'
01.230
$
I really can't tell from your question but maybe one of these does whatever it is you want:
$ cat file
11
12.2
9.6
$ awk '{ x=sprintf("%03d",$0*10); print x }' file
110
122
096
$ awk '{ x=sprintf("%04.1f",$0); print x }' file
11.0
12.2
09.6
Obviously you could just use printf with no intermediate variable but you asked for sprintf().

awk - join last column with next line first column

I want to join last column of the line with the next line first column. For example:
cat FILE
12 15
22 25
32 35
42 45
to join like this:
15 22
25 32
35 42
15 (last column) joined with 22 (first column of next line).
My solution is: tr '\n' '#' < FILE | tr '\t' '\n' | grep '#' | grep -v '#$' | tr '#' '\t'
But there might be simple awk command to do this.
awk '{
for (i = 2; i < NF; i += 2)
print $i, $(i + 1)
}' RS= OFS=\\t infile
With bash:
a=($(<infile));printf '%s\t%s\n' ${a[#]:1:${#a[#]}-2}
With zsh:
printf '%s\t%s\n' ${$(<infile):1:-1}
Got it!
$ awk 'BEGIN{OFS="\t"}{if (NR==1) {a=$2} else {print a,$1;a=$2}}' file
15 22
25 32
35 42
'BEGIN{OFS="\t"} set file separator as tab.
{if (NR==1) {a=$2} for first line just store 2nd field.
else {print a,$1;a=$2}} in the rest of cases print 2nd field of previous row and 1st field of current. This way we do not print last record.
Dimitre Radoulov has the solution but if we're golfing:
awk '$1=$NF=X;1' RS= file|xargs -n2
15 22
25 32
35 42
awk 'NR!=1{print $1,p} {p=$2}'

awk + Need to print everything (all rest fields) except $1 and $2

I have the following file and I need to print everything except $1 and $2 by awk
File:
INFORMATION DATA 12 33 55 33 66 43
INFORMATION DATA 45 76 44 66 77 33
INFORMATION DATA 77 83 56 77 88 22
...
the desirable output:
12 33 55 33 66 43
45 76 44 66 77 33
77 83 56 77 88 22
...
Well, given your data, cut should be sufficient:
cut -d\ -f3- infile
Although it adds an extra space at the beginning of each line compared to yael's expected output, here is a shorter and simpler awk based solution than the previously suggested ones:
awk '{$1=$2=""; print}'
or even:
awk '{$1=$2=""}1'
$ cat t
INFORMATION DATA 12 33 55 33 66 43
INFORMATION DATA 45 76 44 66 77 33
INFORMATION DATA 77 83 56 77 88 22
$ awk '{for (i = 3; i <= NF; i++) printf $i " "; print ""}' t
12 33 55 33 66 43
45 76 44 66 77 33
77 83 56 77 88 22
danbens answer leaves a whitespace at the end of the resulting string. so the correct way to do it would be:
awk '{for (i=3; i<NF; i++) printf $i " "; print $NF}' filename
If the first two words don't change, probably the simplest thing would be:
awk -F 'INFORMATION DATA ' '{print $2}' t
Here's another awk solution, that's more flexible than the cut one and is shorter than the other awk ones. Assuming your separators are single spaces (modify the regex as necessary if they are not):
awk --posix '{sub(/([^ ]* ){2}/, ""); print}'
If Perl is an option:
perl -lane 'splice #F,0,2; print join " ",#F' file
These command-line options are used:
-n loop around every line of the input file, do not automatically print it
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace
-e execute the perl code
splice #F,0,2 cleanly removes columns 0 and 1 from the #F array
join " ",#F joins the elements of the #F array, using a space in-between each element
Variation for csv input files:
perl -F, -lane 'splice #F,0,2; print join " ",#F' file
This uses the -F field separator option with a comma