How to print the next or previous line using awk? - awk

I have a file with 8 columns
1743 abc 04 10 29 31 34 35
1742 def 11 19 21 23 27 52
1741 ghi 15 18 20 32 48 49
and I also have a awk line that print the complete line that contains some specific numbers. The code is
awk -v col=1 '{ delete c; for (i=col; i<=NF; ++i) ++c[$i];
if (c['"$1"']>0 && c['"$2"']>0 && c['"$3"']>0 && c['"$4"']>0) print }'
< input_file
(the variables $1,$2,$3 and $4 is because I'm using it on bash).
In the previous example, when I put the numbers 11 21 27 and 52 I'll get the line 1742.
How can I print the next or the previous line? Like in the previous example, if I use the numbers, 11 21 27 and 52 how I take the line 1743 or the line 1741?

$ cat a.sh
echo "BEFORE"
awk -v p1="$1" -v p2="$2" -v p3="$3" -v p4="$4" -v col=1 -f before.awk file
echo "AFTER"
awk -v p1="$1" -v p2="$2" -v p3="$3" -v p4="$4" -v col=1 -f after.awk file
Quoting #triplee: "To print the previous line, remember the previous line in a variable."
$ cat before.awk
prev { delete c;
for (i=col; i<=NF; ++i) ++c[$i]
if (c[p1]>0 && c[p2]>0 && c[p3]>0 && c[p4]>0) print prev
}
{ prev = $0 }
Again, #triplee: "To print the next line, remember that you want to, and print and reset this variable on the next iteration."
$ cat after.awk
f { print; f = 0 }
{
delete c;
for (i=col; i<=NF; ++i) ++c[$i]
if (c[p1]>0 && c[p2]>0 && c[p3]>0 && c[p4]>0) f = 1
}
$ ./a.sh 11 21 27 52
BEFORE
1743 abc 04 10 29 31 34 35
AFTER
1741 ghi 15 18 20 32 48 49

a different approach with double scanning
$ awk -v search="11 21 27 52" -v offset=-1 '
NR==FNR {n=split(search,s);
for(i=1;i<=n;i++) if(FS $0 FS !~ FS s[i] FS) next;
line=NR; next}
FNR==line+offset' file{,}
1743 abc 04 10 29 31 34 35
you can set offset to any value (not just -1,0,1).
N.B. It only find one match though, if there are multiple matches only the last one will be reported. This can be handled by keeping the matched line numbers in an array instead of a scalar value (here line variable).

Related

Inplace remove last n lines of files without opening them more than once in gawk?

https://www.baeldung.com/linux/remove-last-n-lines-of-file
awk -v n=3 'NR==FNR{total=NR;next} FNR==total-n+1{exit} 1' input.txt input.txt
01 is my line number. Keep me please!
02 is my line number. Keep me please!
03 is my line number. Keep me please!
04 is my line number. Keep me please!
05 is my line number. Keep me please!
06 is my line number. Keep me please!
07 is my line number. Keep me please!
Here is a way to remove the last n lines. But it is not done inplace and the file is read twice, and it only deals with one file at a time.
How can I inplace remove the last n lines of many files without opening them more than once with one gawk command but without using any other external commands?
With your shown samples please try following awk code. Without using any external utilities as per OP's request in question. We could make use of END block here of awk programming.
awk -v n="3" '
{
total=FNR
lines[FNR]=$0
}
END{
till=total-n
for(i=1;i<=till;i++){
print lines[i]
}
}
' Input_file
single-pass awk solution that requires neither arrays nor gawk
— (unless your file is over 500 MB, then it might be slightly slower) :
rm -f file.txt
jot -c 30 51 > file.txt
gcat -n file.txt | rs -t -c$'\n' -C'#' 0 5 | column -s'#' -t
1 3 7 9 13 ? 19 E 25 K
2 4 8 : 14 # 20 F 26 L
3 5 9 ; 15 A 21 G 27 M
4 6 10 < 16 B 22 H 28 N
5 7 11 = 17 C 23 I 29 O
6 8 12 > 18 D 24 J 30 P
mawk -v __='file.txt' -v N='13' 'BEGIN {
OFS = FS = RS
RS = "^$"
getline <(__); close(__)
print $!(NF -= NF < (N+=_==$NF) ? NF : N) >(__) }'
gcat -n file.txt | rs -t -c$'\n' -C'#' 6 | column -s'#' -t ;
1 3 7 9 13 ?
2 4 8 : 14 #
3 5 9 ; 15 A
4 6 10 < 16 B
5 7 11 = 17 C
6 8 12 >
Speed is hardly a concern :
115K rows 198 MB file took 0.254 secs
rows = 115567. | UTF8 chars = 133793410. | bytes = 207390680.
( mawk2 -v __="${fn1}" -v N='13' ; )
0.04s user 0.20s system 94% cpu 0.254 total
rows = 115554. | UTF8 chars = 133779254. | bytes = 207370006.
5.98 million rows 988 MB file took 1.44 secs
rows = 5983333. | UTF8 chars = 969069988. | bytes = 1036334374.
( mawk2 -v __="${fn1}" -v N='13' ; )
0.33s user 1.07s system 97% cpu 1.435 total
rows = 5983320. | UTF8 chars = 969068062. | bytes = 1036332426.
Another way to do it, using GAWK, with option The BEGINFILE and ENDFILE Special Patterns:
{ lines[++numLines] = $0 }
BEGINFILE { fname=FILENAME}
ENDFILE { prt() }
function prt( lineNr,maxLines) {
close(fname)
printf "" > fname
maxLines = numLines - n
for ( lineNr=1; lineNr<=maxLines; lineNr++ ) {
print lines[lineNr] > fname
}
close(fname)
numLines = 0
}
I find that this is the most succinct solution to the problem.
$ gawk -i inplace -v n=3 -v ORS= -e '{ lines[FNR]=$0 RT }
ENDFILE {
for(i=1;i<=FNR-n;++i) {
print lines[i]
}
}' -- file{1..3}.txt

Accessing two fields of a line before a matched line

Given the following in a file, file.txt:
Line with asdfgh output1 45 output2 80
Special Header
Line with output1 38 output2 99
Special Header
I would like to print to a file:
45 80
38 99
i.e., in the line immediately preceding lines whose first column is Special, store the number (which can be a float type with decimals) after output1 and output2
I tried:
awk '($1 == "Special" ) {printf f; printf ("\n");} {f=$0}' file.txt > output.txt
This captures the entirety of the previous line and I get output.txt which looks like this:
Line with output1 45 output2 80
Line with output1 38 output2 99
Now, within the captured variable f, how can I access the specific values after output1 and output2?
Like this:
$ awk '{for (i=1; i<NF; i++)
if ($i == "output1") {arr[NR]=$(i+1)}
else if ($i == "output2") {arr[NR]=arr[NR]" "$(i+1)}}
($1 == "Special") {print arr[NR-1]}
' file
Output
45 80
38 99
With GNU awk:
$ awk '$1=="Special"{print m[1], m[2]}
{match($0, /output1\s+(\S+).*output2\s+(\S+)/, m)}' ip.txt
45 80
38 99
With perl:
$ perl -ne 'print "#d\n" if /^Special/; #d = /output[12]\s+(\S+)/g' ip.txt
45 80
38 99
$ awk '$1=="Special"{print x, y} {x=$(NF-2); y=$NF}' file
45 80
38 99

How to sum a selection of columns?

I'd like to sum multiple columns in a text file similar to this:
GeneA Sample 34 7 8 16
GeneA Sample 17 7 10 91
GeneA Sample 42 9 8 11
I'd like to generate the sum at the bottom of columns 3-5 so it will look like:
GeneA Sample 34 7 8 16
GeneA Sample 17 7 10 91
GeneA Sample 42 9 8 11
93 23 26
I can use this for a single column but don't know how to specify a range of columns:
awk -F'\t' '{sum+=$3} END {print sum}' input file> out
The easiest way is just repeat summing for each column, i.
awk -F '\t' '{
s3 += $3
s4 += $4
s5 += $5
}
END {
print s3, s4, s5
}' input_file > out
In awk:
$ awk '
{
for(i=3;i<=NF;i++) # loop wanted fields
s[i]+=$i } # sum to hash, index on field #
END {
for(i=3;i<=NF;i++) # same old loop
printf "%s%s",s[i],(i==NF?ORS:OFS) } # output
' file
93 23 26 118
Currently the for loop goes thru every numeric field. Change the parameters if needed.
$ awk -v OFS='\t' '{s3+=$3; s4+=$4; s5+=$5; $1=$1} 1;
END {print "","",s3,s4,s5}' file
GeneA Sample 34 7 8 16
GeneA Sample 17 7 10 91
GeneA Sample 42 9 8 11
93 23 26
Try this. Note that NF just means number of fields. And AWK indexing starts with 1. So the example here has a range of 3 to the last col.
awk '{ for(i=3;i<=NF;i++) sum[i] += $i } END { for(i=3;i<=NF;i++) printf( "%d ", sum[i] ); print "" }' input_file
If you want fewer columns, say 3 and 4, then I'd suggest:
awk '{ for(i=3;i<=4 && i<=NF;i++) sum[i] += $i } END { for(i=3;i<=4 && i<=NF;i++) printf( "%d ", sum[i] ); print "" }' input_file

How to use awk and grep combination

I have a file with 10 columns and lots of lines. I want to add a fix correction to the 10th column where its line contain 'G01' pattern.
For example, in the file below
AS G17 2014 3 31 0 2 0.000000 1 -0.809159910000E-04
AS G12 2014 3 31 0 2 0.000000 1 0.195515363000E-03
AS G15 2014 3 31 0 2 0.000000 1 -0.171167837000E-03
AS G29 2014 3 31 0 2 0.000000 1 0.521982134000E-03
AS G07 2014 3 31 0 2 0.000000 1 0.329889640000E-03
AS G05 2014 3 31 0 2 0.000000 1 -0.381588767000E-03
AS G25 2014 3 31 0 2 0.000000 1 0.203352860000E-04
AS G01 2014 3 31 0 2 0.000000 1 0.650180300000E-05
AS G24 2014 3 31 0 2 0.000000 1 -0.258444780000E-04
AS G27 2014 3 31 0 2 0.000000 1 -0.203691700000E-04
the 10th column of the line with G01 should be corrected.
I've used 'awk' with 'while' loop to do that, but it takes a very long time for massive files.
It will be appreciated if anybody can help for a more effective way.
You can use the following :
awk '$2 == "G01" {$10="value"}1' file.txt
To preserve whitespaces you can use the solution from this post :
awk '$2 == "G01" {
data=1
n=split($0,a," ",b)
a[10]="value"
line=b[0]
for (i=1;i<=n; i++){
line=(line a[i] b[i])
}
print line
}{
if (data!=1){
print;
}
else {
data=0;
}
}' file.txt
the 10th column of the line with G01 should be corrected
Syntax is as follows, which will search for regex given inside /../ in current record/line/row regardless of which field the regex was found
Either
$ awk '/regex/{ $10 = "somevalue"; print }' infile
OR
1 at the end does default operation print $0, that is print current record/line/row
$ awk '/regex/{ $10 = "somevalue" }1' infile
OR
$0 means current record/line/row
$ awk '$0 ~ /regex/{ $10 = "somevalue"}1' infile
So in current context, it will be any of the following
$ awk '/G01/{$10 = "somevalue" ; print }' infile
$ awk '/G01/{$10 = "somevalue" }1' infile
$ awk '$0 ~ /G01/{$10 = "somevalue"; print }' infile
$ awk '$0 ~ /G01/{$10 = "somevalue" }1' infile
If you would like to strict your search to specific field/column in record/line/row then
$10 means 10th field/column
$ awk '$2 == "G01" {$10 = "somevalue"; print }' infile
$ awk '$2 == "G01" {$10 = "somevalue" }1' infile
In case if you would like to pass say some word from shell variable to awk or just a word then
$ awk -v search="G01" -v replace="foo" '$2 == search {$10 = replace }1' infile
and then same from shell
$ search_value="G01"
$ new_value="foo"
$ awk -v search="$search_value" -v replace="$new_value" '$2 == search {$10 = replace }1' infile
From man
-v var=val
--assign var=val
Assign the value val to the variable var, before execution of
the program begins. Such variable values are available to the
BEGIN block of an AWK program.
For additional syntax instructions:
"sed & awk" by Dale Dougherty and Arnold Robbins
(O'Reilly)
"UNIX Text Processing," by Dale Dougherty and Tim O'Reilly (Hayden
Books)
"GAWK: Effective awk Programming," by Arnold D. Robbins
(O'Reilly)
http://www.gnu.org/software/gawk/manual/

awk - join last column with next line first column

I want to join last column of the line with the next line first column. For example:
cat FILE
12 15
22 25
32 35
42 45
to join like this:
15 22
25 32
35 42
15 (last column) joined with 22 (first column of next line).
My solution is: tr '\n' '#' < FILE | tr '\t' '\n' | grep '#' | grep -v '#$' | tr '#' '\t'
But there might be simple awk command to do this.
awk '{
for (i = 2; i < NF; i += 2)
print $i, $(i + 1)
}' RS= OFS=\\t infile
With bash:
a=($(<infile));printf '%s\t%s\n' ${a[#]:1:${#a[#]}-2}
With zsh:
printf '%s\t%s\n' ${$(<infile):1:-1}
Got it!
$ awk 'BEGIN{OFS="\t"}{if (NR==1) {a=$2} else {print a,$1;a=$2}}' file
15 22
25 32
35 42
'BEGIN{OFS="\t"} set file separator as tab.
{if (NR==1) {a=$2} for first line just store 2nd field.
else {print a,$1;a=$2}} in the rest of cases print 2nd field of previous row and 1st field of current. This way we do not print last record.
Dimitre Radoulov has the solution but if we're golfing:
awk '$1=$NF=X;1' RS= file|xargs -n2
15 22
25 32
35 42
awk 'NR!=1{print $1,p} {p=$2}'