Line number of last occurrence of a pattern with awk - awk

I am new to awk commands. I am trying a way to print the last line number for my pattern match.
I need to integrate that awk command in tcl script..
If someone has answer to it, please let me know.
exec awk -v search=$var {$0~search{print NR; exit}} file_name
I am using this to print the line number of first occurrence.

I would harness GNU AWK for this task following, let file.txt content be
12
15
120
150
1200
1500
then
awk '$0~"12"{n=NR}END{print n}' file.txt
output
5
Explanation: I am looking for last line containing 12 somewhere, when such line is encountered I set value of variable n to number of row (NR), when all lines of lines are processed I print said value.
(tested in gawk 4.2.1)

Or, without awk:
set fh [open file_name]
set lines [split [read $fh] \n]
close $fh
set line_nums [lmap idx [lsearch -all -regexp $lines with] {expr {$idx + 1}}]
set last_line_num [lindex $line_nums end]

With your shown samples and efforts please try following tac + awk code.
tac Input_file |
awk -v lines=$(wc -l < Input_file) '/12/{print lines-FNR+1;exit}'
Explanation:
Using tac command to print Input_file's output in reverse order from bottom to top(basically to get very last match very quickly at first and exit from awk program, explained later on).
Sending tac Input_file output to awk program as an input.
In awk program creating variable named lines which has total number of lines of Input_file and in main program checking if line contains 12 then printing lines-FNR+1 value and using exit exiting from program then, by which we need NOT to read whole Input_file.

Related

Print filenames & line number with number of fields greater than 'x'

I am running Ubuntu Linux. I am in need to print filenames & line numbers containing more than 7 columns. There are several hundred thousand files.
I am able to print the number of columns per file using awk. However the output I am after is something like
file1.csv-463 which is to suggest file1.csv has more than 7 records on line 463. I am using awk command awk -F"," '{print NF}' * to print the number of fields across all files.
Please could I request help?
If you have GNU awk with you, try following code then. This will simply check condition if NF is greater than 7 then it will print that particular file's file name along with line number and nextfile will take program to next Input_file which will save our time because we need not to read whole Input_file then.
awk -F',' 'NF>7{print FILENAME,FNR;nextfile}' *.csv
Above will print only very first match of condition to get/print all matched lines try following then:
awk -F',' 'NF>7{print FILENAME,FNR}' *.csv
This might work for you (GNU sed):
sed -Ens 's/\S+/&/8;T;F;=;p' *.csv | paste - - -
If there is no eighth column, break.
Output the file name F, the line number = and print the current line p.
Feed the output into a paste command which prints three lines as one.
N.B. The -s option resets the line numbers for each file, without it, it will number each line for the entire input.

Replace a letter with another from the last word from the last two lines of a text file

How could I possibly replace a character with another, selecting the last word from the last two lines of a text file in shell, using only a single command? In my case, replacing every occurrence of a with E from the last word only.
Like, from a text file containing this:
tree;apple;another
mango.banana.half
monkey.shelf.karma
to this:
tree;apple;another
mango.banana.hElf
monkey.shelf.kErmE
I tried using sed -n 'tail -2 'mytext.txt' -r 's/[a]+/E/*$//' but it doesn't work (my error: sed expression #1, char 10: unknown option to 's).
Could you please try following, tac + awk solution. Completely based on OP's samples only.
tac Input_file |
awk 'FNR<=2{if(/;/){FS=OFS=";"};if(/\./){FS=OFS="."};gsub(/a/,"E",$NF)} 1' |
tac
Output with shown samples is:
tree;apple;another
mango.banana.hElf
monkey.shelf.kErmE
NOTE: Change gsub to sub in case you want to substitute only very first occurrence of character a in last field.
This might work for you (GNU sed):
sed -E 'N;${:a;s/a([^a.]*)$/E\1/mg;ta};P;D' file
Open a two line window throughout the length of the file by using the N to append the next line to the previous and the P and D commands to print then delete the first of these. Thus at the end of the file, signified by the $ address the last two lines will be present in the pattern space.
Using the m multiline flag on the substitution command, as well as the g global flag and a loop between :a and ta, replace any a in the last word (delimited by .) by an E.
Thus the first pass of the substitution command will replace the a in half and the last a in karma. The next pass will match nothing in the penultimate line and replace the a in karmE. The third pass will match nothing and thus the ta command will fail and the last two lines will printed with the required changes.
If you want to use Sed, here's a solution:
tac input_file | sed -E '1,2{h;s/.*[^a-zA-Z]([a-zA-Z]+)/\1/;s/a/E/;x;s/(.*[^a-zA-Z]).*/\1/;G;s/\n//}' | tac
One tiny detail. In your question you say you want to replace a letter, but then you transform karma in kErme, so what is this? If you meant to write kErma, then the command above will work; if you meant to write kErmE, then you have to change it just a bit: the s/a/E/ should become s/a/E/g.
With tac+perl
$ tac ip.txt | perl -pe 's/\w+\W*$/$&=~tr|a|E|r/e if $.<=2' | tac
tree;apple;another
mango.banana.hElf
monkey.shelf.kErmE
\w+\W*$ match last word in the line, \W* allows any possible trailing non-word characters to be matched as well. Change \w and \W accordingly if numbers and underscores shouldn't be considered as word characters - for ex: [a-zA-Z]+[^a-zA-Z]*$
$&=~tr|a|E|r change all a to E only for the matched portion
e flag to enable use of Perl code in replacement section instead of string
To do it in one command, you can slurp the entire input as single string (assuming this'll fit available memory):
perl -0777 -pe 's/\w+\W*$(?=(\n.*)?\n\z)/$&=~tr|a|E|r/gme'
Using GNU awk forsplit() 4th arg since in the comments of another solution the field delimiter is every sequence of alphanumeric and numeric characters:
$ gawk '
BEGIN {
pc=2 # previous counter, ie how many are affected
}
{
for(i=pc;i>=1;i--) # buffer to p hash, a FIFO
if(i==pc && (i in p)) # when full, output
print p[i]
else if(i in p) # and keep filling
p[i+1]=p[i] # above could be done using mod also
p[1]=$0
}
END {
for(i=pc;i>=1;i--) {
n=split(p[i],t,/[^a-zA-Z0-9\r]+/,seps) # split on non alnum
gsub(/a/,"E",t[n]) # replace
for(j=1;j<=n;j++) {
p[i]=(j==1?"":p[i] seps[j-1]) t[j] # pack it up
}
print p[i] # output
}
}' file
Output:
tree;apple;another
mango.banana.hElf
monkey.shelf.kErmE
Would this help you ? on GNU awk
$ cat file
tree;apple;another
mango.banana.half
monkey.shelf.karma
$ tac file | awk 'NR<=2{s=gensub(/(.*)([.;])(.*)$/,"\\3",1);gsub(/a/,"E",s); print gensub(/(.*)([.;])(.*)$/,"\\1\\2",1) s;next}1' | tac
tree;apple;another
mango.banana.hElf
monkey.shelf.kErmE
Better Readable version :
$ tac file | awk 'NR<=2{
s=gensub(/(.*)([.;])(.*)$/,"\\3",1);
gsub(/a/,"E",s);
print gensub(/(.*)([.;])(.*)$/,"\\1\\2",1) s;
next
}1' | tac
With GNU awk you can set FS with the two separators, then gsub for the replacement in $3, the third field, if NR>1
awk -v FS=";|[.]" 'NR>1 {gsub("a", "E",$3)}1' OFS="." file
tree;apple;another
mango.banana.hElf
monkey.shelf.kErmE
With GNU awk for the 3rd arg to match() and gensub():
$ awk -v n=2 '
NR>n { print p[NR%n] }
{ p[NR%n] = $0 }
END {
for (i=0; i<n; i++) {
match(p[i],/(.*[^[:alnum:]])(.*)/,a)
print a[1] gensub(/a/,"E","g",a[2])
}
}
' file
tree;apple;another
mango.banana.hElf
monkey.shelf.kErmE
or with any awk:
awk -v n=2 '
NR>n { print p[NR%n] }
{ p[NR%n] = $0 }
END {
for (i=0; i<n; i++) {
match(p[i],/.*[^[:alnum:]]/)
lastWord = substr(p[i],1+RLENGTH)
gsub(/a/,"E",lastWord )
print substr(p[i],1,RLENGTH) lastWord
}
}
' file
If you want to do it for the last 50 lines of a file instead of the last 2 lines just change -v n=2 to -v n=50.
The above assumes there are at least n lines in your input.
You can let sed repeat changing an a into E only for the last word with a label.
tac mytext.txt| sed -r ':a; 1,2s/a(\w*)$/E\1/; ta' | tac

Delete third-to-last line of file using sed or awk

I have several text files with different row numbers and I have to delete in all of them the third-to-last line . Here is a sample file:
bear
horse
window
potato
berry
cup
Expected result for this file:
bear
horse
window
berry
cup
Can we delete the third-to-last line of a file:
a. not based on any string/pattern.
b. based only on a condition that it has to be the third-to-last line
I have problem on how to index my files beginning from the last line. I have tried this from another SO question for the second-to-last line:
> sed -i 'N;$!P;D' output1.txt
With tac + awk solution, could you please try following. Just set line variable of awk to line(from bottom) whichever you want to skip.
tac Input_file | awk -v line="3" 'line==FNR{next} 1' | tac
Explanation: Using tac will read the Input_file reverse(from bottom line to first line), passing its output to awk command and then checking condition if line is equal to line(which we want to skip) then don't print that line, 1 will print other lines.
2nd solution: With awk + wc solution, kindly try following.
awk -v lines="$(wc -l < Input_file)" -v skipLine="3" 'FNR!=(lines-skipLine+1)' Input_file
Explanation: Starting awk program here and creating a variable lines which has total number of lines present in Input_file in it. variable skipLine has that line number which we want to skip from bottom of Input_file. Then in main program checking condition if current line is NOT equal to lines-skipLine+1 then printing the lines.
3rd solution: Adding solution as per Ed sir's comment here.
awk -v line=3 '{a[NR]=$0} END{for (i=1;i<=NR;i++) if (i != (NR-line)) print a[i]}' Input_file
Explanation: Adding detailed explanation for 3rd solution.
awk -v line=3 ' ##Starting awk program from here, setting awk variable line to 3(line which OP wants to skip from bottom)
{
a[NR]=$0 ##Creating array a with index of NR and value is current line.
}
END{ ##Starting END block of this program from here.
for(i=1;i<=NR;i++){ ##Starting for loop till value of NR here.
if(i != (NR-line)){ ##Checking condition if i is NOT equal to NR-line then do following.
print a[i] ##Printing a with index i here.
}
}
}
' Input_file ##Mentioning Input_file name here.
With ed
ed -s ip.txt <<< $'$-2d\nw'
# thanks Shawn for a more portable solution
printf '%s\n' '$-2d' w | ed -s ip.txt
This will do in-place editing. $ refers to last line and you can specify a negative relative value. So, $-2 will refer to last but second line. w command will then write the changes.
See ed: Line addressing for more details.
This might work for you (GNU sed):
sed '1N;N;$!P;D' file
Open a window of 3 lines in the file then print/delete the first line of the window until the end of the file.
At the end of the file, do not print the first line in the window i.e. the 3rd line from the end of the file. Instead, delete it, and repeat the sed cycle. This will try to append a line after the end of file, which will cause sed to bail out, printing the remaining lines in the window.
A generic solution for n lines back (where n is 2 or more lines from the end of the file), is:
sed ':a;N:s/[^\n]*/&/3;Ta;$!P;D' file
Of course you could use:
tac file | sed 3d | tac
But then you would be reading the file 3 times.
To delete the 3rd-to-last line of a file, you can use head and tail:
{ head -n -3 file; tail -2 file; }
In case of a large input file, when perfomance matters, this is very fast, because it doesn't read and write line by line. Also, do not modify the semicolons and the spaces next to the brackets, see about commands grouping.
Or use sed with tac:
tac file | sed '3d' | tac
Or use awk with tac:
tac file | awk 'NR!=3' | tac

Saving last value of awk as variable

I am using awk to parse a file and create new files (1...N) in the following way.
awk -F ';' '{gsub(/[[:blank:]]/,"");for(i=1;i<=NF;i++)print $i>NR}' file
This does what I need to do, but how do I save the last value of the for loop as a variable, consistent with the above? So for example, if i loops to 6, I want to set variable=6.
You want to save last value of for loop which I believe is number of fields if yes then please try following.
var=$(awk -F ';' '{gsub(/[[:blank:]]/,"");for(i=1;i<=NF;i++)print $i>NR;value=NF} END{print value}' file)
In case you want to save value for last line number(total lines in Input_file) then try following.
var=$(awk -F ';' '{gsub(/[[:blank:]]/,"");for(i=1;i<=NF;i++)print $i>NR;value=count++} END{print count}' Input_file)
OR in case your awk supports FNR in END block then simply do:
var=$(awk -F ';' '{gsub(/[[:blank:]]/,"");for(i=1;i<=NF;i++)print $i>NR} END{print FNR}' Input_file)
Note: OP haven't mentioned about it but thought to put it here, in case there are too many files getting created then it will be wise to use close command in awk to close them in background too by using close(NR) just an example here.

awk printing the second to last record of a file

I have a file set up like
Words on
many line
%
More Words
on many lines
%
Even More Words
on many lines
%
and I would like to output the second to last record of this file where the record is delimited by % after each block of text.
I have used:
awk -v RS=\% ' END{ print NR }' $f
to find the number of records (1136). Then I did
awk -v RS=\% ' { print $(NR-1) }' $f
and
awk -v RS=\% ' { print $(NR=1135) }' $f
.
Neither of these worked, and, instead, displayed a record towards the beginning of the file and a many blank lines.
OUTPUT:
"You know, of course, that the Tasmanians, who never committed adultery, are
now extinct."
-- M. Somerset Maugham
"The
is
what
that
This output had many, many more blank lines and contains a record near the middle of the file.
awk -v RS=\% 'END{ print $(NR-1) }' $f
returns a blank line. The same command with different $(NR-x) values also returns a blank line.
Can someone help me to print the second to last record in this case?
Thanks
You can do:
awk '{this=last;last=$0} END{print this}' file
Or, if you don't mind having the entire file in memory:
awk '{a[NR]=$0} END{print a[NR-1]}' file
Or, if it is just line count (or record count) based, you can keep a rolling deletion going so you are not too piggish on memory:
$ seq 999999 | tail -2
999998
999999
$ seq 999999 | awk '{a[NR]=$0; delete a[NR-3]} END{print a[NR-1]}'
999998
If they are blocks of text the same method works if you can separate the blocks into delimited records.
Given:
$ echo "$txt"
Words on
many line
%
More Words
on many lines
%
Even More Words
on many lines
%
You can do:
$ echo "$txt" | awk -v RS=\% '{a[NR]=$0} END{print a[NR-1]}'
Even More Words
on many lines
$ echo "$txt" | awk -v RS=\% '{a[NR]=$0} END{print a[NR-2]}'
More Words
on many lines
If you want to not print the leading and trailing \n you can do:
$ echo "$txt" | awk 'BEGIN{RS="%\n"} {a[NR]=$0} END{printf a[NR-2]}'
Words on
many line
Finally, if you know the specific record you want to print, do it this way in awk:
$ seq 999999 | awk -v mrk=1135 'NR==mrk{print; exit}'
1135
If you want a random record, you can do:
$ awk -v min=1 -v max=1135 'BEGIN{srand()
RS="%\n"
tgt=int(min+rand()*(max-min+1))
}
NR==tgt{print; exit}' file
Does the solution have to be with awk? Just using head and tail would be simpler.
tail -2 file.txt | head 1 > justthatline.txt
The best way for this would be to use the BEGIN construct.
awk 'BEGIN{RS="%\n"; ORS="%\n"}(NR>=2){print}' file
RS and ORS set the input file and output record separators respectively.