Get the line number of the first line matching second pattern - awk

Is it possible using awk or sed to get the line number of a line such that it is the first line matching a regex after another line matching another regex?
In other words:
Find line l1 matching regex r1. l1 is the first line matching r1.
Find line l2 below l1. l2 matches regex r2. l2 is the first line matching r2, ignoring lines l1 and above.
Clarification: By match I mean partial match, for most general solution.
A partial match can of course be turned into a full-word match with \<...\> or a full-line match with ^...$.
Example input:
- - '787928'
- stuff
- - '810790'
- more stuff
- - '787927'
- yet more stuff
- - '828055'
- some more stuff
- - '828472'
- some other stuff
If r1 is ^-.*787927.* and r2 is ^- I'd expect the output to be 7, i.e. the number of the line that says - - '828055'.

Input example :
world
zekfzlefkzl
fezekzevnkzjnz
hello
zeniznejkglz
world
eznkflznfkel
hello
zenilzligeegz
world
Command :
pat1="hello"; pat2="world";
awk -v pat1=$pat1 -v pat2=$pat2 '$0 ~ pat1{pat1_match = 1}($0 ~ pat2)&&pat1_match{print NR; exit}' <input>
Output :
6

For an input file that looks like this:
1 pat2
2 x
3 pat1
4 x
5 pat2
6 x
7 pat1
8 x
9 pat2
you could use sed as follows:
$ sed -n '/pat1/,${/pat2/{=;q;};}' infile
5
which works like this:
sed -n ' # suppress output with -n
/pat1/,$ { # for all lines from the first occurrence of "pat1" on...
/pat2/ { # if the line matches "pat2"
= # print line number
q # quit
}
}' infile
The above fails if the first occurrence of pat1 is on the same line as pat2:
1 pat2
2 x
3 pat1 pat2
4 x
5 pat2
6 x
7 pat1
8 x
9 pat2
would print 3. With GNU sed, we can use this instead:
$ sed -n '0,/pat1/!{/pat2/{=;q;};}' infile
5
sed -n ' # suppress output
0,/pat1/! { # for all lines after the first occurrence of "pat1"
/pat2/ { # if the line matches "pat2"
= # print line number
q # quit
}
}' infile
The 0 address is a GNU extension; using 1 instead would break if pat1 was on the first line.

This might work for you (GNU sed):
sed -n '/^-.*787927.*/{:a;n;/^-/!ba;=;q}' file
On encountering a line that begins -.*787927.*, start a loop that replaces the current line with the next, until a line begins - where on print the line number and quit.

Related

How do I print every nth entry of the mth column, starting from a particular line of a file?

Consider the following data in a file file.txt:
$
$
$
FORCE 10 30 40
* 1 5 4
FORCE 11 20 22
* 2 3 0
FORCE 19 25 10
* 16 12 8
.
.
.
I want to print every 2nd element of the third column, starting from line 4, resulting in:
30
20
25
I have tried:
cat file.txt | sed 's/\|/ /' | awk 'NR%2==4 {print $3}'
However, this is not resulting in anything being printed and no errors generated either.
You might use awk checking that the row number > 3 and then check for an even row number with NR%2==0.
Note that you don't have to use cat
awk 'NR > 3 && NR%2==0 {
print $3
}' file.txt
Output
30
20
25
Using sed
$ sed -En '4~2s/([^ \t]*[ \t]+){2}([^ \t]*).*/\2/p' input_file
30
20
25
I have tried:
cat file.txt | sed 's/\|/ /' | awk 'NR%2==4 {print $3}'
However, this is not resulting in anything being printed and no errors
generated either.
You do not need cat whilst using GNU sed as it can read file on its' own, in this case it would be sed 's/\|/ /' file.txt.
You should consider if you need that part at all, your sample input does not have pipe character at all, so it would do nothing to it. You might also drop that part if lines holding values you want to print do not have that character.
Output is empty as NR%2==4 does never hold, remainder of division by x is always smaller than x (in particular case of %2 only 2 values are possible: 0 and 1)
This might work for you (GNU sed):
sed -nE '4~2s/^((\S+)\s*){3}.*/\2/p' file
Turn off implicit printing by setting the -n option and reduce back slashes in regexps by turning on -E.
From the fourth line and then every second line thereafter, capture the third column and print it.
N.B. The \2 represents the last inhabitant of that back reference which in conjunction with the {3} means the above.
Alternative:
sed -n '4,${s/^\(\(\S\+\)\s*\)\{3\}.*/\2/p;n}' file

How to offset the line numbering from within `less` (i.e., specify starting number)?

Given the following ...
$ less -N file.txt
1 first line
2 second line
3 third line
4 fourth line
file.txt (END)
... I'd like to do something like this:
$ less -N --STARTING-NUMBER=0 file.txt
0 first line
1 second line
2 third line
3 fourth line
file.txt (END)
In other words, I'd like to be able to specify which value the line numbering starts from.
Note that this is possible using nl:
$ nl -v 0 file.txt
0 first line
1 second line
2 third line
3 fourth line
But -N in less can be toggled on and off without leaving less, whereas if I pipe the above into less, the line numbers could not be toggled off.
If less has something like nl's -v option - or there were any other way to achieve the same - that would be awesome. But I don't see it in the less(1) man pages.
Here comes a quick patch
--- line.c.bak 2022-12-07 22:30:28
+++ line.c 2022-12-28 01:40:08
## -367,7 +367,7 ##
len = 0;
else
{
- linenumtoa(linenum, buf);
+ linenumtoa(linenum - 1, buf);
len = (int) strlen(buf);
}
for (i = 0; i < linenum_width - len; i++)
or you make a feature request at https://github.com/gwsw/less/issues.

awk: Compare two sets of numbers (generated by random and strict rules)

I have many files containing some fixed words and numbers:
The FIRST SET of numbers has a fixed length of 7 digits: the first 4 of them being like a random prefix (in example are 100,200,300 but can be others..) we do not need it, we are interested for the remaining 4 digits.
The SECOND SET of number/s is generated number based on the last 4 digits from the FIRST SET (xxx7777 = 7777; xxx0066 = 66). You can see that the SECOND SET can NOT have leading zeros, they are cut out already and this is a rule.
Input
first second third 1007777 fourth 7777
...
first second third 2008341 fourth 8341
...
first second third 3000005 fourth 5
...
...
first second third 2008341 fourth 8
...
first second third 2008341 fourth 341
I found in other examples here - how to find interested lines using grep, but I didn't found AWK example doing what I want, because of the rule with the leading zeros maybe i'm having problems..
My attempt to find the wrong generations:
grep -Pr 'first second third' docs/test/*.txt | awk '{ if($4=$6) print $4 " " $6}'
7777 7777
8341 8341
5 5
8 8
341 341
The correct Output should look like this:
2008341 8
2008341 341
..only the problems (not right generated) lines and the filename.
Thanks ! :)
$ awk '/first second third/ && (substr($4,4)+0 != $NF) {print FILENAME, $4, $NF}' file
file 2008341 8
file 2008341 341
Call it as:
awk '...' docs/test/*.txt
or:
find docs -name '*.txt' -exec awk '...' {} \;
or similar as you see fit.
Use this gnu way, intented to be human readable and maintenable :
$ grep -r foobarbase . | awk '
{match($4, /[0-9]{4}$/, a); #1
a[0]=gensub(/^0+/, "", "g", a[0])} #2
$NF != a[0] #3
' file
Output :
first second third 2008341 fourth 8
first second third 2008341 fourth 341
Explanations :
#1 get the last 4 digits of column 4 and assign a array with match
#2 remove all leading 0
#3 if cutted part is different than last column, print (default awk behavior on true condition)

print whole variable contents if the number of lines are greater than N

How to print all lines if certain condition matches.
Example:
echo "$ip"
this is a sample line
another line
one more
last one
If this file has more than 3 lines then print the whole variable.
I am tried:
echo $ip| awk 'NR==4'
last one
echo $ip|awk 'NR>3{print}'
last one
echo $ip|awk 'NR==12{} {print}'
this is a sample line
another line
one more
last one
echo $ip| awk 'END{x=NR} x>4{print}'
Need to achieve this:
If this file has more than 3 lines then print the whole file. I can do this using wc and bash but need a one liner.
The right way to do this (no echo, no pipe, no loops, etc.):
$ awk -v ip="$ip" 'BEGIN{if (gsub(RS,"&",ip)>2) print ip}'
this is a sample line
another line
one more
last one
You can use Awk as follows,
echo "$ip" | awk '{a[$0]; next}END{ if (NR>3) { for(i in a) print i }}'
one more
another line
this is a sample line
last one
you can also make the value 3 configurable from an awk variable,
echo "$ip" | awk -v count=3 '{a[$0]; next}END{ if (NR>count) { for(i in a) print i }}'
The idea is to store the contents of the each line in {a[$0]; next} as each line is processed, by the time the END clause is reached, the NR variable will have the line count of the string/file you have. Print the lines if the condition matches i.e. number of lines greater than 3 or whatever configurable value using.
And always remember to double-quote the variables in bash to avoid undergoing word-splitting done by the shell.
Using James Brown's useful comment below to preserve the order of lines, do
echo "$ip" | awk -v count=3 '{a[NR]=$0; next}END{if(NR>3)for(i=1;i<=NR;i++)print a[i]}'
this is a sample line
another line
one more
last one
Another in awk. First test files:
$ cat 3
1
2
3
$ cat 4
1
2
3
4
Code:
$ awk 'NR<4{b=b (NR==1?"":ORS)$0;next} b{print b;b=""}1' 3 # look ma, no lines
[this line left intentionally blank. no wait!]
$ awk 'NR<4{b=b (NR==1?"":ORS)$0;next} b{print b;b=""}1' 4
1
2
3
4
Explained:
NR<4 { # for tghe first 3 records
b=b (NR==1?"":ORS) $0 # buffer them to b with ORS delimiter
next # proceed to next record
}
b { # if buffer has records, ie. NR>=4
print b # output buffer
b="" # and reset it
}1 # print all records after that

Print every second consequtive field in two columns - awk

Assume the following file
#zvview.exe
#begin Present/3
77191.0000 189.320100 0 0 3 0111110 16 1
-8.072430+6-8.072430+6 77190 0 1 37111110 16 2
37 2 111110 16 3
8.115068+6 0.000000+0 8.500000+6 6.390560-2 9.000000+6 6.803440-1111110 16 4
9.500000+6 1.685009+0 1.000000+7 2.582780+0 1.050000+7 3.260540+0111110 16 5
37 2 111110 16 18
What I would like to do, is print in two columns, the fields after line 6. This can be done using NR. The tricky part is the following : Every second field, should go in one column as well as adding an E before the sign, so that the output file will look like this
8.115068E+6 0.000000E+0
8.500000E+6 6.390560E-2
9.000000E+6 6.803440E-1
9.500000E+6 1.685009E+0
1.000000E+7 2.582780E+0
1.050000E+7 3.260540E+0
From the output file you see that I want to keep in $6 only length($6)=10 characters.
How is it possible to do it in awk?
can do all in awk but perhaps easier with the unix toolset
$ sed -n '6,7p' file | cut -c2-66 | tr ' ' '\n' | pr -2ats' '
8.115068+6 0.000000+0
8.500000+6 6.390560-2
9.000000+6 6.803440-1
9.500000+6 1.685009+0
1.000000+7 2.582780+0
1.050000+7 3.260540+0
Here is a awk only solution or comparison
$ awk 'NR>=6 && NR<=7{$6=substr($6,1,10);
for(i=1;i<=6;i+=2) {f[++c]=$i;s[c]=$(i+1)}}
END{for(i=1;i<=c;i++) print f[i],s[i]}' file
8.115068+6 0.000000+0
8.500000+6 6.390560-2
9.000000+6 6.803440-1
9.500000+6 1.685009+0
1.000000+7 2.582780+0
1.050000+7 3.260540+0
Perhaps shorter version,
$ awk 'NR>=6 && NR<=7{$6=substr($6,1,10);
for(i=1;i<=6;i+=2) print $i FS $(i+1)}' file
8.115068+6 0.000000+0
8.500000+6 6.390560-2
9.000000+6 6.803440-1
9.500000+6 1.685009+0
1.000000+7 2.582780+0
1.050000+7 3.260540+0
to convert format to standard scientific notation, you can pipe the result to
sed or embed something similar in awk script (using gsub).
... | sed 's/[+-]/E&/g'
8.115068E+6 0.000000E+0
8.500000E+6 6.390560E-2
9.000000E+6 6.803440E-1
9.500000E+6 1.685009E+0
1.000000E+7 2.582780E+0
1.050000E+7 3.260540E+0
With GNU awk for FIELDWIDTHS:
$ cat tst.awk
BEGIN { FIELDWIDTHS="9 2 9 2 9 2 9 2 9 2 9 2" }
NR>5 && NR<8 {
for (i=1;i<NF;i+=4) {
print $i "E" $(i+1), $(i+2) "E" $(i+3)
}
}
$ awk -f tst.awk file
8.115068E+6 0.000000E+0
8.500000E+6 6.390560E-2
9.000000E+6 6.803440E-1
9.500000E+6 1.685009E+0
1.000000E+7 2.582780E+0
1.050000E+7 3.260540E+0
If you really want to get rid of the leading blanks then there's various ways to do it (simplest being gsub(/ /,"",$<field number>) on the relevant fields) but I left them in because the above allows your output to line up properly if/when your numbers start with a -, like they do on line 4 of your sample input.
If you don't have GNU awk, get it as you're missing a LOT of extremely useful functionality.
I tried to combine #karafka 's answer using substr, so the following does the trick!
awk 'NR>=6 && NR<=7{$6=substr($6,1,10);for(i=1;i<=6;i+=2) print substr($i,1,8) "E" substr($i,9) FS substr($(i+1),1,8) "E" substr($(i+1),9)}' file
and the output is
8.115068E+6 0.000000E+0
8.500000E+6 6.390560E-2
9.000000E+6 6.803440E-1
9.500000E+6 1.685009E+0
1.000000E+7 2.582780E+0
1.050000E+7 3.260540E+0