AWK, rename string only on n-th line after pattern match - awk

Hi i have problem i cannot rename just only 1st occurence placed on 9th row of string after/since matched pattern
this is input (file containing 30k lines):
This is pattern
patternvalue=dom.value.5.row.2
design=12
face=x1-m
omit=11
mode=OFF
option=955
display=x1-11-OFF
type=2
name=8a9s7fa645sdf
resolution=0
prio=OK
number of pattern values:
pattern values
id=hex00.EA
name=4fda6sd4f
number of pattern values:
id=hex00.EF
name=as7e8w87e
patternvalue=dom.value.5.row.8
design=1
face=x1-n
omit=12
mode=OFF
option=95
display=x1-22-ON
type=2
name=8a9sad8f
resolution=0
prio=OK
number of pattern values:
pattern values
id=hex00.0A
name=dsf79
number of pattern values:
id=hex00.AA
name=777777s
number of pattern values:
id=hex00.BB
name=777777l
number of pattern values:
id=hex00.CC
name=777777m
i tried this, but its renaming on all strings "name"
awk '/This is pattern/ && NR==10 ; sub(/name/,"patternname")1' num
"_https://stackoverflow.com/questions/51678717/print-mth-column-of-nth-line-after-a-match-if-found-in-a-file-using-awk"
This is expected output:
This is pattern
patternvalue=dom.value.5.row.2
design=12
face=x1-m
omit=11
mode=OFF
option=955
display=x1-11-OFF
type=2
patternname=8a9s7fa645sdf
resolution=0
prio=OK
number of pattern values:
pattern values
id=hex00.EA
name=4fda6sd4f
number of pattern values:
id=hex00.EF
name=as7e8w87e
This is pattern
patternvalue=dom.value.5.row.8
design=1
face=x1-n
omit=12
mode=OFF
option=95
display=x1-22-ON
type=2
patternname=8a9sad8f
resolution=0
prio=OK
number of pattern values:
pattern values
id=hex00.0A
name=dsf79
number of pattern values:
id=hex00.AA
name=777777s
number of pattern values:
id=hex00.BB
name=777777l
number of pattern values:
id=hex00.CC
name=777777m
Thank you for any hints

Something like this should be ok:
awk '/This is pattern/{n=NR};NR==n+9{sub(/name/,"patternname")};1'
Some comments about your try.
You wrote
awk '/This is pattern/ && NR==10 ; sub(/name/,"patternname")1' num
awk commands follow the pattern
condition1{action1};condition2{action2};....
If the {action} part is missing, the default action is {print}.
If the condition part is missing, the default condition is 1 (=always true)
As a result , your awk script equals to this:
awk '/This is pattern/ && NR==10{print};sub(/name/,"patternname"){print};1{print}'
Moreover, awk internal variable NR holds the line number being processed of the input file.
As a result the first part of your script '/This is pattern/ && NR==10{print} prints the line only when This is pattern found and NR (line number) is 10 meaning never in your case.
The second part of your script sub(/name/,"patternname"){print}, uses the function sub as a condition to print the line.
So for every line being processed, sub tries to replace name with patternname. if this replacement is sucessfull , the line is then {print}.
The third part of your script 1{print} , prints all the other lines, since the condition is 1 (always true).
About my solution:
First part /This is pattern/{n=NR}, holds in a temp variable n the line number NR in which This is pattern was found.
Second part NR==n+9{sub(/name/,"patternname")} compares NR (line number being processed by awk) to n+9 (line number of This is Pattern + 9 more lines), and when this condition becomes true, then name is replaced by patternname using sub which is enclosed in {...} to dictate that this is the action part of the NR==n+9 condition.
Third part 1 , just prints all the other lines (condition 1==true , action is missing , default action {print} is performed)

Related

Awk pattern matching on rows that have a value at specific column. No delimiter

I would like to search a file, using awk, to output rows that have a value commencing at a specific column number. e.g.
I looking for 979719 starting at column number 10:
moobaaraa**979719**
moobaaraa123456
moo**979719**123456
moobaaraa**979719**
moobaaraa123456
As you can see, there are no delimiters. It is a raw data text file. I would like to output rows 1 and 4. Not row 3 which does contain the pattern but not at the desired column number.
awk '/979719$/' file
moobaaraa979719
moobaaraa979719
An simple sed approach.
$ cat file
moobaaraa979719
moobaaraa123456
moo979719123456
moobaaraa979719
moobaaraa123456
Just search for a pattern, that end's up with 979719 and print the line:
$ sed -n '/^.*979719$/p' file
moobaaraa979719
moobaaraa979719
This code works:
awk 'length($1) == 9' FS="979719" raw-text-file
This code sets 979719 as the field separator, and checks whether the first field has a length of 9 characters. Then prints the line (as default action).
awk 'substr($0,10,6) == 979719' file
You can drop the ,6 if you want to search from the 10th char to the end of each line.

gawk to create first column based on part of second column

I have a 2 column tsv that I need to insert a new first column using part of the value in column 2.
What I have:
fastq/D0110.L001_R1_001.fastq fastq/D0110.L001_R2_001.fastq
fastq/D0206.L001_R1_001.fastq fastq/D0206.L001_R2_001.fastq
fastq/D0208.L001_R1_001.fastq fastq/D0208.L001_R2_001.fastq
What I want:
D0110 fastq/D0110.L001_R1_001.fastq fastq/D0110.L001_R2_001.fastq
D0206 fastq/D0206.L001_R1_001.fastq fastq/D0206.L001_R2_001.fastq
D0208 fastq/D0208.L001_R1_001.fastq fastq/D0208.L001_R2_001.fastq
I want to pull everything between "fastq/" and the first period and print that as the new first column.
$ awk -F'[/.]' '{printf "%s\t%s\n",$2,$0}' file
D0110 fastq/D0110.L001_R1_001.fastq fastq/D0110.L001_R2_001.fastq
D0206 fastq/D0206.L001_R1_001.fastq fastq/D0206.L001_R2_001.fastq
D0208 fastq/D0208.L001_R1_001.fastq fastq/D0208.L001_R2_001.fastq
How it works
awk implicitly loops over all input lines.
-F'[/.]'
This tells awk to use any occurrence of / or . as a field separator. This means that, for your input, the string you are looking for will be the second field.
printf "%s\t%s\n",$2,$0
This tells awk to print the second field ($2), followed by a tab (\t), followed by the input line ($0), followed by a newline character (\n)

Print lines containing the same second field for more than 3 times in a text file

Here is what I am doing.
The text file is comma separated and has three field,
and I want to extract all the line containing the same second field
more than three times.
Text file (filename is "text"):
11,keyword1,content1
4,keyword1,content3
5,keyword1,content2
6,keyword2,content5
6,keyword2,content5
7,keyword1,content4
8,keyword1,content2
1,keyword1,content2
My command is like below. cat the whole text file inside awk and grep with the second field of each line and count the number of the line.
If the number of the line is greater than 2, print the whole line.
The command:
awk -F "," '{ "cat text | grep "$2 " | wc -l" | getline var; if ( 2 < var ) print $0}' text
However, the command output contains only first three consecutive lines,
instead of printing also last three lines containing "keyword1" which occurs in the text for six times.
Result:
11,keyword1,content1
4,keyword1,content3
5,keyword1,content2
My expected result:
11,keyword1,content1
4,keyword1,content3
5,keyword1,content2
7,keyword1,content4
8,keyword1,content2
1,keyword1,content2
Can somebody tell me what I am doing wrong?
It is relatively straight-forward to make just two passes over the file. In the first pass, you count the number of occurrences of each value in column 2. In the second pass, you print out the rows where the value in column 2 occurs more than your threshold value of 3 times.
awk -F, 'FNR == NR { count[$2]++ }
FNR != NR { if (count[$2] > 3) print }' text text
The first line of code handles the first pass; it counts the occurrences of each different value of the second column.
The second line of code handles the second pass; if the value in column 2 was counted more than 3 times, print the whole line.
This doesn't work if the input is only available on a pipe rather than as a file (so you can't make two passes over the data). Then you have to work much harder.

Print rows whose last field is negative

The last column of my file contains both negative and positive numbers:
a, b, -1
c, d, 2
e, f, -3
I need to extract the lines whose last field contains a negative number. Currently, I am using the following:
awk '/-/{print}' in.csv>out.csv
The above fails if '-' appears in other columns. I wonder if there is a way to test the last field in each row to see if they are negative and then extract the line.
Just tell awk to do...
awk -F, '$NF < 0' file
This sets the field separator to the comma (it looks like this is what you need) and then checks if $NF is lower than 0. And what is $NF? The last field, since NF contains the number of fields and $i points to the field number i.
The line is then printed, because a True condition triggers the default awk action, consisting in printing the current record.

Please explain this awk script

echo "45" | awk 'BEGIN{FS=""}{for (i=1;i<=NF;i++)x+=$i}END{print x}'
I want to know how this works,what specifically does awk Fs,NF do here?
FS is the field separator. Setting it to "" (the empty string) means that every single character will be a separate field. So in your case you've got two fields: 4, and 5.
NF is the number of fields in a given record. In your case, that's 2. So i ranges from 1 to 2, which means that $i takes the values 4 and 5.
So this AWK script iterates over the characters and prints their sum — in this case 9.
These are built-in variables, FS being Field Separator - blank meaning split each character out. NF being Num Fields split by FS... so in this case num of chars, 2. So split the input by each character ("4", "5"), iterate each char (2) while adding their values up, print the result.
http://www.thegeekstuff.com/2010/01/8-powerful-awk-built-in-variables-fs-ofs-rs-ors-nr-nf-filename-fnr/
FS is the field separator. Normally fields are separated by whitespace, but when you set FS to the null string, each character of the input line is a separate field.
NF is the number of fields in the current input line. Since each character is a field, in this case it's the number of characters.
The for loop then iterates over each character on the line, adding it to x. So this is adding the value of each digit in input; for 45 it adds 4+5 and prints 9.