Please explain this awk script

Please explain this awk script - awk

echo "45" | awk 'BEGIN{FS=""}{for (i=1;i<=NF;i++)x+=$i}END{print x}'
I want to know how this works,what specifically does awk Fs,NF do here?

FS is the field separator. Setting it to "" (the empty string) means that every single character will be a separate field. So in your case you've got two fields: 4, and 5.
NF is the number of fields in a given record. In your case, that's 2. So i ranges from 1 to 2, which means that $i takes the values 4 and 5.
So this AWK script iterates over the characters and prints their sum — in this case 9.

These are built-in variables, FS being Field Separator - blank meaning split each character out. NF being Num Fields split by FS... so in this case num of chars, 2. So split the input by each character ("4", "5"), iterate each char (2) while adding their values up, print the result.
http://www.thegeekstuff.com/2010/01/8-powerful-awk-built-in-variables-fs-ofs-rs-ors-nr-nf-filename-fnr/

FS is the field separator. Normally fields are separated by whitespace, but when you set FS to the null string, each character of the input line is a separate field.
NF is the number of fields in the current input line. Since each character is a field, in this case it's the number of characters.
The for loop then iterates over each character on the line, adding it to x. So this is adding the value of each digit in input; for 45 it adds 4+5 and prints 9.

Related

awk/sed replace multiple newlines in the record except end of record

I have file where:
field delimiter is \x01
the record delimiter is \n
Some lines contain multiple newlines I need to remove them, however I don't want to remove the legitimate newlines at the end of each lines. I have tried this with awk:
awk -F '\x01' 'NF < 87 {getline s; $0 = $0 s} 1' infile > outfile
But this is only working when the line contains one newline in the record (except end of line newline). This does not work for multiple newlines.
Note: the record contains 87 fields.
What am I doing wrong here?
Example of file:
PL^ANov-21^A29-11-2021^A0^A00^A00^A0000000
test^A00000000
Test^A^A^A^A
PL^ANov-21^A29-11-2021^A0^A00^A00^A0000000
test^A00000000
Test^A^A^A^A
SL^ANov-21^A30-11-2021^AB^A0000^A1234567^A00000
test^A12102120^A00000^A00^A^A
NOTE: The file contains 11 fields; field separate \x01; record separator \n
Expected result:
PL^ANov-21^A29-11-2021^A0^A00^A00^A0000000test^A00000000 Test^A^A^A^A
PL^ANov-21^A29-11-2021^A0^A00^A00^A0000000test^A00000000 Test^A^A^A^A
SL^ANov-21^A30-11-2021^AB^A0000^A1234567^A00000test^A12102120^A00000^A00^A^A
Note: I need to preserve the field delimiter (\x01) and record delimiter (\n)
Thank you very much in advance for looking into this.
The file always contains 87 fields;
The fild delimiter is '\x01', but when viewing in Linux it is represented as '^A'
Some lines contain newlines - I need to remove them, but I don't want to remove the legitimate newlines at the end of each line.
The newline appears twice in the 1st and second record and once in third record - this are the newlines I want to remove.
In the examples/expected results there are 11 delimiters "x01" represented as "^A",
I expect to have 3 records and not 6, i.e.:
First record:
test^A00000000 should be joined to the previous line
Test^A^A^A^A should be joined to the first line as well
forming one record:
PL^ANov-21^A29-11-2021^A0^A00^A00^A0000000test^A00000000 Test^A^A^A^A
Second record
test^A00000000 should be joined to the previous line
Test^A^A^A^A should be joined to that previous line as well
forming one record:
PL^ANov-21^A29-11-2021^A0^A00^A00^A0000000test^A00000000 Test^A^A^A^A
Third record:
test^A12102120^A00000^A00^A^A should be joined to the previous line
forming one record:
SL^ANov-21^A30-11-2021^AB^A0000^A1234567^A00000test^A12102120^A00000^A00^A^A
Note:
The example of awk - provided works when there is one unwanted newline in the record but not when there are multiple newlines
Thank you so very much. It works perfectly. Thank you for explaining it so well to me too.

This might work for you (GNU sed):
sed ':a;N;s/\x01/&/87;Ta;s/\n//g' file
Gather up lines until there are 87 separators, remove any newlines and print the result.

What's wrong with your attempt is that you concatenate two lines, print the result and move to the next line. NF is then reset to the next fields count. As all your lines have less than 87 fields the NF < 87 condition is useless, your script would work the same without it.
Try this awk script:
$ awk -F'\x01' -vn=87 -vi=0 '
{printf("%s", $0); i+=NF; if(i==n) {i=0; print "";} else i-=1;}' file
Here, we use the real \x01 field separator and the NF fields count. Variable i counts the number of already printed fields. We first print the current line without the trailing newline (printf("%s", $0)). Then we update our i fields counter. If it is equal to n we reset it and print a newline. Else we decrement it such that we do not count the last field of this line and the first of the next as 2 separate fields.
Demo with n=12 instead of 87 and your own input file (with \x01 field separators):
$ awk -F'\x01' -vn=12 -vi=0 '
{printf("%s", $0); i+=NF; if(i==n) {i=0; print "";} else i-=1;}' file |
sed 's/\x01/|/g'
PL|Nov-21|29-11-2021|0|00|00|0000000test|00000000 Test||||
PL|Nov-21|29-11-2021|0|00|00|0000000test|00000000 Test||||
SL|Nov-21|30-11-2021|B|0000|1234567|00000test|12102120|00000|00||
The sed command shows the result with the \x01 replaced by | for easier viewing.

AWK, rename string only on n-th line after pattern match

Hi i have problem i cannot rename just only 1st occurence placed on 9th row of string after/since matched pattern
this is input (file containing 30k lines):
This is pattern
patternvalue=dom.value.5.row.2
design=12
face=x1-m
omit=11
mode=OFF
option=955
display=x1-11-OFF
type=2
name=8a9s7fa645sdf
resolution=0
prio=OK
number of pattern values:
pattern values
id=hex00.EA
name=4fda6sd4f
number of pattern values:
id=hex00.EF
name=as7e8w87e
patternvalue=dom.value.5.row.8
design=1
face=x1-n
omit=12
mode=OFF
option=95
display=x1-22-ON
type=2
name=8a9sad8f
resolution=0
prio=OK
number of pattern values:
pattern values
id=hex00.0A
name=dsf79
number of pattern values:
id=hex00.AA
name=777777s
number of pattern values:
id=hex00.BB
name=777777l
number of pattern values:
id=hex00.CC
name=777777m
i tried this, but its renaming on all strings "name"
awk '/This is pattern/ && NR==10 ; sub(/name/,"patternname")1' num
"_https://stackoverflow.com/questions/51678717/print-mth-column-of-nth-line-after-a-match-if-found-in-a-file-using-awk"
This is expected output:
This is pattern
patternvalue=dom.value.5.row.2
design=12
face=x1-m
omit=11
mode=OFF
option=955
display=x1-11-OFF
type=2
patternname=8a9s7fa645sdf
resolution=0
prio=OK
number of pattern values:
pattern values
id=hex00.EA
name=4fda6sd4f
number of pattern values:
id=hex00.EF
name=as7e8w87e
This is pattern
patternvalue=dom.value.5.row.8
design=1
face=x1-n
omit=12
mode=OFF
option=95
display=x1-22-ON
type=2
patternname=8a9sad8f
resolution=0
prio=OK
number of pattern values:
pattern values
id=hex00.0A
name=dsf79
number of pattern values:
id=hex00.AA
name=777777s
number of pattern values:
id=hex00.BB
name=777777l
number of pattern values:
id=hex00.CC
name=777777m
Thank you for any hints

Something like this should be ok:
awk '/This is pattern/{n=NR};NR==n+9{sub(/name/,"patternname")};1'
Some comments about your try.
You wrote
awk '/This is pattern/ && NR==10 ; sub(/name/,"patternname")1' num
awk commands follow the pattern
condition1{action1};condition2{action2};....
If the {action} part is missing, the default action is {print}.
If the condition part is missing, the default condition is 1 (=always true)
As a result , your awk script equals to this:
awk '/This is pattern/ && NR==10{print};sub(/name/,"patternname"){print};1{print}'
Moreover, awk internal variable NR holds the line number being processed of the input file.
As a result the first part of your script '/This is pattern/ && NR==10{print} prints the line only when This is pattern found and NR (line number) is 10 meaning never in your case.
The second part of your script sub(/name/,"patternname"){print}, uses the function sub as a condition to print the line.
So for every line being processed, sub tries to replace name with patternname. if this replacement is sucessfull , the line is then {print}.
The third part of your script 1{print} , prints all the other lines, since the condition is 1 (always true).
About my solution:
First part /This is pattern/{n=NR}, holds in a temp variable n the line number NR in which This is pattern was found.
Second part NR==n+9{sub(/name/,"patternname")} compares NR (line number being processed by awk) to n+9 (line number of This is Pattern + 9 more lines), and when this condition becomes true, then name is replaced by patternname using sub which is enclosed in {...} to dictate that this is the action part of the NR==n+9 condition.
Third part 1 , just prints all the other lines (condition 1==true , action is missing , default action {print} is performed)

awk greater than why show string value?

I am using this command
awk '$1 > 3 {print $1}' file;
file :
String
2
4
5
6
7
String
output this;
String
4
5
6
7
String
Why result does not been only numbers as below,
4
5
6
7

This happens because one side of the comparison is a string, so awk is doing string comparison and the numeric value of the character 'S' is greater than 3.
$ printf "3: %d S: %d\n" \'3 \'S
3: 51 S: 83
Note: the ' before the arguments passed to printf are important, as they trigger the conversion to the numeric value in the underlying codeset:
If the leading character is a single-quote or double-quote, the value shall be the numeric value in the underlying codeset of the character following the single-quote or double-quote.
We write \' so that the ' is passed to printf, rather than being interpreted as syntax by the shell (a plain ' would open/close a string literal).
Returning to the question, to get the desired behaviour, you need to convert the first field to a number:
awk '+$1 > 3 { print $1 }' file
I am using the unary plus operator to convert the field to a number. Alternatively, some people prefer to simply add 0.

Taken from the awk user guide...
ftp://ftp.gnu.org/old-gnu/Manuals/gawk-3.0.3/html_chapter/gawk_8.html
When comparing operands of mixed types, numeric operands are converted
to strings using the value of CONVFMT. ... CONVFMT's default value is
"%.6g", which prints a value with at least six significant digits.
So, basically they are all treated as strings, and "String" Happens to be greater than "3".

gawk to create first column based on part of second column

I have a 2 column tsv that I need to insert a new first column using part of the value in column 2.
What I have:
fastq/D0110.L001_R1_001.fastq fastq/D0110.L001_R2_001.fastq
fastq/D0206.L001_R1_001.fastq fastq/D0206.L001_R2_001.fastq
fastq/D0208.L001_R1_001.fastq fastq/D0208.L001_R2_001.fastq
What I want:
D0110 fastq/D0110.L001_R1_001.fastq fastq/D0110.L001_R2_001.fastq
D0206 fastq/D0206.L001_R1_001.fastq fastq/D0206.L001_R2_001.fastq
D0208 fastq/D0208.L001_R1_001.fastq fastq/D0208.L001_R2_001.fastq
I want to pull everything between "fastq/" and the first period and print that as the new first column.

$ awk -F'[/.]' '{printf "%s\t%s\n",$2,$0}' file
D0110 fastq/D0110.L001_R1_001.fastq fastq/D0110.L001_R2_001.fastq
D0206 fastq/D0206.L001_R1_001.fastq fastq/D0206.L001_R2_001.fastq
D0208 fastq/D0208.L001_R1_001.fastq fastq/D0208.L001_R2_001.fastq
How it works
awk implicitly loops over all input lines.
-F'[/.]'
This tells awk to use any occurrence of / or . as a field separator. This means that, for your input, the string you are looking for will be the second field.
printf "%s\t%s\n",$2,$0
This tells awk to print the second field ($2), followed by a tab (\t), followed by the input line ($0), followed by a newline character (\n)

How to get a field by counting the column ( number of character)

I have a logfile.txt and I want to specify the filed $4 but based on number of column not number of field because the fields are separated by spaces characters and the field 2 ($2) may contain a values separated by space. I want to count lines but I don't know how specify $4 without causing a problem if the field 2 ($2) contain a space character.
here is my file:
KJKJJ1KLJKJKJ928482711 PIEJHHKIA 87166188177633 AJHHHH77760 00666667 876876800874 2014100898798789979879877770
KJKJJ1KLJKJKJ928482711 HKHG 81882776553868 HGHALJLKA700 00876763 216897879879 2014100898798789979879877770
KJKJJ1KLJKJKJ928482711 UUT UGGT 81762665356426 HGJHGHJG661557008 00778787 268767860704 2014100898798789979879877770
KJKJJ1KLJKJKJ9284827kj ARTH HGG 08276255534867 HGJHGHJG661557008 00876767 212668767684 2014100898798789979879877770
here is the code :
awk 'END { OFS="\t"; for (k in c) print c[k],"\t"k,"\t"f[k] } { k = $4 c[k]++; f[k]=substr($0,137,8) }' logfile.txt
I WANT TO COUNT BASED ON field $4. but to specify this field in code we must based on number of character (substr ($0,..,..) :
the output shold be :
1 20141008 AJHHHH77760
1 20141008 HGHALJLKA700
2 20141008 HGJHGHJG661557008

If your records are composed of fixed width fields you can use cut(1)
% cut -c1-22,23-42,43-62,... --output-delimiter=, file | sed 's/, */,/g' > file.csv
% awk -F, '{your_code}' file.csv
please write a range for each of your fixed width fields, in place of the ... ellipsis.
I have written ranges only for the first three, lazy me.
If you don't want to bother with an intermediate file, just use a | pipe.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Please explain this awk script - awk

echo "45" | awk 'BEGIN{FS=""}{for (i=1;i<=NF;i++)x+=$i}END{print x}' I want to know how this works,what specifically does awk Fs,NF do here?

Related

awk/sed replace multiple newlines in the record except end of record

AWK, rename string only on n-th line after pattern match

awk greater than why show string value?

gawk to create first column based on part of second column

How to get a field by counting the column ( number of character)

Categories

Resources