Explain this awk command - awk

Please explain what exactly this awk command does:
awk '$0!~/^$/{print $0}'

It removes blank lines. The condition is $0 (the whole line) does not match !~ the regexp /^$/ (the beginning of the line immediately followed by the end of the line).
Similar to grep -v '^$'

It prints non-empty input lines. Note: "Empty" does not mean "blank", in this case.

Your example could be rewritten as simply:
awk '!/^$/'
or
sed '/^$/d'

Like Ben Jackson and the others said, it removes completely empty lines. Not the ones with one ore more whitespaces, but the zero character long ones. We will never know if this was the intended behaviour.
I'd like to remark, that the code is at least redundant if not even triple redundant depending on what it's used for.
What it does is that it prints the input line to the output if the input line is not the empty line.
Since the standard behaviour of awk is, that the input line is printed if a condition without a following program block is met, this would suffice:
awk '$0!~/^$/' or even shorter awk '$0!=""'
If you could be sure, that no line would be parsed to zero, even a
awk'$0'
could do the trick.

Make it readable first...
echo '$0!~/^$/{print $0}' | a2p
==>
$, = ' ';
$\ = "\n";
while (<>) {
chomp;
if ($_ !~ /^$/) {
print $_;
}
}
And the interpret. In this case, don't print empty lines.

Related

awk command works, but not in openwrt's awk

Works here: 'awk.js.org/`
but not in openwrt's awk, which returns the error message:
awk: bad regex '^(server=|address=)[': Missing ']'
Hello everyone!
I'm trying to use an awk command I wrote which is:
'!/^(server=|address=)[/][[:alnum:]][[:alnum:]-.]+([/]|[/]#)$|^#|^\s*$/ {count++}; END {print count+0}'
Which counts invalid lines in a dns blocklist (oisd in this case):
Input would be eg:
server=/0--foodwarez.da.ru/anyaddress.1.1.1
serverspellerror=/0-000.store/
server=/0-24bpautomentes.hu/
server=/0-29.com/
server=/0-day.us/
server=/0.0.0remote.cryptopool.eu/
server=/0.0mail6.xmrminingpro.com/
server=/0.0xun.cryptopool.space/
Output for this should be "2" since there are two lines that don't match the criteria (correctly formed address, comments, or blank lines).
I've tried formatting the command every which way with [], but can't find anything that works. Does anyone have an idea what format/syntax/option needs adjusting?
Thanks!
To portably include - in a bracket expression it has to be the first or last character, otherwise it means a range, and \s is shorthand for [[:space:]] in only some awks. This will work in any POSIX awk:
$ awk '!/^(server=|address=)[/][[:alnum:]][[:alnum:].-]+([/]|[/]#)$|^#|^[[:space:]]*$/ {count++}; END {print count+0}' file
2
Per #tripleee's comment below if your awk is broken such that a / inside a bracket expression isn't treated as literal then you may need this instead:
$ awk '!/^(server=|address=)\/[[:alnum:]][[:alnum:].-]+(\/|\/#)$|^#|^[[:space:]]*$/ {count++}; END {print count+0}' file
2
but get a new awk, e.g. GNU awk, as who knows what other surprises the one you're using may have in store for you!
'!/^(server=|address=)[/][[:alnum:]][[:alnum:]-.]+([/]|[/]#)$|^#|^\s*$/ {count++}; END {print count+0}'
- has special meaning inside [ and ], it is used to denote range e.g. [A-Z] means uppercase ASCII letter, use \ escape sequence to make it literal dash, let file.txt content be
server=/0--foodwarez.da.ru/anyaddress.1.1.1
serverspellerror=/0-000.store/
server=/0-24bpautomentes.hu/
server=/0-29.com/
server=/0-day.us/
server=/0.0.0remote.cryptopool.eu/
server=/0.0mail6.xmrminingpro.com/
server=/0.0xun.cryptopool.space/
then
awk '!/^(server=|address=)[/][[:alnum:]][[:alnum:]\-.]+([/]|[/]#)$|^#|^\s*$/ {count++}; END {print count+0}' file.txt
gives output
2
You might also consider replacing \s using [[:space:]] in order to main consistency.
(tested in GNU Awk 5.0.1)

How to extract (First match)text between two words

I have a file having the following structure
destination list
move from station d-435-435 to point place1
move from station d-435-435 to point place2
move from mainpoint
I want to extract the word "d-435-435"(Only the first match, this need not be same value always) in between the words "from station" and "to point"
How can I achieve this?
What I have tried so far?
id=$(sed 's/.*from station \(.*\) to.*/\1/' input.txt)
But this returns the following value: destination list d-435-435 move from mainpoint
1st solution: With your shown samples, please try following GNU awk code. Using match function of awk program here to match regex rom station\s+\S+\s+to point to get requested value by OP then removing from station\s+ and \s+to point from matched value and printing required value.
awk '
match($0,/from station\s+\S+\s+to point/){
val=substr($0,RSTART,RLENGTH)
gsub(/from station\s+|\s+to point/,"",val)
print val
exit
}
' Input_file
2nd solution: Using GNU grep please try following. Using -oP option to print matched portion and enabling PCRE regex respectively here. Then in main grep program matching string from station followed by space(s) then using \K option will make sure matched part before \K is forgotten(since e don't need this in output), Then matching \S+(non space values) followed by space(s) to point string(using positive look ahead here to make sure it only checks its present or not but doesn't print that).
grep -oP -m1 'from station\s+\K\S+(?=\s+to point)' Input_file
If GNU sed is available, how about:
id=$(sed -nE '0,/from station.*to/ s/.*from station (.*) to.*/\1/p' input.txt)
The -n option suppress the print unless the substitution succeeds.
The condition 0,/pattern/ is a flip-flop operator and it returns false
after the pattern match succeeds. The 0 address is a GNU sed extension which
makes the 1st line to match against the pattern.
With awk you can write the before and after conditions of
field $4, where d-435-435 is, and then print this field only the first match and exit with exit after print statement:
awk '$2=="from" && $3=="station" && $5=="to" && $6=="point" {print $4; exit}' file
d-435-435
or using GNU awk for the 3rd arg to match():
awk 'match($0,/from station\s+(.*)\s+to point/,a){print a[1];exit}' file
d-435-435
The regexp contains a parenthesis, so the integer-indexed element of array a[1] contain the portion of string between from station followed by space(s) \s+ and space(s) \s+ followed byto point.
This might work for you (GNU sed):
sed -nE '/.*station (\S+) to point.*/{s//\1/;H;x;/\n(\S+)\n.*\1/{s/\n\S+$//;x;d};x;p}' file
Turn off implicit printing and on extended regexps command line options -nE.
If a line matches the required criteria, extract the required string, append a copy to the hold space, check if the match has already been seen and if not print it. If the match has been seen, remove it from the hold space.
Otherwise, do not print anything.
This should work in any sed:
sed -e '/.*from station \([^ ]*\) to .*/!d' -e 's//\1/' -e q file

Separate single line data into multiple lines via regex with awk?

Have a 40mb single line file where there is no fixed width or delimiting character. But each record starts with a P followed by either a P or S and then a number. So might be like:
PP5
-or-
PS5
Or PP0 , etc.
What's the best way to separate this out?
$ echo PP5xxxPS5yyyyPP0zzz | awk -F'P[PS][0-9]' -v OFS='\n' '{$1=$1}1'
xxx
yyyy
zzz
since starts with the delimiter there is a blank first line, which can be eliminated if important.
If you want to preserve the delimiters, perhaps easier with sed
$ echo PP5xxxPS5yyyyPP0zzz | sed 's/P[PS][0-9]/\n&/g'
PP5xxx
PS5yyyy
PP0zzz
Borrowing #karakfa's sample input, this might be what you want (using GNU awk for multi-char RS and RT):
$ echo 'PP5xxxPS5yyyyPP0zzz' | awk -v RS='P[PS][0-9]|\n' 'NR>1{print pRT $0} {pRT=RT}'
PP5xxx
PS5yyyy
PP0zzz
The differences between that gawk solution and the sed solution #karakfa suggested are:
The sed solution will print a blank line at the start of the output while the above won't, and
The sed solution will read the whole input line into memory at once while the above will only read one RS-separated block into memory at a time. That would only matter if your input was too huge to fit in memory all at once.
The sed script is portable to any version of sed that allows \n in the replacement text to mean "newline" and is easily modified to use an escaped literal newline in others while the above requires GNU awk.
the line begins with P then P/S then #. Where a line begins is where one ends so why not use a fixed RS instead of regex one. Maybe
{mawk/mawk2/gawk} 'BEGIN { FS = "^$" ; RS = "\nP" ;
} FNR==1 { sub(/^P/, "") } { print "P" $0 } '
Let RS take the P off, and pad it back in print. Either print+next or single sub() for 1st row case. I prefer a condition that only runs once for FNR==1 than the opposite requiring FNR > 1.
Yes last line technically won't get split by RS. And that's one of awk's known weaknesses - final line will print with ORS the same, with or without a RS at EOF.
I wrote it this way to allow for variants that don't have RT (basically everyone else). RT makes life easy.

Awk - Grep - Match the exact string in a file

I have a file that looks like this
ON,111111,TEN000812,Super,7483747483,767,Free
ON,262762,BOB747474,SuperMan,4347374,676,Free
ON,454644,FRED84848,Super Man,65757,555,Free
I need to match the values in the fourth column exactly as they are written. So if I am searching for "Super" I need it to return the line with "Super" only.
ON,111111,TEN000812,Super,7483747483,767,Free
Likewise, if I'm looking for "Super Man" I need that exact line returned.
ON,454644,FRED84848,Super Man,65757,555,Free
I have tried using grep, but grep will match all instances that contain Super. So if I do this:
grep -i "Super" file.txt
It returns all lines, because they all contain "Super"
ON,111111,TEN000812,Super,7483747483,767,Free
ON,262762,BOB747474,SuperMan,4347374,676,Free
ON,454644,FRED84848,Super Man,65757,555,Free
I have also tired with awk, and I believe I'm close, but when I do:
awk '$4==Super' file.txt
I still get output like this:
ON,111111,TEN000812,Super,7483747483,767,Free
ON,262762,BOB747474,SuperMan,4347374,676,Free
I have been at this for hours, and any help would be greatly appreciated at this point.
You were close, or I should say very close just put field delimiter as comma in your solution and you are all set.
awk 'BEGIN{FS=","} $4=="Super"' Input_file
Also one more thing in OP's attempt while comparison with 4th field with string value, string should be wrapped in "
OR in case you want to mention value to be compared as an awk variable then try following.
awk -v value="Super" 'BEGIN{FS=","} $4==value' Input_file
You are quite close actually, you can try :
awk -F, '$4=="Super" {print}' file.txt
I find this form easier to grasp. Slightly longer than #RavinderSingh13 though
-F is the field separator, in this case comma
Next you have a condition followed by action
Condition is to check if the fourth field has the string Super
If the string is found, print it

Using SED/AWK to replace letters after a certain position

I have a file with words (1 word per line). I need to censor all letters in the word, except the first five, with a *.
Ex.
Authority -> Autho****
I'm not very sure how to do this.
If you are lucky, all you need is
sed 's/./*/6g' file
When I originally posted this, I believed this to be reasonably portable; but as per #ghoti's comment, it is not.
Perl to the rescue:
perl -pe 'substr($_, 5) =~ s/./*/g' -- file
-p reads the input line by line and prints each line after processing
substr returns a substring of the given string starting at the given position.
s/./*/g replaces any character with an asterisk. The g means the substitution will happen as many times as possible, not just once, so all the characters will be replaced.
In some versions of sed, you can specify which substitution should happen by appending a number to the operation:
sed -e 's/./*/g6'
This will replace all (again, because of g) characters, starting from the 6th position.
Here's a portable solution for sed:
$ echo abcdefghi | sed -e 's/\(.\{5\}\)./\1*/;:x' -e 's/\*[a-z]/**/;t x'
abcde****
Here's how it works:
's/\(.\{5\}\)./\1*/' - preserve the first five characters, replacing the 6th with an asterisk.
':x' - set a "label", which we can branch back to later.
's/\*[a-z]/**/ - ' - substitute the letter following an asterisk with an asterisk.
't x' - if the last substitution succeeded, jump back to the label "x".
This works equally well in GNU and BSD sed.
Of course, adjust the regexes to suit.
Following awk may help you in same.
Solution 1st: awk solution with substr and gensub.
awk '{print substr($0,1,5) gensub(/./,"*","g",substr($0,6))}' Input_file
Solution 2nd:
awk 'NF{len=length($0);if(len>5){i=6;while(i<=len){val=val?val "*":"*";i++};print substr($0,1,5) val};val=i=""}' Input_file
Autho****
EDIT: Adding a non-one liner form of solution too now. Adding explanation with it too now.
awk '
NF{ ##Checking if a line is NON-empty.
len=length($0); ##Taking length of the current line into a variable called len here.
if(len>5){ ##Checking if length of current line is greater than 5 as per OP request. If yes then do following.
i=6; ##creating variable named i whose value is 6 here.
while(i<=len){ ##staring a while loop here which runs from value of variable named i value to till the length of current line.
val=val?val "*":"*"; ##creating variable named val here whose value will be concatenated to its own value, it will add * to its value each time.
i++ ##incrementing variable named i value with 1 each time.
};
print substr($0,1,5) val##printing value of substring from 1st letter to 5th letter and then printing value of variable val here too.
};
val=i="" ##Nullifying values of variable val and i here too.
}
' Input_file ##Mentioning Input_file name here.
Personally I'd just use sed for this (see #triplee's answer) but if you want to do it in awk it'd be:
$ awk '{t=substr($0,1,5); gsub(/./,"*"); print t substr($0,6)}' file
Autho****
or with GNU awk for gensub():
$ awk '{print substr($0,1,5) gensub(/./,"*","g",substr($0,6))}' file
Autho****
It is also possible and quite straightforward with sed:
sed 's/./\*/6;:loop;s/\*[^\*]/\**/;/\*[^\*]/b loop' file_to_censor.txt
output:
explanation:
s/./\*/6 #replace the 6th character of the chain by *
:loop #define an label for the goto
s/\*[^\*]/\**/ #replace * followed by non * char by **
/\*[^\*]/b loop #then loop until it does not exist a * followed by a non * char
Here is a pretty straightforward sed solution (that does not require GNUsed):
sed -e :a -e 's/^\(.....\**\)[^*]/\1*/;ta' filename