Regex match last three integers and one character before that integers - awk

I have been trying this for a lot of time,but my search is failing..I have below test data
mhie0104:x:104:600:Martinescu Horia:/home/scs/gr911/mhie0104:/bin/bash
mlie0105:x:105:600:Martinescu Laurentiu:/home/scs/gr911/mlie0105:/bin/bash
mmie0106:x:106:600:Martinescu Marius:/home/scs/gr911/mmie0106:/bin/bash
mnie0107:x:107:600:Martinescu Nicolae:/home/scs/gr911/mnie0107:/bin/bash
mpiel110:x:110:600:Malinescu Paul:/home/scs/gr911/mpie110:/bin/bash
I am trying to find out users,who has exact three digits at the end ..So below is what i did
awk -F: '$1 ~ /*[a-z]//d{3}/'
My understanding of using above regex is :
"*" at the begining should match any characters
[a-z] it should match any character string just before digits
Finally three digits
I also tried with below variation
awk -F: '$1 ~ /*?//d{3}/'
So what i need from above test data is
mpiel110:x:110:600:Malinescu Paul:/home/scs/gr911/mpie110:/bin/bash

1st solution: If you want to see only last 4 characters of 1st field where 4th last character is NOT digit then you can try following code.
awk -F':' '$1 ~ /[^0-9][0-9]{3}$/' Input_file
Explanation:
Simply making field separator as : for all the line of Input_file.
Then checking condition with 1st field /[^0-9][0-9]{3}$/ if 4 letter from last is anything apart from digit and last 3 are digit then print that line.
2nd solution: In case you want to check if none of characters(from starting of 1st field except last 3 characters) should contain digit and last 3 characters should be digits then try following code.
awk -F':' '
substr($1,1,length($1)-3)!~/[0-9]/ && int(substr($1,length($1)-2))~/^[0-9]{3}$/
' Input_file
Explanation:
First thing first making field separator as : for this awk program.
using substr function of awk to get sub string and using substr($1,1,length($1)-3)!~/[0-9]/ condition I am checking if everything of 1st field apart from last 3 characters is NOT having digit.
Then checking another condition int(substr($1,length($1)-2))~/^[0-9]{3}$/ where last 3 characters are of 3 digits.
If both of the conditions are TRUE then print that line.

You can't use this kind of notation : \d
This is perl type regex.
Solution:
$ awk -F: '$1 ~ /[a-zA-Z][0-9]{3}$/' file
mpiel110:x:110:600:Malinescu Paul:/home/scs/gr911/mpie110:/bin/bash

You can use negative lookbehind in perl
$ perl -F: -ne ' print if $F[0]=~/(?<!\d)\d{3}$/ ' gameiswar.txt
mpiel110:x:110:600:Malinescu Paul:/home/scs/gr911/mpie110:/bin/bash
$

For this particular task, sed might be used as well:
sed '/^[^0-9]*[0-9]\{3\}:/!d' file

Not sure if you can use only digits as the username, but as it is the first field, and the : is present as the delimiter.
Here, ([^:]*[^0-9])? matches optional repetitions of any char except : followed by a char other than 0-9:
awk '/^([^:]*[^0-9])?[0-9]{3}:/' file
If there has to be a leading char a-z
awk '/^[^:]*[a-z][0-9]{3}:/' file
Output
mpiel110:x:110:600:Malinescu Paul:/home/scs/gr911/mpie110:/bin/bash

mawk '!_<NF' FS='^[^:]*[a-z][0-9][0-9][0-9]:'
— or —
gawk '!_<NF' FS='^[^:]*[a-z][0-9]{3}:'
mpiel110:x:110:600:Malinescu Paul:/home/scs/gr911/mpie110:/bin/bash

Related

awk command works, but not in openwrt's awk

Works here: 'awk.js.org/`
but not in openwrt's awk, which returns the error message:
awk: bad regex '^(server=|address=)[': Missing ']'
Hello everyone!
I'm trying to use an awk command I wrote which is:
'!/^(server=|address=)[/][[:alnum:]][[:alnum:]-.]+([/]|[/]#)$|^#|^\s*$/ {count++}; END {print count+0}'
Which counts invalid lines in a dns blocklist (oisd in this case):
Input would be eg:
server=/0--foodwarez.da.ru/anyaddress.1.1.1
serverspellerror=/0-000.store/
server=/0-24bpautomentes.hu/
server=/0-29.com/
server=/0-day.us/
server=/0.0.0remote.cryptopool.eu/
server=/0.0mail6.xmrminingpro.com/
server=/0.0xun.cryptopool.space/
Output for this should be "2" since there are two lines that don't match the criteria (correctly formed address, comments, or blank lines).
I've tried formatting the command every which way with [], but can't find anything that works. Does anyone have an idea what format/syntax/option needs adjusting?
Thanks!
To portably include - in a bracket expression it has to be the first or last character, otherwise it means a range, and \s is shorthand for [[:space:]] in only some awks. This will work in any POSIX awk:
$ awk '!/^(server=|address=)[/][[:alnum:]][[:alnum:].-]+([/]|[/]#)$|^#|^[[:space:]]*$/ {count++}; END {print count+0}' file
2
Per #tripleee's comment below if your awk is broken such that a / inside a bracket expression isn't treated as literal then you may need this instead:
$ awk '!/^(server=|address=)\/[[:alnum:]][[:alnum:].-]+(\/|\/#)$|^#|^[[:space:]]*$/ {count++}; END {print count+0}' file
2
but get a new awk, e.g. GNU awk, as who knows what other surprises the one you're using may have in store for you!
'!/^(server=|address=)[/][[:alnum:]][[:alnum:]-.]+([/]|[/]#)$|^#|^\s*$/ {count++}; END {print count+0}'
- has special meaning inside [ and ], it is used to denote range e.g. [A-Z] means uppercase ASCII letter, use \ escape sequence to make it literal dash, let file.txt content be
server=/0--foodwarez.da.ru/anyaddress.1.1.1
serverspellerror=/0-000.store/
server=/0-24bpautomentes.hu/
server=/0-29.com/
server=/0-day.us/
server=/0.0.0remote.cryptopool.eu/
server=/0.0mail6.xmrminingpro.com/
server=/0.0xun.cryptopool.space/
then
awk '!/^(server=|address=)[/][[:alnum:]][[:alnum:]\-.]+([/]|[/]#)$|^#|^\s*$/ {count++}; END {print count+0}' file.txt
gives output
2
You might also consider replacing \s using [[:space:]] in order to main consistency.
(tested in GNU Awk 5.0.1)

Issue with field separator in AWK script

Having a very large file where two lines shown below and having two fields name and revision having colon delimiter. I need to print only the second column.
sam:7.[0:6]
Ram:8.[6:6]_rev[2:4] h_ack[2:6]
vincent:58
I tried this code:
#!/bin/bash
awk -F: '{print $2}'
7.[0
8.[6
58
Output should be:
7.[0:6]
8.[6:6]_rev[2:4] h_ack[2:6]
58
What went wrong in my code.
The problem in your awk expression is that you are splitting on all :.
Instead, you want to split only on the first : from the start.
$ awk -F'^[^:]+:' '{print $2}' file
The regex pattern matches the start of the string ^, any character other than a :, and finally a :.
If you specify field separator as :, it's normal behavior of awk to output this, ex:
7.[0, because you need the other columns after $2.
cut here, better suits the requirement:
cut -d: -f2- file
Could you please try following.
awk '
match($0,/:.*/){
print substr($0,RSTART+1,RLENGTH-1)
}
' Input_file

Awk multi character field separator containing caret not working as expected

I have tried multiple google searches, but none of the proposed answers are working for my example below. NF should be 3, but I keep getting 1.
# cat a
1^%2^%3
# awk -F^% '{print NF}' a
1
# awk -F'^%' {print NF}' a
1
awk -F "^%" {print NF}' a
1
The -F variable in awk takes a regular expression as its value. So the value ^ is interpreted as a special anchor regex pattern which need to be deprived of its special meaning. So you escape it a with a literal backslash \ character
awk -F'\\^%' '{ print NF }'
from GNU Awk manual for Escape Sequences
The backslash character itself is another character that cannot be included normally; you must write \\ to put one backslash in the string or regexp. Thus, the string whose contents are the two characters " and \ must be written \"\\.
You should escape ^ to remove its special meaning which is getting used as a regex by field separator.Once you escape ^ by doing \\^ it will be treated as a normal/literal character and then ^% will be considered as string and you will get answer as 3.
awk -F'\\^%' '{print NF}' Input_file
Here is one nice SO link which you could take it as an example too for better understanding, it doesn't talk about specifically ^ character but it talks about how to use escape sequence in field separator in awk.
https://stackoverflow.com/a/44072825/5866580

Using SED/AWK to replace letters after a certain position

I have a file with words (1 word per line). I need to censor all letters in the word, except the first five, with a *.
Ex.
Authority -> Autho****
I'm not very sure how to do this.
If you are lucky, all you need is
sed 's/./*/6g' file
When I originally posted this, I believed this to be reasonably portable; but as per #ghoti's comment, it is not.
Perl to the rescue:
perl -pe 'substr($_, 5) =~ s/./*/g' -- file
-p reads the input line by line and prints each line after processing
substr returns a substring of the given string starting at the given position.
s/./*/g replaces any character with an asterisk. The g means the substitution will happen as many times as possible, not just once, so all the characters will be replaced.
In some versions of sed, you can specify which substitution should happen by appending a number to the operation:
sed -e 's/./*/g6'
This will replace all (again, because of g) characters, starting from the 6th position.
Here's a portable solution for sed:
$ echo abcdefghi | sed -e 's/\(.\{5\}\)./\1*/;:x' -e 's/\*[a-z]/**/;t x'
abcde****
Here's how it works:
's/\(.\{5\}\)./\1*/' - preserve the first five characters, replacing the 6th with an asterisk.
':x' - set a "label", which we can branch back to later.
's/\*[a-z]/**/ - ' - substitute the letter following an asterisk with an asterisk.
't x' - if the last substitution succeeded, jump back to the label "x".
This works equally well in GNU and BSD sed.
Of course, adjust the regexes to suit.
Following awk may help you in same.
Solution 1st: awk solution with substr and gensub.
awk '{print substr($0,1,5) gensub(/./,"*","g",substr($0,6))}' Input_file
Solution 2nd:
awk 'NF{len=length($0);if(len>5){i=6;while(i<=len){val=val?val "*":"*";i++};print substr($0,1,5) val};val=i=""}' Input_file
Autho****
EDIT: Adding a non-one liner form of solution too now. Adding explanation with it too now.
awk '
NF{ ##Checking if a line is NON-empty.
len=length($0); ##Taking length of the current line into a variable called len here.
if(len>5){ ##Checking if length of current line is greater than 5 as per OP request. If yes then do following.
i=6; ##creating variable named i whose value is 6 here.
while(i<=len){ ##staring a while loop here which runs from value of variable named i value to till the length of current line.
val=val?val "*":"*"; ##creating variable named val here whose value will be concatenated to its own value, it will add * to its value each time.
i++ ##incrementing variable named i value with 1 each time.
};
print substr($0,1,5) val##printing value of substring from 1st letter to 5th letter and then printing value of variable val here too.
};
val=i="" ##Nullifying values of variable val and i here too.
}
' Input_file ##Mentioning Input_file name here.
Personally I'd just use sed for this (see #triplee's answer) but if you want to do it in awk it'd be:
$ awk '{t=substr($0,1,5); gsub(/./,"*"); print t substr($0,6)}' file
Autho****
or with GNU awk for gensub():
$ awk '{print substr($0,1,5) gensub(/./,"*","g",substr($0,6))}' file
Autho****
It is also possible and quite straightforward with sed:
sed 's/./\*/6;:loop;s/\*[^\*]/\**/;/\*[^\*]/b loop' file_to_censor.txt
output:
explanation:
s/./\*/6 #replace the 6th character of the chain by *
:loop #define an label for the goto
s/\*[^\*]/\**/ #replace * followed by non * char by **
/\*[^\*]/b loop #then loop until it does not exist a * followed by a non * char
Here is a pretty straightforward sed solution (that does not require GNUsed):
sed -e :a -e 's/^\(.....\**\)[^*]/\1*/;ta' filename

How do I tell awk to use = as a separator (with white spaces removed too)

Suppose I have the following file.
John=good
Tom = ok
Tim = excellent
I know the following let's me use = as a separator.
awk -F= '{print $1,$2}' file
This gives me the following results.
John good
Tom ok
Tim excellent
I would like the white spaces to be ignored, so that only the names and their performances are printed out.
One way to get around this is run another awk on the results.
awk -F= '{print$1,$2}' file | awk '{print $1,$2}'
But I wanted to know if I could do this in one awk?
Include them in the separator definition; it's a regexp.
jinx:1654 Z$ awk -F' *= *' '{print $1, $2}' foo.ds
John good
Tom ok
Tim excellent
The FS variable can be set to a regular expression.
From the AWK manual
The following table summarizes how fields are split, based on the value of FS.
FS == " "
Fields are separated by runs of whitespace. Leading and trailing whitespace are ignored. This is the default.
FS == any single character
Fields are separated by each occurrence of the character. Multiple successive occurrences delimit empty fields, as do leading and trailing occurrences.
FS == regexp
Fields are separated by occurrences of characters that match regexp. Leading and trailing matches of regexp delimit empty fields.