Print if the field starts with a specific string using awk - awk

I have a list like this in chrs.txt file:
chr22
chr21
chrUn_gl000225
chrUn_gl000222
chrM
the result I want:
chr22
chr21
I want to print lines if they do not include chrM and chrUn. So I want to filter string chrM and strings including chrUn. To do so I tried codes below but I could only filter chrM.
awk '($1 != "chrM" && $1 != "chrUn")' chrs.txt
awk '($1 != "chrM" && $1 != "/^chrUn/")' chrs.txt
awk '($1 != "chrM" && $1 != "chrUn_*")' chrs.txt
If you could help I will appreciate it.

all you need is:
awk '!/^chr(Un|M)/' file

Also grep can do that.
grep -Ev '^chr(Un|M)' file
GNU ed(1)
printf '%s\n' 'v/^chr\(Un\|M\)/p' | ed -s file

In bash, you can do the comparison like this (assuming line contains a single line of the file, i.e. I've excluded the loop over the file's lines here):
if [[ $line != chrUn* ]] && [[ $line != chrM* ]]; then
echo $line
fi;

awk:
awk '$0!~/^chr(Un|M)/'
sed:
sed -rne '/^chr(Un|M)/!p'
grep:
grep -Ev "^chr(Un|M)"
bash:
for LINE in $(<your_file.txt); do [[ $LINE =~ ^chr(Un|M) ]] || echo $LINE ; done
You should use one of the first three option since they are more portable across differente shell.

Related

Regexp in gawk matches multiples ways

I have some text I need to split up to extract the relevant argument, and my [g]awk match command does not behave - I just want to understand why?! (I have written a less elegant way around it now...).
So the string is blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header
I want to output just the contents of msgcontent1=, so did
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" | gawk '{ if (match($0,/msgcontent1=(.*)[|]/,a)) { print a[1]; } }'
Trouble instead of getting
HeaderUUIiewConsenFlagPSMessage
I get the match with everything from there to the last pipe of the string HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002
Now I accept this is because the regexp in /msgcontent1=(.*)[|]/ can match multiple ways, but HOW do I make it match the way I want it to??
With your shown samples please try following. Written and tested in GNU awk this will print only contents from msgcontent1= till | first occurrence.
awk 'match($0,/msgcontent1=[^|]*/){print substr($0,RSTART+12,RLENGTH-12)}' Input_file
OR with echo + awk try:
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" |
awk 'match($0,/msgcontent1=[^|]*/){print substr($0,RSTART+12,RLENGTH-12)}'
With FPAT option in GNU awk:
awk -v FPAT='msgcontent1=[^|]*' '{sub(/.*=/,"",$1);print $1}' Input_file
This is your input:
s='blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header'
You may use gnu awk like this to extract value after msgcontent1=:
awk -F= -v RS='|' '$1 == "msgcontent1" {print $2}' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
or using this sed:
sed -E 's/^(.*\|)?msgcontent1=([^|]+).*/\2/' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
Or using this gnu grep:
grep -oP '(^|\|)msgcontent1=\K[^|]+' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" | awk '{ if (match($0,/msgcontent1=([^\|]*)/,a)) print a[1] }'
this prints HeaderUUIiewConsenFlagPSMessage
The reason your regex match msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002 is that matching is 'hungry' so it allways finds the longest possible match
Also with awk:
echo 'blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header' | awk -v FS='[=|]' '$2 == "msgcontent1" {print $3}'
HeaderUUIiewConsenFlagPSMessage

compare on number of awk result is not correct

I use awk to get the number of fields for multiple files, and then use if [[ ]] to judge whether the number of fields equal to an exact number, if so, then return the file name.
The code is as follows:
for file in /root/TB_MOVIL_CDR/incorrect_files/*
do
num=$(awk -F '|' '{print NF}' $file)
if [[ $num -eq 24 ]];then
echo $file
fi
done
But if found the result is not correct, I am confused,
the syntax of if [[ $num -eq 24 ]] is wrong ?
No need for bash commands when you do use awk, try this:
awk -F'|' 'NF==24 {print FILENAME}' /root/TB_MOVIL_CDR/incorrect_files/*
This will test if number of fields are 24 in each line of each file, and if it finds a line with 24 fields, it will print the file name. NB if there are several files with 24 fields it print filename multiple times. This can be avoided if needed.
You can use this gnu awk to print the file name only once for each file with 24 fields (gnu due to the ENDFILE function):
awk -F'|' 'NF==24 {f=1} ENDFILE {if (f) print FILENAME;f=0}' /root/TB_MOVIL_CDR/incorrect_files/*
A shorter gnu awk that print file name once for each file.
awk -F'|' 'NF==24 {print FILENAME;nextfile}' /root/TB_MOVIL_CDR/incorrect_files/*

Printing only part of next line after matching a pattern

I want to print next sentence after match
My file content like this:
SSID:CoreFragment
Passphrase:WiFi1234
SSID:CoreFragment_5G
Passphrase:WiFi1234
SSID:Aleph_inCar
Passphrase:1234567890
As per my search,e.g. If I found WIFI-3(SSID) than, I want to print 1234ABCD. I used this command to search SSID:
grep -oP '^SSID:\K.+' file_name
After this search I want to print Passphrase of that particular match.
I'm working on Ubuntu 18.04
ssid=$(grep -oP &apos;^SSID:\K.+&apos; list_wifi.txt)
for ssid in $(sudo iwlist wlp2s0 scan | grep ESSID | cut -d &apos;"&apos; -f2)
do
if [ $ssid == $ssid_name ]; then
echo "SSID found...";
fi
done
I want to print next line after match.
another awk
$ awk -F: -v s="$ssid" '$0=="SSID:"s{c=NR+1} c==NR{print $2; exit}' file
1234ABCD
will only print the value if it's on the next line.
awk -F: '/WIFI-3/{getline;print $2; exit}' file
1234ABCD
Robustly (wont fail due to partial matches, etc.) and idiomatically:
$ awk -F':' 'f{print $2; exit} ($1=="SSID") && ($2=="WIFI-3"){f=1}' file
1234ABCD
Please try the following:
ssid="WIFI-3"
passphrase=$(grep -A 1 "^SSID:$ssid" file_name | tail -n 1 | cut -d: -f2)
echo "$passphrase"
which yields:
1234ABCD
Since code tags have changed the look of samples so adding this now.
var=$(awk '/SSID:[a-zA-Z]+-[0-9]+/{flag=1;next} flag{sub(/.*:/,"");value=$0;flag=""} END{print value}' Input_file)
echo "$var"
Could you please try following.
awk '/Passphrase/ && match($0,/WIFI-3 Passphrase:[0-9a-zA-Z]+/){val=substr($0,RSTART,RLENGTH);sub(/.*:/,"",val);print val;val=""}' Input_file
Using Perl
$ export ssid="WIFI-3"
$ perl -0777 -lne ' /SSID:$ENV{ssid}\s*Passphrase:(\S+)/ and print $1 ' yash.txt
1234ABCD
$ export ssid="Aleph_inCar"
$ perl -0777 -lne ' /SSID:$ENV{ssid}\s*Passphrase:(\S+)/ and print $1 ' yash.txt
1234567890
$
$ cat yash.txt
SSID:CoreFragment
Passphrase:WiFi1234
SSID:CoreFragment_5G
Passphrase:WiFi1234
SSID:Aleph_inCar
Passphrase:1234567890
SSID:WIFI-1
Passphrase:1234ABCD
SSID:WIFI-2
Passphrase:123456789
SSID:WIFI-3
Passphrase:1234ABCD
You can capture it in variables as
$ passphrase=$(perl -0777 -lne ' /SSID:$ENV{ssid}\s*Passphrase:(\S+)/ and print $1 ' yash.txt)
$ echo $passphrase
1234567890
$

awk case insensitive and boolean operator

I would like to grep lines that contains both patterns in any order and I'm using
awk '/pattern1/ && /pattern2/' file.txt
but if I want to do case insensitive search, adding /i works only if I add it to pattern2.
awk '/pattern1/ && /pattern2/i' file.txt ...works
awk '/pattern1/i && /pattern2/i' file.txt ...don't, outputs the whole file
anyone know how to solve this?
Try:
awk '{s=tolower($0)} s~/lowercase_pattern1/ && s~/lowercase_pattern2/' file
There is also the possibility of the IGNORECASE option in GNU awk..
You could also most of the time do something like this:
grep -Ei 'pattern1.*pattern2|pattern2.*pattern1' file
You could also use grep:
grep -i "pattern1" file.txt | grep -i "pattern2.txt"
Though it won't be as efficient as it uses two passes to find the lines.
You could use sed which would do it in one pass:
sed '/pattern1/I!d;/pattern2/I!d/' file.txt
awk '/pattern1/ && /pattern2/i' file.txt will ignore the pattern /pattern2/
if you want to ignore the case totally add IGNORECASE = 1.
ie.
awk 'BEGIN {IGNORECASE = 1} /pattern1/ && /pattern2/' file.txt
just change your command to :
awk 'tolower($0)~/pattern1/ && tolower($0)~/pattern2/' your_file
alternatively ,you can use perl:
perl -lne 'print if(/pattern1/i and /pattern2/i)' your_file

awk capability cut capability

I am using the following ssh command to get a list of ids. Now I want to
get only ids greater than a given number in the list of ids; let's say "231219" in this case. How can I incorporate that?
I have a local file "ids_ignore.txt"; anyid we put in this list should be ignored by the command..
Can awk or cut do the above?
ssh -p 29418 company.com gerrit query --commit-message --files --current-patch-set \
status:open project:platform/code branch:master |
grep refs | cut -f4 -d'/'
OUTPUT:-
231222
231221
231220
231219
230084
229092
228673
228635
227877
227759
226138
226118
225817
225815
225246
223554
223527
223452
223447
226137
... | awk '$1 > max' max=8888 | grep -v -F -f ids_ignore.txt
Or, if you want to do it all with awk:
... | awk 'NR==FNR{ no[$1]++ }
NR!=FNR && $1 > max && ! no[$1]' max=NNN ids_ignore.txt -
cut cannot do numeric comparison on the input fields, it's just a simple field extraction tool. awk can do the work of grep and cut:
ssh -p 29418 company.com gerrit ... |
awk -F/ -v min=231219 '
NR == FNR {ignore[$1]; next}
/refs/ && $4>min && !($4 in ignore) {print $4}
' ids_ignore.txt -
The trailing - is important at the end of the awk command: it tells awk to read from stdin after it reads the ids_ignore file.