Deleting a line which has a certein condition by using awk - awk

I am trying to delete specific line in a gro file.
I want to delete the lines that satisfy the if conditions.The awk code that I am currenly using is this;
cat doubled_system.gro | awk '{if ($2 ~ /^NA/ && $5 > 23) print $0}' > new.gro
So far i manage the line that I don't want in 'new.gro' file.
I was wondering how can combine sed or grep with this if condition so that I can delete the line that I dont want in doubled_system.gro file?

manage the line that i dont want(...)delete the line that i dont want
If you have negative of desired output you need simply to negate condition, as you have used logical AND then it is valid target for de Morgan rule, therefore after negating
$2 ~ /^NA/ && $5 > 23
we do get
$2 !~ /^NA/ || $5 <= 23
your code does also needlessly use if and cat, as GNU AWK can read file by itself, after fixing that your code might look as follows
awk '($2 !~ /^NA/ || $5 <= 23){print $0}' doubled_system.gro > new.gro
which will save output without undesired line(s) into new.gro, after you check that it is giving correct result for all allowed input values, then you might use -i inplace to place file in place like so
awk -i inplace '($2 !~ /^NA/ || $5 <= 23){print $0}' doubled_system.gro

Related

Chain awk regex matches like grep

I am trying to use awk to select/remove data based on cell entries in a CSV file.
How do I chain Awk commands to build up complex searches like I have done with grep? I plan to use Awk to select rows based on matching criteria in cells in multiple columns, not just the first column as in this example.
Test data
123,line1
123a,line2
abc,line3
G-123,line4
G-123a,line5
Separate Awk statements with intermediate files
awk '$1 !~ /^[[:digit:]]/ {print $0}' file.txt > output1.txt
awk '$1 !~ /^G-[[:digit:]]/ {print $0}' output1.txt > output2.txt
mv output2.txt output.txt
cat output.txt
Chained or multi-line grep version (I think limited to first column only)
grep -v \
-e "^[[:digit:]]" \
-e "^G-[[:digit:]]" \
file.txt > output.txt
cat output.txt
How can I rewrite the Awk command to avoid the intermediate files?
Generally, in awk there are boolean operators available (it's better than grep! :) )
awk '/match1/ || /match2/' file
awk '(/match1/ || /match2/ ) && /match3/' file
and so on ...
In your example you could use something like:
awk -F, '$1 ~ /^[[:digit:]]/ || $1 ~ /G-[[:digit:]]/' input >> output
Note: This is just an example of how to use boolean operators. Also the regular expression itself could have been used here to express the alternative match:
awk -F, '$1 ~ /^(G-)?[[:digit:]]/' input >> ouput
In your awk commands and example, awk regards file.txt as having only one field because you have not defined FS, so the default whitespace field separator is used.
With that said, you can easily AND your two pattern matches together like this:
awk '($1 !~ /^[[:digit:]]/) && ($1 !~ /^G-[[:digit:]]/) {print $0}' file.txt
To make awk use comma as a field separator, you can define it in a BEGIN block. In this example, the output should be just line3
awk 'BEGIN {FS=","} ($1 !~ /^[[:digit:]]/) && ($1 !~ /^G-[[:digit:]]/) {print $2}' file.txt
I would suggest the literal translation of that grep command in awk is
awk '
/^[[:digit:]]/ {next}
/^G-[[:digit:]]/ {next}
{print}
' file.txt
But you have several examples of how to write it more concisely.
You can use
awk '$1 !~ /^(G-)?[[:digit:]]/' file.txt > output.txt
The awk tries to find in Field 1:
^ - start of string
(G-)? - an optional G- char sequence (note the regex flavor in awk is POSIX ERE, so (...) denotes a capturing group and ? denotes a one or zero times quantifier)
[[:digit:]] - a digit.
If the match is found, the record (=line) is not printed. Else, the line is printed.
to stick to your question, I would use:
awk '$1 !~ /^[[:digit:]]/ && $1 !~ /G-[[:digit:]]/' file.txt > output.txt
But I like the #Wiktor Stribiżew REGEX approach!
With your shown samples, this could be also done in grep in a single regexp, we need not to chain the different regex, adding this solution in case you/anyone need it; could be helpful.
grep -v -E '^(G-)?[[:digit:]]' Input_file
Explanation: Simple explanation would be, using grep's -v option to omit lines which are matching the mentioned pattern. Then using -E option of it to enable ERE(extended regular expressions). In main program using regex ^(G-)?[[:digit:]] to match if line starts from G- OR digit then don't print that line.

Awk join to handle empty file

I have 2 files of below format
a.txt
-----
1~a
2~b
b.txt
-----
1~one~A
2~two~B
3~three~C
if i run the below awk command, i get the correct output
awk -F'~' -v OFS="~" 'NR==FNR{a[$1] = $1; b[$1] = $2; next}{if(a[$1] == $1){$2="Matched"} else {$2="No Match"}}1' a.txt b.txt
output
1~one~Matched
2~two~Matched
3~three~No Match
Problem is when my a.txt is empty. The above command does'nt output anything in that case. How do i update the above awk command so that i get the below output.
1~one~No Match
2~two~No Match
3~three~No Match
See if ARGIND is supported in GNU awk 3.1.7 version:
$ awk 'BEGIN{FS=OFS="~"} ARGIND==1{a[$1]; next}
{$3 = ($1 in a) ? "Matched" : "No Match"} 1' f1 f2
1~one~Matched
2~two~Matched
3~three~No Match
$ awk 'BEGIN{FS=OFS="~"} ARGIND==1{a[$1]; next}
{$3 = ($1 in a) ? "Matched" : "No Match"} 1' /dev/null f2
1~one~No Match
2~two~No Match
3~three~No Match
Since you need to compare only first field, single array with field content as the key is enough. No need to save value, you can check for the presence of a key using in.
Alternate solution that should work with any awk, based on this answer:
awk 'BEGIN{FS=OFS="~"} !second_file{a[$1]; next}
{$3 = ($1 in a) ? "Matched" : "No Match"} 1' f1 second_file=1 f2
Here, second_file is a flag that is set between the two files.
With your shown samples, please try following once, written and tested in GNU awk 5.0.1. Tested with both a.txt empty OR non-empty scenarios and it worked fine. Simply changing FNR==NR condition(which will be true for first Input_file always) to ARGIND==1 so in case first file is empty then it will directly go to if/else statements and will print no match found statements there.
awk -F'~' -v OFS="~" 'ARGIND==1{a[$1] = $1; b[$1] = $2; next} {if(a[$1] == $1){$2="Matched"} else {$2="No Match"}}1' a.txt b.txt
Detailed explanation of code's working:
If we see man awk we get ARGIND The index in ARGV of the current file being processed. it will basically give you index number of passed arguments/files so for first file it will be 1, that's why if first file is empty(a.txt) then block of if/else conditions coming after it will be executed, because in FNR==NR condition was never getting satisfied(because of NULL value of FNR since file is empty so no line number for it) and statements inside it were never getting executed, hence no results with your command.

How do I obtain a specific row with the cut command?

Background
I have a file, named yeet.d, that looks like this
JET_FUEL = /steel/beams
ABC_DEF = /michael/jackson
....50 rows later....
SHIA_LEBEOUF = /just/do/it
....73 rows later....
GIVE_FOOD = /very/hungry
NEVER_GONNA = /give/you/up
I am familiar with the f and d options of the cut command. The f option allows you to specify which column(s) to extract from, while the d option allows you to specify what the delimiters.
Problem
I want this output returned using the cut command.
/just/do/it
From what I know, this is part of the command I want to enter:
cut -f1 -d= yeet.d
Given that I want the values to the right of the equals sign, with the equals sign as the delimiter. However this would return:
/steel/beams
/michael/jackson
....50 rows later....
/just/do/it
....73 rows later....
/very/hungry
/give/you/up
Which is more than what I want.
Question
How do I use the cut command to return only /just/do/it and nothing else from the situation above? This is different from How to get second last field from a cut command because I want to select a row within a large file, not just near from the end or the beginning.
This looks like it would be easier to express with awk...
# awk -v _s="${_string}" '$3 == _s {print $3}' "${_path}"
## Above could be more _scriptable_ form of bellow example
awk -v _search="/just/do/it" '$3 == _search {print $3}' <<'EOF'
JET_FULE = /steal/beams
SHIA_LEBEOUF = /just/do/it
NEVER_GONNA = /give/you/up
EOF
## Either way, output should be similar to
## /just/do/it
-v _something="Some Thing" bit allows for passing Bash variables to awk
$3 == _search bit tells awk to match only when column 3 is equal to the search string
To search for a sub-string within a line one can use $0 ~ _search
{print $3} bit tells awk to print column 3 for any matches
And the <<'EOF' bit tells Bash to not expand anything within the opening and closing EOF tags
... however, the above will still output duplicate matches, eg. if yeet.d somehow contained...
JET_FULE = /steal/beams
SHIA_LEBEOUF = /just/do/it
NEVER_GONNA = /give/you/up
AGAIN = /just/do/it
... there'd be two /just/do/it lines outputed by awk.
Quickest way around that would be to pipe | to head -1, but the better way would be to tell awk to exit after it's been told to print...
_string='/just/do/it'
_path='yeet.d'
awk -v _s="${_string}" '$3 == _s {print $3; exit}' "${_path}"
... though that now assumes that only the first match is wanted, obtaining the nth is possible though currently outside the scope of the question as of last time read.
Updates
To trip awk on the first column while printing the third column and exiting after the first match may look like...
_string='SHIA_LEBEOUF'
_path='yeet.d'
awk -v _s="${_string}" '$1 == _s {print $3; exit}' "${_path}"
... and generalize even further...
_string='^SHIA_LEBEOUF '
_path='yeet.d'
awk -v _s="${_string}" '$0 ~ _s {print $3; exit}' "${_path}"
... because awk totally gets regular expressions, mostly.
It depends on how you want to identify the desired line.
You could identify it by the line number. In this case you can use sed
cut -f2 -d= yeet.d | sed '53q;d'
This extracts the 53th line.
Or you could identify it by a keyword. In this case use grep
cut -f2 -d= yeet.d | grep just
This extracts all lines containing the word just.

Awk concise way to check if 2 fields match the same string

I want check if 2 fields match a specific pattern, this is what I have so far:
openstack floating ip list -f value | awk '$3 ~ /None/ && $4 ~ /None/{print $2}'
It prints the second field if the 3rd and 4rd field equal "None" which is what i want, but it seems inefficient.
Is there a more concise way to do this with awk?
If you want to check for equality you should use the == (or !=) operator which is more efficient because it doesn't need to perform a regular expression match:
awk '$3 == "None" && $4 == "None" {print $2}'
The rest looks good.
You could do:
awk '$3$4~/None{2}/{print $2}'
but that would produce a false match if $3 contained "NoneNone" and $4 was empty, etc. so YMMV wrt what you data actually contains. You could tweak it to:
awk '$3" "$4" " ~ /(None ){2}/{print $2}'
but it's getting kinda obscure now. The only sensible way to improve your script is to just not use the same hard-coded value multiple times:
awk -v n="None" '$3==n && $4==n{print $2}'

Removing content of a column based on number of occurences

I have a file (; seperated) with data like this
111111121;000-000.1;000-000.2
111111211;000-000.1;000-000.2
111112111;000-000.1;000-000.2
111121111;000-000.1;000-000.2
111211111;000-000.1;000-000.2
112111111;000-000.1;000-000.2
121111112;000-000.2;020-000.8
121111121;000-000.2;020-000.8
121111211;000-000.2;020-000.8
121113111;000-000.3;000-200.2
211111121;000-000.1;000-000.2
I would like to remove any $3 that has less than 3 occurences, so the outcome would be like
111111121;000-000.1;000-000.2
111111211;000-000.1;000-000.2
111112111;000-000.1;000-000.2
111121111;000-000.1;000-000.2
111211111;000-000.1;000-000.2
112111111;000-000.1;000-000.2
121111112;000-000.2;020-000.8
121111121;000-000.2;020-000.8
121111211;000-000.2;020-000.8
121113111;000-000.3
211111121;000-000.1;000-000.2
That is, only $3 got deleted, as it had only a single occurence
Sadly I am not really sure if (thus how) this could be done relatively easily (as doing the =COUNT.IF matching, and manuel delete in Excel feels quite embarrassing)
$ awk -F';' 'NR==FNR{cnt[$3]++;next} cnt[$3]<3{sub(/;[^;]+$/,"")} 1' file file
111111121;000-000.1;000-000.2
111111211;000-000.1;000-000.2
111112111;000-000.1;000-000.2
111121111;000-000.1;000-000.2
111211111;000-000.1;000-000.2
112111111;000-000.1;000-000.2
121111112;000-000.2;020-000.8
121111121;000-000.2;020-000.8
121111211;000-000.2;020-000.8
121113111;000-000.3
211111121;000-000.1;000-000.2
or if you prefer:
$ awk -F';' 'NR==FNR{cnt[$3]++;next} {print (cnt[$3]<3 ? $1 FS $2 : $0)}' file file
this awk one-liner can help, it processes the file twice:
awk -F';' 'NR==FNR{a[$3]++;next}a[$3]<3{NF--}7' file file
Though that awk solutions are the best in terms of performance, your goal could be also achieved with something like this:
while IFS=" " read a b;do
if [[ "$a" -lt "3" ]];then
sed -i "s/$b//" b.txt
fi
done <<<"$(cut -d";" -f3 b.txt |sort |uniq -c)"
Operation is based on the output of cut which counts occurrences.
$cut -d";" -f3 b.txt |sort |uniq -c
7 000-000.2
1 000-200.2
3 020-000.8
Above works for editing source file in place, so keep a back up for testing.
You can feed the file twice to awk. On the first run you gather a statistic that you use in the second run:
script.awk
FNR == NR { stats[ $3 ]++
next
}
{ if( stats[$3] < 3) print $1 $2
else print
}
Run it like this: awk -F\; -f script.awk yourfile yourfile .
The condition FNR == NR is true during processing of the first filename given to awk. The next statement skips the second block.
Thus the second block is only used for processing the second filename given to awk (which is here the same as the first filename).