Delete a line that contain an occurence in the first or second column - awk

I would like to del a line that contain the occurence in the first or second column (separator \t). For exemple :
line 1 uni:1 uni:2 blabla blabla
line 2 uni:3 EBI:1 blbla blabla
I Want to delete the line2. The "blabla" text can contain the occurence (EBI) but I don't want to select by the rest of the text, just with the two first column.
I try : awk -F "\t" '{print $1 $2}' file1 |grep -v EBI > file2
but this will stock just the first and second column and not the entire line.
I try this too : awk -F "\t" '{print $1 $2}'file1 |grep -n EBI
and sed "numberOfLined" file1 >file2
But I have a lot of occurences so I don't want to write all numbers of lines by hand.

You can use if statement and regex matching via ~:
awk -F '\t' '{if (! (($1 ~ ".*EBI.*") || ($2 ~ ".*EBI.*"))) {print $0} }'
And thanks to comments, it could looks even better:
awk '!($1~/EBI/ || $2~/EBI/)'

Related

awk conditional statement based on a value between colon

I was just introduced to awk and I'm trying to retrieve rows from my file based on the value on column 10.
I need to filter the data based on the value of the third value if ":" was used as a separator in column 10 (last column).
Here is an example data in column 10. 0/1:1,9:10:15:337,0,15.
I was able to extract the third value using this command awk '{print $10}' file.txt | awk -F ":" '/1/ {print $3}'
This returns the value 10 but how can I return other rows (not just the value in column 10) if this third value is less than or greater than a specific number?
I tried this awk '{if($10 -F ":" "/1/ ($3<10))" print $0;}' file.txt but it returns a syntax error.
Thanks!
Your code:
awk '{print $10}' file.txt | awk -F ":" '/1/ {print $3}'
should be just 1 awk script:
awk '$10 ~ /1/ { split($10,f,/:/); print f[3] }' file.txt
but I'm not sure that code is doing what you think it does. If you want to print the 3rd value of all $10s that contain :s, as it sounds like from your text, that'd be:
awk 'split($10,f,/:/) > 1 { print f[3] }' file.txt
and to print the rows where that value is less than 7 would be:
awk '(split($10,f,/:/) > 1) && (f[3] < 7)' file.txt

How to exclude lines matching a regex pattern in a column by awk?

I want to exclude lines containing a specific string.
header
1:test
2:test
3:none
4:test
Why don't these commands work?
awk -F: 'FNR>1 {$0 !~ /none/} {print $1}' 1.txt
awk -F: 'FNR>1 {$2 !~ /none/} {print $1}' 1.txt
but this works:
awk '$0 !~ /none/ {print $0}' 1.txt
I intend to get
1
2
4
You need to provide the regex test as condition, not as action, and may use
awk -F: 'FNR>1 && !/none/{print $1}' file
awk -F: 'FNR>1 && $2 !~ /none/{print $1}' file
See an awk online demo
Details
-F: - sets the field separator to a colon
FNR>1 && !/none/ - if number of processed records for current file is more than 1 and there is no none on the line (if $2 !~ /none/ is used, returns true if Field 2 does not contain none pattern)
{print $1} - print Field 1 value.

awk to extract data from a file, file name is dump below

I am using awk to extract ip address but I get lot of white spaces, how can I get rid of the same?
I am using below command
awk -F'f5public_ip =' '{print $2}' examples/aws/dump >> some.txt
shitole$ cat some.txt
54.83.174.153
test that $2 is not empty:
awk -F'f5public_ip =' '$2 != "" {print $2}'
$2 might contain blanks, in that case test for non-blank characters
awk -F'f5public_ip =' '$2 ~ /[^[:blank:]]/ {print $2}'

how to get the common rows according to the first column in awk

I have two ',' separated files as follow:
file1:
A,inf
B,inf
C,0.135802
D,72.6111
E,42.1613
file2:
A,inf
B,inf
C,0.313559
D,189.5
E,38.6735
I want to compare 2 files ans get the common rows based on the 1st column. So, for the mentioned files the out put would look like this:
A,inf,inf
B,inf,inf
C,0.135802,0.313559
D,72.6111,189.5
E,42.1613,38.6735
I am trying to do that in awk and tried this:
awk ' NR == FNR {val[$1]=$2; next} $1 in val {print $1, val[$1], $2}' file1 file2
this code returns this results:
A,inf
B,inf
C,0.135802
D,72.6111
E,42.1613
which is not what I want. do you know how I can improve it?
$ awk 'BEGIN{FS=OFS=","}NR==FNR{a[$1]=$0;next}$1 in a{print a[$1],$2}' file1 file2
A,inf,inf
B,inf,inf
C,0.135802,0.313559
D,72.6111,189.5
E,42.1613,38.6735
Explained:
$ awk '
BEGIN {FS=OFS="," } # set separators
NR==FNR { # first file
a[$1]=$0 # hash to a, $1 as index
next # next record
}
$1 in a { # second file, if $1 in a
print a[$1],$2 # print indexed record from a with $2
}' file1 file2
Your awk code basically works, you are just missing to tell awk to use , as the field delimiter. You can do it by adding BEGIN{FS=OFS=","} to the beginning of the script.
But having that the files are sorted like in the examples in your question, you can simply use the join command:
join -t, file1 file2
This will join the files based on the first column. -t, tells join that columns are separated by commas.
If the files are not sorted, you can sort them on the fly like this:
join -t, <(sort file1) <(sort file2)

output the record by comparing string in other file using awk script

I want to output the record having matching string in another file using awk script
file1 code
849002|48|1208004|1
849007|28|1208004|1
855003|48|1208004|1
855004|28|1208004|1
855006|28|1208004|1
file2 code :
00990029000000004804470425|ST1400029|0.550|Recurring|1248073|ST1400029
00990029000000008410517183|IM1450029|1.000|Recurring|855003|ST1400029
009900290000000000007800612988|IM3350029|1.000|Recurring|1248063|ST1400029
Notice that 855003 occurs in the middle row of each sample? That's the match I'm looking for, and the output should be:
00990029000000008410517183|IM1450029|1.000|Recurring|855003|ST1400029
Because I want to search $1 of file1 in $5 in file2, if match found then output the line
I tried this but its returning zero record
awk 'NR==FNR{a[$1]=$1;next}a[$5]{print $0}' file2 file1 > outfile
Your help will resolve my issue, I have to search long list of data
Don't forget to set the delimiter using the -F flag:
awk -F "|" 'FNR==NR { a[$1]; next } $5 in a' file1 file2
Results:
00990029000000008410517183|IM1450029|1.000|Recurring|855003|ST1400029
try this (didn't test)
awk 'NR==FNR{a[$1];next}$5 in a' file1 file2