I am trying to implement this logic with Awk:
If $3 is in $1 then replace the “$3 part of $1 and 1 blank space” with “” (blanks).
Print this new line and all other lines.
e.g. In my output (below) “Paris” in field $3 is found in field $1. So “Paris “ is replaced by “” in field $1.
INPUT FILE
field1|field2|field3
abc Paris Match|xxxx|Paris
aaaaa|yyyyy|London
OUTPUT NEEDED
field1|field2|field3
abc Match|xxxx|Paris
aaaaa|yyyyy|London
CODE TRIED (Not working)
awk ' BEGIN{ FS=OFS="|"}
{
{ if ( $1 ~ /$3/ ) { print $0 } }
} '
$ awk 'BEGIN{FS=OFS="|"} s=index($1,$3){$1=substr($1,1,s-1) substr($1,s+length($3)); gsub(/ +/," ",$1)} 1' file
field1|field2|field3
abc Match|xxxx|Paris
aaaaa|yyyyy|London
The above assumes you don't care if any existing sequences of multiple blanks in $1 are also compressed to single blanks.
here is one approach
$ awk 'BEGIN {FS=OFS="|"}
{sub($3,"",$1)}1' file
field1|field2|field3
abc Match|xxxx|Paris
aaaaa|yyyyy|London
note that there are two spaces between remaining words.
Related
I try to change in a file some word by others using sed or awk.
My initial fileA as this format:
>Genus_species|NP_001006347.1|transcript-60_2900.p1:1-843
I have a second fileB with the correspondences like this:
NP_001006347.1 GeneA
XP_003643123.1 GeneB
I am trying to substitute in FileA the name to get this ouput:
>Genus_species|GeneA|transcript-60_2900.p1:1-843
I was thinking to use awk or sed, to do something like
's/$patternA/$patternB/' with a while read l but how to indicate which pattern 1 and 2 are in the fileB? I tried also this but not working.
sed "$(sed 's/^\([^ ]*\) \(.*\)$/s#\1#\2#g/' fileB)" fileA
Awk may be able to do the job more easily?
Thanks
It is easier to this in awk:
awk -v OFS='|' 'NR == FNR {
map[$1] = $2
next
}
{
for (i=1; i<=NF; ++i)
$i in map && $i = map[$i]
} 1' file2 FS='|' file1
>Genus_species|GeneA|transcript-60_2900.p1:1-843
Written and tested with your shown samples, considering that you have only one entry for NP_digits.digits in your Input_fileA then you could try following too.
awk '
FNR==NR{
arr[$1]=$2
next
}
match($0,/NP_[0-9]+\.[0-9]+/) && ((val=substr($0,RSTART,RLENGTH)) in arr){
$0=substr($0,1,RSTART-1) arr[val] substr($0,RSTART+RLENGTH)
}
1
' Input_fileB Input_fileA
Using awk
awk -F [\|" "] 'NR==FNR { arr[$1]=$2;next } NR!=FNR { OFS="|";$2=arr[$2] }1' fileB fileA
Set the field delimiter to space or |. Process fileB first (NR==FNR) Create an array called arr with the first space delimited field as the index and the second the value. Then for the second file (NR != FNR), check for an entry for the second field in the arr array and if there is an entry, change the second field for the value in the array and print the lines with short hand 1
You are looking for the join command which can be used like this:
join -11 -22 -t'|' <(tr ' ' '|' < fileB | sort -t'|' -k1) <(sort -t'|' -k2 fileA)
This performs a join on column 1 of fileB with column 2 of fileA. The tr was used such that fileB also uses | as delimiter because join requires it to be equal on both files.
Note that the output columns are not in the order you specified. You can swap by piping the output into awk.
I have a dataset with 1000 rows and 10 columns. Here is the sample dataset
A,B,C,D,E,F,
a,b,c,d,e,f,
g,h,i,j,k,l,
m,n,o,p,q,r,
s,t,u,v,w,x,
From this dataset I want to copy the rows whose has value of column A as 'a' or 'm' to a new csv file. Also I want the header to get copied.
I have tried using awk. It copied all the rows but not the header.
awk '{$1~/a//m/ print}' inputfile.csv > outputfile.csv
How can I copy the header also into the new outputfile.csv?
Thanks in advance.
Considering that your header will be on 1st row, could you please try following.
awk 'BEGIN{FS=OFS=","} FNR==1{print;next} $1 ~ /^a$|^m$/' Input_file > outputfile.csv
OR as per Cyrus sir's comment adding following:
awk 'BEGIN{FS=OFS=","} FNR==1{print;next} $1 ~ /^(a|m)$/' Input_file > outputfile.csv
OR as per Ed sir's comment try following:
awk -F, 'NR==1 || $1~/^[am]$/' Input_file > outputfile.csv
Added corrections in OP's attempt:
Added FS and OFS as , here for all lines since lines are comma delimited.
Added FNR==1 condition which means it is checking 1st line here and printing it simply, since we want to print headers in out file. It will print very first line and then next will skip all further statements from here.
Used a better regex for checking 1st field's condition $1 ~ /^a$|^m$/
This might work for you (GNU sed):
sed '1b;/^[am],/!d' oldFile >newFile
Always print the first line and delete any other line that does not beging a, or m,.
Alternative:
awk 'NR==1 || /^[am],/' oldFile >newFile
With awk. Set field separator (FS) to , and output current row if it's first row or if its first column contains a or m.
awk 'NR==1 || $1=="a" || $1=="m"' FS=',' in.csv >out.csv
Output to out.csv:
A,B,C,D,E,F,
a,b,c,d,e,f,
m,n,o,p,q,r,
$ awk -F, 'BEGIN{split("a,m",tmp); for (i in tmp) tgts[tmp[i]]} NR==1 || $1 in tgts' file
A,B,C,D,E,F,
a,b,c,d,e,f,
m,n,o,p,q,r,
It appears that awk's default delimiter is whitespace. Link
Changing the delimiter can be denoted by using the FS variable:
awk 'BEGIN { FS = "," } ; { print $2 }'
Imagine if you have a string like this
Amazon.com Inc.:181,37:184,22
and you do awk -F':' '{print $1 ":" $2 ":" $3}' then it will output the same thing.
But can you declare $2 in this example so it only outputs 181 and not ,37?
Thanks in advance!
You can change the field separator so that it contains either : or ,, using a bracket expression:
awk -F'[:,]' '{ print $2 }' file
If you are worried that , may appear in the first field (which will break this approach), you could use split:
awk -F: '{ split($2, a, /,/); print a[1] }' file
This splits the second field on the comma and then prints the first part. Any other fields containing a comma are unaffected.
I have a pipe delimited file where I want to remove all text before a comma in field 9.
Example line:
www.upstate.edu|upadhyap|Prashant K Upadhyaya, MD||General Surgery|http://www.upstate.edu/hospital/providers/doctors/?docID=upadhyap|Patricia J. Numann Center for Breast, Endocrine & Plastic Surgery|Upstate Specialty Services at Harrison Center|Suite D, 550 Harrison Street||Syracuse|NY|13202|
so the targeted field is: |Suite D, 550 Harrison Street|
and I want it to look like: |550 Harrison Street|
So far what I have tried has either deleted information from other fields (usually the name in field 3) or has had no effect.
The .awk script I have been trying to write looks like this:
mv $1 $1.bak4
cat $1.bak4 | awk -F "|" '{
gsub(/*,/,"", $9);
print $0
}' > $1
The pattern argument to gsub is a regex not a glob. Your * isn't matching what you expect it to. You want /.*,/ there. You are also going to need to OFS to | to keep that delimiter.
mv $1 $1.bak4
awk 'BEGIN{ FS = OFS = "|" }{ gsub(/.*,/,"",$9) } 1' $1.bak4 > $1
I also replaced the verbose print line you had with a true pattern (1) that uses the fact that the default action is print.
eg, each row of the file is like :
1, 2, 3, 4,..., 1000
How can print out
1 2 3 4 ... 1000
?
If you just want to delete the commas, you can use tr:
$ tr -d ',' <file
1 2 3 4 1000
If it is something more general, you can set FS and OFS (read about FS and OFS) in your begin block:
awk 'BEGIN{FS=","; OFS=""} ...' file
You need to set OFS (the output field separator). Unfortunately, this has no effect unless you also modify the string, leading the rather cryptic:
awk '{$1=$1}1' FS=, OFS=
Although, if you are happy with some additional space being added, you can leave OFS at its default value (a single space), and do:
awk -F, '{$1=$1}1'
and if you don't mind omitting blank lines in the output, you can simplify further to:
awk -F, '$1=$1'
You could also remove the field separators:
awk -F, '{gsub(FS,"")} 1'
Set FS to the input field separators. Assigning to $1 will then reformat the field using the output field separator, which defaults to space:
awk -F',\s*' '{$1 = $1; print}'
See the GNU Awk Manual for an explanation of $1 = $1