Check if all multiple strings exist in one line - awk

I have a file that have this info
IRE_DRO_Fabric_A drogesx0112_IRE_DRO_A_ISIL03_091_871
IRE_DRO_Fabric_A drogesx0112_IRE_DRO_A_NETAPP_7890_2D5_1D8
IRE_DRO_Fabric_A drogesx0112_SAN_A
IRE_DRO_Fabric_B drogesx0112_IRE_DRO_B_ISIL03_081_873
IRE_DRO_Fabric_B drogesx0112_IRE_DRO_B_NETAPP_7890_9D3_2D8
IRE_DRO_Fabric_B drogesx0112_SAN_B
and wanted to check if multiple string were found per line. Tried this command but it's not working. Not sure if it's possible for the current text type?
grep 'drogesx0112.*ISIL03_091_871\|ISIL03_091_871.*drogesx0112' file << tried this but not working
grep 'drogesx0112' file | grep 'ISIL03_091_871' << tried this but not working
Looking for this output (I'm actually looking for string1(drogesx0112) and string2(ISIL03_091_871)
>grep 'drogesx0112.*ISIL03_091_871\|ISIL03_091_871.*drogesx0112' file # command
>IRE_DRO_Fabric_A drogesx0112_IRE_DRO_A_ISIL03_091_871 < output
so it's like i wanted to check if drogesx0112 and ISIL03_091_871 are present in a single line in a file.

Simple awk
$ awk ' /drogesx0112/ && /ISIL03_091_871/ ' gafm.txt
IRE_DRO_Fabric_A drogesx0112_IRE_DRO_A_ISIL03_091_871
$
Simple Perl
$ perl -ne ' print if /drogesx0112/ and /ISIL03_091_871/ ' gafm.txt
IRE_DRO_Fabric_A drogesx0112_IRE_DRO_A_ISIL03_091_871
$

If you are not looking for any order and simply want to check if both strings are present in a single line or not then try following.
awk '/drogesx0112/ && /ISIL03_091_871/' Input_file
In case you are looking for sequence of strings in line:
If your line has drogesx0112 first and then ISIL03_091_871 then try following.
awk '/drogesx0112.*ISIL03_091_871/' Input_file
If your line has ISIL03_091_871 first and then drogesx0112 then try following.
awk '/ISIL03_091_871.*drogesx0112/' Input_file

This might work for you (GNU sed):
sed '/drogesx0112/!d;/ISIL03_091_871/!d' file
Delete the current line if it does not contain drogesx0112 and delete it if does not contain ISIL03_091_871 too.
Another way:
sed -n '/drogesx0112/{/ISIL03_091_871/p}' file
A third:
sed '/drogesx0112/{/ISIL03_091_871/p};d' file

Related

How to print only the line which contains a blank lines before and after

How i can print only the line which contains blank line(s) before and after it.
I'm trying various awk and grep combination but somehow unable to get it.
tuv0657
tuv2330
tuv2133 Unable to get the ssh connection
tuv1988 Unable to get the ssh connection
tuv1049
tuv1683 Unable to get the ssh connection
tuv2101
Desired:
tuv0657
tuv1049
tuv2330
tuv2101
What i tried:
i tried below but did not get the results..
$ awk '{if ($2=="") print $0}' file
$ grep -E --line-number --with-filename '^$'
With your shown samples, could you please try following.
awk -v RS="" 'NF==1' Input_file
tuv0657
tuv2330
tuv1049
tuv2101
Based on your example data, try the following awk solution:
awk '$1 !="" && $2 == "" { print }' file
Where the first space separated field is not blank and the second field is blank, print the line.
This might work for you (GNU sed):
sed -nE '/^\S+$/p' file
Print the current line if it only contains the first field.

With sed or awk, move line matching pattern to bottom of file

I have a similar problem. I need to move a line in /etc/sudoers to the end of the file.
The line I am wanting to move:
#includedir /etc/sudoers.d
I have tried with a variable
#creates variable value
templine=$(cat /etc/sudoers | grep "#includedir /etc/sudoers.d")
#delete value
sed '/"${templine}"/d' /etc/sudoers
#write value to the bottom of the file
cat ${templine} >> /etc/sudoers
Not getting any errors nor the result I am looking for.
Any suggestions?
With awk:
awk '$0=="#includedir /etc/sudoers.d"{lastline=$0;next}{print $0}END{print lastline}' /etc/sudoers
That says:
If the line $0 is "#includedir /etc/sudoers.d" then set the variable lastline to this line's value $0 and skip to the next line next.
If you are still here, print the line {print $0}
Once every line in file is processed, print whatever is in the lastline variable.
Example:
$ cat test.txt
hi
this
is
#includedir /etc/sudoers.d
a
test
$ awk '$0=="#includedir /etc/sudoers.d"{lastline=$0;next}{print $0}END{print lastline}' test.txt
hi
this
is
a
test
#includedir /etc/sudoers.d
You could do the whole thing with sed:
sed -e '/#includedir .etc.sudoers.d/ { h; $p; d; }' -e '$G' /etc/sudoers
This might work for you (GNU sed):
sed -n '/regexp/H;//!p;$x;$s/.//p' file
This removes line(s) containing a specified regexp and appends them to the end of the file.
To only move the first line that matches the regexp, use:
sed -n '/regexp/{h;$p;$b;:a;n;p;$!ba;x};p' file
This uses a loop to read/print the remainder of the file and then append the matched line.
If you have multiple entries which you want to move to the end of the file, you can do the following:
awk '/regex/{a[++c]=$0;next}1;END{for(i=1;i<=c;++i) print a[i]}' file
or
sed -n '/regex/!{p;ba};H;:a;${x;s/.//;p}' file

Append prefix to first column of a file with awk

I have a couple of hundreds of files which I want to process with xargs. They all need a fix of their first column.
Therefore I need an awk command to append the prefix "ID_" to the first column of a file (except for the first header line). Can anyone help me with this?
Something along the line:
gawk -f ';' "{$1='ID_' $1; print $0}" file.csv > file_processed.csv
I am not expert for the command, though. And I would rather like to have some inplace processing instead of making a copy of each file. Beforehand, I made it in VIM, but then I only had one file.
:%s/^-/ID_/
I hope someone can help me here.
gawk 'BEGIN{FS=";"; OFS=";"} {if(NR>1) $1="ID_"$1; print}' file.csv > file_processed.csv
FS and OFS set the input and output field separators, respectively.
NR>1 checks whether current line number is larger than 1, so we don't modify the header line.
You can also modify the file in place with -i inplace option:
gawk -i inplace 'BEGIN{FS=";"; OFS=";"} {if(NR>1) $1="ID_"$1; print}' file.csv
Edit
After elaborating the original question, here's the final version:
gawk -i inplace 'BEGIN{FS=OFS=";"} NR>1{sub(/^-/,"ID_",$2)} 1' file.csv
which substitutes - in the beginning of second column with ID_.
NR>1 action applies for all but first (header) line. 1 invokes the default default print action.
If you just want to do something, particularly adding a prefix, on the first field, it is not different from adding the prefix to the whole line.
So you can just awk '$0 = "ID_" $0' file.csv it should do the work. If you want to make it "change in place", you can:
awk '$0="ID_"$0' csv >/tmp/foo && mv /tmp/foo file.csv
You can also make use of sed:
sed -i 's/^/ID_/' file
The -i does "in-place modification"
You mentioned vim, and gave s/^-/ID_/ cmd, it doesn't add the prefix (ID_), it will replace the leading - by the ID_, they are different.

Inserting characters into specified fields in large files

Here is my question. I have several hundred files that are too large to edit utilizing vi editor. I'm looking for a possible awk or sed command to manipulate my files. Bit of a rookie. I have a simplified file:
001|1|3|053412|16|1234|||
001|21|4|123618|15|88|||
The files were created, with the fourth field being in the wrong format.
The fourth field should be 05:34:12 reflecting HH:MM:SS. The time values are correct, I just need to insert the : in the appropriate places.
How do I insert the colons after the second character and the fourth characters in the fourth field? I cannot do it by character count since the values before and after the fourth field are variable.
gawk to the rescue!
$ awk -F\| -v OFS=\| '{$4=gensub(/(..)(..)(..)/,"\\1:\\2:\\3","g",$4)}1' file
001|1|3|05:34:12|16|1234|||
001|21|4|12:36:18|15|88|||
otherwise you can do the same with substr($4,1,2)":"...
With GNU awk for gensub() and inplace editing
awk -i inplace 'BEGIN{FS=OFS="|"} {$4=gensub(/(..)(..)/,"\\1:\\2:",1,$4)} 1' *
Similarly with GNU sed for EREs and inplace editing:
sed -i -E 's/(([^|]*\|){3}..)(..)/\1:\3:/' *
e.g.:
$ awk 'BEGIN{FS=OFS="|"} {$4=gensub(/(..)(..)/,"\\1:\\2:",1,$4)} 1' file
001|1|3|05:34:12|16|1234|||
001|21|4|12:36:18|15|88|||
$ sed -E 's/(([^|]*\|){3}..)(..)/\1:\3:/' file
001|1|3|05:34:12|16|1234|||
001|21|4|12:36:18|15|88|||
With sed:
$ sed -r 's/([^|]*|[^|]*|[^|]*|)([0-9]{2})([0-9]{2})([0-9]{2})/\1\2:\3:\4/' file
001|1|3|05:34:12|16|1234|||
001|21|4|12:36:18|15|88|||
This might work for you (GNU sed):
sed -r 's/^(([^|]*\|){3})(..)(..)/\1\3:\4:/' file
Use back references to group the first three fields and the following two pairs. Then format the 4th field as required.
Try this awk!
awk -F"|" -v OFS="|" '{r=split($4,T,"");for(i=2;i<=r;i+=2){if(i!=r)T[i]=T[i]":"}tmp="";for(i=1;i<=r;i++){tmp=tmp T[i]}$4=tmp;}1' file
001|1|3|05:34:12|16|1234|||
001|21|4|12:36:18|15|88|||
Longer whit explanation:
BEGIN{
FS=OFS="|"; #Field separator and output field separator
}
{
tmp="";
r=split($4,time_field,""); # Chunk field into pieces
for(i=2;i<=r;i+=2) # Loop two by two
{
if(i!=r)
{
time_field[i]=time_field[i]":"; # Add ":"
}
}
for(i=1;i<=r;i++) # Loop over again to rebuild
{
tmp=tmp time_field[i];
}
$4=tmp; #rebuid field
print
}
How you could use it in bash: Save it as whatever.awk
while IFS='' read -r file
do
awk -f whatever.awk "$file" > out_file
done < list_of_files_to_edit.txt
If you want edit the files in place, you can add the option -i to Kenavoz sed command. sed -ri ...

Concatenating lines using awk

I have fasta file that contains two gene sequences and what I want to do is remove the fasta header (line starting with ">"), concatenate the rest of the lines and output that sequence
Here is my fasta sequence (genome.fa):
>Potrs164783
AGGAAGTGTGAGATTGAAAAAACATTACTATTGAGGAATTTTTGACCAGATCAGAATTGAACCAACATGATGAAGGGGAT
TGTTTGCCATCAGAATATGGCATGAAATTTCTCCCCTAGATCGGTTCAAGCTCCTGTAGGTTTGGAGTCCTTAGTGAGAA
CTTTCTTAAGAGAATCTAATCTGGTCTGTTCCTCGTCATAAGTTAAAGAAAAACTTGAAACAAATAACAAGCATGCATAA
>Potrs164784
TTACCCTCTACCAGCACCAATGCCTATGATCTTACAAAAATCCTTAATAAAAAGAAATCCAAAACCATTGTTACCATTCC
GGAATTACATTCTGAGATAAAAACCCTCAAATCTGAATTACAATCCCTTAAACAAGCCCAACAAAAAGACTCTGCCATAC
Desired output
AGGAAGTGTGAGATTGAAAAAACATTACTATTGAGGAATTTTTGACCAGATCAGAATTGAACCAACATGATGAAGGGGAT
TGTTTGCCATCAGAATATGGCATGAAATTTCTCCCCTAGATCGGTTCAAGCTCCTGTAGGTTTGGAGTCCTTAGTGAGAA
CTTTCTTAAGAGAATCTAATCTGGTCTGTTCCTCGTCATAAGTTAAAGAAAAACTTGAAACAAATAACAAGCATGCATAA
TTACCCTCTACCAGCACCAATGCCTATGATCTTACAAAAATCCTTAATAAAAAGAAATCCAAAACCATTGTTACCATTCC
GGAATTACATTCTGAGATAAAAACCCTCAAATCTGAATTACAATCCCTTAAACAAGCCCAACAAAAAGACTCTGCCATAC
I am using awk to do this but I am getting this error
awk 'BEGIN{filename="file1"}{if($1 ~ />/){filename=$1; sub(/>/,"",filename); print filename;} print $0 >filename.fa;}' ../genome.fa
awk: syntax error at source line 1
context is
BEGIN{filename="file1"}{if($1 ~ />/){filename=$1; sub(/>/,"",filename); print filename;} print $0 >>> >filename. <<< fa;}
awk: illegal statement at source line 1
I am basically a python person and I was given this script by someone. What am I doing wrong here?
I realized that i was not clear and so i am pasting the whole code that i got from someone. The input file and desired output remains the same
mkdir split_genome;
cd split_genome;
awk 'BEGIN{filename="file1"}{if($1 ~ />/){filename=$1; sub(/>/,"",filename); print filename;} print $0 >filename.fa;}' ../genome.fa;
ls -1 `pwd`/* > ../scaffold_list.txt;
cd ..;
If all you want to do is produce the desired output shown in your question, other solutions will work.
However, the script you have is trying to print each sequence to a file that is named using its header, and the extension .fa.
The syntax error you're getting is because filename.fa is neither a variable or a fixed string. While no Awk will allow you to print to filename.fa because it is neither in quotes or a variable (varaible names can't have a . in them), BSD Awk does not allow manipulating strings when they currently act as a file name where GNU Awk does.
So the solution:
print $0 > filename".fa"
would produce the same error in BSD Awk, but would work in GNU Awk.
To fix this, you can append the extension ".fa" to filename at assignment.
This will do the job:
$ awk '{if($0 ~ /^>/) filename=substr($0, 2)".fa"; else print $0 > filename}' file
$ cat Potrs164783.fa
AGGAAGTGTGAGATTGAAAAAACATTACTATTGAGGAATTTTTGACCAGATCAGAATTGAACCAACATGATGAAGGGGAT
TGTTTGCCATCAGAATATGGCATGAAATTTCTCCCCTAGATCGGTTCAAGCTCCTGTAGGTTTGGAGTCCTTAGTGAGAA
CTTTCTTAAGAGAATCTAATCTGGTCTGTTCCTCGTCATAAGTTAAAGAAAAACTTGAAACAAATAACAAGCATGCATAA
$ cat Potrs164784.fa
TTACCCTCTACCAGCACCAATGCCTATGATCTTACAAAAATCCTTAATAAAAAGAAATCCAAAACCATTGTTACCATTCC
GGAATTACATTCTGAGATAAAAACCCTCAAATCTGAATTACAATCCCTTAAACAAGCCCAACAAAAAGACTCTGCCATAC
You'll notice I left out the BEGIN{filename="file1"} declaration statement as it is unnecessary. Also, I replaced the need for sub(...) by using the string function substr as it is more clear and requires fewer actions.
The awk code that you show attempts to do something different than produce the output that you want. Fortunately, there are much simpler ways to obtain your desired output. For example:
$ grep -v '>' ../genome.fa
AGGAAGTGTGAGATTGAAAAAACATTACTATTGAGGAATTTTTGACCAGATCAGAATTGAACCAACATGATGAAGGGGAT
TGTTTGCCATCAGAATATGGCATGAAATTTCTCCCCTAGATCGGTTCAAGCTCCTGTAGGTTTGGAGTCCTTAGTGAGAA
CTTTCTTAAGAGAATCTAATCTGGTCTGTTCCTCGTCATAAGTTAAAGAAAAACTTGAAACAAATAACAAGCATGCATAA
TTACCCTCTACCAGCACCAATGCCTATGATCTTACAAAAATCCTTAATAAAAAGAAATCCAAAACCATTGTTACCATTCC
GGAATTACATTCTGAGATAAAAACCCTCAAATCTGAATTACAATCCCTTAAACAAGCCCAACAAAAAGACTCTGCCATAC
Alternatively, if you had intended to have all non-header lines concatenated into one line:
$ sed -n '/^>/!H; $!d; x; s/\n//gp' ../genome.fa
AGGAAGTGTGAGATTGAAAAAACATTACTATTGAGGAATTTTTGACCAGATCAGAATTGAACCAACATGATGAAGGGGATTGTTTGCCATCAGAATATGGCATGAAATTTCTCCCCTAGATCGGTTCAAGCTCCTGTAGGTTTGGAGTCCTTAGTGAGAACTTTCTTAAGAGAATCTAATCTGGTCTGTTCCTCGTCATAAGTTAAAGAAAAACTTGAAACAAATAACAAGCATGCATAATTACCCTCTACCAGCACCAATGCCTATGATCTTACAAAAATCCTTAATAAAAAGAAATCCAAAACCATTGTTACCATTCCGGAATTACATTCTGAGATAAAAACCCTCAAATCTGAATTACAATCCCTTAAACAAGCCCAACAAAAAGACTCTGCCATAC
Try this to print lines not started by > and in one line:
awk '!/^>/{printf $0}' genome.fa > filename.fa
With carriage return:
awk '!/^>/' genome.fa > filename.fa
To create single files named by the headers:
awk 'split($0,a,"^>")>1{file=a[2];next}{print >file}' genome.fa