I've tried ls *.fasta | parallel --gnu "awk '{print $1}' > {/.}.outputfile.txt"
and its not producing the result I need. I have 48 files where I need to extract these fields and output them to 48 independent files.
I can run this just fine but I have to do it for each file one by one: awk '{print $1}' BLAST_output_file.txt > ID_ BLAST_output_file.txt
Can someone help me out here? Thanks
Could you please try following.
awk '{if(FILENAME!=prev){close(prev)};print $1;prev=FILENAME}' *.fasta > output_all_file
In case you need to have different output file(saw from your attempt):
awk '{if(FILENAME!=prev){close(prev)};print $1 > (FILENAME".id.blast.out.txt");prev=FILENAME}' *.fasta
Add {}:
ls *.fasta | parallel --gnu "awk '{print $1}' {} > {/.}.outputfile.txt"
Just wrote a simple bash script
for i in *.txt; do
awk '{print $1}' $i > $i.id.blast.out.txt
done
My Input file looks like below,
1,,B4,3000,Rushab,UNI,20130919T22:45:05+0100,20190930T23:59:59+0100,,kapeta,,6741948090816,2285917436,971078887,1283538808965528,20181102_20001,,,,,,,,,,,,,,,C
2,,B4,3000,Rushab,UNI,20130919T22:45:05+0100,20190930T23:59:59+0100,20181006T11:57:13+0100,,vsuser,6741948090816,2285917436,971078887,1283538808965528,20181102_20001,,,,,,,,,,,,,,,H
1,,F1,100000,RAWBANK,UNI,20180416T15:25:00+0100,20190416T23:59:59+0100,,enrruac,,7522609506635,3101315044,998445487,1290161608965816,20181102_20001,,,,,,,,,,,,,,,C
4,,F1,100000,RAWBANK,UNI,20180416T15:25:00+0100,20190416T23:59:59+0100,20181007T22:25:13+0100,,vsuser,7522609506635,3101315044,998445487,1290161608965816,20181102_20001,,,,,,,,,,,,,,,H
i want to print only the line that are starting with '1' and ends with 'C', so i am trying with below command,
awk -F, '$1=='1' && $31=='C'{print $0}' input_file.txt
but i am not getting any output.
Use double quotes:
awk -F, '$1=="1" && $31=="C"{print $0}' file
or
awk -F, '$1=="1" && $31=="C"' file
As other users suggested, this can be done using a simple regex. So you can use sed as well as awk
sed '/^1,.*,C$/!d' file
I've got a file with following records:
depots/import/HDN1YYAA_15102018.txt;1;CAB001
depots/import/HDN1YYAA_20102018.txt;2;CLI001
depots/import/HDN1YYAA_20102018.txt;32;CLI001
depots/import/HDN1YYAA_25102018.txt;1;CAB001
depots/import/HDN1YYAA_50102018.txt;1;CAB001
depots/import/HDN1YYAA_65102018.txt;1;CAB001
depots/import/HDN1YYAA_80102018.txt;2;CLI001
depots/import/HDN1YYAA_93102018.txt;2;CLI001
When I execute following oneliner awk:
cat lignes_en_erreur.txt | awk 'FS=";"{ if(NR==1){print $1}}END {}'
the output is not the expected:
depots/import/HDN1YYAA_15102018.txt;1;CAB001
While I am suppose get only the frist column:
If I run it through all the records:
cat lignes_en_erreur.txt | awk 'FS=";"{ if(NR>0){print $1}}END {}'
then it will start filtering only after the second line and I get the following output:
depots/import/HDN1YYAA_15102018.txt;1;CAB001
depots/import/HDN1YYAA_20102018.txt
depots/import/HDN1YYAA_20102018.txt
depots/import/HDN1YYAA_25102018.txt
depots/import/HDN1YYAA_50102018.txt
depots/import/HDN1YYAA_65102018.txt
depots/import/HDN1YYAA_80102018.txt
depots/import/HDN1YYAA_93102018.txt
Does anybody knows why awk is skiping the first line only.
I tried deleting first record but the behaviour is the same, it will skip the first line.
First, it should be
awk 'BEGIN{FS=";"}{ if(NR==1){print $1}}END {}' filename
You can omit the END block if it is empty:
awk 'BEGIN{FS=";"}{ if(NR==1){print $1}}' filename
You can use the -F command line argument to set the field delimiter:
awk -F';' '{if(NR==1){print $1}}' filename
Furthermore, awk programs consist of a sequence of CONDITION [{ACTIONS}] elements, you can omit the if:
awk -F';' 'NR==1 {print $1}' filename
You need to specify delimiter in either BEGIN block or as a command-line option:
awk 'BEGIN{FS=";"}{ if(NR==1){print $1}}'
awk -F ';' '{ if(NR==1){print $1}}'
cut might be better suited here, for all lines
$ cut -d';' -f1 file
to skip the first line
$ sed 1d file | cut -d';' -f1
to get the first line only
$ sed 1q file | cut -d';' -f1
however at this point it's better to switch to awk
if you have a large file and only interested in the first line, it's better to exit early
$ awk -F';' '{print $1; exit}' file
I would like to grab the part after "-" and combine it with the following letter-string into a tab-output. I tried something like cut -d "*-" -f 2 <<< "$your_str" but I am not sure how to do the whole shuffling.
Input:
>1-395652
TATTGCACTTGTCCCGGCCTGT
>2-369990
TATTGCACTCGTCCCGGCCTCC
>3-132234
TATTGCACTCGTCCCGGCCTC
>4-122014
TATTGCACTTGTCCCGGCCTGTAA
>5-118616
Output:
TATTGCACTTGTCCCGGCCTGT 395652
TATTGCACTCGTCCCGGCCTCC 369990
awk to the rescue!
awk -F- '/^>/{k=$2; next} {print $0, k}' file
With GNU sed:
sed -nE 'N;s/.*-([0-9]+)\n(.*)/\2\t\1/p' file
Output:
TATTGCACTTGTCCCGGCCTGT 395652
TATTGCACTCGTCCCGGCCTCC 369990
TATTGCACTCGTCCCGGCCTC 132234
TATTGCACTTGTCCCGGCCTGTAA 122014
Portable sed:
sed -n 's/.*-//;x;n;G;s/\n/ /p' inputfile
Output:
TATTGCACTTGTCCCGGCCTGT 395652
TATTGCACTCGTCCCGGCCTCC 369990
TATTGCACTCGTCCCGGCCTC 132234
TATTGCACTTGTCCCGGCCTGTAA 122014
I have a text file as shown below. I need only PDB IDs after the > symbol. How can I do this with awk?
>results for sequence "files/1H8U.pdb" starting "ASPILEGLUGLY"
DIEGREKQQPSRVS
>results for sequence "files/1P6K.pdb" starting "ILEALALYSASP"
IAKDVAKEGSDGATKQRTHPQDSASI
Desired output
>1H8U
DIEGREKQQPSRVS
>1P6K
IAKDVAKEGSDGATKQRTHPQDSASI
I would probably use sed for this, but here's the awk:
awk '/^>/ { sub (/[^\/]+\//,">", $0); sub (/\..+/, "", $0) }1' file.txt
Here's the sed:
sed -r '/^>/s%[^/]+/%>%;s%\..+%%' file.txt
This might work for you:
awk -F[/.] '/^>/{$1=">"$2;NF=1};1' file
or:
sed '/^>.*\/\([^.]*\)\..*/s//>\1/' file