How to print multiple files in awk? - awk

What is wrong with this file please? I owuld like to print all lines from file01, file02, file03 ... file11.
awk '{print}' file[01-11].txt > file

Assuming you are running this in BASH, the [01-11] second is not in the correct format. Instead, consider the following:
awk '{print}' file{01..11}.txt > file
This is again, assuming a specific shell. If you are running this awk command in a shell that does not support the {##..##} nomenclature, consider testing how your file[01-11].txt is expanding first -- I imagine it's not expanding out to the files you think.

How about using cat itself for it like(since you are only printing and not doing any other operation):
cat Input_file{01..11}.txt > file
In case you really want to do only in awk then try:
awk '1' Input_file{01..11}.txt > file

Related

File size grows greatly after using awk

I want to add row number for a file, then I do like this,
awk '{print $0 "\x03" NR > "/opt/data2/gds_test/test_partly.txt"}' /opt/data2/gds_test/test_partly.txt
I put this line of command in a shell script file, and run it for some time, it still does not finish, so I end it by force, but I find the source file size has changed from 1.7G to 242G,
What happened? I am a little confused,
I had ever use a small file to test in command line, this awk command seems ok,
You're reading from the front of a file at the same time as you're writing onto the end of it. Try this instead:
tmp=$(mktemp)
awk '{print $0 "\x03" NR}' '/opt/data2/gds_test/test_partly.txt' > "$tmp" &&
mv "$tmp" '/opt/data2/gds_test/test_partly.txt'
yes, i change to redirect the result to a tmp file, and then delete the original file and rename the tmp file, it is ok,
and i just also get to know that gawk -I inplace can be used,

Check if all multiple strings exist in one line

I have a file that have this info
IRE_DRO_Fabric_A drogesx0112_IRE_DRO_A_ISIL03_091_871
IRE_DRO_Fabric_A drogesx0112_IRE_DRO_A_NETAPP_7890_2D5_1D8
IRE_DRO_Fabric_A drogesx0112_SAN_A
IRE_DRO_Fabric_B drogesx0112_IRE_DRO_B_ISIL03_081_873
IRE_DRO_Fabric_B drogesx0112_IRE_DRO_B_NETAPP_7890_9D3_2D8
IRE_DRO_Fabric_B drogesx0112_SAN_B
and wanted to check if multiple string were found per line. Tried this command but it's not working. Not sure if it's possible for the current text type?
grep 'drogesx0112.*ISIL03_091_871\|ISIL03_091_871.*drogesx0112' file << tried this but not working
grep 'drogesx0112' file | grep 'ISIL03_091_871' << tried this but not working
Looking for this output (I'm actually looking for string1(drogesx0112) and string2(ISIL03_091_871)
>grep 'drogesx0112.*ISIL03_091_871\|ISIL03_091_871.*drogesx0112' file # command
>IRE_DRO_Fabric_A drogesx0112_IRE_DRO_A_ISIL03_091_871 < output
so it's like i wanted to check if drogesx0112 and ISIL03_091_871 are present in a single line in a file.
Simple awk
$ awk ' /drogesx0112/ && /ISIL03_091_871/ ' gafm.txt
IRE_DRO_Fabric_A drogesx0112_IRE_DRO_A_ISIL03_091_871
$
Simple Perl
$ perl -ne ' print if /drogesx0112/ and /ISIL03_091_871/ ' gafm.txt
IRE_DRO_Fabric_A drogesx0112_IRE_DRO_A_ISIL03_091_871
$
If you are not looking for any order and simply want to check if both strings are present in a single line or not then try following.
awk '/drogesx0112/ && /ISIL03_091_871/' Input_file
In case you are looking for sequence of strings in line:
If your line has drogesx0112 first and then ISIL03_091_871 then try following.
awk '/drogesx0112.*ISIL03_091_871/' Input_file
If your line has ISIL03_091_871 first and then drogesx0112 then try following.
awk '/ISIL03_091_871.*drogesx0112/' Input_file
This might work for you (GNU sed):
sed '/drogesx0112/!d;/ISIL03_091_871/!d' file
Delete the current line if it does not contain drogesx0112 and delete it if does not contain ISIL03_091_871 too.
Another way:
sed -n '/drogesx0112/{/ISIL03_091_871/p}' file
A third:
sed '/drogesx0112/{/ISIL03_091_871/p};d' file

Convert sequence list to fasta for multiple files

I have thousands of files, which are a list of sequence names followed by their sequence, one individual per line, something like this:
L.abdalai.LJAMM.14363.SanMartindeLosAndes CCCTAAGAATAATTTGTT
L.carlosgarini.LJAMM.14070.LagunadelMaule CCCTAAGAAT-ATTTGTT
L.cf.silvai.DD.038.Sarco CCCTAAGAAT-ATTTGTT
And I want to change them to fasta format, so looking something like:
>L.abdalai.LJAMM.14363.SanMartindeLosAndes
CCCTAAGAATAATTTGTTCAGAAAAGATATTTAATTATAT
>L.carlosgarini.LJAMM.14070.LagunadelMaule
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
>L.cf.silvai.DD.038.Sarco
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
I work on a Mac.
Thanks!
Using Perl
perl -pe 's/^/</;s/(\S+)\s+(\S+)/$1\n$2CAGAAAAGATATTTAATTATAT/g ' file
with your inputs
$ cat damien.txt
L.abdalai.LJAMM.14363.SanMartindeLosAndes CCCTAAGAATAATTTGTT
L.carlosgarini.LJAMM.14070.LagunadelMaule CCCTAAGAAT-ATTTGTT
L.cf.silvai.DD.038.Sarco CCCTAAGAAT-ATTTGTT
$ perl -pe 's/^/</;s/(\S+)\s+(\S+)/$1\n$2CAGAAAAGATATTTAATTATAT/g ' damien.txt
<L.abdalai.LJAMM.14363.SanMartindeLosAndes
CCCTAAGAATAATTTGTTCAGAAAAGATATTTAATTATAT
<L.carlosgarini.LJAMM.14070.LagunadelMaule
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
<L.cf.silvai.DD.038.Sarco
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
$
I believe you simplied your sample input, thus different from your expected output.
If not so, and my solutions not work, please comment under my answer to let me know.
So with awk, you can do it like this:
awk -v OFS="\n" '$1=">" $1' file
>L.abdalai.LJAMM.14363.SanMartindeLosAndes
CCCTAAGAATAATTTGTT
>L.carlosgarini.LJAMM.14070.LagunadelMaule
CCCTAAGAAT-ATTTGTT
>L.cf.silvai.DD.038.Sarco
CCCTAAGAAT-ATTTGTT
If you want to change inplace, please install GNU gawk, and use gawk -i inplace ....
And if you want the line endings be Carriages, add/change to -v ORS="\r" -v OFS="\r"
However, you can also, and maybe it's better to do it with sed:
sed -e 's/\([^[:space:]]*\)[[:space:]]*\([^[:space:]]*\)/>\1\n\2/' file
Add -i'' like this: sed -i'' -e ... to change file inplace.
Could you please try following(created and tested based on your samples, since I don't have mac to didn't test on it).
awk '/^L\./{print ">"$1 ORS $2 "CAGAAAAGATATTTAATTATAT"}' Input_file
Output will be as follows. If needed you could take it to a output_file by appending > output_file to above command too.
>L.abdalai.LJAMM.14363.SanMartindeLosAndes
CCCTAAGAATAATTTGTTCAGAAAAGATATTTAATTATAT
>L.carlosgarini.LJAMM.14070.LagunadelMaule
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
>L.cf.silvai.DD.038.Sarco
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT

parse a url with the command line only

I have a csv file looking like this:
id,author,url
1,bob,http://mywebsite.com/path/to/content
2,john,https://anotherwebsite.com/path/to/some/other/content
3,alice,http://www.somewebsite.com/path/to/content
And I'd like to turn it into:
id,author,url
1,bob,mywebsite.com
2,john,anotherwebsite.com
3,alice,somewebsite.com
I know this could be done easily with javascript or python but I am trying to understand how awk and sed work. Is there a way to do this easily with command line tools only?
Many thanks
This should do:
awk -F, 'NR>1{split($3,a,"/");$0=$1","$2","a[3]}1' file
id,author,url
1,bob,mywebsite.com
2,john,anotherwebsite.com
3,alice,www.somewebsite.com
Split the line using ,
Then for all except first line NR>1, split filed $3, recreate the line.
1print all
Also remove www.
awk -F, 'NR>1{split($3,a,"/");sub(/^www./,"",a[3]);$0=$1","$2","a[3]}1'
id,author,url
1,bob,mywebsite.com
2,john,anotherwebsite.com
3,alice,somewebsite.com

Processing of awk with multiple variable from previous processing?

I have a Q's for awk processing, i got a file below
cat test.txt
/home/shhh/
abc.c
/home/shhh/2/
def.c
gthjrjrdj.c
/kernel/sssh
sarawtera.c
wrawrt.h
wearwaerw.h
My goal is to make a full path from splitting sentences into /home/jhyoon/abc.c.
This is the command I got from someone:
cat test.txt | awk '/^\/.*/{path=$0}/^[a-zA-Z]/{printf("%s/%s\n",path,$0);}'
It works, but I do not understand well about how do make interpret it step by step.
Could you teach me how do I make interpret it?
Result :
/home/shhh//abc.c
/home/shhh/2//def.c
/home/shhh/2//gthjrjrdj.c
/kernel/sssh/sarawtera.c
/kernel/sssh/wrawrt.h
/kernel/sssh/wearwaerw.h
What you probably want is the following:
$ awk '/^\//{path=$0}/^[a-zA-Z]/ {printf("%s/%s\n",path,$0)}' file
/home/jhyoon//abc.c
/home/jhyoon/2//def.c
/home/jhyoon/2//gthjrjrdj.c
/kernel/sssh/sarawtera.c
/kernel/sssh/wrawrt.h
/kernel/sssh/wearwaerw.h
Explanation
/^\//{path=$0} on lines starting with a /, store it in the path variable.
/^[a-zA-Z]/ {printf("%s/%s\n",path,$0)} on lines starting with a letter, print the stored path together with the current line.
Note you can also say
awk '/^\//{path=$0; next} {printf("%s/%s\n",path,$0)}' file
Some comments
cat file | awk '...' is better written as awk '...' file.
You don't need the ; at the end of a block {} if you are executing just one command. It is implicit.