I would like to use an online database instead of a local file in AWK.
For instance:
awk 'END{print NR}' somelocalfile.txt
returns number of lines inside the file.
Now my question is, how can I calculate number of lines in an online txt file like this one? I prefer one-liner command.
I can wget and then apply awk command localy on it, but I think there can be more efficient approach.
I would suggest to use wget:
wget -qO - http://path.com/tofile.txt | awk 'END{print NR}'
q means quiet, so you won't have any terminal outputs from wget. -O is the output which is set to stdout with the '-O -'.
Related
I have a directory full of output files, with files with names:
file1.out,file2.out,..., fileN.out.
There is a certain key string in each of these files, lets call it keystring. I want to replace every instance of keystring with newstring in all files.
If there was only one file, I know I can do:
awk '{gsub("keystring","newstring",$0); print $0}' file1.out > file1.out
Is there a way to loop through all N files in awk?
You could use find command for the same. Please make sure you run this on a test file first once it works fine then only run it in your actual path(on all your actual files) for safer side. This also needs gawk newer version which has inplace option in it to save output into files itself.
find your_path -type f -name "*.out" -exec awk -i inplace -f myawkProgram.awk {} +
Where your awk program is as follows: as per your shown samples(cat myawkProgram.awk is only to show contents of your awk program here).
cat myawkProgram.awk
{
gsub("keystring","newstring",$0); print $0
}
2nd option would be pass all .out format files into your gawk program itself with -inplace by doing something like(but again make sure you run this on a single test file first and then run actual command for safer side once you are convinced by command):
awk -i inplace '{gsub("keystring","newstring",$0); print $0}' *.out
sed is the most ideal solution for this and so integrating it with find:
find /directory/path -type f -name "*.out" -exec sed -i 's/keystring/newstring/g' {} +
Find files with the extension .out and then execute the sed command on as many groups of the found files as possible (using + with -exec)
I have thousands of files, which are a list of sequence names followed by their sequence, one individual per line, something like this:
L.abdalai.LJAMM.14363.SanMartindeLosAndes CCCTAAGAATAATTTGTT
L.carlosgarini.LJAMM.14070.LagunadelMaule CCCTAAGAAT-ATTTGTT
L.cf.silvai.DD.038.Sarco CCCTAAGAAT-ATTTGTT
And I want to change them to fasta format, so looking something like:
>L.abdalai.LJAMM.14363.SanMartindeLosAndes
CCCTAAGAATAATTTGTTCAGAAAAGATATTTAATTATAT
>L.carlosgarini.LJAMM.14070.LagunadelMaule
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
>L.cf.silvai.DD.038.Sarco
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
I work on a Mac.
Thanks!
Using Perl
perl -pe 's/^/</;s/(\S+)\s+(\S+)/$1\n$2CAGAAAAGATATTTAATTATAT/g ' file
with your inputs
$ cat damien.txt
L.abdalai.LJAMM.14363.SanMartindeLosAndes CCCTAAGAATAATTTGTT
L.carlosgarini.LJAMM.14070.LagunadelMaule CCCTAAGAAT-ATTTGTT
L.cf.silvai.DD.038.Sarco CCCTAAGAAT-ATTTGTT
$ perl -pe 's/^/</;s/(\S+)\s+(\S+)/$1\n$2CAGAAAAGATATTTAATTATAT/g ' damien.txt
<L.abdalai.LJAMM.14363.SanMartindeLosAndes
CCCTAAGAATAATTTGTTCAGAAAAGATATTTAATTATAT
<L.carlosgarini.LJAMM.14070.LagunadelMaule
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
<L.cf.silvai.DD.038.Sarco
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
$
I believe you simplied your sample input, thus different from your expected output.
If not so, and my solutions not work, please comment under my answer to let me know.
So with awk, you can do it like this:
awk -v OFS="\n" '$1=">" $1' file
>L.abdalai.LJAMM.14363.SanMartindeLosAndes
CCCTAAGAATAATTTGTT
>L.carlosgarini.LJAMM.14070.LagunadelMaule
CCCTAAGAAT-ATTTGTT
>L.cf.silvai.DD.038.Sarco
CCCTAAGAAT-ATTTGTT
If you want to change inplace, please install GNU gawk, and use gawk -i inplace ....
And if you want the line endings be Carriages, add/change to -v ORS="\r" -v OFS="\r"
However, you can also, and maybe it's better to do it with sed:
sed -e 's/\([^[:space:]]*\)[[:space:]]*\([^[:space:]]*\)/>\1\n\2/' file
Add -i'' like this: sed -i'' -e ... to change file inplace.
Could you please try following(created and tested based on your samples, since I don't have mac to didn't test on it).
awk '/^L\./{print ">"$1 ORS $2 "CAGAAAAGATATTTAATTATAT"}' Input_file
Output will be as follows. If needed you could take it to a output_file by appending > output_file to above command too.
>L.abdalai.LJAMM.14363.SanMartindeLosAndes
CCCTAAGAATAATTTGTTCAGAAAAGATATTTAATTATAT
>L.carlosgarini.LJAMM.14070.LagunadelMaule
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
>L.cf.silvai.DD.038.Sarco
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
I have seen questions that are close to this but I have not seen the exact answer I need and can't seem to get my head wrapped around the regex, awk, sed, grep, rename that I would need to make it happen.
I have files in one directory sequentially named from multiple sub directories of a different directory created using find piped to xargs.
Command I used:
find `<dir1>` -name "*.png" | xargs cp -t `<dir2>`
This resulted in the second directory containing duplicate filenames sequentially named as follows:
<name>.png
<name>.png.~1~
<name>.png.~2~
...
<name>.png.~n~
What I would like to do is take all files ending in ~*~ and rename it as follows:
<name>.#.png where the '#" is the number between the "~"s at the end of the file name
Any help would be appreciated.
With Perl's rename (stand alone command):
rename -nv 's/^([^.]+)\.(.+)\.~([0-9]+)~/$1.$3.$2/' *
If everything looks fine remove option -n.
There might be an easier way to this, but here is a small shell script using grep and awk to achieve what you wanted
for i in $(ls|grep ".png."); do
name=$(echo $i|awk -F'png' '{print $1}');
n=$(echo $i|awk -F'~' '{print $2}');
mv $i $name$n.png;
done
I need to run a command on hundreds of files and I need help to get a loop to do this:
have a list of input files /path/dir/file1.csv, file2.csv, ..., fileN.csv
need to run a script on all those input files
script is something like: command input=/path/dir/file1.csv output=output1
I have tried things like:
for f in /path/dir/file*.csv; do command, but how do I get to read and write the new file every time?
Thank you....
Try this, (after changing /path/to/data to the correct path. Same with /path/to/awkscript and other places, pointing to your test data.)
#!/bin/bash
cd /path/to/data
for f in *.csv ; do
echo "awk -f /path/to/awkscript \"$f\" > ${f%.csv}.rpt"
#remove_me awk -f /path/to/awkscript "$f" > ${f%.csv}.rpt
done
make the script "executable" with
chmod 755 myScript.sh
The echo version will help you ensure the script is going to work as expected. You still have to carefully examine that output OR work on a copy of your data so you don't wreck you base-line data.
You could take the output of the last iteration
awk -f /path/to/awkscript myFileLast.csv > myFileLast.rpt
And copy/paste to cmd-line to confirm it will work.
WHen you comfortable that the awk script works as you need, then comment out the echo awk .. line, and uncomment the word #remove_me (and save your bash script).
for f in /path/to/files/*.csv ; do
bname=`basename $f`
pref=${bname%%.csv}
awk -f /path/to/awkscript $f > /path/to/store/output/${pref}_new.txt
done
Hopefully this helps, I am on my blackberry so there may be typos
I wrote a small script, using awk 'split' command to get the current directory name.
echo $PWD
I need to replace '8' with the number of tokens as a result of the split operation.
// If PWD = /home/username/bin. I am trying to get "bin" into package.
package="`echo $PWD | awk '{split($0,a,"/"); print a[8] }'`"
echo $package
Can you please tell me what do I substitute in place of 'print a[8]' to get the script working for any directory path ?
-Sachin
You don't need awk for that. If you always want the last dir in a path just do:
#!/bin/sh
cur_dir="${PWD##*/}/"
echo "$cur_dir"
The above has the added benefit of not creating any subshells and/or forks to external binaries. It's all native POSIX shell syntax.
You could use print a[length(a)] but it's better to avoid splitting and use custom fields separator and $NF:
echo $PWD | awk -F/ '{print $NF}'
But in that specific case you should rather use basename:
basename "$PWD"
The other answers are better replacements to perform the function you're trying to accomplish. However, here is the specific answer to your question:
package=$(echo $PWD | awk '{n = split($0,a,"/"); print a[n] }')
echo "$package"
split() returns the number of resulting elements.