How to run awk script on multiple files - awk

I need to run a command on hundreds of files and I need help to get a loop to do this:
have a list of input files /path/dir/file1.csv, file2.csv, ..., fileN.csv
need to run a script on all those input files
script is something like: command input=/path/dir/file1.csv output=output1
I have tried things like:
for f in /path/dir/file*.csv; do command, but how do I get to read and write the new file every time?
Thank you....

Try this, (after changing /path/to/data to the correct path. Same with /path/to/awkscript and other places, pointing to your test data.)
#!/bin/bash
cd /path/to/data
for f in *.csv ; do
echo "awk -f /path/to/awkscript \"$f\" > ${f%.csv}.rpt"
#remove_me awk -f /path/to/awkscript "$f" > ${f%.csv}.rpt
done
make the script "executable" with
chmod 755 myScript.sh
The echo version will help you ensure the script is going to work as expected. You still have to carefully examine that output OR work on a copy of your data so you don't wreck you base-line data.
You could take the output of the last iteration
awk -f /path/to/awkscript myFileLast.csv > myFileLast.rpt
And copy/paste to cmd-line to confirm it will work.
WHen you comfortable that the awk script works as you need, then comment out the echo awk .. line, and uncomment the word #remove_me (and save your bash script).

for f in /path/to/files/*.csv ; do
bname=`basename $f`
pref=${bname%%.csv}
awk -f /path/to/awkscript $f > /path/to/store/output/${pref}_new.txt
done
Hopefully this helps, I am on my blackberry so there may be typos

Related

Using awk to find and replace strings in every file in directory

I have a directory full of output files, with files with names:
file1.out,file2.out,..., fileN.out.
There is a certain key string in each of these files, lets call it keystring. I want to replace every instance of keystring with newstring in all files.
If there was only one file, I know I can do:
awk '{gsub("keystring","newstring",$0); print $0}' file1.out > file1.out
Is there a way to loop through all N files in awk?
You could use find command for the same. Please make sure you run this on a test file first once it works fine then only run it in your actual path(on all your actual files) for safer side. This also needs gawk newer version which has inplace option in it to save output into files itself.
find your_path -type f -name "*.out" -exec awk -i inplace -f myawkProgram.awk {} +
Where your awk program is as follows: as per your shown samples(cat myawkProgram.awk is only to show contents of your awk program here).
cat myawkProgram.awk
{
gsub("keystring","newstring",$0); print $0
}
2nd option would be pass all .out format files into your gawk program itself with -inplace by doing something like(but again make sure you run this on a single test file first and then run actual command for safer side once you are convinced by command):
awk -i inplace '{gsub("keystring","newstring",$0); print $0}' *.out
sed is the most ideal solution for this and so integrating it with find:
find /directory/path -type f -name "*.out" -exec sed -i 's/keystring/newstring/g' {} +
Find files with the extension .out and then execute the sed command on as many groups of the found files as possible (using + with -exec)

File size grows greatly after using awk

I want to add row number for a file, then I do like this,
awk '{print $0 "\x03" NR > "/opt/data2/gds_test/test_partly.txt"}' /opt/data2/gds_test/test_partly.txt
I put this line of command in a shell script file, and run it for some time, it still does not finish, so I end it by force, but I find the source file size has changed from 1.7G to 242G,
What happened? I am a little confused,
I had ever use a small file to test in command line, this awk command seems ok,
You're reading from the front of a file at the same time as you're writing onto the end of it. Try this instead:
tmp=$(mktemp)
awk '{print $0 "\x03" NR}' '/opt/data2/gds_test/test_partly.txt' > "$tmp" &&
mv "$tmp" '/opt/data2/gds_test/test_partly.txt'
yes, i change to redirect the result to a tmp file, and then delete the original file and rename the tmp file, it is ok,
and i just also get to know that gawk -I inplace can be used,

bulk renaming files rearranging file names based on delimiter

I have seen questions that are close to this but I have not seen the exact answer I need and can't seem to get my head wrapped around the regex, awk, sed, grep, rename that I would need to make it happen.
I have files in one directory sequentially named from multiple sub directories of a different directory created using find piped to xargs.
Command I used:
find `<dir1>` -name "*.png" | xargs cp -t `<dir2>`
This resulted in the second directory containing duplicate filenames sequentially named as follows:
<name>.png
<name>.png.~1~
<name>.png.~2~
...
<name>.png.~n~
What I would like to do is take all files ending in ~*~ and rename it as follows:
<name>.#.png where the '#" is the number between the "~"s at the end of the file name
Any help would be appreciated.
With Perl's rename (stand alone command):
rename -nv 's/^([^.]+)\.(.+)\.~([0-9]+)~/$1.$3.$2/' *
If everything looks fine remove option -n.
There might be an easier way to this, but here is a small shell script using grep and awk to achieve what you wanted
for i in $(ls|grep ".png."); do
name=$(echo $i|awk -F'png' '{print $1}');
n=$(echo $i|awk -F'~' '{print $2}');
mv $i $name$n.png;
done

How to access an online txt file with AWK?

I would like to use an online database instead of a local file in AWK.
For instance:
awk 'END{print NR}' somelocalfile.txt
returns number of lines inside the file.
Now my question is, how can I calculate number of lines in an online txt file like this one? I prefer one-liner command.
I can wget and then apply awk command localy on it, but I think there can be more efficient approach.
I would suggest to use wget:
wget -qO - http://path.com/tofile.txt | awk 'END{print NR}'
q means quiet, so you won't have any terminal outputs from wget. -O is the output which is set to stdout with the '-O -'.

Is there a way to log Awk results?

Let me start off by saying I am not a seasoned programmer by any stretch of the imagination, so please bear with me. :-)
We use the GNUWIN32 awk command in a batch file, like so:
awk -F, -f awk1.txt TDIC-LA-CLM.apc > TDIC-LA-CLM.out
Is there a way to log the results of this command when used like the above example? I tried adding ">> logfile" to the end of the command above but then the command fails.
EDIT: What I would like is for the result code and/or any errors to be loggged. I do not want the output of AWK to go to multiple files, which from what I gather, the tee command does. For example, if you add >> logifle to the end of a DOS move command, the result of that move command is logged in logfile...eg. 1 file(s) moved.
Thanks!
Just so folks can see the answer as an answer rather than buried in the comments, you can redirect the stderr stream of awk to a file like this:
awk ... > TDIC-LA-CLM.out 2> errors.txt
Look here for further info... documentation on ss64 website .