Redirecting files from a directory using awk - awk

I am running an awk command for every text file in a directory. As of now it displays to stdout. I will like it to save those changes to the actual files themselves. My command is
awk{ORS=(/^\- **\ **/?"":RS)}1 *.txt >> *.txt
Every time I redirect the command it saves everything into one file. Is there anyway I can save the changes back to the files themselves?

-s and blanks aren't regexp metacharacters outside of bracket expressions so no need to escape them in your regexp. You do need to enclose your script in single quote delimiters though. This will do what your script is apparently trying to do:
awk '{ORS=(/^- ** **/?"":RS)}1'
You cannot write to the same file you are reading. If you try to do that with any command (awk, sed, grep, whatever):
command file > file
then the shell can do whatever it likes, including executing > file before command file and so emptying the file before your command opens it.
To overwrite the input file with GNU awk 4.* would be:
awk -i inplace '{ORS=(/^- ** **/?"":RS)}1' *.txt
and with other awks you'd need something like:
for file in *.txt; do
awk '{ORS=(/^- ** **/?"":RS)}1' "$file" > tmp && mv tmp "$file"
done

Related

Introducing command line option arguments in awk

I have a file with some bash functions that I can call with user-defined options like this
transfer -R --src /opstk/ --dst /media/hagbard/hc1/
I also have a number of awk functions in an awk file. Would one be able to to the same with awk scripts, calling the awk file with options to do something rather than another.
awk -f replace.awk -R --src /opstk/ --dst /media/hagbard/hc1/
I understand that it is possible to include command line options for awk commands, in what context would they be useful?

AWK to process compressed files and printing original (compressed) file names

I would like to process multiple .gz files with gawk.
I was thinking of decompressing and passing it to gawk on the fly
but I have an additional requirement to also store/print the original file name in the output.
The thing is there's 100s of .gz files with rather large size to process.
Looking for anomalies (~0.001% rows) and want to print out the list of found inconsistencies ALONG with the file name and row number that contained it.
If I could have all the files decompressed I would simply use FILENAME variable to get this.
Because of large quantity and size of those files I can't decompress them upfront.
Any ideas how to pass filename (in addition to the gzip stdout) to gawk to produce required output?
Assuming you are looping over all the files and piping their decompression directly into awk something like the following will work.
for file in *.gz; do
gunzip -c "$file" | awk -v origname="$file" '.... {print origname " whatever"}'
done
Edit: To use a list of filenames from some source other than a direct glob something like the following can be used.
$ ls *.awk
a.awk e.awk
$ while IFS= read -d '' filename; do
echo "$filename";
done < <(find . -name \*.awk -printf '%P\0')
e.awk
a.awk
To use xargs instead of the above loop will require the body of the command to be in a pre-written script file I believe which can be called with xargs and the filename.
this is using combination of xargs and sh (to be able to use pipe on two commands: gzip and awk):
find *.gz -print0 | xargs -0 -I fname sh -c 'gzip -dc fname | gawk -v origfile="fname" -f printbadrowsonly.awk >> baddata.txt'
I'm wondering if there's any bad practice with the above approach…

How to run an .awk file without 'awk -f' command?

I am new to awk script. I am trying to figure out how to run an awk file without awk -f command. I see people keep saying add "#!bin/awk -f" for the first line of an awk file. But this didn't for my awk. It still gives me "no file or directory" error.
I question is what does "#!bin/awk -f" really mean, and what does it do?
Its #!/bin/awk -f not #!bin/awk. That will probably work, but theres no guaranty. If someone who has awk installed in a different location runs your script, it won't work. What you want is this: #!/usr/bin/env awk -f.
#! is what tells bash what to use to interpret your script. It should go at the very top of your file. It's called a Shebang. Right after that, you put the path to the interpreter.
/usr/bin/env finds where awk is located, and uses that script as the interpreter. So if they installed awk into somewhere else like /usr/local/bin then it'll find it. This probably won't matter for you, but it's a good habit to get into. It's more portable, and can be shared easier.
The -f says that awk is gonna read from a file. You could do awk -f yourfilename.awk in bash, but in the shebang, -f means the rest of the code will be the file it reads from.
I hope this helped. Feel free to ask me any questions if it doesn't work, or isn't clear enough.
UPDATE
If you get the error message:
/usr/bin/env: ‘awk -f’: No such file or directory
/usr/bin/env: use -[v]S to pass options in shebang lines
then change the first line of your script to #!/usr/bin/env -S awk -f (tested with GNU bash, version 4.4.23)
You probably want
#!/bin/awk -f
(The first slash after the #! is important).
This tells unix what program it should use to 'run' the script with.
It is usually called the 'shebang' which comes from hash + bang.
If you want to run your script like this you need to make sure it is executable (chmod +x <script>).
Otherwise you can just run your script by typing the command /bin/awk -f <script>
The Shebang for Awk Explained
#! is the start of a shebang line, which tells the shell which interpreter to use for the script.
/bin/awk is the path to your awk executable. You may need to change this is your awk is installed elsewhere, or if you want to use a different version of awk.
-f is a flag to awk to tell it to interpret the flag's argument as an awk script. In a shebang, it tells some awks to interpret the remainder of the script instead of a file.
Your Shebang is (Probably) Broken
You are using #!bin/awk -f which is unlikely to work, unless you have awk installed as $PWD/bin/awk. You probably meant to use #!/bin/awk instead.
In some instances, passing a flag on the shebang line may not work with your shell or your awk. If you have the rest of the shebang line correct, you might try removing the -f flag and see if that works for you.

Execute a command in the re-verse order of ids present in a file

I am running the following command using awk on file.txt ,currently its running the command on the ids present in file.txt from top to bottom..i want the commmand to be run in the reverse order for the ids present in file.txt..any inputs on how we can do this?
git command $(awk '{print $1}' file.txt)
file.txt contains.
97a65fd1d1b3b8055edef75e060738fed8b31d3
fb8df67ceff40b4fc078ced31110d7a42e407f16
a0631ce8a9a10391ac4dc377cd79d1adf1f3f3e2
.....
If you aren't bound to using awk then tail with the -r (for reverse) argument will do the trick...
myFile.txt
97a65fd1d1b3b8055edef75e060738fed8b31d3
fb8df67ceff40b4fc078ced31110d7a42e407f16
a0631ce8a9a10391ac4dc377cd79d1adf1f3f3e2
Now to print it in reverse...
$ tail -r myFile.txt
a0631ce8a9a10391ac4dc377cd79d1adf1f3f3e2
fb8df67ceff40b4fc078ced31110d7a42e407f16
97a65fd1d1b3b8055edef75e060738fed8b31d3
EDIT:
To output this to a file simply redirect it out...
$ tail -r myFile.txt > newFile.txt
EDIT:
Want to write to the same file? No problem!
tail -r myFile.txt > temp.txt; cat temp.txt > myFile.txt; rm temp.txt;
For some reason when I redirected tail -r to the same file it came back blank, this workaround avoids that issue by writing to a temporary "buffer" file.
To reverse the lines in a file using awk, use
awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }' file
use $1 instead of $0 above to operate on the first field only instead of the whole line.

How to determine the line ending of a file

I have a bunch (hundreds) of files that are supposed to have Unix line endings. I strongly suspect that some of them have Windows line endings, and I want to programmatically figure out which ones do.
I know I can just run flip -u or something similar in a script to convert everything, but I want to be able to identify those files that need changing first.
You can use the file tool, which will tell you the type of line ending. Or, you could just use dos2unix -U which will convert everything to Unix line endings, regardless of what it started with.
You could use grep
egrep -l $'\r'\$ *
Something along the lines of:
perl -p -e 's[\r\n][WIN\n]; s[(?<!WIN)\n][UNIX\n]; s[\r][MAC\n];' FILENAME
though some of that regexp may need refining and tidying up.
That'll output your file with WIN, MAC, or UNIX at the end of each line. Good if your file is somehow a dreadful mess (or a diff) and has mixed endings.
Here's the most failsafe answer. Stimms answer doesn account for subdirectories and binary files
find . -type f -exec file {} \; | grep "CRLF" | awk -F ':' '{ print $1 }'
Use file to find file type. Those with CRLF have windows return characters. The output of file is delimited by a :, and the first field is the path of the file.
Unix uses one byte, 0x0A (LineFeed), while windows uses two bytes, 0x0D 0x0A (Carriage Return, Line feed).
If you never see a 0x0D, then it's very likely Unix. If you see 0x0D 0x0A pairs then it's very likely MSDOS.
Windows use char 13 & 10 for line ending, unix only one of them ( i don't rememeber which one ). So you can replace char 13 & 10 for char 13 or 10 ( the one, which use unix ).
When you know which files has Windows line endings (0x0D 0x0A or \r \n), what you will do with that files? I supose, you will convert them into Unix line ends (0x0A or \n). You can convert file with Windows line endings into Unix line endings with sed utility, just use command:
$> sed -i 's/\r//' my_file_with_win_line_endings.txt
You can put it into script like this:
#!/bin/bash
function travers()
{
for file in $(ls); do
if [ -f "${file}" ]; then
sed -i 's/\r//' "${file}"
elif [ -d "${file}" ]; then
cd "${file}"
travers
cd ..
fi
done
}
travers
If you run it from your root dir with files, at end you will be sure all files are with Unix line endings.