I am going through shell scripting online lessons as my work requires me to learn shell scripting.
I came across "awk" and "nawk " commands and my learning hasn't yet reached up to it.
In a nutshell, I know that awk/nawk search for a particular pattern and perform an action in case a match has been found.
Even despite of that, I couldn't understand what the following line is meant for:
eval $( cat ${MMORPHYDIR}/${PWFILE} | nawk ' /^#BEGIN '${ENV_NAME}'/,/^#END '${ENV_NAME}'/ { print }' | egrep "${USEFULL_PARAM}" )
Please help me to understand what this line does or is intended to do.
... awk '/start/,/end/'
prints the records between start and end patterns. {print} can be omitted since it's implied. The ^ in your script indicates beginning of a line.
Note that cat and eval are unnecessary, the same can be written as
$ awk '...' "${MMORPHYDIR}/${PWFILE}"
also grep can be included in the awk script as well.
awk is THE standard, general purpose tool for manipulating text on all UNIX-like systems. There are various flavors of awk, all with the same core functionality plus some differences. nawk is the very unfortunately named new awk because it was named that about 30 years ago as the successor to 1977s old awk (e.g. /bin/awk on Solaris, aka old, broken awk which must never be used) and is now actually best avoided as it doesn't even support the minimal awk functionality required by POSIX (e.g. character classes). Important lesson there: never use the word "new" in the name of any software entity!
The best awk to use these days is GNU awk, gawk, as it supports all POSIX functionality plus a ton of useful extensions, is generally available, is extremely well documented, and has a massive user base.
wrt:
eval $( cat ${MMORPHYDIR}/${PWFILE} | nawk ' /^#BEGIN '${ENV_NAME}'/,/^#END '${ENV_NAME}'/ { print }' | egrep "${USEFULL_PARAM}" )
That is a complete mess, doing literally about a dozen things that should never be done in shell or in awk. Trying to explain it would be like trying to explain someone mixing concrete with a screwdriver. Forget you ever saw it and move on.
To learn awk, read the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
Related
This question already has answers here:
Save modifications in place with awk
(7 answers)
Closed 1 year ago.
I have a lot of files, where I would like to edit only those lines that start with private.
It principle I want to
gawk '/private/{gsub(/\//, "_"); gsub(/-/, "_"); print}' filename
but this only prints out the modified part of the file, and not everything.
Question
Does gawk have a way similar to sed -i inplace?
Or is there are much simpler way to do the above woth either sed or gawk?
Just move the final print outside of the filtered pattern. eg:
gawk '/private/{gsub(/\//, "_"); gsub(/-/, "_")} {print}'
usually, that is simplified to:
gawk '/private/{gsub(/\//, "_"); gsub(/-/, "_")}1'
You really, really, really, (emphasis on "really") do not want to use something like sed -i to edit the files "in-place". (I put "in-place" in quotes, because gnu's sed does not edit the files in place, but creates new files with the same name.) Doing so is a recipe for data corruption, and if you have a lot of files you don't want to take that risk. Just write the files into a new directory tree. It will make recovery much simpler.
eg:
d=backup/$(dirname "$filename")
mkdir -p "$d"
awk '...' "$filename" > "$d/$filename"
Consider if you used something like -i which puts backup files in the same directory structure. If you're modifying files in bulk and the process is stopped half-way through, how do you recover? If you are putting output into a separate tree, recovery is trivial. Your original files are untouched and pristine, and there are no concerns if your filtering process is terminated prematurely or inadvertently run multiple times. sed -i is a plague on humanity and should never be used. Don't spread the plague.
GNU awk from 4.1.0 has the in place ability.
And you should put the print outside the reg match block.
Try this:
gawk '/^private/{gsub(/[/-]/, "_");} 1' filename
or, make sure you backed up the file:
gawk -i inplace '/^private/{gsub(/[/-]/, "_");} 1' filename
You forgot the ^ to denote start, you need it to change lines started with private, otherwise all lines contain private will be modified.
And yeah, you can combine the two gsubs with a single one.
The sed command to do the same would be:
sed '/^private/{s/[/-]/_/g;}' filename
Add the -i option when you done testing it.
I've tried to assign the output of an Awk command to a variable but I receive an error. I would like to assign and the echo the result in the variable.
count = `awk '$0 ~ /Reason code "68"/' ladb.log | wc -l`
I've enclosed the statement in backticks and receive this error below
/lsf9/db/dict/=: Unable to open dictionary: No such file or directory
DataArea = does not exist
Your main problem is your usage of spaces. You can't have a spaced assignment in shell scripts.
Backticks may be harmful to your code, but I haven't used IBM AIX in a very long time, so it may be essential to your Posix shell (though this guide and its coverage of $(…) vs `…` probably don't suggest a problem here). One thing you can try is running the code in ksh or bash instead.
The following code assumes a standards-compliant Posix shell. If they don't work, try replacing the "$(…)" notation with "`…`" notation. With these, since it's just a number being returned, you technically don't need the surrounding double quotes, but it's good practice.
count="$(awk '$0 ~ /Reason code "68"/' ladb.log | wc -l)"
The above should work, but it could be written more cleanly as
count="$(awk '/Reason code "68"/ {L++} END { print L }' ladb.log)"
As noted in the comments to the question, grep -c may be faster than awk, but if you know the location of that text, awk can be faster still. Let's say it begins a line:
count="$(awk '$1$2$3 == "Reasoncode\"68\"" {L++} END { print L }' ladb.log)"
Yes, Posix shell is capable of understanding double-quotes inside a "$(…)" are not related to the outer double-quotes, so only the inner double-quotes within that awk string need to be escaped.
I have the following code:
#!/bin/sh
while read line; do
printf "%s\n" $line
done < input.txt
Input.txt has the following lines:
one\two
eight\nine
The output is as follows
onetwo
eightnine
The "standard" solutions to retain the slashes would be to use read -r.
However, I have the following limitations:
must run under #!/bin/shfor reasons of portability/posix compliance.
not all systems
will support the -r switch to read under /sh
The input file format cannot be changed
Therefore, I am looking for another way to retain the backslash after reading in the line. I have come up with one working solution, which is to use sed to replace the \ with some other value (e.g.||) into a temporary file (thus bypassing my last requirement above) then, after reading them in use sed again to transform it back. Like so:
#!/bin/sh
sed -e 's/[\/&]/||/g' input.txt > tempfile.txt
while read line; do
printf "%s\n" $line | sed -e 's/||/\\/g'
done < tempfile.txt
I'm thinking there has to be a more "graceful" way of doing this.
Some ideas:
1) Use command substitution to store this into a variable instead of a file. Problem - I'm not sure command substitution will be portable here either and my attempts at using a variable instead of a file were unsuccessful. Regardless, file or variable the base solution is really the same (two substitutions).
2) Use IFS somehow? I've investigated a little, but not sure that can help in this issue.
3) ???
What are some better ways to handle this given my constraints?
Thanks
Your constraints seem a little strict. Here's a piece of code I jotted down(I'm not too sure of how valuable your while loop is for the other stuffs you would like to do, so I removed it off just for ease). I don't guarantee this code to be robustness. But anyway, the logic would give you hints in the direction you may wish to proceed. (temp.dat is the input file)
#!/bin/sh
var1="$(cut -d\\ -f1 temp.dat)"
var2="$(cut -d\\ -f2 temp.dat)"
iter=1
set -- $var2
for x in $var1;do
if [ "$iter" -eq 1 ];then
echo $x "\\" $1
else
echo $x "\\" $2
fi
iter=$((iter+1))
done
As Larry Wall once said, writing a portable shell is easier than writing a portable shell script.
perl -lne 'print $_' input.txt
The simplest possible Perl script is simpler still, but I imagine you'll want to do something with $_ before printing it.
I like using Python, because of the easy-to-learn syntax, however, I recently learned it has no support for UTF-8 in the area of CSVs. As I often use CSVs, this seems a serious problem for me. Is there another scripting language that has a simple syntax that I can learn when I need to manage really large CSV UTF-8 files?
If you're working on the command and can install another command line tool I'd strongly recommend csvfix.
Once installed you can robustly query any csv file e.g.
csvfix order -f 1,3 file.csv
will extract the 1st and 3rd columns of a csv.
There is a full list of commands here
See this related question
I'd recommend using gawk. E.g.:
awk -F ";" '{print $1 ";" $2}' FILE.csv
would print FILE.CSV's first two (; separated) column. To work properly with UTF-8, you should use it like:
LC_ALL=C awk 'BEGIN {print length("árvíztűrőtükörkúrópék")}'
=> 30
LC_ALL=en_US.utf8 awk 'BEGIN {print length("árvíztűrőtükörkúrópék")}'
=> 21
(Or you can set LC_ALL globally if you're using UTF-8 all the time, and you're on *nix, e.g. in .bashrc, export LC_ALL=en_US.utf8.)
awk is an old, but really powerful and fast tool.
HTH
I think awk will be the solution to my problem. My tools are limited b/c I'm using busybox on ESXi 4.0u1. I have a log file from a VM backup program (ghettoVCB). I need to scan this file for the expression
"Failed to clone disk : There is not enough space on the file system for the selected operation"
In my file, this is around line '43'. The previous field (in AWK vocab) represents the VM name that I want to print to an output text file. In my example the VM name is TEST12-RH4-AtlassianTest.
awk 'RS=""
/There is not enough space/ {
print $17
} '
print $17 is hard-coded, and I don't want this. I want to find the field that is one less than the first field on the line returned by the regex above. Any suggestions are appreciated.
[Awk Input File]
Update (Optimized version)
awk 'NR==1{print $NF}' RS="Failed to clone" input-awk.txt
Proof of Concept
$ awk 'NR==1{print $NF}' RS="Failed to clone" input-awk.txt
TEST12-RH4-AtlassianTest
Update 2 (Uber optimized version)
Technically, the following would be the uber optimized version but it leaves too much chance for false hits on the record separator, although it works for your sample input.
awk 'NR<2{print $NF}' RS="Fa" input-awk.txt`
Update 3 (Ultimate mega-kill optimized version)
I wouldn't use this in production code, but it just goes to show you there is always a way to make it simpler. If somebody can beat this for code golf purposes, I'd certainly like to see it!
awk '!a++,$0=$NF' RS="Fa" input-awk.txt
Original
Assuming your VM name is always the last field in the record you want to print, this works:
awk '/not enough space/{split(pre,a);print a[pNF]}{pre=$0;pNF=NF}' input-awk.txt
So couldn't you use something like
'{if it matches, print foo; foo=$17}'