Running an awk command with $SHELL -c returns different results - awk

I am trying to use awk to print the unique lines returned by a command. For simplicity, assume the command is ls -alh.
If I run the following command in my Z shell, awk shows all lines printed by ls -alh
ls -alh | awk '!seen[$0]++'
However, if I run the same command with $SHELL -c while escaping the ! with backslash, I only see the first line of the output printed.
$SHELL -c "ls -alh | awk '\!seen[$0]++'"
How can I ensure the latter command prints the exact same outputs as the former?
EDIT 1:
I initially thought the ! could be the issue. But changing the expression '!seen[$0]++' to 'seen[$0]++==0' has the same problem.
EDIT 2:
It looks like I should have escaped $ too. Since I do not know the reason behind it, I will not post an answer.

In the second form, $0 is being treated as a shell variable in the double-quoted string. The substitution creates an interestingly mangled awk command:
> print $SHELL -c "ls -alh | awk '\!seen[$0]++'"
/bin/zsh -c ls -alh | awk '!seen[-zsh]++'
The variable is not substituted in the first form since it is inside single quotes.
This answer discusses how single- and double-quoted strings are treated in bash and zsh:
Difference between single and double quotes in Bash
Escaping the $ so that $0 is passed to awk should work, but note that quoting in commands that are parsed multiple times can get really tricky.
> print $SHELL -c "ls -alh | awk '\!seen[\$0]++'"
/bin/zsh -c ls -alh | awk '!seen[$0]++'

Related

How to insert argument in awk script?

I'm writing a shell script which shut down some services and trying to get its pid by using the following awk script.
However, this awk script can't get pid. What's wrong with that?
ps -ef | awk -v port_no=10080 '/[m]ilk.*port=port_no/{print $2}'
The result of ps -ef is like this:
username 13155 27705 0 16:06 pts/2 00:00:00 /home/username/.rbenv/versions/2.3.6/bin/ruby /home/username/.rbenv/versions/2.3.6/bin/milk web --no-browser --host=example.com --port=10080
This process is working with a different port argument as well, so I want to kill the process only working on port=10080.
The awk script below works fine, but when I specify the port no using awk -v like the above, it doesn't work well.
ps -ef | awk '/[m]ilk.*port=10080/{print $2}'
awk version: GNU Awk 4.0.2
The syntax for pattern matching with /../ does not work with variables in the regular expression. You need to use the ~ syntax for it.
awk -v port_no=10080 '$0 ~ "[m]ilk.*port="port_no{print $2}'
If you notice the regex carefully, the regex string on the r.h.s of ~ is under the double-quotes ".." except the variable name holding the port number which shouldn't be under quotes, for the expansion to happen.
This task is easily accomplished using pgrep:
$ pgrep -f '[m]ilk.*port=10080'
Have a look at man pgrep for details.

Make work find pipe awk command in Makefile

I have this find awk line to get python code analyse::
$ find ./ -name '*.py' -exec wc -l {} \; | sort -n| awk '{print $0}{s+=$0}END{print s}'
12 ./gb/__init__.py
23 ./gb/value_type.py
40 ./setup.py
120 ./gb/libcsv.py
200
$
I try to put it in a Makefile::
$ cat Makefile
python_count_lines: clean
#find ./ -name '*.py' -exec wc -l {} \; | sort -n| awk '{print \$0}{s+=\$0}END{print s}'
But this did not work::
$ make python_count_lines
awk: line 1: syntax error at or near }
Makefile:12: recipe for target 'python_count_lines' failed
make: *** [python_count_lines] Error 2
$
Bertrand Martel is correct that you need to escape dollar signs from make by doubling them, not prefixing them with backslashes (see info here).
However, the rest of that suggestion is not right and won't work; first, you should almost never use the shell function in a recipe. Second, using the info function here cannot work because in the first line you've set a shell variable RES equal to some value, then you try to print the make variable RES in the second line; not only that but each line is run in a separate shell, and also all make variable and function references are expanded up-front, before any part of the recipe is passed to the shell.
You just need to do this:
python_count_lines: clean
#find ./ -name '*.py' -exec wc -l {} \; | sort -n| awk '{print $$0}{s+=$$0}END{print s}'

How to put this command in a Makefile?

I have the following command I want to execute in a Makefile but I'm not sure how.
The command is docker rmi -f $(docker images | grep "<none>" | awk "{print \$3}")
The command executed between $(..) should produce output which is fed to docker rmi but this is not working from within the Makefile I think that's because the $ is used specially in the Makefile but I'm not sure how to modify the command to fit in there.
Any ideas?
$ in Makefiles needs to be doubled to prevent substitution by make:
docker rmi -f $$(docker images | grep "<none>" | awk "{print \$$3}")
Also, it'd be simpler to use use a singly-quoted string in the awk command to prevent expansion of $3 by the shell:
docker rmi -f $$(docker images | grep "<none>" | awk '{print $$3}')
I really recommend the latter. It's usually better to have awk code in single quotes because it tends to contain a lot of $s, and all the backslashes hurt readability.

output awk command result to variable

i ran the following command from console it output the correct result:0,
sudo -H -u hadoop bash -c "/home/hadoop/hadoop-install/bin/hadoop dfsadmin -report | grep 'Under replicated blocks' | awk '{print \$4}'"
however if i put it in shell script and assigned it to a variable the 'awk' won't work anymore, it just output the whole result from 'grep':
replications=`sudo -H -u hadoop bash -c "/home/hadoop/hadoop-install/bin/hadoop dfsadmin -report | grep 'Under replicated blocks' | awk '{print \$4}'"`
echo "Replications: $replications"
result: Replications: Under replicated blocks: 0
how can i make the awk work again to only output 4th column which is 0 instead of the whole string?
In backtick command substitution, \ followed by $ means just $. From the POSIX standard:
Within the backquoted style of command substitution, backslash shall retain its literal meaning, except when followed by: '$', '`', or '\' (dollar sign, backquote, backslash). (...)
With the $(command) form, all characters following the open parenthesis to the matching closing parenthesis constitute the command. Any valid shell script can be used for command, except a script consisting solely of redirections which produces unspecified results.
And yet more explicitly from the bash manpage:
When the old-style backquote form of substitution is used, backslash retains its literal meaning except when followed by $, `, or \. The first backquote not preceded by a backslash terminates the command substitution. When using the $(command) form, all characters between the parentheses make up the command; none are treated specially.
So the easiest way is to use
replications=$(sudo -H -u hadoop bash -c "/home/hadoop/hadoop-install/bin/hadoop dfsadmin -report | grep 'Under replicated blocks' | awk '{print \$4}'")
but
replications=`sudo -H -u hadoop bash -c "/home/hadoop/hadoop-install/bin/hadoop dfsadmin -report | grep 'Under replicated blocks' | awk '{print \\\$4}'"`
also works.

Awk processing of filenames containing backslash madness

I spent a whole day trying to process some files with backslashes and spaces inside their names. No matter what I do awk (gawk) refuses to print backslashes:
echo "this/pathname/contains/spa ces/and/back\\slashes" | xargs -d'\n' -n1 -I{} bash -c 'echo "{}"; echo whatever | gawk "{printf {}}"'
this/pathname/contains/spa ces/and/back\slashes
gawk: {printf this/pathname/contains/spa ces/and/back\slashes}
gawk: ^ syntax error
gawk: {printf this/pathname/contains/spa ces/and/back\slashes}
gawk: ^ backslash not last character on line
This didn't work since the backspace gets directly into awk code.
echo "this/pathname/contains/spa ces/and/back\\slashes" | xargs -d'\n' -n1 -I{} bash -c 'echo "{}"; echo whatever | gawk "{printf \"{}\"}"'
this/pathname/contains/spa ces/and/back\slashes
gawk: warning: escape sequence `\s' treated as plain `s'
this/pathname/contains/spa ces/and/backslashes
This worked, but awk eats the backslash. As you can see above, echo prints it but awk doesn't.
echo "this/pathname/contains/spa ces/and/back\\slashes" | ./escape.sh | xargs -d'\n' -n1 -I{} bash -c 'echo "{}"; echo whatever | gawk "{printf \"{}\"}"'
this/pathname/contains/spa\ ces/and/back\slashes
gawk: warning: escape sequence `\ ' treated as plain ` '
gawk: warning: escape sequence `\s' treated as plain `s'
Next I tried escaping the filenames using escape.sh
#!/bin/bash
xargs -d'\n' -n1 -I{} bash -c 'echo $(printf "%q" "{}")'
Now there's a double backslash in there but awk still complains.
echo "this/pathname/contains/spa ces/and/back\\slashes" | ./escape.sh | xargs -d'\n' -n1 -I{} bash -c 'echo "{}"; echo whatever | gawk -v VAR=$(printf "%q" "{}") "{printf VAR}"'
this/pathname/contains/spa\ ces/and/back\slashes
gawk: ces/and/back\\slashes
gawk: ^ syntax error
gawk: ces/and/back\\slashes
gawk: ^ unterminated regexp
Now awk said some nonsense about some unterminated regexp.
Any ideas? Thanks!
You are solving the wrong problem: Regardless of the tool, backslashes and spaces in filenames on UNIX-Systems will always mean extra work. In my opinion you should sanitize the filenames, then process them.
Try:
sed "s/ /_/g;s/\\\\/-/g"
HTH Chris
The fix is just to double every backslash that is fed into mawk, either in the input or via variables.
Like this:
# awk needs escaped backslashes
VAR=$(echo "$1" | sed -r 's:\\:\\\\:g')
mawk -v VAR="$VAR" -f "script.awk"
Therefore, if a filename containing backslashes is passed inside $1, this is how you obtain the expected result.
I don't understand why you're piping into xargs. Is that a requirement of your process? Can you do something like this:
filename='this/pathname/contains/spa ces/and/back\slashes'
awk -v "fname=$filename" 'BEGIN {print fname}'