output awk command result to variable - awk

i ran the following command from console it output the correct result:0,
sudo -H -u hadoop bash -c "/home/hadoop/hadoop-install/bin/hadoop dfsadmin -report | grep 'Under replicated blocks' | awk '{print \$4}'"
however if i put it in shell script and assigned it to a variable the 'awk' won't work anymore, it just output the whole result from 'grep':
replications=`sudo -H -u hadoop bash -c "/home/hadoop/hadoop-install/bin/hadoop dfsadmin -report | grep 'Under replicated blocks' | awk '{print \$4}'"`
echo "Replications: $replications"
result: Replications: Under replicated blocks: 0
how can i make the awk work again to only output 4th column which is 0 instead of the whole string?

In backtick command substitution, \ followed by $ means just $. From the POSIX standard:
Within the backquoted style of command substitution, backslash shall retain its literal meaning, except when followed by: '$', '`', or '\' (dollar sign, backquote, backslash). (...)
With the $(command) form, all characters following the open parenthesis to the matching closing parenthesis constitute the command. Any valid shell script can be used for command, except a script consisting solely of redirections which produces unspecified results.
And yet more explicitly from the bash manpage:
When the old-style backquote form of substitution is used, backslash retains its literal meaning except when followed by $, `, or \. The first backquote not preceded by a backslash terminates the command substitution. When using the $(command) form, all characters between the parentheses make up the command; none are treated specially.
So the easiest way is to use
replications=$(sudo -H -u hadoop bash -c "/home/hadoop/hadoop-install/bin/hadoop dfsadmin -report | grep 'Under replicated blocks' | awk '{print \$4}'")
but
replications=`sudo -H -u hadoop bash -c "/home/hadoop/hadoop-install/bin/hadoop dfsadmin -report | grep 'Under replicated blocks' | awk '{print \\\$4}'"`
also works.

Related

Running an awk command with $SHELL -c returns different results

I am trying to use awk to print the unique lines returned by a command. For simplicity, assume the command is ls -alh.
If I run the following command in my Z shell, awk shows all lines printed by ls -alh
ls -alh | awk '!seen[$0]++'
However, if I run the same command with $SHELL -c while escaping the ! with backslash, I only see the first line of the output printed.
$SHELL -c "ls -alh | awk '\!seen[$0]++'"
How can I ensure the latter command prints the exact same outputs as the former?
EDIT 1:
I initially thought the ! could be the issue. But changing the expression '!seen[$0]++' to 'seen[$0]++==0' has the same problem.
EDIT 2:
It looks like I should have escaped $ too. Since I do not know the reason behind it, I will not post an answer.
In the second form, $0 is being treated as a shell variable in the double-quoted string. The substitution creates an interestingly mangled awk command:
> print $SHELL -c "ls -alh | awk '\!seen[$0]++'"
/bin/zsh -c ls -alh | awk '!seen[-zsh]++'
The variable is not substituted in the first form since it is inside single quotes.
This answer discusses how single- and double-quoted strings are treated in bash and zsh:
Difference between single and double quotes in Bash
Escaping the $ so that $0 is passed to awk should work, but note that quoting in commands that are parsed multiple times can get really tricky.
> print $SHELL -c "ls -alh | awk '\!seen[\$0]++'"
/bin/zsh -c ls -alh | awk '!seen[$0]++'

How to filter output of a URL

I have a URL and when I send a request by curl, I get a big output.
curl https://www.aparat.com/video/video/embed/videohash/lXhkG/vt/frame -H "Accept: application/json" -s
I get: https://pastebin.mozilla.org/QM6FN8MZ#L
But I just want to get the URL of 720p, I mean just:
https:\/\/caspian1.cdn.asset.aparat.com\/aparat-video\/de54245e862b62249b6b7958c734276547445778-720p.apt?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6IjQ2NDJhYmQ4NGFiN2UzNDJkNGMxZWI3ZTNkMzlmZmQ5IiwiZXhwIjoxNjY5ODA5NzI1LCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.havkkhJyXjBt_jHPVv4poEVb65_7tRsLIxO5pCO7tGE
Any idea how to do it?
I'm trying to use grep but I don't know how to remove other things from else 720p URL.
curl https://www.aparat.com/video/video/embed/videohash/lXhkG/vt/frame -H "Accept: application/json" -s | grep -e "720p"
You could go the html-parsing/json-parsing route, e.g.:
curl -s https://www.aparat.com/video/video/embed/videohash/lXhkG/vt/frame |
# Normalize html
xmlstarlet fo -o -H -R 2> /dev/null |
# Extract relevant js bit
xmlstarlet sel -t -v '_:html/_:body/_:div/_:script' 2> /dev/null |
# Extract relevant json
sed -nE '/^ *var +options *= */ { s///; s/;$//p; }' |
# Extract desired url, i.e. the 720p in this case
jq -r '.multiSRC[][] | select( .label == "720p" ) | .src'
I would harness GNU AWK for this following way
wget --quiet -O - https://www.aparat.com/video/video/embed/videohash/lXhkG/vt/frame | awk 'match($0, /http[^"]*720[^"]*/){print substr($0,RSTART,RLENGTH)}'
gives output
https:\/\/caspian1.cdn.asset.aparat.com\/aparat-video\/de54245e862b62249b6b7958c734276547445778-720p.apt?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6IjY1OTcxYTRkNGZiMjkyYjk0NjM0Mjk2ODVkOTc3YjEwIiwiZXhwIjoxNjY5ODIxNDM2LCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.NI2_6nwOxLEOxhWghsR2bOqzrXINXqqscbduHpCWwok
Explanation: I use wget with information like progress bar &c turned-off (--quiet) and writing to standard output (-O -) which is piped into awk, which for each line is matching against following regular expression http[^"]*720[^"]* that is http followed by zero-or-more (*) not-quotes followed by 720 followed by zero-or-more non-quotes, if there is match I print substring of line containing that match. match string function sets RSTART and RLENGTH variables, which I use later in substr. Note: this might give false positivie if there are others URL containing 720.
(tested in GNU Wget 1.20.3 and GNU Awk 5.0.1)
Using any awk:
$ cat file | awk 'match($0,/"https?:\\\/\\\/[^"]*-720p\.apt\?[^"]*"/) { print substr($0,RSTART+1,RLENGTH-2) }'
https:\/\/caspian1.asset.aparat.com\/aparat-video\/de54245e862b62249b6b7958c734276547445778-720p.apt?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6ImViODhjZDNlYzZhYzk3OTBhZDc3MWJhMzIyNWQ3NmZlIiwiZXhwIjoxNjY5ODE4Mjc5LCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.e6do9Ha9EkDS46NZDoHT2dYHSOezu_TbdGAGblfi2tM
The contents of file are what you provided in pastebin, obviously just replace cat file with your curl command.

How to insert argument in awk script?

I'm writing a shell script which shut down some services and trying to get its pid by using the following awk script.
However, this awk script can't get pid. What's wrong with that?
ps -ef | awk -v port_no=10080 '/[m]ilk.*port=port_no/{print $2}'
The result of ps -ef is like this:
username 13155 27705 0 16:06 pts/2 00:00:00 /home/username/.rbenv/versions/2.3.6/bin/ruby /home/username/.rbenv/versions/2.3.6/bin/milk web --no-browser --host=example.com --port=10080
This process is working with a different port argument as well, so I want to kill the process only working on port=10080.
The awk script below works fine, but when I specify the port no using awk -v like the above, it doesn't work well.
ps -ef | awk '/[m]ilk.*port=10080/{print $2}'
awk version: GNU Awk 4.0.2
The syntax for pattern matching with /../ does not work with variables in the regular expression. You need to use the ~ syntax for it.
awk -v port_no=10080 '$0 ~ "[m]ilk.*port="port_no{print $2}'
If you notice the regex carefully, the regex string on the r.h.s of ~ is under the double-quotes ".." except the variable name holding the port number which shouldn't be under quotes, for the expansion to happen.
This task is easily accomplished using pgrep:
$ pgrep -f '[m]ilk.*port=10080'
Have a look at man pgrep for details.

How to put this command in a Makefile?

I have the following command I want to execute in a Makefile but I'm not sure how.
The command is docker rmi -f $(docker images | grep "<none>" | awk "{print \$3}")
The command executed between $(..) should produce output which is fed to docker rmi but this is not working from within the Makefile I think that's because the $ is used specially in the Makefile but I'm not sure how to modify the command to fit in there.
Any ideas?
$ in Makefiles needs to be doubled to prevent substitution by make:
docker rmi -f $$(docker images | grep "<none>" | awk "{print \$$3}")
Also, it'd be simpler to use use a singly-quoted string in the awk command to prevent expansion of $3 by the shell:
docker rmi -f $$(docker images | grep "<none>" | awk '{print $$3}')
I really recommend the latter. It's usually better to have awk code in single quotes because it tends to contain a lot of $s, and all the backslashes hurt readability.

piping to awk hangs

I am trying to pipe tshark output to awk. The tshark command works fine on its own, and when piped to other programs such as cat, it works fine (real time printing of output). However, when piped to awk, it hangs and nothing happens.
sudo tshark -i eth0 -l -f "tcp" -R 'http.request.method=="GET"' -T fields -e ip.src -e ip.dst -e
tcp.srcport -e tcp.dstport -e tcp.seq -e tcp.ack | awk '{printf("mz -A %s -B %s -tcp \"s=%s sp=%s
dp=%s\"\n", $2, $1, $5, $4, $3)}'
Here is a simplier version:
sudo tshark -i eth0 -f "tcp" -R 'http.request.method=="GET"' | awk '{print $0}'
And to compare, the following works fine (although is not very useful):
sudo tshark -i eth0 -f "tcp" -R 'http.request.method=="GET"' | cat
Thanks in advance.
I had the same problem.
I have found some partial "solutions" that are not completely portable.
Some of them point to use the fflush() or flush() awk functions or -W interactive option
http://mywiki.wooledge.org/BashFAQ/009
I tried both and none works. So awk is not the appropriate command at all.
A few of them suggest to use gawk but it neither does the trick for me.
cut command has the same problem.
My solution: In my case I just needed to put --line-buffered in GREP and not touching awk command but in your case I would try:
sed -u
with the proper regular expression. For example:
sed -u 's_\(.*\) \(.*\) \(.*\) DIFF: \(.*\)_\3 \4_'
This expression gives you the 3rd and 4th columns separate by TAB (written with ctrl+v and TAB combination). With -u option you get unbuffered output and also you have -l option that gives you line buffered output.
I hope you find this answer useful although is late
Per our previous messages in comments, maybe it will work to force closing the input and emitting a linefeed.
sudo tshark -i eth0 -f "tcp" -R 'http.request.method=="GET"' ...... \
| {
awk '{print $0}'
printf "\n"
}
Note, no pipe between awk and printf.
I hope this helps.
I found the solution here https://superuser.com/questions/742238/piping-tail-f-into-awk (by John1024).
It says:
"You don't see it in real time because, for purposes of efficiency, pipes are buffered. tail -f has to fill up the buffer, typically 4 kB, before the output is passed to awk."
The proposed solutions is to use "unbuffer" or "stdbuf -o0" commands to disable buffering. It worked for me like this:
stdbuf -o0 tshark -i ens192 -f "ip" | awk '{print $0}'