extracting a number from a stream using sed/grep/awk - awk

i am writting a script and i need to get a number out of the shell command output. The command & its return is
$ git branch -a -v --no-abbrev --contains $(git rev-parse HEAD)
* (HEAD detached at c5246b6) c5246b6907e46795741853852462914e7a5f60de Merge pull request 1166 from testPR into dev
remotes/origin-pull/1166/merge c5246b6907e46795741853852462914e7a5f60de Merge pull request 1166 from testPR into dev
i am trying to extract the 1166 out of the result by using sed over the piped result. Something like
$ git branch -a -v --no-abbrev --contains $(git rev-parse HEAD) | sed <pattern>
to get the 1166
My patterns so far doesn't seem to get the number i am expecting.

I seems that you're trying to extract the part of your remote branch name between last 2 slashes. And you may use grep with perl interpreted pattern to achieve that, here you are,
$ git branch ... | grep -oP '[^\/]+(?=\/[^\/]+$)'
1166
Brief explanation,
-o: Print only the matched (non-empty) parts
[^\/]+ : grep command would print this part, non-slash pattern
(?=\/[^\/]+$) : matches words ahead of the las slash of the line [^\/]+$

Not the answer to my exact question, but i am able to get what i want by modifying my bash command.
git branch -r -v --no-abbrev --contains $(git rev-parse HEAD) | awk '{print $1}'
This returns: origin-pull/1166/merge , which is what i want
notice the -r in the command, -a will return both local and remote git branch info. This way, i can cheat on the sed pattern again.

Related

Running an awk command with $SHELL -c returns different results

I am trying to use awk to print the unique lines returned by a command. For simplicity, assume the command is ls -alh.
If I run the following command in my Z shell, awk shows all lines printed by ls -alh
ls -alh | awk '!seen[$0]++'
However, if I run the same command with $SHELL -c while escaping the ! with backslash, I only see the first line of the output printed.
$SHELL -c "ls -alh | awk '\!seen[$0]++'"
How can I ensure the latter command prints the exact same outputs as the former?
EDIT 1:
I initially thought the ! could be the issue. But changing the expression '!seen[$0]++' to 'seen[$0]++==0' has the same problem.
EDIT 2:
It looks like I should have escaped $ too. Since I do not know the reason behind it, I will not post an answer.
In the second form, $0 is being treated as a shell variable in the double-quoted string. The substitution creates an interestingly mangled awk command:
> print $SHELL -c "ls -alh | awk '\!seen[$0]++'"
/bin/zsh -c ls -alh | awk '!seen[-zsh]++'
The variable is not substituted in the first form since it is inside single quotes.
This answer discusses how single- and double-quoted strings are treated in bash and zsh:
Difference between single and double quotes in Bash
Escaping the $ so that $0 is passed to awk should work, but note that quoting in commands that are parsed multiple times can get really tricky.
> print $SHELL -c "ls -alh | awk '\!seen[\$0]++'"
/bin/zsh -c ls -alh | awk '!seen[$0]++'

Gitlab CI variables and script section give different results

I've searched a lot of examples but they did not work for me.
I'm trying to run linters for changed files when MR is opened.
My .gitlab-ci.yml
run_linters:
image: python:3
variables:
FILES: git diff --name-only $CI_MERGE_REQUEST_TARGET_BRANCH_NAME | grep *.py
before_script:
- python3 -m pip install black==21.5b1
- python3 -m pip install flake8==3.9.2
script:
- echo $FILES
- git diff --name-only $CI_MERGE_REQUEST_TARGET_BRANCH_NAME | grep *.py
- black --check $FILES
- flake8 $FILES
only:
- merge_requests
And I'm getting strange output.
echo $FILES says git diff --name-only main | grep incoming_file.py
incoming_file.py is the only file in that MR. Why is it around grep?
And git diff at script section says fatal: ambiguous argument 'main': unknown revision or path not in the working tree.
Why is filename present around grep?
Why are same git diff commands give different result?
Why is filename present around grep?
In bash when you refer to * this will expand and try to match the files/directories present in your current path, in your case since only the incoming_file.py is present, so it expands to this.
Why are same git diff commands give different result?
variables:
FILES: git diff --name-only $CI_MERGE_REQUEST_TARGET_BRANCH_NAME | grep *.py
When you define a variable in variables section, Gitlab doesnt execute the command, it simple populates the variable FILES with the string git diff --name-only $CI_MERGE_REQUEST_TARGET_BRANCH_NAME | grep *.py
Then in the script section, the runner expands *.py to incoming_file.py and $CI_MERGE_REQUEST_TARGET_BRANCH_NAME to main
that's why in echo you see git diff --name-only main | grep incoming_file.py
Here
- git diff --name-only $CI_MERGE_REQUEST_TARGET_BRANCH_NAME | grep *.py
You actually execute the command and you get the mentioned message

Clone all projects from cgit

I have to download all projects that are hosted on some cgit instance. There are several hundreds of repositories, so it is tedious to do this manually.
How can it be done?
Seems that it is possible to do it with curl by parsing pages one by one. By is there more convenient interface?
There does not seem to be any official or convenient API for CGit to export/clone all its repositories.
You can try those alternatives:
curl -s http://git.suckless.org/ |
xml sel -N x="http://www.w3.org/1999/xhtml" -t -m "//x:a" -v '#title' -n |
grep . |
while read repo
do git clone git://git.suckless.org/$repo
done
Or:
curl -s http://git.suckless.org/ | xml pyx | awk '$1 == "Atitle" { print $2 }'
Or:
curl -s http://git.suckless.org/ | xml pyx | awk '$1 == "Atitle" { printf("git clone %s\n",$2) }' | s
I suspect this work for one page of Git repositories as listed by CGit: you might still have to repeat that for all subsequent Git repositories pages.

How to put this command in a Makefile?

I have the following command I want to execute in a Makefile but I'm not sure how.
The command is docker rmi -f $(docker images | grep "<none>" | awk "{print \$3}")
The command executed between $(..) should produce output which is fed to docker rmi but this is not working from within the Makefile I think that's because the $ is used specially in the Makefile but I'm not sure how to modify the command to fit in there.
Any ideas?
$ in Makefiles needs to be doubled to prevent substitution by make:
docker rmi -f $$(docker images | grep "<none>" | awk "{print \$$3}")
Also, it'd be simpler to use use a singly-quoted string in the awk command to prevent expansion of $3 by the shell:
docker rmi -f $$(docker images | grep "<none>" | awk '{print $$3}')
I really recommend the latter. It's usually better to have awk code in single quotes because it tends to contain a lot of $s, and all the backslashes hurt readability.

output awk command result to variable

i ran the following command from console it output the correct result:0,
sudo -H -u hadoop bash -c "/home/hadoop/hadoop-install/bin/hadoop dfsadmin -report | grep 'Under replicated blocks' | awk '{print \$4}'"
however if i put it in shell script and assigned it to a variable the 'awk' won't work anymore, it just output the whole result from 'grep':
replications=`sudo -H -u hadoop bash -c "/home/hadoop/hadoop-install/bin/hadoop dfsadmin -report | grep 'Under replicated blocks' | awk '{print \$4}'"`
echo "Replications: $replications"
result: Replications: Under replicated blocks: 0
how can i make the awk work again to only output 4th column which is 0 instead of the whole string?
In backtick command substitution, \ followed by $ means just $. From the POSIX standard:
Within the backquoted style of command substitution, backslash shall retain its literal meaning, except when followed by: '$', '`', or '\' (dollar sign, backquote, backslash). (...)
With the $(command) form, all characters following the open parenthesis to the matching closing parenthesis constitute the command. Any valid shell script can be used for command, except a script consisting solely of redirections which produces unspecified results.
And yet more explicitly from the bash manpage:
When the old-style backquote form of substitution is used, backslash retains its literal meaning except when followed by $, `, or \. The first backquote not preceded by a backslash terminates the command substitution. When using the $(command) form, all characters between the parentheses make up the command; none are treated specially.
So the easiest way is to use
replications=$(sudo -H -u hadoop bash -c "/home/hadoop/hadoop-install/bin/hadoop dfsadmin -report | grep 'Under replicated blocks' | awk '{print \$4}'")
but
replications=`sudo -H -u hadoop bash -c "/home/hadoop/hadoop-install/bin/hadoop dfsadmin -report | grep 'Under replicated blocks' | awk '{print \\\$4}'"`
also works.