How to gzip tee output while running in gnu parallel? - gzip

Suppose I am running tee inside a command executed by parallel.
I would like to gzip the output from tee:
... | tee --gzip the_file | and_continue

bash process substitution is useful for cases like this. Something like:
... | tee >(gzip -c the_file) | and_continue
If you're choosing different files in a parallel run and need to format the name differently each time, take a look at GNU Parallel argument placeholder in bash process substitution for how that has to change (to defer the process substitution to act per parallel job).

Related

How to put this command in a Makefile?

I have the following command I want to execute in a Makefile but I'm not sure how.
The command is docker rmi -f $(docker images | grep "<none>" | awk "{print \$3}")
The command executed between $(..) should produce output which is fed to docker rmi but this is not working from within the Makefile I think that's because the $ is used specially in the Makefile but I'm not sure how to modify the command to fit in there.
Any ideas?
$ in Makefiles needs to be doubled to prevent substitution by make:
docker rmi -f $$(docker images | grep "<none>" | awk "{print \$$3}")
Also, it'd be simpler to use use a singly-quoted string in the awk command to prevent expansion of $3 by the shell:
docker rmi -f $$(docker images | grep "<none>" | awk '{print $$3}')
I really recommend the latter. It's usually better to have awk code in single quotes because it tends to contain a lot of $s, and all the backslashes hurt readability.

Redirect stderr to stdout in C shell

When I run the following command in csh, I got nothing, but it works in bash.
Is there any equivalent in csh which can redirect the standard error to standard out?
somecommand 2>&1
The csh shell has never been known for its extensive ability to manipulate file handles in the redirection process.
You can redirect both standard output and error to a file with:
xxx >& filename
but that's not quite what you were after, redirecting standard error to the current standard output.
However, if your underlying operating system exposes the standard output of a process in the file system (as Linux does with /dev/stdout), you can use that method as follows:
xxx >& /dev/stdout
This will force both standard output and standard error to go to the same place as the current standard output, effectively what you have with the bash redirection, 2>&1.
Just keep in mind this isn't a csh feature. If you run on an operating system that doesn't expose standard output as a file, you can't use this method.
However, there is another method. You can combine the two streams into one if you send it to a pipeline with |&, then all you need to do is find a pipeline component that writes its standard input to its standard output. In case you're unaware of such a thing, that's exactly what cat does if you don't give it any arguments. Hence, you can achieve your ends in this specific case with:
xxx |& cat
Of course, there's also nothing stopping you from running bash (assuming it's on the system somewhere) within a csh script to give you the added capabilities. Then you can use the rich redirections of that shell for the more complex cases where csh may struggle.
Let's explore this in more detail. First, create an executable echo_err that will write a string to stderr:
#include <stdio.h>
int main (int argc, char *argv[]) {
fprintf (stderr, "stderr (%s)\n", (argc > 1) ? argv[1] : "?");
return 0;
}
Then a control script test.csh which will show it in action:
#!/usr/bin/csh
ps -ef ; echo ; echo $$ ; echo
echo 'stdout (csh)'
./echo_err csh
bash -c "( echo 'stdout (bash)' ; ./echo_err bash ) 2>&1"
The echo of the PID and ps are simply so you can ensure it's csh running this script. When you run this script with:
./test.csh >test.out 2>test.err
(the initial redirection is set up by bash before csh starts running the script), and examine the out/err files, you see:
test.out:
UID PID PPID TTY STIME COMMAND
pax 5708 5364 cons0 11:31:14 /usr/bin/ps
pax 5364 7364 cons0 11:31:13 /usr/bin/tcsh
pax 7364 1 cons0 10:44:30 /usr/bin/bash
5364
stdout (csh)
stdout (bash)
stderr (bash)
test.err:
stderr (csh)
You can see there that the test.csh process is running in the C shell, and that calling bash from within there gives you the full bash power of redirection.
The 2>&1 in the bash command quite easily lets you redirect standard error to the current standard output (as desired) without prior knowledge of where standard output is currently going.
I object the above answer and provide my own. csh DOES have this capability and here is how it's done:
xxx |& some_exec # will pipe merged output to your some_exec
or
xxx |& cat > filename
or if you just want it to merge streams (to stdout) and not redirect to a file or some_exec:
xxx |& tee /dev/null
As paxdiablo said you can use >& to redirect both stdout and stderr. However if you want them separated you can use the following:
(command > stdoutfile) >& stderrfile
...as indicated the above will redirect stdout to stdoutfile and stderr to stderrfile.
xxx >& filename
Or do this to see everything on the screen and have it go to your file:
xxx | & tee ./logfile
What about just
xxx >& /dev/stdout
???
I think this is the correct answer for csh.
xxx >/dev/stderr
Note most csh are really tcsh in modern environments:
rmockler> ls -latr /usr/bin/csh
lrwxrwxrwx 1 root root 9 2011-05-03 13:40 /usr/bin/csh -> /bin/tcsh
using a backtick embedded statement to portray this as follows:
echo "`echo 'standard out1'` `echo 'error out1' >/dev/stderr` `echo 'standard out2'`" | tee -a /tmp/test.txt ; cat /tmp/test.txt
if this works for you please bump up to 1. The other suggestions don't work for my csh environment.

run nm | grep commands with cmake execute_process? [duplicate]

I am wondering if it is possible for CMake to run tests like one might run with a configure script. Specifically I want to test if the system I am compiling on has support for the rdtscp instruction.
I am using Linux and if I were using a configure script I could do something like:
cat /proc/cpuinfo | head -n 19 | tail -1 | grep -c rdtscp
which would give me 0 if the rdtscp feature was not present or a 1 if it were. I could then use this to determine whether to #define RDTSCP. I'm wondering if it's possible to do something similar with CMake even if it's not completely portable (I'm only running under Linux I'm not using Visual Studio, etc.).
execute_process(COMMAND cat /proc/cpuinfo
COMMAND head -n 19
COMMAND tail -1
COMMAND grep -c rdtscp
OUTPUT_VARIABLE OUT)
Selecting line 19 exactly makes this brittle. On my desktop (Linux 4.20 on i7-6700k), that line is
wp : yes
Instead use grep's pattern-matching ability to check for the flags\t\t: line.
grep -l '^flags[[:space:]]*:.*rdtscp' /proc/cpuinfo prints the filename and exits with success after the first match. Or prints nothing and exists with failure status if it doesn't find a match.
I don't know CMake, but based on the other answer presumably you'd use
execute_process(COMMAND grep -l '^flags[[:space:]]*:.*rdtscp' /proc/cpuinfo
OUTPUT_VARIABLE OUT)
The simpler version of this is just grep -l rdtscp /proc/cpuinfo, but requiring a match in the flags : line will prevent any possible false-positive. (To be even more belt-and-suspenders, you could require space or end of line before/after, maybe with PCREgrep for zero-width assertions. In case some future feature flag like XYZrdtscpABC that can be present without RDTSCP support becomes a thing in the future. Or like broken_rdtscp). Or we could just assume that rdtscp is never at the end of the a line and look for ^flags.*:.* rdtscp.
Using -l gets grep to exit after the first match, in case you were using head/tail as an optimization to avoid processing more lines on massively multi-core systems like Xeon Phi? It will still read the whole file if there's no match for rdtscp, but probably any massively-multi-core system will have RDTSCP. And grep is very fast anyway.

How to detect code change frequency?

I am working on a program written by several folks with largely varying skill level. There are files in there that have never changed (and probably never will, as we're afraid to touch them) and others that are changing constantly.
I wonder, are there any tools out there that would look at the entire repo history (git) and produce analysis on how frequently a given file changes? Or package? Or project?
It would be of value to recognize that (for example) we spent 25% of our time working on a set of packages, which would be indicative or code's fragility, as compared with code that "just works".
If you're looking for an OS solution, I'd probably consider starting with gitstats and look at extending it by grabbing file logs and aggregating that data.
I'd have a look at NChurn:
NChurn is a utility that helps asses the churn level of your files in
your repository. Churn can help you detect which files are changed the
most in their life time. This helps identify potential bug hives, and
improper design.The best thing to do is to plug NChurn into your build
process and store history of each run. Then, you can plot the
evolution of your repository's churn.
I wrote something that we use to visualize this information successfully.
https://github.com/bcarlso/defect-density-heatmap
Take a look at the project and you can see what the output looks like in the readme.
You can do what you need by first getting a list of files that have changed in each commit from Git.
~ $ git log --pretty="format:" --name-only | grep -v ^$ > file-changes.txt
~ $ for i in `cat file-changes.txt | cut -d"." -f1,2 | uniq`; do num=`cat file-changes.txt | grep $i | wc -l`; if (( $num > 1 )); then echo $num,0,$i; fi; done | heatmap > results.html
This will give you a tag cloud with files that churn more will show up larger.
I suggest using a command like
git log --follow -p file
That will give you all the changes that happened to the file in the history (including renames). If you want to get the number of commits that changed the file then you can do on a UNIX-based OS :
git log --follow --format=oneline Gemfile | wc -l
You can then create a bash script to apply this to multiple files with the name aside.
Hope it helped !
Building on a previous answer I suggest the following script to parse all project files
#!/bin/sh
cd $1
find . -path ./.git -prune -o -name "*" -exec sh -c 'git log --follow --format=oneline $1 | wc -l | awk "{ print \$1,\"\\t\",\"$1\" }" ' {} {} \; | sort -nr
cd ..
If you call the script as file_churn.sh you can parse your git project directory calling
> ./file_churn.sh project_dir
Hope it helps.

ubuntu shell scripting

I'm new to shell scripting. I need to write a script that executes this command to get the process ID's for the tasks...
ps aux | grep java | grep dbConvert2 | awk '{print $2}'
then do some other stuff, and then kill the process ID's that I grabbed earlier...
I know I can do kill -9, i just don't know how to dynamically grab all the PID's and store them as variables
append | xargs kill -9 to your current command
[edit]
if you want to do some operations on each id, you can use a for loop, something like:
for my_pid in `YOUR_CMD`; do
<some stuff with $my_pid>
kill -9 $my_pid
done
pkill -9 'java.*dbConvert2'
You might want to use pgrep 'pattern' to try different patterns before.
Edit: If your process isn't matched, you might need to use -f (applies to both pgrep and pkill, use after the -9 though) to search the entire command including arguments.
Example: pkill -9 -f 'java.*dbConvert2'