How to detect code change frequency? - code-analysis

I am working on a program written by several folks with largely varying skill level. There are files in there that have never changed (and probably never will, as we're afraid to touch them) and others that are changing constantly.
I wonder, are there any tools out there that would look at the entire repo history (git) and produce analysis on how frequently a given file changes? Or package? Or project?
It would be of value to recognize that (for example) we spent 25% of our time working on a set of packages, which would be indicative or code's fragility, as compared with code that "just works".

If you're looking for an OS solution, I'd probably consider starting with gitstats and look at extending it by grabbing file logs and aggregating that data.

I'd have a look at NChurn:
NChurn is a utility that helps asses the churn level of your files in
your repository. Churn can help you detect which files are changed the
most in their life time. This helps identify potential bug hives, and
improper design.The best thing to do is to plug NChurn into your build
process and store history of each run. Then, you can plot the
evolution of your repository's churn.

I wrote something that we use to visualize this information successfully.
https://github.com/bcarlso/defect-density-heatmap
Take a look at the project and you can see what the output looks like in the readme.
You can do what you need by first getting a list of files that have changed in each commit from Git.
~ $ git log --pretty="format:" --name-only | grep -v ^$ > file-changes.txt
~ $ for i in `cat file-changes.txt | cut -d"." -f1,2 | uniq`; do num=`cat file-changes.txt | grep $i | wc -l`; if (( $num > 1 )); then echo $num,0,$i; fi; done | heatmap > results.html
This will give you a tag cloud with files that churn more will show up larger.

I suggest using a command like
git log --follow -p file
That will give you all the changes that happened to the file in the history (including renames). If you want to get the number of commits that changed the file then you can do on a UNIX-based OS :
git log --follow --format=oneline Gemfile | wc -l
You can then create a bash script to apply this to multiple files with the name aside.
Hope it helped !

Building on a previous answer I suggest the following script to parse all project files
#!/bin/sh
cd $1
find . -path ./.git -prune -o -name "*" -exec sh -c 'git log --follow --format=oneline $1 | wc -l | awk "{ print \$1,\"\\t\",\"$1\" }" ' {} {} \; | sort -nr
cd ..
If you call the script as file_churn.sh you can parse your git project directory calling
> ./file_churn.sh project_dir
Hope it helps.

Related

--immediate-submit {dependencies} string contains script paths, not job IDs?

I'm trying to use the --immediate-submit on a PBSPro cluster. I tried using an in-place modification of the dependencies string to adapt it to PBSPro, similar to what is done here.
snakemake --cluster "qsub -l wd -l mem={cluster.mem}GB -l ncpus={threads} -e {cluster.stderr} -q {cluster.queue} -l walltime={cluster.walltime} -o {cluster.stdout} -S /bin/bash -W $(echo '{dependencies}' | sed 's/^/depend=afterok:/g' | sed 's/ /:/g')"
This last part gets converted into, for example:
-W depend=afterok: /g/data1a/va1/dk0741/analysis/2018-03-25_marmo_test/.snakemake/tmp.cyrhf51c/snakejob.trimmomatic_pe.7.sh
There are two problems here:
How can I get the dependencies string to output job ID instead of the script path? The qsub command normally outputs the job ID to stdout, so I'm not sure why it's not doing so here.
How do I get rid of the space after afterok:? I've tried everything!
As an aside, it would be helpful if there were some option to debug the submission or not to delete the tmp.cyrhf51c directory in .snakemake -- is there some way to do this?
Thanks,
David
I suggest to use a profile for this, instead of trying to find an ad-hoc solution. This will also help with debugging. E.g., there is already a pbs-torque profile available (https://github.com/Snakemake-Profiles/pbs-torque), probably there is not much to change towards pbspro?

Move file, change permissions and rename it keeping the same extesion

Using zsh 5.2 on Fedora 24 workstation.
I want to be programatically able to:
move an image file (can have jpg/ jpeg/ png/ JPG/ PNG extensions)
from /tmp/folder1 to ~/Pictures
This file will have the same few initial characters --- prefix111.jpg OR prefix222.png, etc.
rename the file such that samefilename.JPG becomes 20161013.jpg
20161013 is today's date in yyyymmdd format
Note that the extension becomes small letters
And JPEG or jpeg becomes jpg
change the permissions of the moved file to 644
All at one go.
If there are multiple prefix* files, the command should just fail silently.
I will initially like to do it at the command prompt with an option to add a cron job later. I mean, will the same zsh command/ script work in cron?
I am sure, this is doable. However, with my limited shell knowledge, could only achieve:
mv /tmp/folder1/prefix-*.JPG ~/Pictures/$(date +'%Y%m%d').jpg
Problems with my approach are many. It does not handle capitalization, does not take care of different extensions and does not address the permission issue.
How about this:
#!/bin/sh
FILES="/tmp/folder1/prefix*.jpg /tmp/folder1/prefix*.jpeg /tmp/folder1/prefix*.png h/tmp/folder1/prefix*.JPG /tmp/folder1/prefix*.PNG"
if [ $(ls $FILES | wc -l ) -gt 1 ]; then
exit 1
fi
if [ $(ls $FILES | grep -i '\.png$') ]; then
SUFF=png
else
SUFF=jpg
fi
DEST=$HOME/Pictures/$(date +'%Y%m%d').$SUFF
mv $FILES $DEST
chmod 644 $DEST

Apache Subversion pre-commit to restrict files

I'm new to Apache SVN and I need some help to use a pre-commit script to filter which files are being upload to my repository.
I was searching a lot and found this script on another question, but it didn't work for me.
#!/bin/bash
REPOS=$1
TXN=$2
AWK=/usr/bin/awk
SVNLOOK="/usr/bin/svnlook";
#Put all the restricted formats in variable FILTER
FILTER=".(sh|xls|xlsx|exe|xlsm|XLSM|vsd|VSD|bak|BAK|class|CLASS)$"
# Figure out what directories have changed using svnlook.
FILES=`${SVNLOOK} changed -t ${REPOS} ${TXN} | ${AWK} '{ print $2 }'` > /dev/null
for FILE in $FILES; do
#Get the base Filename to extract its extension
NAME=`basename "$FILE"`
#Get the extension of the current file
EXTENSION=`echo "$NAME" | cut -d'.' -f2-`
#Checks if it contains the restricted format
if [[ "$FILTER" == *"$EXTENSION"* ]]; then
echo "Your commit has been blocked because you are trying to commit a restricted file." 1>&2
echo "Please contact SVN Admin. -- Thank you" 1>&2
exit 1
fi
done
exit 0
If I try to use svnlook changed -t repodirectory it didn't work because had a missing subcommand.
I overwrote my pre-commit.tmpl but it didn't work, can someone help me?
First - seems you incorrectly use svnlook. It should has parameters:
svnlook changed ${REPOS} -t ${TXN}
-t means 'read from transaction' and TXN - transaction name itself.
Second - not sure if I understand correctly, but hook file should has name pre-commit not pre-commit.tmpl
Third - pre-commit should has correct rights. For tests try a+rwx
update. It is not easy to obtain transaction object for tests, but you can use svnlook -r <revision> <repositiry_path> and experiment on already commited revisions.

No such file or directory from sh script

Looking for the origin of this error message:
Processing: +([^_]).flv
date: +([^_]).flv: No such file or directory
I started getting this at some point in the last few months (can't say when as I wasn't logging my cron output. I know, I know!).
When I originally wrote this, it worked ok for at least two months. I'm wondering if there was an sh update that broke it?
The script runs via crontab and gets all .flv files in the current directory without an underscore and processes each one. It then checks the modified date for files that have been created in the last 24 hours and runs the yamdi meta tag injector for .flv files.
It seems like it's not recognizing the pattern as a pattern and looking for it as an actual file to me. If I run this script from an ssh shell it works ok, it's only when running via cron that it gives this error.
shopt -s extglob
now=$(date +"%s")
for f in +([^_]).flv; do
echo "Processing: $f"
age=$(date -r "$f" +"%s")
calc=$(((now-age) / 60 / 60))
if(( calc < 24 )); then
echo "$f age=$calc"
yamdi -i "$f" -o "$f".seek
rm "$f"
cp "$f".seek "$f"
touch -d #$age "$f"
fi
done
This is most likely a problem of the wrong shell being used; make sure your script's first line represents the right shell:
#!/bin/bash
for bash, or whatever shell you wrote this for. You might want to check your environment variables that cron may set (that's a very common problem -- one assumes everything is set up correctly, but the environment that cron offers to scripts it executes is different).

How to ignore certain files when branching / checking out?

I'd like to compare a few files from the bazaar branch lp:ubuntu/nvidia-graphics-drivers. I'm mainly interested in the debian subdirectory inside that branch, but due to the binary blob in http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files, it takes ages to get just the text files. I've already downloaded 555MB and it's still counting.
Is it possible to retrieve a bazaar branch, including or excluding certain files by one of the following properties:
file size
file extension
file name (include only debian/ for example)
I do not need to push back any changes, nor do I need to view the history of a file. I just want to compare two files in the debian/ directory, files with the .in extension and files without.
As far as I'm aware, no. You're downloading the branch history, not just the individual files. And each file is an integral part of the branch's history.
On the bright side, you only have to check it out once. Unless those binary files change, they'll be skipped the next time you pull from Launchpad.
Depending on the branch's history, you may be able to cut down on the download size if you use a lightweight checkout (bzr checkout --lightweight). But of course, that may come back and bite you later, as it means you won't get a local copy of the branch, only the checked-out files. So it'll work much like SVN, where every operation has to go through the server. And as long as you don't need to look at the branch history, or commit your changes, that should serve you just fine, I believe.
I ended up doing some dirty grep-ing on the HTTP response since bzr info "$branch" and bzr ls -d "$branch" "$directory" did not provide enough information to me.
The below Bash script relies on the working of Launchpads front-end Loggerhead. It recursively downloads from a given URL. Currently, it ignores *.run files. Save it as bzrdl in a directory available from $PATH and run it with bzrdl http://launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files/head:/debian/. All files will be saved in the current directory, be sure that it's empty to avoid conflicts.
#!/bin/bash
max_retries=5
rooturl="$1"
if ! [[ $rooturl =~ /$ ]]; then
echo "Usage: ${0##*/} URL"
echo "URL must end with a slash. Example URL:"
echo "http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files/head:/"
exit 1
fi
tmpdir="$(mktemp -d)"
target="$(pwd)"
# used for holding HTTP response before extracting data
tmp="$(mktemp)"
# url_filter reads download URLs from stdin (piped)
url_filter() {
grep -v '\.run$'
}
get_files_from_dir() {
local slash=/
local dir="$1"
# to avoid name collision: a/b/c/ -> a.d/b.d/c.d/
local storedir="${dir//$slash/.d${slash}}"
mkdir -p "$tmpdir/$storedir" "$target/$dir"
local i subdir
for ((i=0; i<$max_retries; i++ )); do
if wget -O "$tmp" "$rooturl$dir"; then
# store file list
grep -F -B 1 '<img src="/static/images/ico_file_download.gif" alt="Download File" />' "$tmp" |\
grep '^<a' | cut -d '"' -f 2 | url_filter \
> "$tmpdir/$storedir/files"
IFS=$'\n'
for subdir in $(grep -F -B 1 '<img src="/static/images/ico_folder.gif" ' "$tmp" | \
grep -F '<a ' | rev | cut -d / -f 2 | rev); do
IFS=$' \t\n'
get_files_from_dir "$dir$subdir/"
done
return
fi
done
echo "Failed to download directory listing of: $dir" >> "$tmpdir/errors"
}
download_files() {
local slash=/
local dir="$1"
# to avoid name collision: a/b/c/ -> a.d/b.d/c.d/
local storedir="${dir//$slash/.d${slash}}"
local done=false
local subdir
cd "$tmpdir/$storedir"
for ((i=0; i<$max_retries; i++)); do
if wget -B "$rooturl$dir" -nc -i files -P "$target/$dir"; then
done=true
break
fi
done
$done || echo "Failed to download all files from $dir" >> "$tmpdir/errors"
for subdir in *.d; do
download_files "$dir${subdir%%.d}/"
done
}
get_files_from_dir ''
# make *.d expand to nothing if no directories are found
shopt -s nullglob
download_files ''
echo "TMP dir: $tmpdir"
echo "Errors : $(wc -l "$tmpdir/errors" 2>/dev/null | cut -d ' ' -f 2 || echo 0)"
The temporary directory and file is not removed afterwards, that must be done manually. Any errors (failures to download) will be written to $tmpdir/errors
It's confirmed to work with:
bzrdl http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-settings/oneiric/files/head:/debian/
Feel free to correct any mistakes or add improvements.
There is no way to selectively check out a specific directory from a Bazaar branch at the moment, although we do have plans to add such support in the future.
There is definitely too much traffic for the clone you are doing, considering the size of the branch. It's probably a bug in the client implementation.
Here on bzr 2.4 it is still quite slow but not too bad (60s):
localhost:/tmp% bzr branch http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-settings/oneiric
Most recent Ubuntu Oneiric version: 275.09.07-0ubuntu1
Packaging branch status: CURRENT
Branched 37 revision(s).
From the log:
[11866] 2011-07-31 00:56:57.007 INFO: Branched 37 revision(s).
56.786 Transferred: 5335kB (95.8kB/s r:5314kB w:21kB)