CMake add_custom_target() to format source code induces clock skew - cmake

The real problem
I want to apply project level source code formatting to all modified files
Current approach
Use add_custom_target in my top-level CMakeLists.txt file to call a script that applies formatting rules to all files the SCM tool reports as modified:
add_custom_target(Name ALL ${PROJECT_SOURCE_DIR}/../cmake/format_files.bash
)
This rule is before any add_subdirectory calls, because reformatting should take place before all compilation.
Per the documentation:
ALL
Indicate that this target should be added to the default build target so that it will be run every time (the command cannot be called ALL).
When CMake itself runs (like any modification to the CMakeLists.txt files), all is good.
The Symptom
Suppose I perform some spacing-related modification to file Foo.hh (my rules replace tabs with spaces, for example). My build is likely to include something like this:
Scanning dependencies of target Foo
make[2]: Warning: File `projects/foo/src/Foo.hh' has modification time 8.7 s in the future
...
make[2]: warning: Clock skew detected. Your build may be incomplete.
I'm pretty sure it's the source formatting script that somehow runs after dependency scanning (or something like that), modifies Foo.hh, and creates the illusion of clock skew.
What I think the question is
What is the right way to force my build process to assert project standards for source code style prior to building, without potentially creating dependency problems?
Is there a better way to introduce formatting to the build process?
Red Herrings
At first, I thought I was dealing with a true clock skew problem; my development environment is on a VMware VM, and we have had some issues with time in the past, but now I'm 99% sure that all the VMs are using host time. Furthermore, a simple test like this (in the same filesystem as my builds) proves there is no intrinsic clock skew:
$ date ; touch foo ; ls --time-style=+%H:%M:%S -l foo ; date
Thu Jan 17 12:48:59 MST 2019
-rw-rw-r--. 1 1001 1001 0 12:48:59 foo
Thu Jan 17 12:48:59 MST 2019
A key facet of the source code formatting process is that there is no deterministic way to know which files might be modified in the script and which will not. Files that comply with project standards are not touched.
For completeness, here is the script:
#!/bin/bash
# This script is intended to format any modified files to project standards
# Change to the project root
cd $(dirname $0)/..
outfile=format.log
file_list=$( git status --short --untracked-files=all src \
| awk '/^( M|\?\?) .*\.(cpp|hh)/ {print $2}' )
# If we haven't changed any files, exit gracefully
[[ -z $file_list ]] && exit 0
# Format the current working set
echo >> ${outfile}
date '+%Y-%m-%dT%H:%M:%S.%N: ' >> ${outfile}
astyle --project $file_list >>${outfile} 2>&1
This script appends to an output file (I'll probably remove that at some point) that looks like this:
2019-01-17T18:54:20.641765133:
Unchanged src/Foo.cpp
Formatted src/Foo.hh
Unchanged src/Bar.cpp

Based on the discussion at https://discourse.cmake.org/t/cmake-pre-build-command/1083, the answer is "don't do that". Formatting can be a target and building can be a target, but having a build step that modifies the dependencies of another build step (after the dependency tree has been evaluated) is bad.
Instead of formatting my code as part of the build, I added it as a CI check on the build server: if formatting would change the code, the build fails. I also created a pre-commit hook to tell me if my code needs formatting. I don't like hooks that change the code checked in; changed code should always be compiled before commit.

Related

Conditionally run CMake's CHECK_TYPE_SIZE

Is there a way to conditionally run CMake's CHECK_TYPE_SIZE command? CHECK_TYPE_SIZE is great for figuring out what the size of a struct is, but the problem is that over time someone might modify the header file with the struct to add new fields. Seems like it's an accident waiting to happen when someone updates the struct but forgets (or doesn't know to) blow away the CMake cache. Yes, you could put a note next to the struct to do a cache wipe if updated, but that doesn't really help when you have a multi-person project and someone else updated the header file.
I tried to do an unset(HAS_MYVAR CACHE) but that didn't seem to work as the function doesn't appear to be re-run. Any ideas?
CODE:
CHECK_TYPE_SIZE("my_struct_t" MY_STRUCT_SIZE)
message("Struct size is ${MY_STRUCT_SIZE})
add_custom_target(get_size)
add_custom_command(TARGET get_size COMMAND echo ${MY_STRUCT_SIZE})
$ cmake -S . -B /tmp/test
...
Struct size is 40
...
$ cd /tmp/test
$ make get_size
40
Built target get_size
$ <mod stuct to be larger>
$ make get_size
40
Built target get_size

How to use the program's exit status at compile time?

This question is subsequent to my previous one: How to integrate such kind of source generator into CMake build chain?
Currently, the C source file is generated from XS in this way:
set_source_files_properties(${CMAKE_CURRENT_BINARY_DIR}/${file_src_by_xs} PROPERTIES GENERATED 1)
add_custom_target(${file_src_by_xs}
COMMAND ${XSUBPP_EXECUTABLE} ${XSUBPP_EXTRA_OPTIONS} ${lang_args} ${typemap_args} ${file_xs} >${CMAKE_CURRENT_BINARY_DIR}/${file_src_by_xs}
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
DEPENDS ${file_xs} ${files_xsh} ${_XSUBPP_TYPEMAP_FILES}
COMMENT "generating source from XS file ${file_xs}"
)
The GENERATED property let cmake don't check the existence of this source file at configure time, and add_custom_target let the xsubpp always re-run at each compile. The reason for always rerun is because xsubpp will generate an incomplete source file even if it fails, so there are possibility that the whole compiling continues with an incomplete source file.
I found it is time consuming to always re-run source generator and recompile it. So I want to have it re-run only when dependent XS files are modified. However, if I do so, the incomplete generated source file must be deleted.
So my question is: is there any way to remove the generated file, only when the program exit abnormally at compile time?
Or more generic: is there any way to run a command depending on another command's exit status at compile time?
You can always write a wrapper script in your favorite language, e.g. Perl or Ruby, that runs xsubpp and deletes the output file if the command failed. That way you can be sure that if it exists, it is correct.
In addition, I would suggest that you use the OUTPUT keyword of add_custom_command to tell CMake that the file is a result of executing the command. (And, if you do that, you don't have to set the GENERATED property manually.)
Inspired by #Lindydancer's answer, I achieved the purpose by multiple COMMANDs in one target, and it don't need to write an external wrapper script.
set(source_file_ok ${source_file}.ok)
add_custom_command(
OUTPUT ${source_file} ${source_file_ok}
DEPENDS ${xs_file} ${xsh_files}
COMMAND rm -f ${source_file_ok}
COMMAND xsubpp ...... >${source_file}
COMMAND touch ${source_file_ok}
)
add_library(${xs_lib} ${source_file})
add_dependencies(${xs_lib} ${source_file} ${source_file_ok})
The custom target has 3 commands. The OK file only exists when xsubpp is success, and this file is added as a dependency of the library. When xsubpp is not success, the dependency on the OK file will force the custom command to be run again.
The only flaw is cross-platform: not all OS have touch and rm, so the name of these two commands should be decided according to OS type.

make targets depend on variables

I want (GNU) make to rebuild when variables change. How can I achieve this?
For example,
$ make project
[...]
$ make project
make: `project' is up to date.
...like it should, but then I'd prefer
$ make project IMPORTANTVARIABLE=foobar
make: `project' is up to date.
to rebuild some or all of project.
Make wasn't designed to refer to variable content but Reinier's approach shows us the workaround. Unfortunately, using variable value as a file name is both insecure and error-prone. Hopefully, Unix tools can help us to properly encode the value. So
IMPORTANTVARIABLE = a trouble
# GUARD is a function which calculates md5 sum for its
# argument variable name. Note, that both cut and md5sum are
# members of coreutils package so they should be available on
# nearly all systems.
GUARD = $(1)_GUARD_$(shell echo $($(1)) | md5sum | cut -d ' ' -f 1)
foo: bar $(call GUARD,IMPORTANTVARIABLE)
#echo "Rebuilding foo with $(IMPORTANTVARIABLE)"
#touch $#
$(call GUARD,IMPORTANTVARIABLE):
rm -rf IMPORTANTVARIABLE*
touch $#
Here you virtually depend your target on a special file named $(NAME)_GUARD_$(VALUEMD5) which is safe to refer to and has (almost) 1-to-1 correspondence with variable's value. Note that call and shell are GNU Make extensions.
You could use empty files to record the last value of your variable by using something like this:
someTarget: IMPORTANTVARIABLE.$(IMPORTANTVARIABLE)
#echo Remaking $# because IMPORTANTVARIABLE has changed
touch $#
IMPORTANTVARIABLE.$(IMPORTANTVARIABLE):
#rm -f IMPORTANTVARIABLE.*
touch $#
After your make run, there will be an empty file in your directory whose name starts with IMPORTANTVARIABLE. and has the value of your variable appended. This basically contains the information about what the last value of the variable IMPORTANTVARIABLE was.
You can add more variables with this approach and make it more sophisticated using pattern rules -- but this example gives you the gist of it.
You probably want to use ifdef or ifeq depending on what the final goal is. See the manual here for examples.
I might be late with an answer, but here is another way of doing such a dependency with Make conditional syntax (works on GNU Make 4.1, GNU bash, Bash on Ubuntu on Windows version 4.3.48(1)-release (x86_64-pc-linux-gnu)):
1 ifneq ($(shell cat config.sig 2>/dev/null),prefix $(CONFIG))
2 .PHONY: config.sig
3 config.sig:
4 #(echo 'prefix $(CONFIG)' >config.sig &)
5 endif
In the above sample we track the $(CONFIG) variable, writing it's value down to a signature file, by means of the self-titled target which is generated under condition when the signature file's record value is different with that of $(CONFIG) variable. Please, note the prefix on lines 1 and 4: it is needed to distinct the case, when signature file doesn't exist yet.
Of course, consumer targets specify config.sig as a prerequisite.

Git - how do I view the change history of a method/function?

So I found the question about how to view the change history of a file, but the change history of this particular file is huge and I'm really only interested in the changes of a particular method. So would it be possible to see the change history for just that particular method?
I know this would require git to analyze the code and that the analysis would be different for different languages, but method/function declarations look very similar in most languages, so I thought maybe someone has implemented this feature.
The language I'm currently working with is Objective-C and the SCM I'm currently using is git, but I would be interested to know if this feature exists for any SCM/language.
Recent versions of git log learned a special form of the -L parameter:
-L :<funcname>:<file>
Trace the evolution of the line range given by "<start>,<end>" (or the function name regex <funcname>) within the <file>. You may not give any pathspec limiters. This is currently limited to a walk starting from a single revision, i.e., you may only give zero or one positive revision arguments. You can specify this option more than once.
...
If “:<funcname>” is given in place of <start> and <end>, it is a regular expression that denotes the range from the first funcname line that matches <funcname>, up to the next funcname line. “:<funcname>” searches from the end of the previous -L range, if any, otherwise from the start of file. “^:<funcname>” searches from the start of file.
In other words: if you ask Git to git log -L :myfunction:path/to/myfile.c, it will now happily print the change history of that function.
Using git gui blame is hard to make use of in scripts, and whilst git log -G and git log --pickaxe can each show you when the method definition appeared or disappeared, I haven't found any way to make them list all changes made to the body of your method.
However, you can use gitattributes and the textconv property to piece together a solution that does just that. Although these features were originally intended to help you work with binary files, they work just as well here.
The key is to have Git remove from the file all lines except the ones you're interested in before doing any diff operations. Then git log, git diff, etc. will see only the area you're interested in.
Here's the outline of what I do in another language; you can tweak it for your own needs.
Write a short shell script (or other program) that takes one argument -- the name of a source file -- and outputs only the interesting part of that file (or nothing if none of it is interesting). For example, you might use sed as follows:
#!/bin/sh
sed -n -e '/^int my_func(/,/^}/ p' "$1"
Define a Git textconv filter for your new script. (See the gitattributes man page for more details.) The name of the filter and the location of the command can be anything you like.
$ git config diff.my_filter.textconv /path/to/my_script
Tell Git to use that filter before calculating diffs for the file in question.
$ echo "my_file diff=my_filter" >> .gitattributes
Now, if you use -G. (note the .) to list all the commits that produce visible changes when your filter is applied, you will have exactly those commits that you're interested in. Any other options that use Git's diff routines, such as --patch, will also get this restricted view.
$ git log -G. --patch my_file
Voilà!
One useful improvement you might want to make is to have your filter script take a method name as its first argument (and the file as its second). This lets you specify a new method of interest just by calling git config, rather than having to edit your script. For example, you might say:
$ git config diff.my_filter.textconv "/path/to/my_command other_func"
Of course, the filter script can do whatever you like, take more arguments, or whatever: there's a lot of flexibility beyond what I've shown here.
The closest thing you can do is to determine the position of your function in the file (e.g. say your function i_am_buggy is at lines 241-263 of foo/bar.c), then run something to the effect of:
git log -p -L 200,300:foo/bar.c
This will open less (or an equivalent pager). Now you can type in /i_am_buggy (or your pager equivalent) and start stepping through the changes.
This might even work, depending on your code style:
git log -p -L /int i_am_buggy\(/,+30:foo/bar.c
This limits the search from the first hit of that regex (ideally your function declaration) to thirty lines after that. The end argument can also be a regexp, although detecting that with regexp's is an iffier proposition.
git log has an option '-G' could be used to find all differences.
-G Look for differences whose added or removed line matches the
given <regex>.
Just give it a proper regex of the function name you care about. For example,
$ git log --oneline -G'^int commit_tree'
40d52ff make commit_tree a library function
81b50f3 Move 'builtin-*' into a 'builtin/' subdirectory
7b9c0a6 git-commit-tree: make it usable from other builtins
The correct way is to use git log -L :function:path/to/file as explained in eckes answer.
But in addition, if your function is very long, you may want to see only the changes that various commit had introduced, not the whole function lines, included unmodified, for each commit that maybe touch only one of these lines. Like a normal diff does.
Normally git log can view differences with -p, but this not work with -L.
So you have to grep git log -L to show only involved lines and commits/files header to contextualize them. The trick here is to match only terminal colored lines, adding --color switch, with a regex. Finally:
git log -L :function:path/to/file --color | grep --color=never -E -e "^(^[\[[0-9;]*[a-zA-Z])+" -3
Note that ^[ should be actual, literal ^[. You can type them by pressing ^V^[ in bash, that is Ctrl + V, Ctrl + [. Reference here.
Also last -3 switch, allows to print 3 lines of output context, before and after each matched line. You may want to adjust it to your needs.
Show function history with git log -L :<funcname>:<file> as showed in eckes's answer and git doc
If it shows nothing, refer to Defining a custom hunk-header to add something like *.java diff=java to the .gitattributes file to support your language.
Show function history between commits with git log commit1..commit2 -L :functionName:filePath
Show overloaded function history (there may be many function with same name, but with different parameters) with git log -L :sum\(double:filepath
git blame shows you who last changed each line of the file; you can specify the lines to examine so as to avoid getting the history of lines outside your function.

SCONS: making a special script builder depend on output of another builder

I hope the title clarifies what I want to ask because it is a bit tricky.
I have a SCONS SConscript for every subdir as follows (doing it in linux, if it matters):
src_dir
compiler
SConscript
yacc srcs
scripts
legacy_script
data
SConscript
data files for the yacc
I use a variant_dir without copy, for example:
SConscript('src_dir/compiler/SConscript', variant_dir = 'obj_dir', duplicate = 0)
The resulting obj_dir after building the yacc is:
obj_dir
compiler
compiler_compiler.exe
Now here is the deal.
I have another SConscript in the data dir that needs to do 2 things:
1. compile the data with the yacc compiled compiler
2. Take the output of the compiler and run it with the legacy_script I can't change
(the legacy_script, takes the output of the compiled data and build some h files for another software to depend on)
number 1 is acheived easily:
linux_env.Command('[output1, output2]', 'data/data_files','compiler_compiler.exe data_files output1 output2')
my problem is number 2: How do I make the script runner depend on outputs of another target
And just to clarify it, I need to make SCONS run (and only if compiler_output changes):
src_dir/script/legacy_script obj_dir/data/compiler_output obj_dir/some_dir/script_output
(the script is usage is: legacy_script input_file output_file)
I hope I made myself clear, feel free to ask some more questions...
I've had a similar problem recently when I needed to compile Cheetah Templates first, which were then used from another Builder to generate HTML files from different sources.
If you define the build output of the first builder as source for the second builder, SCons will run them in the correct order and only if intermediate files have changed.
Wolfgang