Bazel Checkers Support - lint

What options do Bazel provide for creating new or extending existing targets that call C/C++-code checkers such as
Do I need to use a genrule or is there some other target rule for that?
Is my only viable choice here?
In security critical software industries, such as aviation and automotive, it's very common to use the results of these calls to collect so called "metric reports".
In these cases, calls to such linters must have outputs that are further processed by the build actions of these metric report collectors. In such cases, I cannot find a useful way of reusing Bazel's "extra-actions". Ideas any one?

I've written something which uses extra actions to generate a compile_commands.json file used by clang-tidy and other tools, and I'd like to do the same kind of thing for iwyu when I get around to it. I haven't used those other tools, but I assume they fit the same pattern too.
The basic idea is to run an extra action which generates some output for each file (aka C/C++ compilation command), and then find all the output files afterwards (outside of Bazel) and aggregate them. A reasonably complete example is here for reference. Basically, the action listener (written in Python) decodes the extra action proto and extracts the source files, compiler options, etc:
action = extra_actions_base_pb2.ExtraActionInfo()
with open(argv[1], 'rb') as f:
cpp_compile_info = action.Extensions[extra_actions_base_pb2.CppCompileInfo.cpp_compile_info]
compiler = cpp_compile_info.tool
options = ' '.join(cpp_compile_info.compiler_option)
source = cpp_compile_info.source_file
output = cpp_compile_info.output_file
print('%s %s -c %s -o %s' % (compiler, options, source, output))
If you give the extra action an output template, then it can write that output to a file. If you give the output files distinctive names, you can find them all in the output tree and merge them together however you want.
A more sophisticated option is to use bazel query --output=proto and write code to calculate the extra action output filenames of the targets you're interested in from there. That requires writing more code, but you don't have problems with old output files in the output tree that are accidentally included when aggregating.
FWIW, Aspects are another possibility. However, I think extra actions work acceptably for this.


In CMake how do I deal with generated source files which number and names are not known before?

Imagine a code generator which reads an input file (say a UML class diagram) and produces an arbitrary number of source files which I want to be handled in my project. (to draw a simple picture let's assume the code generator just produces .cpp files).
The problem is now the number of files generated depends on the input file and thus is not known when writing the CMakeLists.txt file or even in CMakes configure step. E.g.:
>>> code-gen uml.xml
generate class1.cpp..
generate class2.cpp..
generate class3.cpp..
What's the recommended way to handle generated files in such a case? You could use FILE(GLOB.. ) to collect the file names after running code-gen the first time but this is discouraged because CMake would not know any files on the first run and later it would not recognize when the number of files changes.
I could think of some approaches but I don't know if CMake covers them, e.g.:
(somehow) define a dependency from an input file (uml.xml in my example) to a variable (list with generated file names)
in case the code generator can be convinced to tell which files it generates the output of code-gen could be used to create a list of input file names. (would lead to similar problems but at least I would not have to use GLOB which might collect old files)
just define a custom target which runs the code generator and handles the output files without CMake (don't like this option)
Update: This question targets a similar problem but just asks how to glob generated files which does not address how to re-configure when the input file changes.
Together with Tsyvarev's answer and some more googling I came up with the following CMakeList.txt which does what I want:
cmake_minimum_required(VERSION 3.6)
set(IN_FILE "${CMAKE_SOURCE_DIR}/input.txt")
COMMAND python3 "${CMAKE_SOURCE_DIR}/code-gen" "${IN_FILE}"
add_executable(generated main.cpp ${GENERATED_FILES})
It turns an input file (input.txt) into output files using code-gen and compiles them.
execute_process is being executed in the configure step and the set_property() command makes sure CMake is being re-run when the input file changes.
Note: in this example the code-generator must print a CMake-friendly list on stdout which is nice if you can modify the code generator. FILE(GLOB..) would do the trick too but this would for sure lead to problems (e.g. old generated files being compiled, too, colleagues complaining about your code etc.)
PS: I don't like to answer my own questions - If you come up with a nicer or cleaner solution in the next couple of days I'll take yours!

Proper way to call a found executable in a custom command?

I have a program on my computer, let's say C:/Tools/generate_v23_debug.exe
I have a FindGenerate.cmake file which allows CMake to find that exact path to the executable.
So in my CMake code, I do:
if (NOT Generate_FOUND)
message(FATAL_ERROR "Generator not found!")
So CMake has found the executable. Now I want to call this program in a custom command statement. Should I use COMMAND Generator or COMMAND ${GENERATOR_EXECUTABLE}? Will both of these do the same thing? Is one preferred over the other? Is name_EXECUTABLE a variable that CMake will define (it's not in the FindGenerate.cmake file), or is it something specific to someone else's example code I'm looking at? Will COMMAND Generator be expanded to the correct path?
OUTPUT blahblah.txt
COMMAND Generator inputfile1.log
DEPENDS Generator
find_program stores its result into the variable given as a first argument. You can verify this by inserting some debug output:
find_program(GENERATOR Generate)
Note that find_program does not set any additional variables beyond that. In particular, you mentioned Generate_FOUND and GENERATOR_EXECUTABLE in your question and neither of those gets introduced implicitly by the find_program call.
The second mistake in your program is the use of the DEPENDS option on the add_custom_command. DEPENDS is used to model inter-target dependencies at build time and not to manipulate control flow in the CMakeLists. For example, additional custom command can DEPEND on the output of your command (blahblah.txt), but a custom command cannot DEPEND on the result of a previous find operation.
A working example might look something like this:
find_program(GENERATOR Generate)
message(FATAL_ERROR "Generator not found!")
OUTPUT blahblah.txt
COMMAND ${GENERATOR} inputfile1.log
P.S.: You asked why the code examples were not properly formatted in your question. You indented everything correctly, but you need an additional newline between normal text and code paragraphs. I edited your question accordingly.

Ignore includes with #pycparser and define multiple Subgraphs in #pydot

I'm trying to create a software showing me caller depandencys for legacycode.
I'parsing a directory with c code with pycparcer, and for each file i want to create a subgraph with pydot.
Two questions:
When parsing a c file, the parser references the #includes, an i get also functions in my AST, from the included files. How can i know, if the function is included, or originaly from this actual file/ or ignore the #includes??
For each file i want to create a subgraph, an then add all functions in this file to this subgraph. I don't know how many subgraphs i have to create...
I have a set of files, where each file is a frozenset with the functions of this file
somthing like this is pssible?
for files in SetOfFiles:
#how to create subgraph with name of files?
for function in files:
self.graph.add_node(pydot.Node(funktion)) #--> add node to subgraph "files"
I hope you got my challange... any ideas?
I solved the question about pydot, it was quiet easy... So I stay with my pycparser problem :(
for files in ListOfFuncs:
cluster_x = pydot.Cluster(files, label=files)
for functions in files:
I can address the pycparser part. The preprocessor leaves #line directives that specify which file & line code came for, and pycparser consumes those. You can get that information from the AST it creates (see tests for an example).

program to reproduce itself and be useful -- not a quine

I have a program which performs a useful task. Now I want to produce the plain-text source code when the compiled executable runs, in addition to performing the original task. This is not a quine, but is probably related.
This capability would be useful in general, but my specific program is written in Fortran 90 and uses Mako Templates. When compiled it has access to the original source code files, but I want to be able to ensure that the source exists when a user runs the executable.
Is this possible to accomplish?
Here is an example of a simple Fortran 90 which does a simple task.
program exampl
implicit none
write(*,*) 'this is my useful output'
end program exampl
Can this program be modified such that it performs the same task (outputs a string when compiled) and outputs a Fortran 90 text file containing the source?
It's been so long since I have touched Fortran (and I've never dealt with Fortran 90) that I'm not certain but I see a basic approach that should work so long as the language supports string literals in the code.
Include your entire program inside itself in a block of literals. Obviously you can't include the literals within this, instead you need some sort of token that tells your program to include the block of literals.
Obviously this means you have two copies of the source, one inside the other. As this is ugly I wouldn't do it that way, but rather store your source with the include_me token in it and run it through a program that builds the nested files before you compile it. Note that this program will share a decent amount of code with the routine that recreates the code from the block of literals. If you're going to go this route I would also make the program spit out the source for this program so whoever is trying to modify the files doesn't need to deal with the two copies.
My original program (see question) is edited: add an include statement
Call this file "exampl.f90"
program exampl
implicit none
write(*,*) "this is my useful output"
include "exampl_source.f90"
end program exampl
Then another program (written in Python in this case) reads that source
import os
f=open('exampl.f90') # read in exampl.f90
g=open('exampl_source.f90','w') # and replace each line with write(*,*) 'line'
for line in f:
#print 'write(2,*) \''+line.rstrip()+'\'\n',
g.write('write(2,*) \''+line.rstrip()+'\'\n')
# then complie exampl.f90 (which includes exampl_source.f90)
os.system('gfortran exampl.f90')
os.system('/bin/rm exampl_source.f90')
Running this python script produces an executable. When the executable is run, it performs the original task AND prints the source code.

SCONS: making a special script builder depend on output of another builder

I hope the title clarifies what I want to ask because it is a bit tricky.
I have a SCONS SConscript for every subdir as follows (doing it in linux, if it matters):
yacc srcs
data files for the yacc
I use a variant_dir without copy, for example:
SConscript('src_dir/compiler/SConscript', variant_dir = 'obj_dir', duplicate = 0)
The resulting obj_dir after building the yacc is:
Now here is the deal.
I have another SConscript in the data dir that needs to do 2 things:
1. compile the data with the yacc compiled compiler
2. Take the output of the compiler and run it with the legacy_script I can't change
(the legacy_script, takes the output of the compiled data and build some h files for another software to depend on)
number 1 is acheived easily:
linux_env.Command('[output1, output2]', 'data/data_files','compiler_compiler.exe data_files output1 output2')
my problem is number 2: How do I make the script runner depend on outputs of another target
And just to clarify it, I need to make SCONS run (and only if compiler_output changes):
src_dir/script/legacy_script obj_dir/data/compiler_output obj_dir/some_dir/script_output
(the script is usage is: legacy_script input_file output_file)
I hope I made myself clear, feel free to ask some more questions...
I've had a similar problem recently when I needed to compile Cheetah Templates first, which were then used from another Builder to generate HTML files from different sources.
If you define the build output of the first builder as source for the second builder, SCons will run them in the correct order and only if intermediate files have changed.