locations of loops in source files - optimization

In my pass, I'd like to know the locations of loops. For example, in a for loop, such as:
for(int i=0; i<n; i++) { ... }
The line number of for(...) in the source file is what I am interested in. If the .bc file is generated by llvm-gcc with -O0, I can easily get this information by reading the line number of the first instruction of the loop. However, if -O3 is used, this method does not work. How can I still get the loop locations in this case?

In general you cannot, because your loop might be transformed by compiler (e.g. unrolled, reversed, etc.)

Related

How can I tell cppcheck to ignore inline assembly?

We have a file within inline assembly for a DSP. Cppcheck thinks there are a load of "variable assigned but not used" lines in the assembly.
Is there any way to tell it to skip checking the inline assembly sections? I couldn't see anything obvious in the manual, and it is a bit tedious to have to suppress each line in turn (t
Here's an example of some of the the offending lines. It's a context save routine.
inline assembly void save_ctx()
{
asm_begin
.undef global data saved_ctx;
.undef global data p_ctx;
asm_text
...
st XM[p0++], r0;
st XM[p0++], r1;
st XM[p0++], r2;
st XM[p0++], r3;
st XM[p0++], r4;
st XM[p0++], r5;
st XM[p0++], r6;
...
I can turn off the messages with
// cppcheck-suppress unreadVariable
before each line, but it would be better to just tell cppcheck to skip the whole inline assembly section.
Is there any way I can do this, or will we just have to accept lots of repeated comments?
Somewhat counter-intuitive, but thanks to #DavidWohlferd for pointing me the right way.
-D__CPPCHECK__ doesn't do the right thing. It tells cppcheck to only check blocks with __CPPCHECK__ or nothing defined, i.e. it completely turns off the combinatorial checking. However there is a simple but counter-intuitive solution using -U.
Wrap the block with
#define EXCLUDE_CPPCHECK
#ifdef EXCLUDE_CPPCHECK
...
#endif // EXCLUDE_CPPCHECK
Now if you call cppcheck with -UEXCLUDE_CPPCHECK it will skip that block (even though the #define is just before it!) but still do all the other combinations of #define which are used in #if.
Thank you David and Drew.
According to man page (didn't try myself) you can add command line options:
--suppress=<spec>
Suppress a specific warning. The format of <spec> is: [error id]:[filename]:[line]. The [filename] and [line] are optional. [error id] may be * to suppress all warnings (for a specified file or files). [filename] may contain the wildcard characters * or ?.
--suppressions-list=<file>
Suppress warnings listed in the file. Each suppression is in the format of <spec> above.
I.e. in your case --suppress=unreadVariable:all_dsp_asm_*.cpp and switch it completely for those particular files. Which is IMO usable, as you can put all the DSP inline asm things into separate file, so it will not affect your ordinary cpp check.
Or in worst case use the suppression-list listing file, where you may list particular lines ad absurd I guess, to cover whole inline parts.
I don't see how to inline it in the source, looks like it may affect only single line.
Checking probably more up to date version of manual here, you can exclude whole file also by -i<filename> (second page).
The options above are at page 11.

Buffering output with AWK

I have an input file which consists of three parts:
inputFirst
inputMiddle
inputLast
Currently I have an AWK script which with this input creates an output file which consists of two parts:
outputFirst
outputLast
where outputFirst and outputLast is generated (on the fly) from inputFirst and inputLast respectively. However, to calculate the outputMiddle part (which is only one line) I need to scan the entire input, so I store it in a variable. The problem is that the value of this variable should go in between outputFirst and outputLast in the output file.
Is there a way to solve this using a single portable AWK script that takes no arguments? Is there a portable way to create temporary files in an AWK script or should I store the output from outputFirst and outputLast in two variables? I suspect that using variables will be quite inefficient for large files.
All versions of AWK (since at least 1985) can do basic I/O redirection to files or pipelines, just like the shell can, as well as run external commands without I/O redirection.
So, there are any number of ways to approach your problem and solve it without having to read the entire input file into memory. The most optimal solution will depend on exactly what you're trying to do, and what constraints you must honour.
A simple approach to the more precise example problem you describe in your comment above would perhaps go something like this: first in the BEGIN clause form two unique filenames with rand() (and define your variables), then read and sum the first 50 numbers from standard input while also writing them to a temporary file, then continuing to read and sum the next 50 numbers and write them to a second file, then finally in an END clause you would use a loop to read the first temporary file with getline and write it to standard output, print the total sum, then read the second temporary file the same way and write it to standard output, and finally call system("rm " file1 " " file2) to remove the temporary files.
If the output file is not too large (whatever that is), saving outputLast in a variable is quite reasonable. The first part, outputFirst, can (as described) be generated on the fly. I tried this approach and it worked fine.
Print the "first" output while processing the file, then write the remainder to a temporary file until you have written the middle.
Here is a self-contained shell script which processes its input files and writes to standard output.
#!/bin/sh
t=$(mktemp -t middle.XXXXXXXXX) || exit 127
trap 'rm -f "$t"' EXIT
trap 'exit 126' HUP INT TERM
awk -v temp="$t" "NR<500000 { print n+1 }
{ s+=$1 }
NR>=500000 { print n+1 >>temp
END { print s }' "$#"
cat "$t"
For illustration purposes, I used really big line numbers. I'm afraid your question is still too vague to really obtain a less general answer, but perhaps this can help you find the right direction.

Determining symbol addresses using binutils/readelf

I am working on a project where our verification test scripts need to locate symbol addresses within the build of software being tested. This might be used for setting breakpoints or reading static data from memory. What I am after is to create a map file containing symbol names, base address in memory, and size. Our build outputs an ELF file which has the information I want. I've been trying to use the readelf, nm, and objdump tools to try and to gain the symbol addresses I need.
I originally tried readelf -s file.elf and that seemed to access some symbols, particularly those which were written in assembler. However, many of the symbols that I wanted were not in there - specifically those that originated within our Ada code.
I used readelf --debug-dump file.elf to dump all debug information. From that I do see all symbols, including those that were in the Ada code. However, the format seems to be in the DWARF format. Does anyone know why these symbols would not be output by readelf when I ask it to list the symbolic information? Perhaps there is simply an option I am missing.
Now I could go to the trouble of writing a custom DWARF parser to get the information but if I can get it using one of the Binutils (nm, readelf, objdump) then I'd really like prefer a standard solution.
DWARF is the debug information and tries to reflect the relation of the original source code. Taking following code as an example
static int one() {
// something
return 1;
}
int main(int ac, char **av) {
return one();
}
After you compile it using gcc -O3 -g, the static function one will be inlined into main. So when you use readelf -s, you will never see the symbol one. However, when you use readelf --debug-dump, you can see one is a function which is inlined.
So, in this example, compiler does not prohibit you use optimization with -g, so you can still debug the executable. In that example, even the function is optimized and inlined, gdb still can use DWARF information to know the function and source/line from current code block inside inlined function.
Above is just a case of compiler optimization. There might be plenty of reasons that could lead to mismatch symbols address between readelf -s and DWARF.

Reading MANY files at once in Fortran

I have 500,000 files which I need to read in Fortran and each file has ~14,000 entries in it (each entry is only about 100 characters long). I need to process each line for each file at a time. For example, I need to process line 1 for all 500,000 files before moving on to line 2 from the files and so forth.
I cannot open them all at once (I tried making an array of file pointers and opening them all) because there will be too many files open at once. Instead, I would like to do something as follows:
do iline = 1,Nlines
do ifile = 1,Nfiles
! open the file
! read a line
! close the file
enddo
end
In hopes that this would allow me to read one line at a time (from each file) and then move on to the next line (in each file). Unfortunately, each time I open the file it starts me off at line 1 again. Is there any way to open/close a file and then open it again where you left off previously?
Thanks
Unfortunately it is not possible in this way in standard Fortran. Even If you specify
position="ASIS"
the actual position will be unspecified for a not already connected unit and will be in fact the beginning of the file on most systems.
That means You have to use
read(*,*)
enough times to get on the right place in the file.
You could also use stream access. The file would be again opened at the beginning, but you can use
read(u,*,pos=n) number
where n is the position saved from the previous open. You can get the position from
inquire(unit=u, pos=n)
n = n
You would open the file with acess="STREAM".
Also 500000 opened files is indeed too much. There are ways how to inquire for the system limits and how to control them, but also your compiler may have some limits http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/
Other solution: Couldn't you store the content of the files in memory? Today couple of Gigabytes is OK, but it may be not enough for you.
You can try using fseek and ftell in something like the following.
! initialize an array of 0's
do iline = 1,Nlines
do ifile = 1,Nfiles
! open the file
! fseek(fd, array(ifile))
! read a line
! array(ifile)=ftell(fd)
! close the file
enddo
end
The (untested) idea is to store the offset of each file in an array and position the cursor at that place upon opening the file. Then, once a line is read, the ftell retrieves the current position which is saved to memory for next round. If all entries have the same length, you can spare the array and just store one value.
If the files have fixed, i.e., constant, record lengths, you could use direct access. Then you could "directly" read a specific record. A big "if" however.
the overhead of all the file opening/closing will be a big performance bottleneck.
You should try to read as much as you can for each open operation given whatever memory you have:
pseudocode:
loop until done:
loop over all files:
open
fseek !as in damiens answer
read N lines into array ! N=100 eg.
save ftell value for file
close
end file loop
loop over N output files:
open
write array data
close

Is there a length limit on g++ variable names?

See title​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​
Short Answer:
No
Long Answer:
Yes, it has to be small enough that it will fit in memory, but otherwise no, not really. If there is a builtin limit (I don't believe there is) it is so huge you'd be really hard-pressed to reach it.
Actually, you got me really curious, so I created the following Python program to generate code:
#! /usr/bin/env python2.6
import sys;
cppcode="""
#include <iostream>
#include <cstdlib>
int main(int argc, char* argv[])
{
int %s = 0;
return 0;
}
"""
def longvarname(n):
str="x";
for i in xrange(n):
str = str+"0";
return str;
def printcpp(n):
print cppcode % longvarname(n);
if __name__=="__main__":
if len(sys.argv)==2:
printcpp(int(sys.argv[1]));
This generates C++ code using the desired length variable name. Using the following:
./gencpp.py 1048576 > main.cpp
g++ main.cpp -o main
The above gives me no problems (the variable name is roughly 1MB in length). I tried for a gigabyte, but I'm not being so smart with the string construction, and so I decided to abort when gencpp.py took too long.
Anyway, I very much doubt that gcc pre-allocates 1MB for variable names. It is purely bounded by memory.
an additional gotcha, some linkers have a limit on the length of the mangled name. this tends to be an issue with template and nested classes more than identifier length but either could trigger a problem afaik
I don't know what the limit is (or if there is one), but I think it is good practice that there should be one, in order to catch pathological code, for example that created by a runaway code generator. For what it's worth, the C++ Standard suggests a minimum of 1K for identifier length.