using valgrind at specific point while running program - valgrind

I want to use valrgind at a specific time while my program is running. For example, when I use -O3 it will take 0.5 hours to reach the desired point. When I use -g -ggdb it will take nearly 2.5 hours to reach the desired point.
Now if I use valgrind with -g -ggdb, the program is extremely slow and I can not predict when it reach the desired point.
What should I do?

Related

gprof displays no time accumulated

I am trying to use gprof to profile my code. However, when I compile with the flag -pg, gprof displays no time accumulated and the report contains all 0's. The version I have is GNU gprof (GNU Binutils for Ubuntu) 2.34. How can I solve this issue?
The program is definitely running for quite a while (>30 seconds) and it contains only two functions. The same thing happens when I use code and instructions from this article: https://www.thegeekstuff.com/2012/08/gprof-tutorial/.

Valgrind has long pause before running executables

Let me preface this question by saying that I know it takes programs longer to run in valgrind as there is a lot of overhead. This question is not about that.
To ensure that our implementations of data structures have the appropriate runtime, all test cases time out after a certain amount of time (usually around 10 times the amount of time the teacher produced solutions take to run in Valgrind). I ran the test cases on my laptop early in the day and everything was fine. I made two very minor changes later at night (adding one to something and adding a counter for something else, both of which are constant time operations). I reran the tests and I timed out on even the most basic of test cases, like inserting one node. I was freaking out, so I went to the 24/7 computer lab on campus and ran my code on a virtual machine and it worked fine. I ran the binaries on my laptop and they're speedy. I tried turning my computer off and then back on and that didn't fix anything, so I tried updating valgrind but it is up to date. I removed valgrind and then re-installed and that didn't fix the problem either. To verify it is a problem with valgrind and not my code I made a hello_world.cpp then and ran the binary in valgrind with no extra flags. It takes about 15-20 seconds to run. I have absolutely no idea why this is happening. I've not made any changes to my computer. I've skimmed the valgrind documentation, but I cannot pin down what is wrong. I run Fedora 27.

Limiting data collection of Cachegrind, in Valgrind

It is well known that, the callgrind analysis tool of the valgrind suit, provides the possibility to start and stop the colection of data via command line instruction callgrind_control -i on or callgrind_control -i off. For instance, the following code will collect data only after the hour.
(sleep 3600; callgrind_control -i on) &
valgrind --tool=callgrind --instr-atstart=no ./myprog
Is there a similar option for the cachegrind tool? if so, how can I use it (I do not find anything in the documentation)? If no, how can I start collecting data after a certain amount of time with cachegrind?
As far as I know, there is no such function for Cachegrind.
However, Callgrind is an extension of Cachegrind, which means that you can use Cachegrind features on Callgrind.
For example:
valgrind --tool=callgrind --cache-sim=yes --branch-sim=yes ./myprog
Will measure your programs cache and branch performance as if you where using Cachegrind.

Optimization in GCC

I have two questions:
(1) I learned somewhere that -O3 is not recommended with GCC, because
The -O3 optimization level may increase the speed of the resulting executable, but can also increase its size. Under some circumstances where these optimizations are not favorable, this option might actually make a program slower. in fact it should not be used system-wide with gcc 4.x. The behavior of gcc has changed significantly since version 3.x. In 3.x, -O3 has been shown to lead to marginally faster execution times over -O2, but this is no longer the case with gcc 4.x. Compiling all your packages with -O3 will result in larger binaries that require more memory, and will significantly increase the odds of compilation failure or unexpected program behavior (including errors). The downsides outweigh the benefits; remember the principle of diminishing returns. Using -O3 is not recommended for gcc 4.x.
Suppose I have a workstation (Kubuntu9.04) which has 128 GB of memory and 24 cores and is shared by many users, some of whom may run intensive programs using like 60 GB memory. Is -O2 a better choice for me than -O3?
(2) I also learned that when a running program crashes unexpectedly, any debugging information is better than none, so the use of -g is recommended for optimized programs, both for development and deployment. But when compiled with -ggdb3 together with -O2 or -O3, will it slow down the speed of execution? Assume I am still using the same workstation.
The only way to know for sure is to benchmark your application compiled with -O2 and -O3. Also there are some individual optimization options that -O3 includes and you can turn on and off individually. Concerning the warning about larger binaries, note that just comparing executable file sizes compiled with -O2 and -O3 will not do much good here, because it is the size of small critical internal loops that matters here the most. You really have to benchmark.
It will result in a larger executable, but there shouldn't be any measurable slowdown.
Try it
You can rarely make accurate judgments about speed and optimisation without any data.
ps. This will also tell you if it's worth the effort. How many milliseconds saved in a function used once at startup is worthwhile ?
Firstly, it does appear that the compiler team is essentially admitting that -O3 isn't reliable. It seems like they are saying: try -O3 on your critical loops or critical modules, or your Lattice QCD program, but it's not reliable enough for building the whole system or library.
Secondly, the problem with making the code bigger (inline functions and other things) isn't only that it uses more memory. Even if you have extra RAM, it can slow you down. This is because the faster the CPU chip gets, the more it hurts to have to go out to DRAM. They are saying that some programs will run faster WITH the extra routine calls and unexploded branches (or whatever O3 replaces with bigger things) because without O3 they will still fit in the cache, and that's a bigger win than the O3 transformations.
On the other issue, I wouldn't normally build anything with -g unless I was currently working on it.
-g and/or -ggdb just adds debugging symbols to the executable. It makes the executable file bigger, but that part isn't loaded into memory(except when run in a debugger or similar).
As for what's best for performance of -O2 and -O3, there's no silver bullet. You have to measure/profile it for your particular program.
In my experience what I found is that GCC does not generate best assembly with O2 and O3, The best way is to apply specific optimization flags which you can find from this will definitely generate better code than -O2 and -O3 because there are flags which you can not find in -O2 and -O3, and they will be useful for your faster code.
One good example is that code and data prefetch instruction will never be inserted in your code with -O2 and -O3, But using additional flags for prefetching will make your memory intensive code 2 to 3 % faster.
You can find list of GCC optimization flags at http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html.
I think this pretty much answers your question:
The downsides outweigh the benefits; remember the principle of diminishing returns. Using -O3 is not recommended for gcc 4.x.
If the guys writing the compiler say not to do it, I wouldn't second guess them.

Unable to find processes unused for half an hour

You can get underground processes by
ps ux
I am searching a way to find processes to which I have not touched for 30 minutes.
How can you find processes unused for an half hour?
Define "untouched" and "unused". You can find out lots of things using the f parameter on ps(1) in BSD-like systems, the -o on Solaris and Sys/V-like systems.
Update
Responding to the comment:
Well, you can do it. Consider, for example, something that does a periodic ps, and stores the CPU time used along with time. (Actually, you could do this better with a C program calling the appropriate system calls, but that's really an implementation detail.) Store sample time and PID, and watch for the PID's CPU time not having changed over the appropriate interval. This could even be implemented with an awk or perl program like
while true; do
ps _flags_
sleep 30
done | awk -f myprog | tail -f
so that every time awk gets a ps output, it mangles it, identifies candidates, and sends them out to show through tail -f.
But then you may well have daemon processes that don't get called often; it's not clear to me that CPU time alone is a good measure.
That's the point about defining what you really want to do: there's probably a way to do it, but I can't think of a combination of ps flags alone that will do it.