How to understand the gprof outputs?

How to understand the gprof outputs? - g++

Here is my source code
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>
#include <windows.h>
#include <string>
#include <vector>
static void set(const char *buf)
{
static std::string *localstring;
localstring = new std::string;
(*localstring) = buf;
delete localstring;
}
int main()
{
const uint32_t BUF_SIZE = 1024 * 4;//32;
char buf[BUF_SIZE+4];
int i;
ULONG T0, T1;
for(i = 0; i < BUF_SIZE; i++) {
buf[i] = 'a' + i % 26;
}
T0 = GetTickCount();
for(i = 0; i < 10000000; i++) {
const int pos = i % BUF_SIZE + 1;
const char save_ch = buf[pos - 1];
set(buf);
buf[pos - 1] = 0;
buf[pos - 1] = save_ch;
}
T1 = GetTickCount();
printf("Totally %u mseconds\n", T1 - T0);
return 0;
}
Here are the profiling results:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
50.00 0.09 0.09 _fu0___ZNSs4_Rep20_S_empty_rep_storageE
38.89 0.16 0.07 _fu1___ZNSs4_Rep20_S_empty_rep_storageE
5.56 0.17 0.01 std::string::assign(char const*, unsigned int)
5.56 0.18 0.01 strlen
% the percentage of the total running time of the
time program used by this function.
cumulative a running sum of the number of seconds accounted
seconds for by this function and those listed above it.
self the number of seconds accounted for by this
seconds function alone. This is the major sort for this
listing.
calls the number of times this function was invoked, if
this function is profiled, else blank.
self the average number of milliseconds spent in this
ms/call function per call, if this function is profiled,
else blank.
total the average number of milliseconds spent in this
ms/call function and its descendents per call, if this
function is profiled, else blank.
name the name of the function. This is the minor sort
for this listing. The index shows the location of
the function in the gprof listing. If the index is
in parenthesis it shows where it would appear in
the gprof listing if it were to be printed.
Call graph (explanation follows)
granularity: each sample hit covers 4 byte(s) for 5.56% of 0.18 seconds
index % time self children called name
<spontaneous>
[1] 50.0 0.09 0.00 _fu0___ZNSs4_Rep20_S_empty_rep_storageE [1]
-----------------------------------------------
<spontaneous>
[2] 38.9 0.07 0.00 _fu1___ZNSs4_Rep20_S_empty_rep_storageE [2]
-----------------------------------------------
<spontaneous>
[3] 5.6 0.01 0.00 std::string::assign(char const*, unsigned int) [3]
-----------------------------------------------
<spontaneous>
[4] 5.6 0.01 0.00 strlen [4]
-----------------------------------------------
This table describes the call tree of the program, and was sorted by
the total amount of time spent in each function and its children.
Each entry in this table consists of several lines. The line with the
index number at the left hand margin lists the current function.
The lines above it list the functions that called this function,
and the lines below it list the functions this one called.
This line lists:
index A unique number given to each element of the table.
Index numbers are sorted numerically.
The index number is printed next to every function name so
it is easier to look up where the function in the table.
% time This is the percentage of the `total' time that was spent
in this function and its children. Note that due to
different viewpoints, functions excluded by options, etc,
these numbers will NOT add up to 100%.
self This is the total amount of time spent in this function.
children This is the total amount of time propagated into this
function by its children.
called This is the number of times the function was called.
If the function called itself recursively, the number
only includes non-recursive calls, and is followed by
a `+' and the number of recursive calls.
name The name of the current function. The index number is
printed after it. If the function is a member of a
cycle, the cycle number is printed between the
function's name and the index number.
For the function's parents, the fields have the following meanings:
self This is the amount of time that was propagated directly
from the function into this parent.
children This is the amount of time that was propagated from
the function's children into this parent.
called This is the number of times this parent called the
function `/' the total number of times the function
was called. Recursive calls to the function are not
included in the number after the `/'.
name This is the name of the parent. The parent's index
number is printed after it. If the parent is a
member of a cycle, the cycle number is printed between
the name and the index number.
If the parents of the function cannot be determined, the word
`<spontaneous>' is printed in the `name' field, and all the other
fields are blank.
For the function's children, the fields have the following meanings:
self This is the amount of time that was propagated directly
from the child into the function.
children This is the amount of time that was propagated from the
child's children to the function.
called This is the number of times the function called
this child `/' the total number of times the child
was called. Recursive calls by the child are not
listed in the number after the `/'.
name This is the name of the child. The child's index
number is printed after it. If the child is a
member of a cycle, the cycle number is printed
between the name and the index number.
If there are any cycles (circles) in the call graph, there is an
entry for the cycle-as-a-whole. This entry shows who called the
cycle (as parents) and the members of the cycle (as children.)
The `+' recursive calls entry shows the number of function calls that
were internal to the cycle, and the calls entry for each member shows,
for that member, how many times it was called from other members of
the cycle.
Index by function name
[3] std::string::assign(char const*, unsigned int) [2] _fu1___ZNSs4_Rep20_S_empty_rep_storageE
[1] _fu0___ZNSs4_Rep20_S_empty_rep_storageE [4] strlen
The actual running time of the program is 15 seconds from the print messages, instead of 0.18 seconds from the profiling results. Why such big difference and how to understand the profiling outputs?
BR,Ruochen

The main thing you're doing is new, string assignment, and delete.
If any of those are spending time in the OS, or code not linked with your build, the sampler won't see it. Simple as that.
That's not gprof's only problem.

Related

Trade off between Linear and Binary Search

I have a list of elements to be searched in a dataset of variable lengths. I have tried binary search and I found it is not always efficient when the objective is to search a list of elements.
I did the following study and conclude that if the number of elements to be searched is less than 5% of the data, binary search is efficient, other wise the Linear search is better.
Below are the details
Number of elements : 100000
Number of elements to be searched: 5000
Number of Iterations (Binary Search) =
log2 (N) x SearchCount=log2 (100000) x 5000=83048
Further increase in the number of search elements lead to more iterations than the linear search.
Any thoughts on this?
I am calling the below function only if the number elements to be searched is less than 5%.
private int SearchIndex(ref List<long> entitylist, ref long[] DataList, int i, int len, ref int listcount)
{
int Start = i;
int End = len-1;
int mid;
while (Start <= End)
{
mid = (Start + End) / 2;
long target = DataList[mid];
if (target == entitylist[listcount])
{
i = mid;
listcount++;
return i;
}
else
{
if (target < entitylist[listcount])
{
Start = mid + 1;
}
if (target > entitylist[listcount])
{
End = mid - 1;
}
}
}
listcount++;
return -1; //if the element in the list is not in the dataset
}
In the code I retun the index rather than the value because, I need to work with Index in the calling function. If i=-1, the calling function resets the value to the previous i and calls the function again with a new element to search.

In your problem you are looking for M values in an N long array, N > M, but M can be quite large.
Usually this can be approached as M independent binary searches (or even with the slight optimization of using the previous result as a starting point): you are going to O(M*log(N)).
However, using the fact that also the M values are sorted, you can find all of them in one pass, with linear search. In this case you are going to have your problem O(N). In fact this is better than O(M*log(N)) for M large.
But you have a third option: since M values are sorted, binary split M too, and every time you find it, you can limit the subsequent searches in the ranges on the left and on the right of the found index.
The first look-up is on all the N values, the second two on (average) N/2, than 4 on N/4 data,.... I think that this scale as O(log(M)*log(N)). Not sure of it, comments welcome!
However here is a test code - I have slightly modified your code, but without altering its functionality.
In case you have M=100000 and N=1000000, the "M binary search approach" takes about 1.8M iterations, that's more that the 1M needed to scan linearly the N values. But with what I suggest it takes just 272K iterations.
Even in case the M values are very "collapsed" (eg, they are consecutive), and the linear search is in the best condition (100K iterations would be enough to get all of them, see the comments in the code), the algorithm performs very well.

Understanding the output number of digits when dividing two floats [duplicate]

I am puzzled. I have no explanation to why this test passes when using the double data type but fails when using the float data type. Consider the following snippet of code.
float total = 0.00;
for ( int i = 0; i < 100; i++ ) total += 0.01;
One would anticipate total to be 1.00, however it is equal to 0.99. Why is this the case? I compiled with both GCC and clang, both compilers have the same result.

Try this:
#include <stdio.h>
int main(){
float total = 0.00;
int i;
for (i = 0; i < 100; i++)
total += 0.01;
printf("%f\n", total);
if (total == 1.0)
puts("Precise");
else
puts("Rounded");
}
At least on most machines, you'll get an output of "Rounded". In other words, the result simply happens to be close enough that when it's printed out, it's rounded so it looks like exactly 1.00, but it really isn't. Change total to a double, and you'll still get the same.

The value for 0.01 in decimal is expressed as the series: a1*(1/2) + a2*(1/2)^2 + a3*(1/2)^4 + etc. where aN is a zero or one.
I leave it to you to figure out the specific values of a1, a2 and how many fractional bits (aN) are required. In some cases a decimal fraction cannot be represented by a finite series of (1/2)^n values.
For this series to sum to 0.01 in decimal requires that aN go beyond the number of bits stored in a float (full word of bits minus the number of bits for a sign and exponent). But since double has more bits then 0.01 decimal can/might/maybe (you do the calculation) be precisely defined.

What do the operators '<<' and '>>' do?

I was following 'A tour of GO` on http://tour.golang.org.
The table 15 has some code that I cannot understand. It defines two constants with the following syntax:
const (
Big = 1<<100
Small = Big>>99
)
And it's not clear at all to me what it means. I tried to modify the code and run it with different values, to record the change, but I was not able to understand what is going on there.
Then, it uses that operator again on table 24. It defines a variable with the following syntax:
MaxInt uint64 = 1<<64 - 1
And when it prints the variable, it prints:
uint64(18446744073709551615)
Where uint64 is the type. But I can't understand where 18446744073709551615 comes from.

They are Go's bitwise shift operators.
Here's a good explanation of how they work for C (they work in the same way in several languages).
Basically 1<<64 - 1 corresponds to 2^64 -1, = 18446744073709551615.
Think of it this way. In decimal if you start from 001 (which is 10^0) and then shift the 1 to the left, you end up with 010, which is 10^1. If you shift it again you end with 100, which is 10^2. So shifting to the left is equivalent to multiplying by 10 as many times as the times you shift.
In binary it's the same thing, but in base 2, so 1<<64 means multiplying by 2 64 times (i.e. 2 ^ 64).

That's the same as in all languages of the C family : a bit shift.
See http://en.wikipedia.org/wiki/Bitwise_operation#Bit_shifts
This operation is commonly used to multiply or divide an unsigned integer by powers of 2 :
b := a >> 1 // divides by 2
1<<100 is simply 2^100 (that's Big).
1<<64-1 is 2⁶⁴-1, and that's the biggest integer you can represent in 64 bits (by the way you can't represent 1<<64 as a 64 bits int and the point of table 15 is to demonstrate that you can have it in numerical constants anyway in Go).

The >> and << are logical shift operations. You can see more about those here:
http://en.wikipedia.org/wiki/Logical_shift
Also, you can check all the Go operators in their webpage

It's a logical shift:
every bit in the operand is simply moved a given number of bit
positions, and the vacant bit-positions are filled in, usually with
zeros
Go Operators:
<< left shift integer << unsigned integer
>> right shift integer >> unsigned integer

'while' Loop in Objective-C

The following program calculates and removes the remainder of a number, adds the total of the remainders calculated and displays them.
#import <Foundation/Foundation.h>
int main (int argc, char * argv[]) {
#autoreleasepool {
int number, remainder, total;
NSLog(#"Enter your number");
scanf("%i", &number);
while (number != 0)
{
remainder = number % 10;
total += remainder;
number /= 10;
}
NSLog(#"%i", total);
}
return 0;
}
My questions are:
Why is the program set to continue as long as the number is not equal to 0? Shouldn't it continue as the long as the remainder is not equal to 0?
At what point is the remainder discarded from the value of number? Why is there no number -= remainder statement before n /=10?
[Bonus question: Does Objective-C get any easier to understand?]

The reason we continue until number != 0 instead of using remainder is that if our input is divisible by 10 exactly, then we don't get the proper output (the sum of the base 10 digits).
The remainder is dropped off because of integer division. Remember, an integer cannot hold a decimal place, so when we divide 16 by 10, we don't get 1.6, we just get 1.
And yes, Objective-C does get easier over time (but, as a side-note, this uses absolutely 0 features of Objective-C, so it's basically C with a NSLog call).
Note that the output isn't quite what you would expect at all times, however, as in C / ObjC, a (unlike languages like D or JS) a variable is not always initialized to a set value (in this case, you assume 0). This could cause UB down the road.

It checks to see if number is not equal to zero because remainder very well may never become zero. If we were to input 5 as our input value, the first time through the loop remainder would be set to 5 (because 5 % 10 = 5), and number would go to zero because
5 / 10 = 0.5, and ints do not store floating point values, so the .5 will get truncated and the value of number will equal zero.
The remainder does not get removed from the value of number in this code. I think that you may be confused about what the modulo operator does (see this explanation).
Bonus answer: learning a programming language is difficult at first, but very rewarding in the long run (if you stick with it). Each new language that you learn after your first will most likely be easier to learn too, because you will understand general programming constructs and practices. The best of luck on your endeavor!

Algorithm for max and min? (Objective-C)

This is a part of a book I'm reading to learn Objective-C.
The following defines a macro called MAX that gives the maximum of two
values: #define MAX(a,b) ( ((a) > (b)) ? (a) : (b) )
And then there are some exercises in the book that asks the reader to define a macro (MIN) to find the minimum of two values and another that asks to define a macro called MAX3 that gives the maximum of 3 values. I think these two definitions will look similar to MAX, but I don't understand how the MAXformula finds the maximum value. I mean if I just did this
int limits = MAX (4,8)
It'll just assign limits the value of 8. What does that have to do with finding a variable's maximum value?

I think you are confusing value and variable. The macro example you listed expands to a comparison between two values and returns the greater of the two values (i.e. which is greater, a or b). So you are right, int limits = MAX(4,8) just assigns 8 to limits and has nothing to do with finding the maximum value you can store in limits.
The header limits.h defines many values like INT_MAX that will tell you information about the min/max values of variable types on your system.

To break it apart:
The declaration:
#define MAX(a,b)
If a is greater than b, use a else use b:
( ((a) > (b)) ? (a) : (b) )
Then to create a MIN expression, use a similar form:
#define MIN(a,b) ( ((a) < (b)) ? (a) : (b) )
^
Then to create a MAX3 expression, you can combine them:
#define MAX3(a,b,c) ( MAX(a, MAX(b,c)) )
Specifically, this macro's intended to be used with scalars (C builtins) which can be compared using < or >. If you passed an objc variable, it would result in comparison of addresses and MAX would return the one with the higher address (it would be very rare if you actually wanted to compare addresses of objc instances).
Also note that this is the classic example of how macros can bite you. With macros, the preprocessor simply expands (textual copy/paste) the parameters in place, so: int limits = MAX (4,8) literally expands to int limits = (4 > 8 ? 4 : 8). If you write MAX(x,++y), then y will be incremented twice if y is greater than or equal to x because it expands to: int limits = (x > ++y ? x : ++y).

generally, you will use a MAX() or MIN() macro to get whichever is the higher/lower of a pair of variables, or of a variable and a constant, or even a pair of macro constants or other non-literal constant expressions. you generally won't supply 2 literal constants as you have done in your question.

Algorithm for max (Objective-C)
// get max value
- (float)maxValue:(NSArray *)arrValue
{
float maxValue = 0.0;
for (NSString *value in arrValue) {
float compareValue = [value floatValue];
if (compareValue > maxValue) {
maxValue = compareValue;
}
}
return maxValue;
}
NSArray *number=[NSArray arrayWithObjects:[NSNumber numberWithFloat:57.02], [NSNumber numberWithFloat:55.02], [NSNumber numberWithFloat:45.02], nil];
NSLog(#"%f", [self maxValue:number]);
result 57.020000

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to understand the gprof outputs? - g++

The main thing you're doing is new, string assignment, and delete. If any of those are spending time in the OS, or code not linked with your build, the sampler won't see it. Simple as that. That's not gprof's only problem.

Related

Trade off between Linear and Binary Search

Understanding the output number of digits when dividing two floats [duplicate]

What do the operators '<<' and '>>' do?

'while' Loop in Objective-C

Algorithm for max and min? (Objective-C)

Categories

Resources