I am using valgrind callgrind to profile a program on gtk. And then I use kcachedgrind to read the result. I have captured an update a screenshot of kcachedgrind here: http://i41.tinypic.com/168spk0.jpg. It said the function gtk_moz_embed_new() costed '15.61%'.
But I dont understand how is that possible. the function gtk_moz_embed_new() literally has 1 line: and it is just calling a g_object_new().
GtkWidget *
gtk_moz_embed_new(void)
{
return GTK_WIDGET(g_object_new(GTK_TYPE_MOZ_EMBED, NULL));
}
Can you please help understanding the result or how to use kcachedgrind.
Thank you.
If i remember correctly that should mean (more or less) that function gtk_moz_embed_new() was executing 15.61% of the time the the app was running.
You see, that function returns an inline call to other functions (or classes or whatever) that also take time to execute. When they are all done it's then that the function gtk_moz_embed_new() acutally returns a value. The very same reason it takes main() 99% of the time to execute, it finisesh execution after all included code in it is executed.
Note that self value for the gtk_moz_embed_new() is 0 which is "exclusive cost" meaning that function it self did not really took any time to execute (it's really only a return call)
But to be exact:
1.1 What is the difference between 'Incl.' and 'Self'?
These are cost attributes for
functions regarding some event type.
As functions can call each other, it
makes sense to distinguish the cost of
the function itself ('Self Cost') and
the cost including all called
functions ('Inclusive Cost'). 'Self'
is sometimes also referred to as
'Exclusive' costs.
So e.g. for main(), you will always
have a inclusive cost of almost 100%,
whereas the self cost is neglectable
when the real work is done in another
function.
Related
I'm currently working on a project that does not include GSAP (Greensock's JS Tweening library), but since it's super easy to create your own Custom Easing functions with it's visual editor - I was wondering if there is a way to break down the desired ease-function so that it can be reused in a CreateJS Tween?
Example:
var myEase = CustomEase.create("myCustomEase", [
{s:0,cp:0.413,e:0.672},{s:0.672,cp:0.931,e:1.036},
{s:1.036,cp:1.141,e:1.036},{s:1.036,cp:0.931,e:0.984},
{s:0.984,cp:1.03699,e:1.004},{s:1.004,cp:0.971,e:0.988},
{s:0.988,cp:1.00499,e:1}
]);
So that it turns it into something like:
var myEase = function(t, b, c, d) {
//Some magic algorithm performed on the 7 bezier/control points above...
}
(Here is what the graph would look like for this particular easing method.)
I took the time to port and optimize the original GSAP-based CustomEase class... but due to license restrictions / legal matters (basically a grizzly bear that I do not want to poke with a stick...), posting the ported code would violate it.
However, it's fair for my own use. Therefore, I believe it's only fair that I guide you and point you to the resources that made it possible.
The original code (not directly compatible with CreateJS) can be found here:
https://github.com/art0rz/gsap-customease/blob/master/CustomEase.js (looks like the author was also asked to take down the repo on github - sorry if the rest of this post makes no sense at all!)
Note that CreateJS's easing methods only takes a "time ratio" value (not time, start, end, duration like GSAP's easing method does). That time ratio is really all you need, given it goes from 0.0 (your start value) to 1.0 (your end value).
With a little bit of effort, you can discard those parameters from the ease() method and trim down the final returned expression.
Optimizations:
I took a few extra steps to optimize the above code.
1) In the constructor, you can store the segments.length value directly as this.length in a property of the CustomEase instance to cut down a bit on the amount of accessors / property lookups in the ease() method (where qty is set).
2) There's a few redundant calculations done per Segments that can be eliminated in the ease() method. For instance, the s.cp - s.s and s.e - s.s operations can be precalculated and stored in a couple of properties in each Segments (in its constructor).
3) Finally, I'm not sure why it was designed this way, but you can unwrap the function() {...}(); that are returning the constructors for each classes. Perhaps it was used to trap the scope of some variables, but I don't see why it couldn't have wrapped the entire thing instead of encapsulating each one separately.
Need more info? Leave a comment!
As a curiosity i held one experiment in objective C:
My assumtion before starting the experiment was function returning value should take more time (Nanoseconds) than function with no returns.
Two functions having same number and same codes but one returning value and another not returning the value were written.
-(void)methodNotReturningTheValue
{
//Some Codes + #""
}
-(NSString*)methodReturningTheValue
{
//Some Codes
return #"";
}
The time is shown in nano second before and after calling the function. Below are the results:
Example:
Time Before Calling methodNotReturningTheValue: 1411033150.946451
Time Before Calling methodReturningTheValue: 1411033150.946978
Difference in Time (Before and after) for methodReturningTheValue:0.000527
Time Before Calling methodReturningTheValue: 1411033150.947947
Time Before Calling methodReturningTheValue: 1411033150.948464
Difference in Time (Before and after) for methodReturningTheValue: 0.000517
The results are not consisted. Sometime time consume by methodreturningvalue is greater and sometime methodNotReturning value is greater. May be the fluctuation in working of [[NSDate date] timeIntervalSince1970]; is not allowing the experiment to calculate the time consumed by function returning and not returning values.
am i sounding irrelative or any direction or solution to solve this curiosity:
My Query: How to demonstrate performance difference between function returning value and function with out any return (Void) in any language?
Thank You!
A value returned from a method is stored in a register (at least under ARM and <= 16-bytes; see the accepted answer to this question) and so that difference between the methods is irrelevant when a reference is returned.
A method returning a struct on the other hand will be more relevant, given the struct must be copied back to the caller.
I'm wrestling with the concept of code "order of execution" and so far my research has come up short. I'm not sure if I'm phrasing it incorrectly, it's possible there is a more appropriate term for the concept. I'd appreciate it if someone could shed some light on my various stumbling blocks below.
I understand that if you call one method after another:
[self generateGrid1];
[self generateGrid2];
Both methods are run, but generateGrid1 doesn't necessarily wait for generateGrid2. But what if I need it to? Say generateGrid1 does some complex calculations (that take an unknown amount of time) and populate an array that generateGrid2 uses for it's calculations? This needs to be done every time an event is fired, it's not just a one time initialization.
I need a way to call methods sequentially, but have some methods wait for others. I've looked into call backs, but the concept is always married to delegates in all the examples I've seen.
I'm also not sure when to make the determinate that I can't reasonably expect a line of code to be parsed in time for it to be used. For example:
int myVar = [self complexFloatCalculation];
if (myVar <= 10.0f) {} else {}
How do I determine if something will take long enough to implement checks for "Is this other thing done before I start my thing". Just trial and error?
Or maybe I'm passing a method as parameter of another method? Does it wait for the arguments to be evaluated before executing the method?
[self getNameForValue:[self getIntValue]];
I understand that if you call one method after another:
[self generateGrid1];
[self generateGrid2];
Both methods are run, but generateGrid1 doesn't necessarily wait for generateGrid2. But what if I need it to?
False. generateGrid1 will run, and then generateGrid2 will run. This sequential execution is the very basis of procedural languages.
Technically, the compiler is allowed to rearrange statements, but only if the end result would be provably indistinguishable from the original. For example, look at the following code:
int x = 3;
int y = 4;
x = x + 6;
y = y - 1;
int z = x + y;
printf("z is %d", z);
It really doesn't matter whether the x+6 or the y-1 line happens first; the code as written does not make use of either of the intermediate values other than to calculate z, and that can happen in either order. So if the compiler can for some reason generate more efficient code by rearranging those lines, it is allowed to do so.
You'd never be able to see the effects of such rearranging, though, because as soon as you try to use one of those intermediate values (say, to log it), the compiler will recognize that the value is being used, and get rid of the optimization that would break your logging.
So really, the compiler is not required to execute your code in the order provided; it is only required to generate code that is functionally identical to the code you provided. This means that you actually can see the effects of these kinds of optimizations if you attach a debugger to a program that was compiled with optimizations in place. This leads to all sorts of confusing things, because the source code the debugger is tracking does not necessarily match up line-for-line with the code the compiled code the compiler generated. This is why optimizations are almost always turned off for debug builds of a program.
Anyway, the point is that the compiler can only do these sorts of tricks when it can prove that there will be no effect. Objective-c method calls are dynamically bound, meaning that the compiler has absolutely no guarantee about what will actually happen at runtime when that method is called. Since the compiler can't make any guarantees about what will happen, the compiler will never reorder Objective-C method calls. But again, this just falls back to the same principle I stated earlier: the compiler may change order of execution, but only if it is completely imperceptible to the user.
In other words, don't worry about it. Your code will always run top-to-bottom, each statement waiting for the one before it to complete.
In general, most method calls that you see in the style you described are synchronous, that means they'll have the effect you desire, running in the order the statements were coded, where the second call will only run after the first call finishes and returns.
Also, when a method takes parameters, its parameters are evaluated before the method is called.
float pi = 3.14;
float (^piSquare)(void) = ^(void){ return pi * pi; };
float (^piSquare2)(void) = ^(void){ return pi * pi; };
[piSquare isEqualTo: piSquare2]; // -> want it to behave like -isEqualToString...
To expand on Laurent's answer.
A Block is a combination of implementation and data. For two blocks to be equal, they would need to have both the exact same implementation and have captured the exact same data. Comparison, thus, requires comparing both the implementation and the data.
One might think comparing the implementation would be easy. It actually isn't because of the way the compiler's optimizer works.
While comparing simple data is fairly straightforward, blocks can capture objects-- including C++ objects (which might actually work someday)-- and comparison may or may not need to take that into account. A naive implementation would simply do a byte level comparison of the captured contents. However, one might also desire to test equality of objects using the object level comparators.
Then there is the issue of __block variables. A block, itself, doesn't actually have any metadata related to __block captured variables as it doesn't need it to fulfill the requirements of said variables. Thus, comparison couldn't compare __block values without significantly changing compiler codegen.
All of this is to say that, no, it isn't currently possible to compare blocks and to outline some of the reasons why. If you feel that this would be useful, file a bug via http://bugreport.apple.com/ and provide a use case.
Putting aside issues of compiler implementation and language design, what you're asking for is provably undecidable (unless you only care about detecting 100% identical programs). Deciding if two programs compute the same function is equivalent to solving the halting problem. This is a classic consequence of Rice's Theorem: Any "interesting" property of Turing machines is undecidable, where "interesting" just means that it's true for some machines and false for others.
Just for fun, here's the proof. Assume we can create a function to decide if two blocks are equivalent, called EQ(b1, b2). Now we'll use that function to solve the halting problem. We create a new function HALT(M, I) that tells us if Turing machine M will halt on input I like so:
BOOL HALT(M,I) {
return EQ(
^(int) {return 0;},
^(int) {M(I); return 0;}
);
}
If M(I) halts then the blocks are equivalent, so HALT(M,I) returns YES. If M(I) doesn't halt then the blocks are not equivalent, so HALT(M,I) returns NO. Note that we don't have to execute the blocks -- our hypothetical EQ function can compute their equivalence just by looking at them.
We have now solved the halting problem, which we know is not possible. Therefore, EQ cannot exist.
I don't think this is possible. Blocks can be roughly seen as advanced functions (with access to global or local variables). The same way you cannot compare functions' content, you cannot compare blocks' content.
All you can do is to compare their low-level implementation, but I doubt that the compiler will guarantee that two blocks with the same content share their implementation.
Noted that the parameter of taskDelay is of type int, which means the number could be negative. Just wondering how the function is going to react when passing a negative number.
Most functions would validate the input, and just return early/return 0/set the parameter in question to a default value.
I presume there's no critical need to do this in production, and you probably have some code lying around that you could test with.... why not give it a go?
The documentation doesn't address it, and the only error codes they do define doesn't cover this case. The most correct answer therefore is that the results are undefined.
See the VxWorks / Tornado II FAQ for this gem, however:
taskDelay(-1) shows another bug in
the vxWorks timer/tick code. It has
the (side) effect of setting vxTicks
to zero. This corrupts the localtime
(and probably other things). In fact
taskDelay(x) will have the same effect
if vxTicks + x >= 0x100000000. If the
system clock rate is 100Hz this
happens after about 500 days (because
vxTicks wraps). At faster clock rates
it will happen sooner. Anyone trying
for several years uptime?
Oh there is an undocumented upper
limit on the clock rate. At rates
above 4294 select() will fail to
convert its 'usec' time into the
correct number of ticks. (From: David
Laight, dsl#tadpole.co.uk)
Assuming this bug is old, I would hope that it would either return an error or do the same thing as taskDelay(0), which puts your task at the end of the ready queue.
The task delay tick will be VIRTUALLY 10,9,..,1,0 for taskDelay(10).
The task delay tick will be VIRTUALLY -10,-11,...,-2147483648,2147483647,...,1,0 for taskDelay(-10).