Some doubts about the graph generated by gprof and gprof2dot - gprof

I used gprof2dot to generate the graph below, which visualized my program's profiling output.
I have some doubts about the graph:
First, why the root of the call tree isn't main(), and the root Bat_Read() even not appeared in my program, but is declared in the .h file.
Second, GMatrix is a C++ class without explicitly destructor, and it is unreasonable for it to call the two functions in the graph. Almost half of time spending is also illogic.
Third, What is the long function at the bottom of the graph, which spends 6.94 percentages of time ?
You can read the graph in a new tab and magnify it, so you can see it clearly.

I just magnified the image so I could read it.
The function at the bottom is very wide only because it has an extremely long name, but it is only a method _M_Erase of a red-black tree. It is called half a million times from galois_w16_region_multiply. Its size draws your attention to it, but in fact it only appears on about 7% of samples.
If you take every block in the diagram that has no parents, and add up their inclusive percents, you get 100%.
All of this indicates that gprof's method of propagating time upwards through the call graph is flaky, so it thinks things are at the top, when in fact it just couldn't figure out who the caller was.
You can't tell much from this. You might consider alternatives to gprof.
ADDED: Gprof puts some entry code into every function compiled with the -pg flag. So when A calls B, the code in B tries to figure out what routine called it, by using the return address and looking it up in a table of functions. It uses that to increment a counter saying how many times A called B. If for some reason it cannot figure out the correct caller, you get mistakes like you see in this graph. For example, it says that routines
~vector
~GMatrix
galois_w32_region_multby_2
galois_get_log_table
Bat_Read
are at the tops of call chains (have no callers among your functions).
What's more, it thinks that main was called by Bat_Read.
This is typical of gprof.

Related

How should I organise a pile of singly used functions?

I am writing a C++ OpenCV-based computer vision program. The basic idea of the program could be described as follows:
Read an image from a camera.
Do some magic to the image.
Display the transformed image.
The implementation of the core logic of the program (step 2) falls into a sequential calling of OpenCV functions for image processing. It is something about 50 function calls. Some temporary image objects are created to store intermediate results, but, apart from that, no additional entities are created. The functions from step 2 are used only once.
I am confused about organising this type of code (which feels more like a script). I used to create several classes for each logical step of the image processing. Say, here I could create 3 classes like ImagePreprocessor, ImageProcessor, and ImagePostprocessor, and split the abovementioned 50 OpenCV calls and temorary images correspondingly between them. But it doesn't feel like a resonable OOP design. The classes would be nothing more than a way to store the function calls.
The main() function would still just create a single object of each class and call thier methods consequently:
image_preprocessor.do_magic(img);
image_processor.do_magic(img);
image_postprocessor.do_magic(img);
Which is, to my impression, essentially the same thing as callling 50 OpenCV functions one by one.
I start to question whether this type of code requiers an OOP design at all. After all, I can simply provide a function do_magic(), or three functions preprocess(), process(), and postprocess(). But, this approach doesn't feel like a good practice as well: it is still just a pile of function calls, separated into a different function.
I wonder, are there some common practices to organise this script-like kind of code? And what would be the way if this code is a part of a large OOP system?
Usually, in Image Processing, you have a pipeline of various Image Processing Modules. Same is applicable on Video Processing, where each Image is processed according to its timestamp order in the video.
Constraints to consider before designing such pipeline:
Order of Execution of these modules is not always same. Thus, the pipeline should be easily configurable.
All modules of the pipeline should be executable in parallel with each other.
Each module of the pipeline may also have a multithreaded operation. (Out of scope of this answer, but is a good idea when a single module becomes the bottleneck for the pipeline).
Each module should easily adhere to the design and have the flexibility of internal implementation changes without affecting other modules.
The benefit of preprocessing of a frame by one module should be available to later modules.
Proposed Design.
Video Pipeline
A video pipeline is a collection of modules. For now, assume module is a class whose process method is called with some data. How each module can be executed will depend on how such modules are stored in VideoPipeline! To further explain, see below two categories:-
Here, let’s say we have modules A, B, and C which always execute in same order. We will discuss the solution with a video of Frame 1, 2 and 3.
a. Linked List: In a single-threaded application, frame 1 is first executed by A, then B and then C. The process is repeated for next frame and so on. So linked list seems like an excellent choice for the single threaded application.
For a multi-threaded application, speed is what matters. So, of course, you would want all your modules running 128-core machine. This is where Pipeline class comes into play. If each Pipeline object runs in a separate thread, the whole application which may have 10 or 20 modules starts running multithreaded. Note that the single-thread/multithread approach can be made configurable
b. Directed Acyclic Graph: Above-linked list implementation can be further improved when you have high processing power and want to reduce the lag between input and response time of pipeline. Such a case is when module C does not depend on B, but on A. In such case, any frame can be parallelly processed by module B and module C using a DAG based implementation. However, I wouldn’t recommend this as the benefits are not so great compared to the increased complexity, as further management of output from module B and C needs to be done by say module D where D depends on B or C or both. The number of scenarios increases.
Thus, for simplicity sake, let’s use LinkedList based design.
Pipeline
Create a linked list of PipelineElement.
Make process method of pipeline call process method of the first element.
PipelineElement
First, the PipelineElement processes the information by calling its ImageProcessor(read below). The PipelineElement will pass a Packet(of all data, read below) to ImageProcessor and receives the updated packet.
If next element is not null, call next PipelineElement process and pass updated packet.
If next element of a PipelineElement is null, stop. This element is special as it has an Observer object. Other PipelineElement will be set to null for Observer field.
FrameReader(VIdeoReader/ImageReader)
For video/image reader, create an abstract class. Whether you process video or image or multiple, processing is done one frame at a time, so create an abstract class(interface) ImageProcessor.
A FrameReader object stores reference to the pipeline.
For each frame, it pushes the information in by calling process method of Pipeline.
ImageProcessor
There is no Pre and Post ImageProcessor. For example, retinex processing is used as Post Processing but some application can use it as PreProcessing. Retinex processing class will implement ImageProcessor. Each element will hold Its ImageProcessor and Next PipeLineElement object.
Observer
A special class which extends PipelineElement and provides a meaningful output using GUI or disk.
Multithreading
1. Make each method run in its thread.
2. Each thread will poll messages from a BlockingQueue( of small size like 2-3 Frames) to act as a buffer between two PipelineElements. Note: The queue helps in averaging the speed of each module. Thus, small jitters(a module taking too long time for a frame) does not affect video output rate and provides smooth playback.
Packet
A packet will store all the information such as input or Configuration class object. This way you can store intermediate calculations as well as observe a real-time effect of changing configuration of an algorithm using a Configuration Manager.
To conclude, each element can now process in parallel. The first element will process nth frame, the second element will process n-1th frame, and soon, but with this, a lot more issues such as pipeline bottlenecks and additional delays due to less core power available to each element will pop up.
This structure lends itself to the pipes and filters architecture (see Pattern-Oriented Software Architecture Volume 1: A System of Patterns by Frank Buschmann):
The Pipes and Filters architectural pattern provides a structure for
systems that process a stream of data. Each processing step is
encapsulated in a filter component. Data is passed through pipes
between adjacent filters. Recombining filters allows you to build
families of related systems.
See also this short description (with images) from the Enterprise Integration Patterns book.

Simple game design: I'm scared of tons of loops

I started making a simple 2D game than runs on LAN using C++ and SFML library. The game uses a typical update function every frame with its loop to change state of the objects. The game class stores a vector/list of players and monsters and two maps (one for tileset - just graphics, 2nd one holding terrain mechanics - wall, ground etc).
In loop, I call a Think() function (which does move/jump/attack, etc) on every monster (different monsters behave differently but all are inherited from abstract class Monster with theirs appropriate override).
The problem is:
For every monster I need to loop through every other object to check collision
For every monster I need to find near objects (by its coords) so the monster can behave according to what it is seeing
For every non-living object (like flying fireball, any other projectile) I need to update its coords according to passed time (this is easy) but again check collision
For every player I need to loop through all other players/non-lived/monsters to collect information about near objects to send appropriate state of the game to them.
I'm scared how many loops/nested loops this game would have.
I've already seen that some games implement small-instance-based maps world so the loops are always going through small amount of data and since every map is separated its easy to find anything/send update to players.
I could apply this approach to every floor with ease but the floor 0 would be still really huge (array around 5000x5000 tiles to walk on).
I'm thinking now of changing world map array to class that stores references to each object by its coordinates. I just came up with an idea that sorting objects by theirs coords would improve performance of loops or even replace them.
Is this a correct design? Or does exist a better idea?
You should not worry to much about many loops. You can always optimize once you run into problems.
However for the collision you should avoid to check each object against all others, as this will require n^2 checks. Still, this only applies if you really run into performance problems. If this happens, the default approach is to use a grid, which is updated once per frame (or less) to calculate each object's position in the grid. This means each of your cells will know about all objects in it.
Then, if you want to find collisions for a single object, you just check it, with the objects in the same cell and in adjacent cells.
If you have a big amount of objects, you might consider a dynamically adjusting grid, which can be achieved via a quadtree for example. But in most cases a simple statically defined grid should be sufficient.

LabVIEW: Programmatically setting FPGA I/O variables (templates?)

Question
Is there a way to programmatically set what FPGA variables I am reading from or writing to so that I can generalize my main simulation loop for every object that I want to run? The simulation loops for each object are identical except for which FPGA variables they read and write. Details follow.
Background
I have a code that uses LabVIEW OOP to define a bunch of things that I want to simulate. Each thing then has an update method that runs inside of a Timed Loop on an RT controller, takes a cluster of inputs, and returns a cluster of outputs. Some of these inputs come from an FPGA, and some of the outputs are passed back to the FPGA for some processing before being sent out to hardware.
My problem is that I have a separate simulation VI for every thing in my code, since different values are read from and returned to the FPGA for each thing. This is a pain for maintainability and seems to cry out for a better method. The problem is illustrated below. The important parts are the FPGA input and output nodes (change for every thing), and the input and output clusters for the update method (always the same).
Is there some way to define a generic main simulation VI and then programmatically (maybe with properties stored in my things) tell it which specific inputs and outputs to use from the FPGA?
If so then I think the obvious next step would be to make the main simulation loop a public method for my objects and just call that method for each object that I need to simulate.
Thanks!
The short answer is no. Unfortunately once you get down to the hardware level with LabVIEW FPGA things begin to get very static and rely on hard-coded IO access. This is typically handled exactly how you have presented your current approach. However, you may be able encapsulate the IO access with a bit of trickery here.
Consider this, define the IO nodes on your diagram as interfaces and abstract them away with a function (or VI or method, whichever term you prefer). You can implement this with either a dynamic VI call or an object oriented approach.
You know the data types defined by your interface are well known because you are pushing and pulling them from clusters that do not change.
By abstracting away the hardware IO with a method call you can then maintain a library of function calls that represent unique hardware access for every "thing" in your system. This will encapsulate changes to the hardware IO access within a piece of code dedicated to that job.
Using dynamic VI calls is ugly but you can use the properties of your "things" to dictate the path to the exact function you need to call for that thing's IO.
An object oriented approach might have you create a small class hierarchy with a root object that represents generic IO access (probably doing nothing) with children overriding a core method call for reading or writing. This call would take your FPGA reference in and spit out the variables every hardware call will return (or vice versa for a read). Under the hood it is taking care of deciding exactly which IO on the FPGA to access. Example below:
Keep in mind that this is nowhere near functional, I just wanted you to see what the diagram might look like. The approach will help you further generalize your main loop and allow you to embed it within a public call as you had suggested.
This looks like an [object mapping] problem which LabVIEW doesn't have great support for, but it can be done.
My code maps one cluster to another assuming the control types are the same using a 2 column array as a "lookup."

How can I check chess rules without a circular dependency?

I'm writing a chess program in C++. I ran into an issue that I probably should have foreseen. The way the program finds all possible moves is by trying to move each piece to every square on the board. The function that does that is called calculateAllPossibleMoves. Each move is tested by cloning the game, and calling move on the piece being tested. The move function will throw an exception when a move is invalid. If no exception is thrown, then that move is valid, and it's added to the list of possible moves.
A move is not valid if it results in your king being in check. So I have a function that uses the find-all-possible moves function (let's call it inCheck) to see whether one of the opponent's pieces checks the king.
The problem is, the aforementioned move function relies on the inCheck function to find out whether the move results in check. inCheck uses calculateAllPossibleMoves to find all the moves that could potentially lead to the king. calculateAllPossibleMoves finds all the possible moves by simulating the move using the normal move function. This code runs forever because it's mutually recursive.
To try to fix it, what I did was introduce an edge case by passing in an integer. So when I call move, it decrements the integer and passes it along, and when move is called again, it is called with a lower number. That way, infinite recursion is not possible.
Still, the results seem to vary, and when I increase the number, the program takes a very long time to run. Is there a cleaner way to solve this problem?
Why don't you have an abstract class ChessPiece inherited by all chess pieces, and 32 of those objects or something in an array of another class Game supporting a class BoardPosition, or something.
Then to list all moves, just go thru the active (not captured) pieces and for each cycle thru their moves (most are direction + number of squares). A move is valid if (1) there is no piece on the way and no same-color piece at the target position AND (2) no opponent's piece not taken by the tested move has in its possible moves the chance to capture your king in the updated board (because that's what check is -- mate just means that you cannot avoid the opponent taking your king)
What you really need is a calculateIntersectingMoves function which simply detects if any pieces have a valid move which intersects the square in question, and call it for check detection.
If calculateAllPossibleMoves is accurately named, it's horribly inefficient to use simply to see if there are any pieces which can reach a specific square; worse, it leads to the exact circularity you mentioned.

Getting the world's contactListener in Box2D

I'm writing a game for Mac OS using cocos2D and Box2D. I've added a b2ContactListener subclass to my world as follows:
contactListener = new ContactListener();
world->SetContactListener(contactListener);
This works perfectly, but I am unsure of the best/accepted way to access the contact listener from other classes that don't currently have a direct reference to the contact listener.
I know I can pass a reference to other classes that need it, but what I was wondering is if there is a better way. More specifically, although I can't find a method to do this, is there some equivalent of this:
world->GetContactListener();
in Box2D?
The reason I am trying to do this is simply because I would prefer to move some game logic (i.e. whether a body is able to jump based on information from the contact listener) to the relevant classes themselves, rather than putting everything in the main gameplay class.
Thanks!
A contact listener just serves as an entry point for the four functions BeginContact, EndContact, PreSolve and PostSolve. Typically it has no member variables, so there is no reason to get it, because there is nothing to get from it.
When one of these functions is called during a world Step, you can make a note of which two things touched/stopped touching etc, but you should not change anything in the world right away, until the time step is complete.
I think the crux of this question is the method used to 'make a note' of which things touched, but that's really up to you and depends on what kind of information you need. For example if you're only interested in BeginContact, then the absolute simplest way might be to just store which two fixtures touched as a list of pairs:
std::vector< std::pair<b2Fixture*, b2Fixture*> > thingsThatTouched;
//in BeginContact
thingsThatTouched.push_back( make_pair(contact->GetFixtureA(), contact->GetFixtureB()) );
//after the time step
for (int i = 0; i < thingsThatTouched.size(); i++) {
b2Fixture* fixtureA = thingsThatTouched[i].first;
b2Fixture* fixtureB = thingsThatTouched[i].second;
// ... do something clever ...
}
thingsThatTouched.clear(); //important!!
For this to work you'll need to make the thingsThatTouched list visible from the contact listener function, so it could either be a global variable, or you could set a pointer to it in the contact listener class, or maybe have a global function that returns a pointer to the list.
If you need to keep track of more information such as what things stopped touching, or do something after the time step based on how hard things impacted when they touched etc, it will take a bit more work and becomes more specific. You might find these tutorials useful:
This one uses BeginContact/EndContact to update a list of which other things a body is touching, and uses it to decide if a player can jump at any given time:
http://www.iforce2d.net/b2dtut/jumpability
This one uses a similar method to look at what type of surfaces are currently under a car tire, to decide how much friction the surface has:
http://www.iforce2d.net/b2dtut/top-down-car
This one uses PreSolve to decide whether two bodies (arrow and target) should stick together when they collide, based on the speed of the impact. The actual 'sticking together' processing is done after the time step finishes:
http://www.iforce2d.net/b2dtut/sticky-projectiles
I think you simply can call GetContactList and then process all the contacts using iterator if you need to do it in some other place