Understanding the wording from gprof - gprof

I wanted to time my program's time to completion, and came across the GNU program GPROF. It seems to be very powerful, but I don't understand some of its output.
I don't understand what this means, could someone please clarify what "sample hit covers x bytes" and "X% of X seconds" means? I don't understand the context of the latter, I understand it is 0.00926 seconds... :
granularity: each sample hit covers 2 byte(s) for 0.02% of 46.30 seconds

Related

QPSK works in simulation but not with SDR

I'm going to start off by saying that I'm very new to SDR and GNU Radio. This may be a dumb question, but I have been googling and testing things for about two months now trying to get this to work without success. Any help or pointers would be appreciated!
I'm attempting to use GNU Radio 3.8 to transfer a file using differential QPSK. I've tried to follow the tutorials on the wiki as well as several similar academic papers I found on the internet (which also seem to be based off the wiki tutorial). None of them worked on their own but combining what actually works from each one, I managed to create a flowgraph sans hardware that does indeed send and receive the data from a file. Here's the Flowgraph and here is a screenshot of the results. The results show the four constellation points, and the data from the file source matches up perfectly with the data having gone though the entire transmit+receive chain. In the simulation I have a throttle block and a channel model block where the LimeSDR Source and LimeSDR Sink block would be. So far so good (at least as far as I can tell).
When I actually start transmitting this signal with the SDR, the received data no longer matches up with what is transmitted. Here's the flowgraph I've been using for the transmission. I added a protocol formatter and some FEC blocks that I could have removed for this illustration, but the point is that simply looking at what bits are going into the modulator vs what's being recovered, the two do not match up. The constellation looks good (as far as I can tell) but the bits are all wrong. Here's a screenshot showing the bits being transmitted. You'll notice in the screenshot of the transmitted signal that the signal has a repeating series of three flat top "1's" surrounded on both sides by a period of "0's" (at time 1.5ms and 3.5ms). This is a screenshot of the received bits. At time 1ms and 3ms you can see how it is has significantly more transitions between 1 and 0 than it should.
So at this point I'm stumped. The simulation worked but the real world test does not. I've messed around with the RRC filter properties a significant amount. I have no clue if the values I have chosen are correct as I have not found a tutorial or explanation on how to do so. I just looked at some of the example flowgraphs and made some guesses as to how those values were derived and applied those guesses to my use case. It worked well in the simulation so I thought it would be fine in the real world test. I've tried a variety of samples per symbol but my goal is for a 4800 bit per second transfer speed, and using different samples per symbol didn't help anyway. What should I change in order to get this to work?
Bonus question: The constellation object has QPSK and DQPSK, and the constellation modulator has a differential checkbox. What is the best practice combination of selections to get a differential QPSK modulation?

Why solutions like pusher claim to be "Real time"?

I have been using Pusher for some time now. I always assumed "Real time" meant "instantaneous". Lately I have step into this article: https://en.wikipedia.org/wiki/Real-time_computing, and a sentence grab my attention:
"Real-time programs must guarantee response within specified time
constraints"
They give an example based on audio processing:
"Consider an audio DSP example; if a process requires 2.01 seconds to
analyze, synthesize, or process 2.00 seconds of sound, it is not
real-time. However, if it takes 1.99 seconds, it is or can be made
into a real-time DSP process."
My questions:
1. This definition only applies to hardware/electronic devices or can be applied to software too?
2. If applies to software, does it apply to remote services like Pusher?
3. What is the time constraint for pusher to be considered "Real time"?
4. What is the time constraint for other services like WebRTC, Firebase?
Sorry for the lengthly post that doesn't specifically answer your question, but I hope it will make you better undestand where the "real time" definition comes from.
Yes, it is an understandable confusion that "real time" means "instantaneous". But if you really start to think about it you will soon find out that "instantaneous" is difficult to define.
What does instantaneous mean? 0 (zero) seconds response time (as in 0 sec 0 ms 0 ns 0 ps) from the time of the command to the time of the response is physically impossible. We can then try to say that instantaneous would mean that the command-response time is perceived instantaneously, i.e. it would not be seen as a delay. But then... what exactly does "perceived instantaneously" mean? Perceived by humans? Ok, that is good, we are getting somewhere. Human eye and the brain image processing are a very very complex machine and it does not really simply work in fps, but we can use data to approximate some. A human eye can "perceive an image flashed on the screen for 1/250th of a second". That would be 0.004 seconds or 250 fps. So by this approximation a graphical program would be real time if it has a response time < 0.004 sec or would run faster than #250 fps. But we know that in practice games are perceived smooth by most people at just 60 fps, or 0.01666 seconds. So now we have two different answers. Can we somehow justify them both? Yes. We can say in theory realtime would mean 0.004 seconds, but in practice 0.01666 seconds is enough.
We could be happy and stop here, but we are on a journey of discovery. So lets think further. Would you want a "real time" avionic automation system to have 0.01666 seconds response time? Would you deem acceptable a 0.01666 seconds response time for a "real time" nuclear plant system? Would an oil control system where a valve takes physically 15 seconds to close be defined as "real time" if the command-completion time is 0.0166 seconds? The answer to all these questions is most definitely no. Why? Answer that and you answer why "real time" is defined as it is: "Real-time programs must guarantee response within specified time constraints".
I am sorry, I am not familiar at all with "Pusher", but I can answer your first question and part of your second one: "real times" can be applied to any system that needs to "react" or respond to some form of input. Here "system" is more generic than you might think. A brain would qualify, but in the context of engineering means the whole stack: hardware + software.
This definition only applies to hardware/electronic devices or can be applied to software too?
It applies to software too. Anything that has hard time constraints. There are real-time operating systems, for example, and even a real-time specification for Java.
If applies to software, does it apply to remote services like Pusher?
Hard to see how, if a network is involved. More probably they just mean 'timely', or maybe it's just a sloppy way of saying 'push model', as the name implies. Large numbers of users on this site seem to think that 'real-time' means 'real-world'. In IT it means a system that is capable of meeting hard real-time constraints. The Wikipedia definition you cited is correct but the example isn't very satisfactory.
What is the time constraint for pusher to be considered "Real time"?
The question is malformed. The real question is whether Pusher can actually meet hard real-time constraints at all, and only then what their minimum value might be. It doesn't seem at all likely without operating system and network support.
What is the time constraint for other services like WebRTC, Firebase?
Ditto.
Most interpretations of the term "real-time " refer to the traditional static type, often referred to as "hard real-time." Although there is not much of a consensus on the meanings of the terms "hard real-time" and "soft real-time," I provide definitions, based on scientific first principles, of these and other essential terms in Introduction to Fundamental Principles of Dynamic Real-Time Systems.

NEON output generated by the simulator regarding (pipeline information, stalls, execution cycles) not clear

I have some problem understanding the output of NEON simulator. The output generated is cryptic and there is no proper documentation for understanding the simulator output.
for example :
In the above figure the 1st column's information is not clearly explained.
What does lc mean? Sometimes the syntax given below doesn't match the data format in the table.
The code and data are at http://pulsar.webshaker.net/ccc/sample-55d49530 . I find some help at Some doubts in optimizing the neon code but it is not completely clear.
It is not 'LC' it is '1C', i.e. one-C, meaning that instruction takes one cycle.

Visual Studio 2010 Code Profiling During Debug

I am working on a Visio Addin in VS2010 Professional and am looking for hot spots (specifically around a COM object) while debugging the application. I have found a number of profilers that can profile existing .NET applications, but none of which (that I have seen) support debugging. Further more, because this is a .NET add-in rather than a full standalone executable I'm not sure how they'd fair.
Profilers I've looked into:
EQATEC
Slimtune
CLR
nprof
VS2010 Performance Profiler -- Note that this one requires Ultimate or Premium while I am using Professional.
Has anyone found a profiler that can be used during a VS2010 debug session?
I've made this point before on SO, and so have others.
If your object is to improve performance, as measured by wall-clock time, by far the best tool is just the debugger itself, and its "Pause" button.
Let me show you why.
First, let's look at a good profiler
Among profilers, ANTS is probably as good as they come.
When I run it on an app, the top of the screen looks like this:
Notice that you have to choose a time span to look at, and you have to choose if you want to look at CPU time or File I/O time.
Within that time span, you see something like this:
which is trying to show what ANTS thinks is the "hot path", considering only CPU time.
Of course it emphasizes inclusive "Time With Children (%)", and that's good.
In a big code base like this, notice how extremely small the self-time "Time (%)" is?
That's typical, and you can see why.
What this says is that you should certainly ignore functions that have low inclusive percent, because even if you could reduce them to no-ops, your overall time in that interval would go down by no more than their inclusive percent.
So you look at the functions with high inclusive percent, and you try to find something in them to make them take less time, generally by either a) having them make fewer calls to sub-functions, or b) having the function itself be called less.
If you find something and fix it, you get a certain percent speedup. Then you can try it all again.
When you cannot find anything to fix, you declare victory and put away your profiler for another day.
Notice that there might have been additional problems that you could have fixed for more speedup, but if the profiler didn't help you find them, you've assumed they are not there.
These can be really big sleepers.
Now let's take some manual samples
I just randomly paused the app six times during the phase that was bugging me because it was making me wait.
Each time I took a snapshot of the call stack, and I took a good long look at what the program was doing and why it was doing it.
Three of the samples looked like this:
External Code
Core.Types.ResourceString.getStringFromResourceFile Line 506
Core.Types.ResourceString.getText Line 423
Core.Types.ResourceString.ToString Line 299
External Code
Core.Types.ResourceString.getStringFromResourceFile Line 528
Core.Types.ResourceString.getText Line 423
Core.Types.ResourceString.ToString Line 299
Core.Types.ResourceString.implicit operator string Line 404
SplashForm.pluginStarting Line 149
Services.Plugins.PluginService.includePlugin Line 737
Services.Plugins.PluginService.loadPluginList Line 1015
Services.Plugins.PluginService.loadPluginManifests Line 1074
Services.Plugins.PluginService.DoStart Line 95
Core.Services.ServiceBase.Start Line 36
Core.Services.ServiceManager.startService Line 1452
Core.Services.ServiceManager.startService Line 1438
Core.Services.ServiceManager.loadServices Line 1328
Core.Services.ServiceManager.Initialize Line 346
Core.Services.ServiceManager.Start Line 298
AppStart.Start Line 95
AppStart.Main Line 42
Here is what it is doing.
It is reading a resource file (that's I/O, so looking at CPU time would not see it).
The reason it is reading it is to get the name of a plugin.
The reason the name of the plugin is in a resource file is that there might be a future requirement to internationalize that string.
Anyway, the reason it is being fetched is so the name can be displayed on a splash screen during the loading of the plugin.
Presumably the reason for this is, if the user is wondering what is taking so long, the splash screen will show them what's happening.
Those six samples proved that if the name was not displayed, or if it was displayed but was gotten in some more efficient way, then startup speed of the app would approximately double.
I hope you can see that no profiler that works by showing measurements could have yielded this insight this quickly.
Even if the profiler showed inclusive percent by wall-clock time, not CPU, it still would have left the user trying to puzzle out just what was going on, because in summarizing the times on the routines, it loses almost all explanatory context that tells if what it is doing is necessary.
The human tendency when looking only at summary statistics, and looking at the code, is to say "I can see what it's doing, but I don't see any way to improve it."
So what about "statistical significance"?
I hear this all the time, and it comes from naivete' about statistics.
If three out of six samples show a problem, that means the most likely actual percent used by the problem is 3/6=50%.
It also means if you did this many times, on average the cost would be (3+1)/(6+2) which is also 50%.
If you save 50% of time, that gives a 2x speedup.
There is a probability that the cost could be as small as 20%, in which case the speedup would be only 1.25x.
There is an equal probability that the cost could be as large as 80%, in which case the speedup would be 5x (!).
So yes, it is a gamble.
The speedup could be less than estimated, but it will not be zero, and it is equally likely to be dramatically large.
If more precision is required, more samples can be taken, but if one sacrifices the insight that comes from examining samples to get statistical precision, the speedups may well not be found.
P.S. This link shows the key importance of finding all the problems - not missing any.
After digging in the Extension Manager in VS2010, I found dotTrace. This gives the ability to attach to a running process (Visio in my case) during debugging.
This tool's 10 day trial helped out, but at $400 it still feels a little steep. I still hope to find a cheaper way to accomplish the same task.

Acceptable load time

I am working on a point of sale (POS vending machine) project which has many images on the screen where the customer is expected to browse almost all of them. Here are my questions:
Can you please suggest me test cases for testing load time for images?
What is the acceptable load time for these images on screen?
Do we have any standards for testing these kind of acceptable load time?
"What is an acceptable loading time?" is a very broad question, one that has been studied as a research question for human computer interaction issues. In general the answer depends on:
How predictable the loading time is? (does it vary according to time of day, e.g. from 9am to 2am. unpredictable is usually the single most annoying thing about waiting)
How good is the feedback to the user? (does it look like it's broken or have a nice progress bar during the waiting? knowing it's nearly there can help ease the pain, even if the loading times are always consistent)
Who are the users and what other systems have they used previously? If it was all writing in a book before then waiting 2 minutes for images is going to be positively slow. If you're replacing something that took 3 minutes then it's pretty fast.
Ancillary input issues, e.g. does it buffer presses whilst loading and also move items around on the display so people press before it's finished and accidentally press the wrong thing? Does it annoyingly eat input soon after you've started to input it so you have to type/scan it again?
In terms of testing I'm assuming you're not planning on observing users and asking "how hard would you like to hurt this proxy for frustration?" What you can realistically test is how it copes under realistic loads and how accurate the predictions are.