Inherent time in jprofiler - jprofiler

Consider the below method template:
methodA()
{
Print (abc); // Instruction 1
Calculate(a+b+c); // Instruction 2
Call methodB();// Instruction 3
Call methodC();// Instruction 4
Print(abcd); // Instruction 5
for(; ;) // Instruction 6
{
. ..
}
}
Inherent time for methodA() in JProfiler shows the total time taken by methodA() alone. Is this inherent time the sum of CPU time + I/O wait time or is it just CPU time?

The time type depends on the thread state selector in the top-right corner of the call tree view. If it is set to "Runnable", the displayed times measure the time when the CPU was in the runnable state. If it set to "All states", it includes I/O, waiting and blocking.

As per this page http://resources.ej-technologies.com/jprofiler/help/doc/index.html
The inherent time is defined as the total time of a method minus the
time of its child nodes.

Related

How intensive is getting time?

Here's a low level question. How CPU intensive is getting system time?
What is the source of the time? I know there is a hardware clock on the bios chip but I'm thinking that getting data from outside the CPU and RAM will need some hardware synchronization which may delay the read so I'm guessing the CPU may have its own clock. Feel free to correct me if I'm wrong in any way.
Does getting time incur a heavy system function call or is it in any way dependent on the used programming language?
I have just tested it using a C++ program:
clock_t started = clock();
clock_t endClock = started + CLOCKS_PER_SEC;
long itera = 0;
for (; clock() < endClock; itera++)
{
}
I get about 23 million iterations per second (Windows 7, 32bit, Visual Studio 2015, 2.6 GHz CPU). In terms of your question, I would not call this intensive.
In debug mode, I measured 18 million iterations per second.
In case the time is transformed into a localized timestamp, complicated calendar calculations (timezone, daylight saving time, ...) might significantly slow down the loop.
It is not easy to tell what happens inside the clock() call. For my system, it calls QueryPerfomanceCounter, but this recurs to other system functions as explained here.
Tuning
To reduce the time measurement overhead even further, you can measure in every 10th, 100th ... iteration.
The following measures once in 1024 iterations:
for (; (itera & 0x03FF) || (clock() < endClock); itera++)
{
}
This brings up the loop per second count to some 500 million.
Tuning with Timer Thread
The following yields a further improvement of some 10% paid with additional complexity:
std::atomic<bool> processing = true;
// launch a timer thread to clear the processing flag after 1s
std::thread t([&processing]() {
std::this_thread::sleep_for(std::chrono::seconds(1));
processing = false;
});
for (; (itera & 0x03FF) || processing; itera++)
{
}
t.join();
An extra thread is started which sleeps for one second and then sets a control variable. The main thread executes the loop until the timer threads signals the end of processing.

Sleep blocks whole program (Smalltalk Squeak)

I'm making a N*N queens problem with gui.
I want the gui to stop for x seconds each move of every queen, problem is, the program just stacks all the waits together and then runs everything at speed.
I'm giving the code here: http://pastebin.com/s2VT0E49
EDIT:
This is my workspace:
board := MyBoard new initializeWithStart: 8.
Transcript show:'something'.
3 seconds asDelay wait.
board solve.
3 seconds asDelay wait.
board closeBoard.
This is where i want the wait to happen
canAttack: testRow x: testColumn
| columnDifference squareMark |
columnDifference := testColumn - column.
((row = testRow
or: [row + columnDifference = testRow])
or: [row - columnDifference = testRow]) ifTrue: [
squareDraw := squareDraw
color: Color red.
0.2 seconds asDelay wait.
^ true ].
squareDraw := squareDraw color: Color black.
^ neighbor canAttack: testRow x: testColumn
Since you're using Morphic you should use stepping for animation, not processes or delays. In your Morph implement a step method. This will be executed automatically and repeatedly. Also implement stepTime to answer the interval in milliseconds, e.g. 4000 for every 4 seconds.
Inside the step method, calculate your new state. If each queen is modeled as a separate Morph and you just move the positions, then Morphic will take care of updating the screen. If you have your own drawOn: method then call self changed in your step method so that Morphic will later invoke your drawing code.
See this tutorial: http://static.squeak.org/tutorials/morphic-tutorial-1.html
The process you're suspending is the one your program is running in. This process also happens to be the UI process. So when you suspend your program you also suspend the UI and therefore the UI elements never get a chance to update themselves. Try running your program in a separate process:
[ MyProgram run ] forkAt: Processor userBackgroundPriority.
Note that the UI process usually runs at priority 40. #userBackgroundPriority is 30. This makes sure that you can't lock up the UI.
To make your workspace code work insert this before the delay:
World doOneCycle.
This will cause the Morphic world to be redisplayed.
Note that this is quick-and-very-dirty hack and not the proper way to do it (see my other answer). Delays block the whole UI process, whereas the whole point of Morphic is that you can do many things simultaneously while your code is executing.

Sleep in Smalltalk Squeak

I'm dealing with N*N queens problem and gui of it.
I want to sleep for a few seconds each move so the viewer can see the process.
How do I put smalltalk to sleep?
Thank you
Instead of sleeping you can just wait.
5 seconds asDelay wait.
e.g. if you select and print it the following, it will wait 5 seconds before printing the result (2)
[
5 seconds asDelay wait.
1 + 1
] value
The comment of the Delay class explains what it does.
I am the main way that a process may pause for some amount of time. The simplest usage is like this:
(Delay forSeconds: 5) wait.
An instance of Delay responds to the message 'wait' by suspending the caller's process for a certain amount of time. The duration of the pause is specified when the Delay is created with the message forMilliseconds: or forSeconds:. A Delay can be used again when the current wait has finished. For example, a clock process might repeatedly wait on a one-second Delay.
A delay in progress when an image snapshot is saved is resumed when the snapshot is re-started. Delays work across millisecond clock roll-overs.
For a more complex example, see #testDelayOf:for:rect: .
Update: (based on comment)
wait will pause the execution flow, which means that in the example earlier, the 1 + 1 will get executed (execution flow resumed) only after the wait period has ended.
So in your class you can have...
MyBoard>>doStep
self drawBoard.
5 seconds asDelay wait.
self solve.
5 seconds asDelay wait.
self destroyBoard.

How can I (reasonably) precisely perform an action every N milliseconds?

I have a machine which uses an NTP client to sync up to internet time so it's system clock should be fairly accurate.
I've got an application which I'm developing which logs data in real time, processes it and then passes it on. What I'd like to do now is output that data every N milliseconds aligned with the system clock. So for example if I wanted to do 20ms intervals, my oututs ought to be something like this:
13:15:05:000
13:15:05:020
13:15:05:040
13:15:05:060
I've seen suggestions for using the stopwatch class, but that only measures time spans as opposed to looking for specific time stamps. The code to do this is running in it's own thread, so should be a problem if I need to do some relatively blocking calls.
Any suggestions on how to achieve this to a reasonable (close to or better than 1ms precision would be nice) would be very gratefully received.
Don't know how well it plays with C++/CLR but you probably want to look at multimedia timers,
Windows isn't really real-time but this is as close as it gets
You can get a pretty accurate time stamp out of timeGetTime() when you reduce the time period. You'll just need some work to get its return value converted to a clock time. This sample C# code shows the approach:
using System;
using System.Runtime.InteropServices;
class Program {
static void Main(string[] args) {
timeBeginPeriod(1);
uint tick0 = timeGetTime();
var startDate = DateTime.Now;
uint tick1 = tick0;
for (int ix = 0; ix < 20; ++ix) {
uint tick2 = 0;
do { // Burn 20 msec
tick2 = timeGetTime();
} while (tick2 - tick1 < 20);
var currDate = startDate.Add(new TimeSpan((tick2 - tick0) * 10000));
Console.WriteLine(currDate.ToString("HH:mm:ss:ffff"));
tick1 = tick2;
}
timeEndPeriod(1);
Console.ReadLine();
}
[DllImport("winmm.dll")]
private static extern int timeBeginPeriod(int period);
[DllImport("winmm.dll")]
private static extern int timeEndPeriod(int period);
[DllImport("winmm.dll")]
private static extern uint timeGetTime();
}
On second thought, this is just measurement. To get an action performed periodically, you'll have to use timeSetEvent(). As long as you use timeBeginPeriod(), you can get the callback period pretty close to 1 msec. One nicety is that it will automatically compensate when the previous callback was late for any reason.
Your best bet is using inline assembly and writing this chunk of code as a device driver.
That way:
You have control over instruction count
Your application will have execution priority
Ultimately you can't guarantee what you want because the operating system has to honour requests from other processes to run, meaning that something else can always be busy at exactly the moment that you want your process to be running. But you can improve matters using timeBeginPeriod to make it more likely that your process can be switched to in a timely manner, and perhaps being cunning with how you wait between iterations - eg. sleeping for most but not all of the time and then using a busy-loop for the remainder.
Try doing this in two threads. In one thread, use something like this to query a high-precision timer in a loop. When you detect a timestamp that aligns to (or is reasonably close to) a 20ms boundary, send a signal to your log output thread along with the timestamp to use. Your log output thread would simply wait for a signal, then grab the passed-in timestamp and output whatever is needed. Keeping the two in separate threads will make sure that your log output thread doesn't interfere with the timer (this is essentially emulating a hardware timer interrupt, which would be the way I would do it on an embedded platform).
CreateWaitableTimer/SetWaitableTimer and a high-priority thread should be accurate to about 1ms. I don't know why the millisecond field in your example output has four digits, the max value is 999 (since 1000 ms = 1 second).
Since as you said, this doesn't have to be perfect, there are some thing that can be done.
As far as I know, there doesn't exist a timer that syncs with a specific time. So you will have to compute your next time and schedule the timer for that specific time. If your timer only has delta support, then that is easily computed but adds more error since the you could easily be kicked off the CPU between the time you compute your delta and the time the timer is entered into the kernel.
As already pointed out, Windows is not a real time OS. So you must assume that even if you schedule a timer to got off at ":0010", your code might not even execute until well after that time (for example, ":0540"). As long as you properly handle those issues, things will be "ok".
20ms is approximately the length of a time slice on Windows. There is no way to hit 1ms kind of timings in windows reliably without some sort of RT add on like Intime. In windows proper I think your options are WaitForSingleObject, SleepEx, and a busy loop.

Precisely time a function call

I am using a microcontroller with a C51 core. I have a fairly timeconsuming and large subroutine that needs to be called every 500ms. An RTOS is not being used.
The way I am doing it right now is that I have an existing Timer interrupt of 10 ms. I set a flag after every 50 interrupts that is checked for being true in the main program loop. If the Flag is true the subroutine is called. The issue is that by the time the program loop comes round to servicing the flag, it is already more than 500ms,sometimes even >515 ms in case of certain code paths. The time taken is not accurately predictable.
Obviously, the subroutine cannot be called from inside the timer interrupt due to that large time it takes to execute.The subroutine takes 50ms to 89ms depending upon various conditions.
Is there a way to ensure that the subroutine is called in exactly 500ms each time?
I think you have some conflicting/not-thought-through requirements here. You say that you can't call this code from the timer ISR because it takes too long to run (implying that it is a lower-priority than something else which would be delayed), but then you are being hit by the fact that something else which should have been lower-priority is delaying it when you run it from the foreground path ('program loop').
If this work must happen at exactly 500ms, then run it from the timer routine, and deal with the fall-out from that. This is effectively what a pre-emptive RTOS would be doing anyway.
If you want it to run from the 'program loop', then you will have to make sure than nothing else which runs from that loop ever takes more than the maximum delay you can tolerate - often that means breaking your other long-running work into state-machines which can do a little bit of work per pass through the loop.
I don't think there's a way to guarantee it but this solution may provide an acceptable alternative.
Might I suggest not setting a flag but instead modifying a value?
Here's how it could work.
1/ Start a value at zero.
2/ Every 10ms interrupt, increase this value by 10 in the ISR (interrupt service routine).
3/ In the main loop, if the value is >= 500, subtract 500 from the value and do your 500ms activities.
You will have to be careful to watch for race conditions between the timer and main program in modifying the value.
This has the advantage that the function runs as close as possible to the 500ms boundaries regardless of latency or duration.
If, for some reason, your function starts 20ms late in one iteration, the value will already be 520 so your function will then set it to 20, meaning it will only wait 480ms before the next iteration.
That seems to me to be the best way to achieve what you want.
I haven't touched the 8051 for many years (assuming that's what C51 is targeting which seems a safe bet given your description) but it may have an instruction which will subtract 50 without an interrupt being possible. However, I seem to remember the architecture was pretty simple so you may have to disable or delay interrupts while it does the load/modify/store operation.
volatile int xtime = 0;
void isr_10ms(void) {
xtime += 10;
}
void loop(void) {
while (1) {
/* Do all your regular main stuff here. */
if (xtime >= 500) {
xtime -= 500;
/* Do your 500ms activity here */
}
}
}
You can also use two flags - a "pre-action" flag, and a "trigger" flag (using Mike F's as a starting point):
#define PREACTION_HOLD_TICKS (2)
#define TOTAL_WAIT_TICKS (10)
volatile unsigned char pre_action_flag;
volatile unsigned char trigger_flag;
static isr_ticks;
interrupt void timer0_isr (void) {
isr_ticks--;
if (!isr_ticks) {
isr_ticks=TOTAL_WAIT_TICKS;
trigger_flag=1;
} else {
if (isr_ticks==PREACTION_HOLD_TICKS)
preaction_flag=1;
}
}
// ...
int main(...) {
isr_ticks = TOTAL_WAIT_TICKS;
preaction_flag = 0;
tigger_flag = 0;
// ...
while (1) {
if (preaction_flag) {
preaction_flag=0;
while(!trigger_flag)
;
trigger_flag=0;
service_routine();
} else {
main_processing_routines();
}
}
}
A good option is to use an RTOS or write your own simple RTOS.
An RTOS to meet your needs will only need to do the following:
schedule periodic tasks
schedule round robin tasks
preform context switching
Your requirements are the following:
execute a periodic task every 500ms
in the extra time between execute round robin tasks ( doing non-time critical operations )
An RTOS like this will guarantee a 99.9% chance that your code will execute on time. I can't say 100% because whatever operations your do in your ISR's may interfere with the RTOS. This is a problem with 8-bit micro-controllers that can only execute one instruction at a time.
Writing an RTOS is tricky, but do-able. Here is an example of small ( 900 lines ) RTOS targeted at ATMEL's 8-bit AVR platform.
The following is the Report and Code created for the class CSC 460: Real Time Operating Systems ( at the University of Victoria ).
Would this do what you need?
#define FUDGE_MARGIN 2 //In 10ms increments
volatile unsigned int ticks = 0;
void timer_10ms_interrupt( void ) { ticks++; }
void mainloop( void )
{
unsigned int next_time = ticks+50;
while( 1 )
{
do_mainloopy_stuff();
if( ticks >= next_time-FUDGE_MARGIN )
{
while( ticks < next_time );
do_500ms_thingy();
next_time += 50;
}
}
}
NB: If you got behind with servicing your every-500ms task then this would queue them up, which may not be what you want.
One straightforward solution is to have a timer interrupt that fires off at 500ms...
If you have some flexibility in your hardware design, you can cascade the output of one timer to a second stage counter to get you a long time base. I forget, but I vaguely recall being able to cascade timers on the x51.
Ah, one more alternative for consideration -- the x51 architecture allow two levels of interrupt priorities. If you have some hardware flexibility, you can cause one of the external interrupt pins to be raised by the timer ISR at 500ms intervals, and then let the lower-level interrupt processing of your every-500ms code to occur.
Depending on your particular x51, you might be able to also generate a lower priority interrupt completely internal to your device.
See part 11.2 in this document I found on the web: http://www.esacademy.com/automation/docs/c51primer/c11.htm
Why do you have a time-critical routine that takes so long to run?
I agree with some of the others that there may be an architectural issue here.
If the purpose of having precise 500ms (or whatever) intervals is to have signal changes occuring at specific time intervals, you may be better off with a fast ISR that ouputs the new signals based on a previous calculation, and then set a flag that would cause the new calculation to run outside of the ISR.
Can you better describe what this long-running routine is doing, and what the need for the specific interval is for?
Addition based on the comments:
If you can insure that the time in the service routine is of a predictable duration, you might get away with missing the timer interrupt postings...
To take your example, if your timer interrupt is set for 10 ms periods, and you know your service routine will take 89ms, just go ahead and count up 41 timer interrupts, then do your 89 ms activity and miss eight timer interrupts (42nd to 49th).
Then, when your ISR exits (and clears the pending interrupt), the "first" interrupt of the next round of 500ms will occur about a ms later.
Given that you're "resource maxed" suggests that you have your other timer and interrupt sources also in use -- which means that relying on the main loop to be timed accurately isn't going to work, because those other interrupt sources could fire at the wrong moment.
If I'm interpretting your question correctly, you have:
a main loop
some high priority operation that needs to be run every 500ms, for a duration of up to 89ms
a 10ms timer that also performs a small number of operations.
There are three options as I see it.
The first is to use a second timer of a lower priority for your 500ms operations. You can still process your 10ms interrupt, and once complete continue servicing your 500ms timer interrupt.
Second option - doe you actually need to service your 10ms interrupt every 10ms? Is it doing anything other than time keeping? If not, and if your hardware will allow you to determine the number of 10ms ticks that have passed while processing your 500ms op's (ie. by not using the interrupts themselves), then can you start your 500ms op's within the 10ms interrupt and process the 10ms ticks that you missed when you're done.
Third option: To follow on from Justin Tanner's answer, it sounds like you could produce your own preemptive multitasking kernel to fill your requirements without too much trouble.
It sounds like all you need is two tasks - one for the main super loop and one for your 500ms task.
The code to swap between two contexts (ie. two copies of all of your registers, using different stack pointers) is very simple, and usually consists of a series of register pushes (to save the current context), a series of register pops (to restore your new context) and a return from interrupt instruction. Once your 500ms op's are complete, you restore the original context.
(I guess that strictly this is a hybrid of preemptive and cooperative multitasking, but that's not important right now)
edit:
There is a simple fourth option. Liberally pepper your main super loop with checks for whether the 500ms has elapsed, both before and after any lengthy operations.
Not exactly 500ms, but you may be able to reduce the latency to a tolerable level.