How to measure precisely CPU usage in Background Tasks?

How to measure precisely CPU usage in Background Tasks? - windows-8

CPU usage quota for Background Tasks in WinRT is 1 second, or 2 seconds if they are on lockscreen. The question is how to measure accurately this CPU usage - I'd like to know if my code runs under this 2 sec quota or not? I guess using just DateTime.Now before and after the execution of the task is not the right approach.
The MSDN article about Background Tasks:
Supporting your app with background tasks

I had the same problem.
If you start Task Manager, under the App history tab, you can see the statistics of Resource usage by various apps. One of them is CPU Time. The problem is that it's not the average, but it only displays the total CPU usage time.
If you need the average time, the trick is to keep a count in your app for any background activity, and divide the whole time by that, so you will get an average time.

I used GetProcessTimes WinAPI.
The documentation says “desktop apps only”, but technically, it is present even on the phones:
[DllImport( "KERNELBASE.DLL", SetLastError = true )]
static extern IntPtr GetCurrentProcess();
// NB! Undocumented API, won't pass marketplace checks.
[DllImport( "KERNELBASE.DLL", SetLastError = true )]
[return: MarshalAs( UnmanagedType.Bool )]
static extern bool GetProcessTimes( IntPtr hProcess, out long lpCreationTime, out long lpExitTime, out long lpKernelTime, out long lpUserTime );
On the PC replace KERNELBASE.DLL with Kernel32.dll.
That won’t pass marketplace certification, but should be enough for you to benchmark your background task.
Call GetProcessTimes when started, calculate long startTime = KernelTime + UserTime. Call GetProcessTimes when finished, calculate ( KernelTime + UserTime ) - startTime, and you get your data. The unit of measure is 100ns ticks, just like in TimeSpan.

Related

How intensive is getting time?

Here's a low level question. How CPU intensive is getting system time?
What is the source of the time? I know there is a hardware clock on the bios chip but I'm thinking that getting data from outside the CPU and RAM will need some hardware synchronization which may delay the read so I'm guessing the CPU may have its own clock. Feel free to correct me if I'm wrong in any way.
Does getting time incur a heavy system function call or is it in any way dependent on the used programming language?

I have just tested it using a C++ program:
clock_t started = clock();
clock_t endClock = started + CLOCKS_PER_SEC;
long itera = 0;
for (; clock() < endClock; itera++)
{
}
I get about 23 million iterations per second (Windows 7, 32bit, Visual Studio 2015, 2.6 GHz CPU). In terms of your question, I would not call this intensive.
In debug mode, I measured 18 million iterations per second.
In case the time is transformed into a localized timestamp, complicated calendar calculations (timezone, daylight saving time, ...) might significantly slow down the loop.
It is not easy to tell what happens inside the clock() call. For my system, it calls QueryPerfomanceCounter, but this recurs to other system functions as explained here.
Tuning
To reduce the time measurement overhead even further, you can measure in every 10th, 100th ... iteration.
The following measures once in 1024 iterations:
for (; (itera & 0x03FF) || (clock() < endClock); itera++)
{
}
This brings up the loop per second count to some 500 million.
Tuning with Timer Thread
The following yields a further improvement of some 10% paid with additional complexity:
std::atomic<bool> processing = true;
// launch a timer thread to clear the processing flag after 1s
std::thread t([&processing]() {
std::this_thread::sleep_for(std::chrono::seconds(1));
processing = false;
});
for (; (itera & 0x03FF) || processing; itera++)
{
}
t.join();
An extra thread is started which sleeps for one second and then sets a control variable. The main thread executes the loop until the timer threads signals the end of processing.

Canonical way(s) of determining system time in a microcontroller

Every so often I start a bare metal microcontroller project and end up implementing a system time measurement using a random timer unit.
I am working with ARM Cortex-M devices for a (albeit short) while now and typically used the SysTick ("System Tick") interrupt to create a 1ms resolution timer. It recently stumbled over a post that suggested chaining two Programmable Interrupt Timers (on a Kinetis KL25Z device) in order to create an interrupt-less 32bit millisecond timer, however sacrificing two PIT interrupts which may come in handy later on.
So I was wondering if there are some (sort of) canonical ways to determine the system time on a microcontroller - preferrably for Kinetis KL2xZ devices as I currently work with these, but not necessarily so.

The canonical method as you put it is exactly as you have done - using systick. That is the single timer device defined by the Cortex-M architecture; any other timer hardware is external to the core and vendor specific.
Some parts (STM32F2 for example) include 32 bit timer/counter hardware, so you would not need to chain two.
The best approach is to abstract timer services by defining a generic timer API that you implement for all parts you need so that the application layer is identical for all parts. For example in this case you might simply implement the standard library clock() function and define CLOCKS_PER_SEC.
If you are using two free-running cascaded timers, you must ensure high/low word consistency when combining the two counter values:
#include <time.h>
clock_t clock( void )
{
uint16_t low_word = 0 ;
uint16_t hi_word = 0 ;
do
{
hi_word = readTimerH() ;
lo_word = readTimerL() ;
} while( hi_word != readTimerH() ) ;
return (clock_t)(hi_word << 16 | lo_word) ;
}

I just looked into KL25 Sub-Family Reference Manual.
In Chapter 34 Real Time Clock (RTC) section 34.3.2 Time counter (may differ with document version).
I found that there are Two registers for Timer counter in RTC
32-bit seconds counter
16-bit prescaler register that increments once every 32.768 kHz clock cycle
Reference Manual says
Always write to the prescaler register before writing to the seconds register,
because the seconds register increments on the falling edge of bit 14 of the prescaler
register.
Which means to calculate system time, read rtc_sec_counter and add 14 bits of prescalar_reg
you can even create a macro to give you system time in uSec and mSec from combination of rtc_sec_counter and prescalar_reg or Sec(obviously from rtc_sec_counter)
For 16 bit prescalar REG System clock is 32.768 Khz, with this we can create macros to get time in uSec and mSec
#define PRESCALAR_TICK 32768
#define KHZ 1000
#define MHZ 1000000
/// Here first we extract 14bit value of prescalar_reg and than multiply it with MHZ to get better precision
/// but this value will not go more than 14 Bit
#define GET_SYS_US ((((prescalar_reg & 0x03FFF)*MHZ)/PRESCALAR_TICK))
#define GET_SYS_MS (GET_SYS_US)/KHZ)
if you need time in milliseconds up to 32 bit use below macro
#define GET_SYS_US_32bit ((rtc_sec_counter * 0x3FFF) + GET_SYS_US)
#define GET_SYS_MS_32bit ((rtc_sec_counter * 0x3FFF) + GET_SYS_MS)
But to use these information you must initialise RTC of you micro (Obviously)

Implementing pid code with 6 variables

Implementing this pid code that I have, mainly what info I need to pass into the function. There are six variables to pass but I don't really know what to enter.
A bit of background, I am automating my home brewery, and although it is all up and running, the temperature control of the RIMs is all over the place. For those not familiar with what a RIMs is, it is a way of ensuring the grains that are being soaked are kept at a very constant temperature. It does this by using a pump and a heating element to heat fluid taken from the bottom of the soaking vessel and passing it past the heating element and heating the fluid as it goes if needed. The code I have running at the moment is dumb so I need to replace it with something more intelligent, like a PID!
To heat the heater element Plan on using a simple function called every second that will change the amount of time the element is powered from 100ms to 1000ms depending on how much correction to the temperature is needed.
Ok, so I have the code, its just how to use it! I want to get it up and running in a stand alone windows form project using vb.net. I know I need to play with the PID values so it suits my application, but what to enter to get me started?
Many thanks for any help!
Public Function PID_output(ByVal process As Double, ByVal setpoint As Double, ByVal Gain As Double, _
ByVal Integ As Double, ByVal deriv As Double, ByVal deltaT As Double)
Dim Er As Double
Dim Derivative As Double
Dim Proportional As Double
Static Olderror As Double
Static Cont As Double
Static Integral As Double
Static Limiter_Switch As Double
Limiter_Switch = 1
Er = setpoint - process
If ((Cont >= 1 And Er > 0) Or (Cont <= 0 And Er < 0) Or (Integ >= 9999)) Then
Limiter_Switch = 0
Else
Limiter_Switch = 1
End If
Integral = Integral + Gain / Integ * Er * deltaT * Limiter_Switch
Derivative = Gain * deriv * (Er - Olderror) / deltaT
Proportional = Gain * Er
Cont = Proportional + Integral + Derivative
Olderror = Er
If (Cont > 1) Then
Cont = 1
End If
If (Cont < 0) Then
Cont = 0
End If
Return (Cont)
End Function

Using PID for heating process is a bit trickier than typical PID applications. The reason is; when you need to heat probably you start energizing a heater. The power of that heater is the key to how much energy you can pump into the system so that how fast you can heat it up. However, when it comes to cooling, cooling is normally achieved by the nature of the environment.
What I'm trying to say is: the system is not symmetrical.
May I offer you to use different control set according to the "direction" of your control. If the process is cooler than your setpoint, then use controller 1. But, if system is hotter that your setpoint, use controller 2.
For system 1, I definitely advise to use D term because D term is a kind of limiter of how fast your controller is building up the heat on progress. The worry is; many heat control systems have a considerable thermal inertia and lag (delay) of reading back the feedback. This often results in high overshoot (progress is reaching and passing the setpoint with considerable amounts). If exaggerated, will forever oscillate (fluctuate) :)
Also, for cooling effect, say environment is 20deg. Now, if your setpoint is 100 this is something, if your setpoint if 1000 this is totally different. because the Delta T will be different (80 and 980 degrees respectively) and the system response of "trying to get cool" will be a function of Delta T.
Cooling progress is not linear but exponential (like a capacitor discharging over a constant resistor). If your setpoint shall not be changing every day, then you are fine. But if otherwise, you'd better divide your setpoint space into regions and use different controller parameters for different setpoints too.
Where to start:
There are different PID tuning thumb of rules. Look at Ziegler Nichols method. But basically, get your progress cool (initial conditions) then give it full throttle heating power and record the time-heat graph. This graph is called step response and will be estimating the thermal inertial and the system lag for you. This will tell you the typical starting PID values when you check Ziegler Nichols method.

How can I (reasonably) precisely perform an action every N milliseconds?

I have a machine which uses an NTP client to sync up to internet time so it's system clock should be fairly accurate.
I've got an application which I'm developing which logs data in real time, processes it and then passes it on. What I'd like to do now is output that data every N milliseconds aligned with the system clock. So for example if I wanted to do 20ms intervals, my oututs ought to be something like this:
13:15:05:000
13:15:05:020
13:15:05:040
13:15:05:060
I've seen suggestions for using the stopwatch class, but that only measures time spans as opposed to looking for specific time stamps. The code to do this is running in it's own thread, so should be a problem if I need to do some relatively blocking calls.
Any suggestions on how to achieve this to a reasonable (close to or better than 1ms precision would be nice) would be very gratefully received.

Don't know how well it plays with C++/CLR but you probably want to look at multimedia timers,
Windows isn't really real-time but this is as close as it gets

You can get a pretty accurate time stamp out of timeGetTime() when you reduce the time period. You'll just need some work to get its return value converted to a clock time. This sample C# code shows the approach:
using System;
using System.Runtime.InteropServices;
class Program {
static void Main(string[] args) {
timeBeginPeriod(1);
uint tick0 = timeGetTime();
var startDate = DateTime.Now;
uint tick1 = tick0;
for (int ix = 0; ix < 20; ++ix) {
uint tick2 = 0;
do { // Burn 20 msec
tick2 = timeGetTime();
} while (tick2 - tick1 < 20);
var currDate = startDate.Add(new TimeSpan((tick2 - tick0) * 10000));
Console.WriteLine(currDate.ToString("HH:mm:ss:ffff"));
tick1 = tick2;
}
timeEndPeriod(1);
Console.ReadLine();
}
[DllImport("winmm.dll")]
private static extern int timeBeginPeriod(int period);
[DllImport("winmm.dll")]
private static extern int timeEndPeriod(int period);
[DllImport("winmm.dll")]
private static extern uint timeGetTime();
}
On second thought, this is just measurement. To get an action performed periodically, you'll have to use timeSetEvent(). As long as you use timeBeginPeriod(), you can get the callback period pretty close to 1 msec. One nicety is that it will automatically compensate when the previous callback was late for any reason.

Your best bet is using inline assembly and writing this chunk of code as a device driver.
That way:
You have control over instruction count
Your application will have execution priority

Ultimately you can't guarantee what you want because the operating system has to honour requests from other processes to run, meaning that something else can always be busy at exactly the moment that you want your process to be running. But you can improve matters using timeBeginPeriod to make it more likely that your process can be switched to in a timely manner, and perhaps being cunning with how you wait between iterations - eg. sleeping for most but not all of the time and then using a busy-loop for the remainder.

Try doing this in two threads. In one thread, use something like this to query a high-precision timer in a loop. When you detect a timestamp that aligns to (or is reasonably close to) a 20ms boundary, send a signal to your log output thread along with the timestamp to use. Your log output thread would simply wait for a signal, then grab the passed-in timestamp and output whatever is needed. Keeping the two in separate threads will make sure that your log output thread doesn't interfere with the timer (this is essentially emulating a hardware timer interrupt, which would be the way I would do it on an embedded platform).

CreateWaitableTimer/SetWaitableTimer and a high-priority thread should be accurate to about 1ms. I don't know why the millisecond field in your example output has four digits, the max value is 999 (since 1000 ms = 1 second).

Since as you said, this doesn't have to be perfect, there are some thing that can be done.
As far as I know, there doesn't exist a timer that syncs with a specific time. So you will have to compute your next time and schedule the timer for that specific time. If your timer only has delta support, then that is easily computed but adds more error since the you could easily be kicked off the CPU between the time you compute your delta and the time the timer is entered into the kernel.
As already pointed out, Windows is not a real time OS. So you must assume that even if you schedule a timer to got off at ":0010", your code might not even execute until well after that time (for example, ":0540"). As long as you properly handle those issues, things will be "ok".

20ms is approximately the length of a time slice on Windows. There is no way to hit 1ms kind of timings in windows reliably without some sort of RT add on like Intime. In windows proper I think your options are WaitForSingleObject, SleepEx, and a busy loop.

Precisely time a function call

I am using a microcontroller with a C51 core. I have a fairly timeconsuming and large subroutine that needs to be called every 500ms. An RTOS is not being used.
The way I am doing it right now is that I have an existing Timer interrupt of 10 ms. I set a flag after every 50 interrupts that is checked for being true in the main program loop. If the Flag is true the subroutine is called. The issue is that by the time the program loop comes round to servicing the flag, it is already more than 500ms,sometimes even >515 ms in case of certain code paths. The time taken is not accurately predictable.
Obviously, the subroutine cannot be called from inside the timer interrupt due to that large time it takes to execute.The subroutine takes 50ms to 89ms depending upon various conditions.
Is there a way to ensure that the subroutine is called in exactly 500ms each time?

I think you have some conflicting/not-thought-through requirements here. You say that you can't call this code from the timer ISR because it takes too long to run (implying that it is a lower-priority than something else which would be delayed), but then you are being hit by the fact that something else which should have been lower-priority is delaying it when you run it from the foreground path ('program loop').
If this work must happen at exactly 500ms, then run it from the timer routine, and deal with the fall-out from that. This is effectively what a pre-emptive RTOS would be doing anyway.
If you want it to run from the 'program loop', then you will have to make sure than nothing else which runs from that loop ever takes more than the maximum delay you can tolerate - often that means breaking your other long-running work into state-machines which can do a little bit of work per pass through the loop.

I don't think there's a way to guarantee it but this solution may provide an acceptable alternative.
Might I suggest not setting a flag but instead modifying a value?
Here's how it could work.
1/ Start a value at zero.
2/ Every 10ms interrupt, increase this value by 10 in the ISR (interrupt service routine).
3/ In the main loop, if the value is >= 500, subtract 500 from the value and do your 500ms activities.
You will have to be careful to watch for race conditions between the timer and main program in modifying the value.
This has the advantage that the function runs as close as possible to the 500ms boundaries regardless of latency or duration.
If, for some reason, your function starts 20ms late in one iteration, the value will already be 520 so your function will then set it to 20, meaning it will only wait 480ms before the next iteration.
That seems to me to be the best way to achieve what you want.
I haven't touched the 8051 for many years (assuming that's what C51 is targeting which seems a safe bet given your description) but it may have an instruction which will subtract 50 without an interrupt being possible. However, I seem to remember the architecture was pretty simple so you may have to disable or delay interrupts while it does the load/modify/store operation.
volatile int xtime = 0;
void isr_10ms(void) {
xtime += 10;
}
void loop(void) {
while (1) {
/* Do all your regular main stuff here. */
if (xtime >= 500) {
xtime -= 500;
/* Do your 500ms activity here */
}
}
}

You can also use two flags - a "pre-action" flag, and a "trigger" flag (using Mike F's as a starting point):
#define PREACTION_HOLD_TICKS (2)
#define TOTAL_WAIT_TICKS (10)
volatile unsigned char pre_action_flag;
volatile unsigned char trigger_flag;
static isr_ticks;
interrupt void timer0_isr (void) {
isr_ticks--;
if (!isr_ticks) {
isr_ticks=TOTAL_WAIT_TICKS;
trigger_flag=1;
} else {
if (isr_ticks==PREACTION_HOLD_TICKS)
preaction_flag=1;
}
}
// ...
int main(...) {
isr_ticks = TOTAL_WAIT_TICKS;
preaction_flag = 0;
tigger_flag = 0;
// ...
while (1) {
if (preaction_flag) {
preaction_flag=0;
while(!trigger_flag)
;
trigger_flag=0;
service_routine();
} else {
main_processing_routines();
}
}
}

A good option is to use an RTOS or write your own simple RTOS.
An RTOS to meet your needs will only need to do the following:
schedule periodic tasks
schedule round robin tasks
preform context switching
Your requirements are the following:
execute a periodic task every 500ms
in the extra time between execute round robin tasks ( doing non-time critical operations )
An RTOS like this will guarantee a 99.9% chance that your code will execute on time. I can't say 100% because whatever operations your do in your ISR's may interfere with the RTOS. This is a problem with 8-bit micro-controllers that can only execute one instruction at a time.
Writing an RTOS is tricky, but do-able. Here is an example of small ( 900 lines ) RTOS targeted at ATMEL's 8-bit AVR platform.
The following is the Report and Code created for the class CSC 460: Real Time Operating Systems ( at the University of Victoria ).

Would this do what you need?
#define FUDGE_MARGIN 2 //In 10ms increments
volatile unsigned int ticks = 0;
void timer_10ms_interrupt( void ) { ticks++; }
void mainloop( void )
{
unsigned int next_time = ticks+50;
while( 1 )
{
do_mainloopy_stuff();
if( ticks >= next_time-FUDGE_MARGIN )
{
while( ticks < next_time );
do_500ms_thingy();
next_time += 50;
}
}
}
NB: If you got behind with servicing your every-500ms task then this would queue them up, which may not be what you want.

One straightforward solution is to have a timer interrupt that fires off at 500ms...
If you have some flexibility in your hardware design, you can cascade the output of one timer to a second stage counter to get you a long time base. I forget, but I vaguely recall being able to cascade timers on the x51.

Ah, one more alternative for consideration -- the x51 architecture allow two levels of interrupt priorities. If you have some hardware flexibility, you can cause one of the external interrupt pins to be raised by the timer ISR at 500ms intervals, and then let the lower-level interrupt processing of your every-500ms code to occur.
Depending on your particular x51, you might be able to also generate a lower priority interrupt completely internal to your device.
See part 11.2 in this document I found on the web: http://www.esacademy.com/automation/docs/c51primer/c11.htm

Why do you have a time-critical routine that takes so long to run?
I agree with some of the others that there may be an architectural issue here.
If the purpose of having precise 500ms (or whatever) intervals is to have signal changes occuring at specific time intervals, you may be better off with a fast ISR that ouputs the new signals based on a previous calculation, and then set a flag that would cause the new calculation to run outside of the ISR.
Can you better describe what this long-running routine is doing, and what the need for the specific interval is for?
Addition based on the comments:
If you can insure that the time in the service routine is of a predictable duration, you might get away with missing the timer interrupt postings...
To take your example, if your timer interrupt is set for 10 ms periods, and you know your service routine will take 89ms, just go ahead and count up 41 timer interrupts, then do your 89 ms activity and miss eight timer interrupts (42nd to 49th).
Then, when your ISR exits (and clears the pending interrupt), the "first" interrupt of the next round of 500ms will occur about a ms later.
Given that you're "resource maxed" suggests that you have your other timer and interrupt sources also in use -- which means that relying on the main loop to be timed accurately isn't going to work, because those other interrupt sources could fire at the wrong moment.

If I'm interpretting your question correctly, you have:
a main loop
some high priority operation that needs to be run every 500ms, for a duration of up to 89ms
a 10ms timer that also performs a small number of operations.
There are three options as I see it.
The first is to use a second timer of a lower priority for your 500ms operations. You can still process your 10ms interrupt, and once complete continue servicing your 500ms timer interrupt.
Second option - doe you actually need to service your 10ms interrupt every 10ms? Is it doing anything other than time keeping? If not, and if your hardware will allow you to determine the number of 10ms ticks that have passed while processing your 500ms op's (ie. by not using the interrupts themselves), then can you start your 500ms op's within the 10ms interrupt and process the 10ms ticks that you missed when you're done.
Third option: To follow on from Justin Tanner's answer, it sounds like you could produce your own preemptive multitasking kernel to fill your requirements without too much trouble.
It sounds like all you need is two tasks - one for the main super loop and one for your 500ms task.
The code to swap between two contexts (ie. two copies of all of your registers, using different stack pointers) is very simple, and usually consists of a series of register pushes (to save the current context), a series of register pops (to restore your new context) and a return from interrupt instruction. Once your 500ms op's are complete, you restore the original context.
(I guess that strictly this is a hybrid of preemptive and cooperative multitasking, but that's not important right now)
edit:
There is a simple fourth option. Liberally pepper your main super loop with checks for whether the 500ms has elapsed, both before and after any lengthy operations.
Not exactly 500ms, but you may be able to reduce the latency to a tolerable level.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas