I am using freertos on STM32F4-Nucleo. To get CPU usage I am using vTaskGetRunTimeStats() with following changes
**void configureTimerForRunTimeStats(void)
{
g_osRuntimeCounter = 0;
}
unsigned long getRunTimeCounterValue(void)
{
return g_osRuntimeCounter;
}**
The g_osRuntimeCounter increments in timer callback function. Everything works fine I am getting CPU usage of running tasks and IDLE time but is there a way to get CPU usage in ISR?
The CAN reception and USB-OTG reception is interrupt based and executes few functions. And I need CPU usage of that.
Related
I'm trying to send a variable size array of bytes over SPI using interrupts. The system is composed by two nucleo STM32L432 boards. The sender board works fine, but I'm having issue with the receiver board. Specifically, I noticed that very often some bytes are dropped. Beyond the default initialization provided by CubeMX, I have also the following settings in my init function:
// Trigger RXNE when the FIFO is 1/4 full
LL_SPI_SetRxFIFOThreshold(sw.spi_sw2pc,LL_SPI_RX_FIFO_TH_QUARTER);
// Enable RXNE interrupt
LL_SPI_EnableIT_RXNE(sw.spi_sw2pc);
// Enable SPI
if((SPI3->CR1 & SPI_CR1_SPE) != SPI_CR1_SPE)
{
// If disabled, I enable it
SET_BIT(sw.spi_sw2pc->CR1, SPI_CR1_SPE);
}
The SPI is set to work at 10 Mbit/s. Can it be that the communication speed is too fast?
Following are the IRQ handler and the callback.
IRQ handler
void SPI3_IRQHandler(void)
{
/* USER CODE BEGIN SPI3_IRQn 0 */
/* Check RXNE flag value in ISR register */
if(LL_SPI_IsActiveFlag_RXNE(SPI3))
{
/* Call function Slave Reception Callback */
SW_rx_callback();
}
/* USER CODE END SPI3_IRQn 0 */
/* USER CODE BEGIN SPI3_IRQn 1 */
/* USER CODE END SPI3_IRQn 1 */
}
Callback
void SW_rx_callback(void)
{
// RXNE flag is cleared by reading data in DR register
while(LL_SPI_IsActiveFlag_RXNE(SPI3))
recv_buffer[recv_buffer_index++] = LL_SPI_ReceiveData8(SPI3);
if(LL_SPI_GetRxFIFOLevel(SPI3) == LL_SPI_RX_FIFO_EMPTY)
{
// If there are no more data
new_data_arrived = true;
memset(recv_buffer,'\0',recv_buffer_index);
recv_buffer_index = 0;
}
}
Thank you in advance for your help.
SPI on 10 Mbits mean that you will have 1.25 millions interrupts per second (in case of 8bit transfer) and this is quite enough to process by interrupts especially in combination with HAL.
STM32L4xx is quite fast (80MHz) but in this case it mean that every interrupt call can't take longer than 64 cycles. but calling interrupt take 12 cycles, exit interrupt 10 cycles (it is in ideal state with no wait states on bus) so if your interrupt code will take 42 or more cycles then you can be sure that you miss some bytes.
Here are my suggestions:
First try to enable some compiler optimizations, to speed-up the code.
Change interrupt routine and remove everything unnecessary from interrupt handler (use SW FIFO and process received data in main loop)
But best solution in your case can be to use DMA transfer.
Here's a low level question. How CPU intensive is getting system time?
What is the source of the time? I know there is a hardware clock on the bios chip but I'm thinking that getting data from outside the CPU and RAM will need some hardware synchronization which may delay the read so I'm guessing the CPU may have its own clock. Feel free to correct me if I'm wrong in any way.
Does getting time incur a heavy system function call or is it in any way dependent on the used programming language?
I have just tested it using a C++ program:
clock_t started = clock();
clock_t endClock = started + CLOCKS_PER_SEC;
long itera = 0;
for (; clock() < endClock; itera++)
{
}
I get about 23 million iterations per second (Windows 7, 32bit, Visual Studio 2015, 2.6 GHz CPU). In terms of your question, I would not call this intensive.
In debug mode, I measured 18 million iterations per second.
In case the time is transformed into a localized timestamp, complicated calendar calculations (timezone, daylight saving time, ...) might significantly slow down the loop.
It is not easy to tell what happens inside the clock() call. For my system, it calls QueryPerfomanceCounter, but this recurs to other system functions as explained here.
Tuning
To reduce the time measurement overhead even further, you can measure in every 10th, 100th ... iteration.
The following measures once in 1024 iterations:
for (; (itera & 0x03FF) || (clock() < endClock); itera++)
{
}
This brings up the loop per second count to some 500 million.
Tuning with Timer Thread
The following yields a further improvement of some 10% paid with additional complexity:
std::atomic<bool> processing = true;
// launch a timer thread to clear the processing flag after 1s
std::thread t([&processing]() {
std::this_thread::sleep_for(std::chrono::seconds(1));
processing = false;
});
for (; (itera & 0x03FF) || processing; itera++)
{
}
t.join();
An extra thread is started which sleeps for one second and then sets a control variable. The main thread executes the loop until the timer threads signals the end of processing.
I have PIC18F87J11 with 8 MHz oscillator and I am using timer1 as real time clock. At this moment I have it toggle an LED every 1 minute. I noticed it does work perfect fine the first few times but slowly it starts toggling the LED every 59 seconds. Then every few minutes it keeps going down to 58, 57, etc. I don't know if its impossible to get an accurate clock using internal oscillator or if I need external oscillator. My settings look right for timer1, I just hope I can resolve this issue with the current hardware.
Prescaler 1:8, TMR1 Preload = 15536, Actual Interrupt Time : 200 ms
// Timer 1 Settings
RCONbits.IPEN = 1; // Enable interrupt system priority feature
INTCONbits.GIEL = 1; // Enable low priority interrupts
// 1:8 prescalar
T1CONbits.T1CKPS1 = 1;
T1CONbits.T1CKPS0 = 1;
// Use Internal Clock
T1CONbits.TMR1CS = 0;
// Timer1 overflow interrupt
PIE1bits.TMR1IE = 1;
IPR1bits.TMR1IP = 0; // Timer 1 -> Low priority interrupt group
PIE1bits.TMR1IE = 1; // Enable Timer1 interrupt
// TMR1 Preload = 15536;
TMR1H = 0x3C;
TMR1L = 0xB0;
Interrupt Routine
void interrupt low_priority lowISR(void) {
if (PIR1bits.TMR1IF == 1) {
oneSecond++;
if (oneSecond == 5) {
minute_Counter++;
if (minute_Counter >= 60) {
// One minute passed
Printf("\r\n One minute Passed");
ToggleLed();
minute_Counter = 0;
}
oneSecond = 0;
}
// TMR1 Preload = 15536;
TMR1H = 0x3C;
TMR1L = 0xB0;
PIR1bits.TMR1IF = 0;
}}
The internal oscillator is a simple RC oscilator (a resistor/capacitor time constant determines its frequency), this kind of circuit may be accurate to only +/-10% over the operating temperature range of the device, and the device will be self-heating due to normal operating power dissipation.
From the data sheet:
An external crystal or other accurate external clock source is required to get accurate timing. Alternatively, if you have some other stable and accurate, but low frequency clock source, such as output from an RTC with a 38768 Hz crystal, you can use that to calibrate the internal RC oscillator and dynamically adjust it with the OSCTUNE register - by using a timer gated by the low frequency source, you can determine the actual frequency of INTOSC and adjust accordingly - it will not be perfect, but it will be better - but no better than the precision of the calibrating source of course.
Some devices have a die temperature sensor that can also be used to compensate, but that is not available on your device.
The RC error can cause serial communications mistiming to the extent that you cannot communicate with a device using asynchronous (UART) serial comms.
There are some stuff in the datasheet you linked, "2.5.3 INTERNAL OSCILLATOR OUTPUT FREQUENCY AND TUNING", on p38
The datasheet says that
The INTOSC frequency may drift as VDD or temperature changes".
Are VDD and temperature stable ?
It notes three ways to deal with this by tuning the OSCTUNE register. The three of them would need an external "oscillator" :
dealing with errors of EUSART...this signal should come from somewhere.
a peripheral clock
cpp module in capture mode. You may use any stable AC signal as input.
Good luck !
Reload the Timer as soon as it expires, the delay between timer overflow and rearm is affecting the total time. So this will solve your problem.
void interrupt low_priority lowISR(void)
{
if (PIR1bits.TMR1IF)
{
PIR1bits.TMR1IF = 0;
TMR1H = 0x3C;
TMR1L = 0xAF;
/* rest of the code here */
. . . .
}
}
One more recommendation is not to load up the isr, keep it simple.
For all timing, time and frequency applications the first and most important thing to do is to CALIBRATE THE CRYSTAL OSCILLATOR!!! The oscillator itself and its crystal MUST run exactly (to better than 1 part per million = 1ppm) of its nominal frequency. Crystals straight out of a factory (except some very specialized and expensive ones = 100's of $) are not running exactly at their nominal frequency. If the calibration is not done, all time and frequency related functions will be off, because the oscillator frequency is used as reference for all PICs internal functions. The calibration must be done against an accurate frequency counter by adjusting one of the capacitors from crystal pins to ground. Any processor routines for frequency (and time) calibration are not accurate enough.
I am running this code on AIX 6.1
while(true)
{
int a = rand(); //generate a random integer value
void* test = malloc(a*a); //allocate large chunk of memory block
usleep(3000000); //sleep for 3 sec
free(test); // release memory block
}
using MALLOCTYPE=buckets
My observation is
Resident set size(real memory) and data section size for process is continuously increasing. This is check by command ps v PID
pg sp value shown in topas for process is slowly increasing.
Can someone justify this behavior.
On free, memory is not released to AIX os, but it is reserved for reuse. With MALLOCOPTIONS=disclaim, free releases memory back to AIX os and their is not increase in memory utilization. But with MALLOCOPTIONS=disclaim, CPU utilization is almost 2-3 times greater.
CLOCK_MONOTONIC does not seem available, so clock_gettime is out.
I've read in some places that mach_absolute_time() might be the right way to go, but after reading that it was a 'cpu dependent value', it instantly made me wonder if it is using rtdsc underneath. Thus, the value could drift over time even if it is monotonic. Also, issues with thread affinity could result in meaningfully different results from calling the function (making it not monotonic across all cores).
Of course, that is just speculation. Does anyone know how mach_absolute_time actually works? I'm actually looking for a replacement to clock_gettime(CLOCK_MONOTONIC... or something like it for OSX. No matter what the clock source is, I expect at least millisecond precision and millisecond accuracy.
I'd just like to understand what clocks are available, which clocks are monotonic, if certain clocks drift, have thread affinity issues, aren't supported on all Mac hardware, or take a 'super high' number of cpu cycles to execute.
Here are the links I was able to find about this topic (some are already dead links and not findable on archive.org):
https://developer.apple.com/library/mac/#qa/qa1398/_index.html
http://www.wand.net.nz/~smr26/wordpress/2009/01/19/monotonic-time-in-mac-os-x/
http://www.meandmark.com/timing.pdf
Thanks!
Brett
The Mach kernel provides access to system clocks, out of which at least one (SYSTEM_CLOCK) is advertised by the documentation as being monotonically incrementing.
#include <mach/clock.h>
#include <mach/mach.h>
clock_serv_t cclock;
mach_timespec_t mts;
host_get_clock_service(mach_host_self(), SYSTEM_CLOCK, &cclock);
clock_get_time(cclock, &mts);
mach_port_deallocate(mach_task_self(), cclock);
mach_timespec_t has nanosecond precision. I'm not sure about the accuracy, though.
Mac OS X supports three clocks:
SYSTEM_CLOCK returns the time since boot time;
CALENDAR_CLOCK returns the UTC time since 1970-01-01;
REALTIME_CLOCK is deprecated and is the same as SYSTEM_CLOCK in its current implementation.
The documentation for clock_get_time says the clocks are monotonically incrementing unless someone calls clock_set_time. Calls to clock_set_time are discouraged as it could break the monotonic property of the clocks, and in fact, the current implementation returns KERN_FAILURE without doing anything.
After looking up a few different answers for this I ended up defining a header which emulates clock_gettime on mach:
#include <sys/types.h>
#include <sys/_types/_timespec.h>
#include <mach/mach.h>
#include <mach/clock.h>
#ifndef mach_time_h
#define mach_time_h
/* The opengroup spec isn't clear on the mapping from REALTIME to CALENDAR
being appropriate or not.
http://pubs.opengroup.org/onlinepubs/009695299/basedefs/time.h.html */
// XXX only supports a single timer
#define TIMER_ABSTIME -1
#define CLOCK_REALTIME CALENDAR_CLOCK
#define CLOCK_MONOTONIC SYSTEM_CLOCK
typedef int clockid_t;
/* the mach kernel uses struct mach_timespec, so struct timespec
is loaded from <sys/_types/_timespec.h> for compatability */
// struct timespec { time_t tv_sec; long tv_nsec; };
int clock_gettime(clockid_t clk_id, struct timespec *tp);
#endif
and in mach_gettime.c
#include "mach_gettime.h"
#include <mach/mach_time.h>
#define MT_NANO (+1.0E-9)
#define MT_GIGA UINT64_C(1000000000)
// TODO create a list of timers,
static double mt_timebase = 0.0;
static uint64_t mt_timestart = 0;
// TODO be more careful in a multithreaded environement
int clock_gettime(clockid_t clk_id, struct timespec *tp)
{
kern_return_t retval = KERN_SUCCESS;
if( clk_id == TIMER_ABSTIME)
{
if (!mt_timestart) { // only one timer, initilized on the first call to the TIMER
mach_timebase_info_data_t tb = { 0 };
mach_timebase_info(&tb);
mt_timebase = tb.numer;
mt_timebase /= tb.denom;
mt_timestart = mach_absolute_time();
}
double diff = (mach_absolute_time() - mt_timestart) * mt_timebase;
tp->tv_sec = diff * MT_NANO;
tp->tv_nsec = diff - (tp->tv_sec * MT_GIGA);
}
else // other clk_ids are mapped to the coresponding mach clock_service
{
clock_serv_t cclock;
mach_timespec_t mts;
host_get_clock_service(mach_host_self(), clk_id, &cclock);
retval = clock_get_time(cclock, &mts);
mach_port_deallocate(mach_task_self(), cclock);
tp->tv_sec = mts.tv_sec;
tp->tv_nsec = mts.tv_nsec;
}
return retval;
}
Just use Mach Time.
It is public API, it works on macOS, iOS, and tvOS and it works from within the sandbox.
Mach Time returns an abstract time unit that I usually call "clock ticks". The length of a clock tick is system specific and depends on the CPU. On current Intel systems a clock tick is in fact exactly one nanosecond but you cannot rely on that (may be different for ARM and it certainly was different for PowerPC CPUs). The system can also tell you the conversion factor to convert clock ticks to nanoseconds and nanoseconds to clock ticks (this factor is static, it won't ever change at runtime). When your system boots, the clock starts at 0 and then monotonically increases with every clock tick thereafter, so you can also use Mach Time to get the uptime of your system (and, of course, uptime is monotonic!).
Here's some code:
#include <stdio.h>
#include <inttypes.h>
#include <mach/mach_time.h>
int main ( ) {
uint64_t clockTicksSinceSystemBoot = mach_absolute_time();
printf("Clock ticks since system boot: %"PRIu64"\n",
clockTicksSinceSystemBoot
);
static mach_timebase_info_data_t timebase;
mach_timebase_info(&timebase);
// Cast to double is required to make this a floating point devision,
// otherwise it would be an interger division and only the result would
// be converted to floating point!
double clockTicksToNanosecons = (double)timebase.numer / timebase.denom;
uint64_t systemUptimeNanoseconds = (uint64_t)(
clockTicksToNanosecons * clockTicksSinceSystemBoot
);
uint64_t systemUptimeSeconds = systemUptimeNanoseconds / (1000 * 1000 * 1000);
printf("System uptime: %"PRIu64" seconds\n", systemUptimeSeconds);
}
You can also put a thread to sleep until a certain Mach Time has been reached. Here's some code for that:
// Sleep for 750 ns
uint64_t machTimeNow = mach_absolute_time();
uint64_t clockTicksToSleep = (uint64_t)(750 / clockTicksToNanosecons);
uint64_t machTimeIn750ns = machTimeNow + clockTicksToSleep;
mach_wait_until(machTimeIn750ns);
As Mach Time has no relation to any wallclock time, you can play around with your system date and time setting as you like, that won't have any effect on Mach Time.
There's one special consideration, though, that may make Mach Time unsuitable for certain use cases: The CPU clock is not running while your system is asleep! So if you make a thread wait for 5 minutes and after 1 minute the system goes to sleep and stays asleep for 30 minutes, the thread is still waiting another 4 minutes after the system has woken up as the 30 minutes sleep time don't count! The CPU clock was resting as well during that time. Yet in other cases this is exactly what you want to happen.
Mach Time is also a very precise way to measure time spent. Here's some code showing that task:
// Measure time
uint64_t machTimeBegin = mach_absolute_time();
sleep(1);
uint64_t machTimeEnd = mach_absolute_time();
uint64_t machTimePassed = machTimeEnd - machTimeBegin;
uint64_t timePassedNS = (uint64_t)(
machTimePassed * clockTicksToNanosecons
);
printf("Thread slept for: %"PRIu64" ns\n", timePassedNS);
You will see that the thread doesn't sleep for exactly one second, that's because it takes some time to put a thread to sleep, to wake it back up again and even when awake, it won't get CPU time immediately if all cores are already busy running a thread at that moment.
Update (2018-09-26)
Since macOS 10.12 (Sierra) there also exists mach_continuous_time. The only difference between mach_continuous_time and mach_absolute_time is that continues time also advances when the system is asleep. So in case this was a problem so far and a reason for not using Mach Time, 10.12 and up offer a solution to this problem. The usage is exactly the same as described above.
Also starting with macOS 10.9 (Mavericks), there is a mach_approximate_time and in 10.12 there's also a mach_continuous_approximate_time. These two are identical to mach_absolute_time and mach_continuous_time with the only difference, that they are faster yet less accurate. The standard functions require a call into the kernel as the kernel takes care of Mach Time. Such a call is somewhat expensive, especially on systems that already have a Meltdown fix. The approximate versions won't have to always call into the kernel. They use a clock in user space that is only synchronized with the kernel clock from time to time to prevent that it is running too far out of sync, yet a small deviation is always possible and thus it is only the "approximate" Mach Time.