Monotonic clock on OSX

Monotonic clock on OSX - objective-c

CLOCK_MONOTONIC does not seem available, so clock_gettime is out.
I've read in some places that mach_absolute_time() might be the right way to go, but after reading that it was a 'cpu dependent value', it instantly made me wonder if it is using rtdsc underneath. Thus, the value could drift over time even if it is monotonic. Also, issues with thread affinity could result in meaningfully different results from calling the function (making it not monotonic across all cores).
Of course, that is just speculation. Does anyone know how mach_absolute_time actually works? I'm actually looking for a replacement to clock_gettime(CLOCK_MONOTONIC... or something like it for OSX. No matter what the clock source is, I expect at least millisecond precision and millisecond accuracy.
I'd just like to understand what clocks are available, which clocks are monotonic, if certain clocks drift, have thread affinity issues, aren't supported on all Mac hardware, or take a 'super high' number of cpu cycles to execute.
Here are the links I was able to find about this topic (some are already dead links and not findable on archive.org):
https://developer.apple.com/library/mac/#qa/qa1398/_index.html
http://www.wand.net.nz/~smr26/wordpress/2009/01/19/monotonic-time-in-mac-os-x/
http://www.meandmark.com/timing.pdf
Thanks!
Brett

The Mach kernel provides access to system clocks, out of which at least one (SYSTEM_CLOCK) is advertised by the documentation as being monotonically incrementing.
#include <mach/clock.h>
#include <mach/mach.h>
clock_serv_t cclock;
mach_timespec_t mts;
host_get_clock_service(mach_host_self(), SYSTEM_CLOCK, &cclock);
clock_get_time(cclock, &mts);
mach_port_deallocate(mach_task_self(), cclock);
mach_timespec_t has nanosecond precision. I'm not sure about the accuracy, though.
Mac OS X supports three clocks:
SYSTEM_CLOCK returns the time since boot time;
CALENDAR_CLOCK returns the UTC time since 1970-01-01;
REALTIME_CLOCK is deprecated and is the same as SYSTEM_CLOCK in its current implementation.
The documentation for clock_get_time says the clocks are monotonically incrementing unless someone calls clock_set_time. Calls to clock_set_time are discouraged as it could break the monotonic property of the clocks, and in fact, the current implementation returns KERN_FAILURE without doing anything.

After looking up a few different answers for this I ended up defining a header which emulates clock_gettime on mach:
#include <sys/types.h>
#include <sys/_types/_timespec.h>
#include <mach/mach.h>
#include <mach/clock.h>
#ifndef mach_time_h
#define mach_time_h
/* The opengroup spec isn't clear on the mapping from REALTIME to CALENDAR
being appropriate or not.
http://pubs.opengroup.org/onlinepubs/009695299/basedefs/time.h.html */
// XXX only supports a single timer
#define TIMER_ABSTIME -1
#define CLOCK_REALTIME CALENDAR_CLOCK
#define CLOCK_MONOTONIC SYSTEM_CLOCK
typedef int clockid_t;
/* the mach kernel uses struct mach_timespec, so struct timespec
is loaded from <sys/_types/_timespec.h> for compatability */
// struct timespec { time_t tv_sec; long tv_nsec; };
int clock_gettime(clockid_t clk_id, struct timespec *tp);
#endif
and in mach_gettime.c
#include "mach_gettime.h"
#include <mach/mach_time.h>
#define MT_NANO (+1.0E-9)
#define MT_GIGA UINT64_C(1000000000)
// TODO create a list of timers,
static double mt_timebase = 0.0;
static uint64_t mt_timestart = 0;
// TODO be more careful in a multithreaded environement
int clock_gettime(clockid_t clk_id, struct timespec *tp)
{
kern_return_t retval = KERN_SUCCESS;
if( clk_id == TIMER_ABSTIME)
{
if (!mt_timestart) { // only one timer, initilized on the first call to the TIMER
mach_timebase_info_data_t tb = { 0 };
mach_timebase_info(&tb);
mt_timebase = tb.numer;
mt_timebase /= tb.denom;
mt_timestart = mach_absolute_time();
}
double diff = (mach_absolute_time() - mt_timestart) * mt_timebase;
tp->tv_sec = diff * MT_NANO;
tp->tv_nsec = diff - (tp->tv_sec * MT_GIGA);
}
else // other clk_ids are mapped to the coresponding mach clock_service
{
clock_serv_t cclock;
mach_timespec_t mts;
host_get_clock_service(mach_host_self(), clk_id, &cclock);
retval = clock_get_time(cclock, &mts);
mach_port_deallocate(mach_task_self(), cclock);
tp->tv_sec = mts.tv_sec;
tp->tv_nsec = mts.tv_nsec;
}
return retval;
}

Just use Mach Time.
It is public API, it works on macOS, iOS, and tvOS and it works from within the sandbox.
Mach Time returns an abstract time unit that I usually call "clock ticks". The length of a clock tick is system specific and depends on the CPU. On current Intel systems a clock tick is in fact exactly one nanosecond but you cannot rely on that (may be different for ARM and it certainly was different for PowerPC CPUs). The system can also tell you the conversion factor to convert clock ticks to nanoseconds and nanoseconds to clock ticks (this factor is static, it won't ever change at runtime). When your system boots, the clock starts at 0 and then monotonically increases with every clock tick thereafter, so you can also use Mach Time to get the uptime of your system (and, of course, uptime is monotonic!).
Here's some code:
#include <stdio.h>
#include <inttypes.h>
#include <mach/mach_time.h>
int main ( ) {
uint64_t clockTicksSinceSystemBoot = mach_absolute_time();
printf("Clock ticks since system boot: %"PRIu64"\n",
clockTicksSinceSystemBoot
);
static mach_timebase_info_data_t timebase;
mach_timebase_info(&timebase);
// Cast to double is required to make this a floating point devision,
// otherwise it would be an interger division and only the result would
// be converted to floating point!
double clockTicksToNanosecons = (double)timebase.numer / timebase.denom;
uint64_t systemUptimeNanoseconds = (uint64_t)(
clockTicksToNanosecons * clockTicksSinceSystemBoot
);
uint64_t systemUptimeSeconds = systemUptimeNanoseconds / (1000 * 1000 * 1000);
printf("System uptime: %"PRIu64" seconds\n", systemUptimeSeconds);
}
You can also put a thread to sleep until a certain Mach Time has been reached. Here's some code for that:
// Sleep for 750 ns
uint64_t machTimeNow = mach_absolute_time();
uint64_t clockTicksToSleep = (uint64_t)(750 / clockTicksToNanosecons);
uint64_t machTimeIn750ns = machTimeNow + clockTicksToSleep;
mach_wait_until(machTimeIn750ns);
As Mach Time has no relation to any wallclock time, you can play around with your system date and time setting as you like, that won't have any effect on Mach Time.
There's one special consideration, though, that may make Mach Time unsuitable for certain use cases: The CPU clock is not running while your system is asleep! So if you make a thread wait for 5 minutes and after 1 minute the system goes to sleep and stays asleep for 30 minutes, the thread is still waiting another 4 minutes after the system has woken up as the 30 minutes sleep time don't count! The CPU clock was resting as well during that time. Yet in other cases this is exactly what you want to happen.
Mach Time is also a very precise way to measure time spent. Here's some code showing that task:
// Measure time
uint64_t machTimeBegin = mach_absolute_time();
sleep(1);
uint64_t machTimeEnd = mach_absolute_time();
uint64_t machTimePassed = machTimeEnd - machTimeBegin;
uint64_t timePassedNS = (uint64_t)(
machTimePassed * clockTicksToNanosecons
);
printf("Thread slept for: %"PRIu64" ns\n", timePassedNS);
You will see that the thread doesn't sleep for exactly one second, that's because it takes some time to put a thread to sleep, to wake it back up again and even when awake, it won't get CPU time immediately if all cores are already busy running a thread at that moment.
Update (2018-09-26)
Since macOS 10.12 (Sierra) there also exists mach_continuous_time. The only difference between mach_continuous_time and mach_absolute_time is that continues time also advances when the system is asleep. So in case this was a problem so far and a reason for not using Mach Time, 10.12 and up offer a solution to this problem. The usage is exactly the same as described above.
Also starting with macOS 10.9 (Mavericks), there is a mach_approximate_time and in 10.12 there's also a mach_continuous_approximate_time. These two are identical to mach_absolute_time and mach_continuous_time with the only difference, that they are faster yet less accurate. The standard functions require a call into the kernel as the kernel takes care of Mach Time. Such a call is somewhat expensive, especially on systems that already have a Meltdown fix. The approximate versions won't have to always call into the kernel. They use a clock in user space that is only synchronized with the kernel clock from time to time to prevent that it is running too far out of sync, yet a small deviation is always possible and thus it is only the "approximate" Mach Time.

Related

Canonical way(s) of determining system time in a microcontroller

Every so often I start a bare metal microcontroller project and end up implementing a system time measurement using a random timer unit.
I am working with ARM Cortex-M devices for a (albeit short) while now and typically used the SysTick ("System Tick") interrupt to create a 1ms resolution timer. It recently stumbled over a post that suggested chaining two Programmable Interrupt Timers (on a Kinetis KL25Z device) in order to create an interrupt-less 32bit millisecond timer, however sacrificing two PIT interrupts which may come in handy later on.
So I was wondering if there are some (sort of) canonical ways to determine the system time on a microcontroller - preferrably for Kinetis KL2xZ devices as I currently work with these, but not necessarily so.

The canonical method as you put it is exactly as you have done - using systick. That is the single timer device defined by the Cortex-M architecture; any other timer hardware is external to the core and vendor specific.
Some parts (STM32F2 for example) include 32 bit timer/counter hardware, so you would not need to chain two.
The best approach is to abstract timer services by defining a generic timer API that you implement for all parts you need so that the application layer is identical for all parts. For example in this case you might simply implement the standard library clock() function and define CLOCKS_PER_SEC.
If you are using two free-running cascaded timers, you must ensure high/low word consistency when combining the two counter values:
#include <time.h>
clock_t clock( void )
{
uint16_t low_word = 0 ;
uint16_t hi_word = 0 ;
do
{
hi_word = readTimerH() ;
lo_word = readTimerL() ;
} while( hi_word != readTimerH() ) ;
return (clock_t)(hi_word << 16 | lo_word) ;
}

I just looked into KL25 Sub-Family Reference Manual.
In Chapter 34 Real Time Clock (RTC) section 34.3.2 Time counter (may differ with document version).
I found that there are Two registers for Timer counter in RTC
32-bit seconds counter
16-bit prescaler register that increments once every 32.768 kHz clock cycle
Reference Manual says
Always write to the prescaler register before writing to the seconds register,
because the seconds register increments on the falling edge of bit 14 of the prescaler
register.
Which means to calculate system time, read rtc_sec_counter and add 14 bits of prescalar_reg
you can even create a macro to give you system time in uSec and mSec from combination of rtc_sec_counter and prescalar_reg or Sec(obviously from rtc_sec_counter)
For 16 bit prescalar REG System clock is 32.768 Khz, with this we can create macros to get time in uSec and mSec
#define PRESCALAR_TICK 32768
#define KHZ 1000
#define MHZ 1000000
/// Here first we extract 14bit value of prescalar_reg and than multiply it with MHZ to get better precision
/// but this value will not go more than 14 Bit
#define GET_SYS_US ((((prescalar_reg & 0x03FFF)*MHZ)/PRESCALAR_TICK))
#define GET_SYS_MS (GET_SYS_US)/KHZ)
if you need time in milliseconds up to 32 bit use below macro
#define GET_SYS_US_32bit ((rtc_sec_counter * 0x3FFF) + GET_SYS_US)
#define GET_SYS_MS_32bit ((rtc_sec_counter * 0x3FFF) + GET_SYS_MS)
But to use these information you must initialise RTC of you micro (Obviously)

How to make Timer1 more accurate as a real time clock?

I have PIC18F87J11 with 8 MHz oscillator and I am using timer1 as real time clock. At this moment I have it toggle an LED every 1 minute. I noticed it does work perfect fine the first few times but slowly it starts toggling the LED every 59 seconds. Then every few minutes it keeps going down to 58, 57, etc. I don't know if its impossible to get an accurate clock using internal oscillator or if I need external oscillator. My settings look right for timer1, I just hope I can resolve this issue with the current hardware.
Prescaler 1:8, TMR1 Preload = 15536, Actual Interrupt Time : 200 ms
// Timer 1 Settings
RCONbits.IPEN = 1; // Enable interrupt system priority feature
INTCONbits.GIEL = 1; // Enable low priority interrupts
// 1:8 prescalar
T1CONbits.T1CKPS1 = 1;
T1CONbits.T1CKPS0 = 1;
// Use Internal Clock
T1CONbits.TMR1CS = 0;
// Timer1 overflow interrupt
PIE1bits.TMR1IE = 1;
IPR1bits.TMR1IP = 0; // Timer 1 -> Low priority interrupt group
PIE1bits.TMR1IE = 1; // Enable Timer1 interrupt
// TMR1 Preload = 15536;
TMR1H = 0x3C;
TMR1L = 0xB0;
Interrupt Routine
void interrupt low_priority lowISR(void) {
if (PIR1bits.TMR1IF == 1) {
oneSecond++;
if (oneSecond == 5) {
minute_Counter++;
if (minute_Counter >= 60) {
// One minute passed
Printf("\r\n One minute Passed");
ToggleLed();
minute_Counter = 0;
}
oneSecond = 0;
}
// TMR1 Preload = 15536;
TMR1H = 0x3C;
TMR1L = 0xB0;
PIR1bits.TMR1IF = 0;
}}

The internal oscillator is a simple RC oscilator (a resistor/capacitor time constant determines its frequency), this kind of circuit may be accurate to only +/-10% over the operating temperature range of the device, and the device will be self-heating due to normal operating power dissipation.
From the data sheet:
An external crystal or other accurate external clock source is required to get accurate timing. Alternatively, if you have some other stable and accurate, but low frequency clock source, such as output from an RTC with a 38768 Hz crystal, you can use that to calibrate the internal RC oscillator and dynamically adjust it with the OSCTUNE register - by using a timer gated by the low frequency source, you can determine the actual frequency of INTOSC and adjust accordingly - it will not be perfect, but it will be better - but no better than the precision of the calibrating source of course.
Some devices have a die temperature sensor that can also be used to compensate, but that is not available on your device.
The RC error can cause serial communications mistiming to the extent that you cannot communicate with a device using asynchronous (UART) serial comms.

There are some stuff in the datasheet you linked, "2.5.3 INTERNAL OSCILLATOR OUTPUT FREQUENCY AND TUNING", on p38
The datasheet says that
The INTOSC frequency may drift as VDD or temperature changes".
Are VDD and temperature stable ?
It notes three ways to deal with this by tuning the OSCTUNE register. The three of them would need an external "oscillator" :
dealing with errors of EUSART...this signal should come from somewhere.
a peripheral clock
cpp module in capture mode. You may use any stable AC signal as input.
Good luck !

Reload the Timer as soon as it expires, the delay between timer overflow and rearm is affecting the total time. So this will solve your problem.
void interrupt low_priority lowISR(void)
{
if (PIR1bits.TMR1IF)
{
PIR1bits.TMR1IF = 0;
TMR1H = 0x3C;
TMR1L = 0xAF;
/* rest of the code here */
. . . .
}
}
One more recommendation is not to load up the isr, keep it simple.

For all timing, time and frequency applications the first and most important thing to do is to CALIBRATE THE CRYSTAL OSCILLATOR!!! The oscillator itself and its crystal MUST run exactly (to better than 1 part per million = 1ppm) of its nominal frequency. Crystals straight out of a factory (except some very specialized and expensive ones = 100's of $) are not running exactly at their nominal frequency. If the calibration is not done, all time and frequency related functions will be off, because the oscillator frequency is used as reference for all PICs internal functions. The calibration must be done against an accurate frequency counter by adjusting one of the capacitors from crystal pins to ground. Any processor routines for frequency (and time) calibration are not accurate enough.

Acoustic Echo Cancellation (AEC) in embedded software

I am doing a VoIP project on embedded device. I have built a sample using a 32bits MCU with a low grade audio codec. Now I found that there is echo issue on my device, that is I can hear what I said from the speaker. I have do some research and found that most appliaction use a DSP codec with acoustic echo cancellation feature. However, is it possible that I do the acoustic echo cancellation in the software, using my 32bits MCU?
Can you adive the algorithm, or even source code:P, for doing acoustic echo cancellation? I know sophisticated method is not possible on a MCU, whereas a simple algorithm is also welcomed.
Thank you
[Follow up] : I have tried some AEC code but they can not work well in my MCU, probably it is the limit of the MCU power. I found that my device become non-real-time when implemented these codes (but a VoIP need a real-time respond). At last I implemented a analog hardware solution by adding an AEC chips, because I do not want to write the code again in another DSP chip.

I had a heck of a time with echo cancellation. I wrote a softphone, and the user can switch their audio input and output devices around to suit their fancy. I tried the Speex echo cancellation library, and several other open source libs I found online. None worked well for me. I tried different speaker/mike configuration and the echo was always there in some form or fashion.
I believe it would be very hard to create AEC code that would work for all possible speaker configurations / room sizes / background noises..etc. Finally I sat down and wrote my own echo cancellation module for my softphone with this algorithm.
It's somewhat crude, but it has worked well and is reliable.
variable1:
Keep a record of what the average amplitude is of when the person to whom you're talking is speaking. (Don't factor quiet-time)
variable2:
Keep a record of what the average amplitude is on the input (mike), but only when there is voice- again- don't factor quiet time.
As soon as there's audio to play- cut the mike. And assuming the person listening is not talking, turn the mike on 150-300ms after the last audible audio frame comes in to be played.
If the audio from the microphones (that you're dropping during playback) is greater than oh- say (variable2 * 1.5), start sending the audio input frames for a specified duration, resetting that duration every time the input amplitude reaches (variable2 * 1.5).
That way the person talking will know they are being interrupted, and stop to see what the person is saying. If the person talking doesn't have too noisy of a background, they will probably hear most if not all of the interruption.
Like I said, not the most graceful, but it doesn't use a lot of resources (CPU, memory) and it actually works pretty darn well. I am very pleased with how mine sounds.
To implement it, I just made a few functions.
On a received audio frame, I call a function I called:
void audioin( AEC *ec, short *frame ) {
unsigned int tas=0; /* Total sum of all audio in frame (absolute value) */
int i=0;
for (;i<160;i++)
tas+=ABS(frame[i]);
tas/=160; /* 320 byte frames muLaw */
if (tas>300) { /* I assume this is audiable */
lockecho(ec);
ec->lastaudibleframe=GetTickCount64();
unlockecho(ec);
}
return;
}
and before sending a frame, I do:
#define ECHO_THRESHOLD 300 /* Time to keep suppression alive after last audible frame */
#define ONE_MINUTE 3000 /* 3000 20ms samples */
#define AVG_PERIOD 250 /* 250 20ms samples */
#define ABS(x) (x>0?x:-x)
char removeecho( AEC *ec, short *aecinput ) {
int tas=0; /* Average absolute amplitude in this signal */
int i=0;
unsigned long long *tot=0;
unsigned int *ctr=0;
unsigned short *avg=0;
char suppressframe=0;
lockecho(ec);
if (ec->lastaudibleframe+ECHO_THRESHOLD > GetTickCount64() ) {
/* If we're still within the threshold for echo (speaker state is ON) */
tot=&ec->t_aiws;
ctr=&ec->c_aiws;
avg=&ec->aiws;
} else {
/* If we're outside the threshold for echo (speaker state is OFF) */
tot=&ec->t_aiwos;
ctr=&ec->c_aiwos;
avg=&ec->aiwos;
}
for (;i<160;i++) {
tas+=ABS(aecinput[i]);
}
tas/=160;
if (tas>200) {
(*tot)+=tas;
(*avg)=(unsigned short)((*tot)/( (*ctr)?(*ctr):1));
(*ctr)++;
if ((*ctr)>AVG_PERIOD) {
(*tot)=(*avg);
(*ctr)=0;
}
}
if ( (avg==&ec->aiws) ) {
tas-=ec->aiwos;
if (tas<0) {
tas=0;
}
if ( ((unsigned short) tas > (ec->aiws*1.5)) && ((unsigned short)tas>=ec->aiwos) && (ec->aiwos!=0) ) {
suppressframe=0;
} else {
suppressframe=1;
}
}
if (suppressframe) { /* Silence frame */
memset(aecinput, 0, 320);
}
unlockecho(ec);
return suppressframe;
}
Which will silence the frame if it needs to. I keep all my variables, like the timers, and amplitude averages in the AEC struct, which I return from a call to
AEC *initecho( void ) {
AEC *ec=0;
ec=(AEC *)malloc(sizeof(AEC));
memset(ec, 0, sizeof(AEC));
ec->aiws=200; /* Just a default guess as to what the average amplitude would be */
return ec;
}
typedef struct aec {
unsigned long long lastaudibleframe; /* time stamp of last audible frame */
unsigned short aiws; /* Average mike input when speaker is playing */
unsigned short aiwos; /*Average mike input when speaker ISNT playing */
unsigned long long t_aiws, t_aiwos; /* Internal running total (sum of PCM) */
unsigned int c_aiws, c_aiwos; /* Internal counters for number of frames for averaging */
unsigned long lockthreadid; /* Thread ID with lock */
int stlc; /* Same thread lock-count */
} AEC;
You can adapt as you need to and play with the idea, but like I said. It actually sounds pretty dang good. The only problem I have is if they have a lot of background noise. But for me, if they pick up their USB handset or are using a headset, they can turn echo cancellation off, and not worry about it...but though PC speakers with a mike...I'm pretty happy with it.
I hope it helps, or gives you something to build on...

If you are doing a commercial project that this should be easy. You can integrate a commercial audio cancellation software in your VoIP application.

Timing different sections in CUDA kernel

I have a CUDA kernel that calls out to a series of device functions.
What is the best way to get the execution time for each of the device functions?
What is the best way to get the execution time for a section of code in one of the device functions?

In my own code, I use the clock() function to get precise timings. For convenience, I have the macros
enum {
tid_this = 0,
tid_that,
tid_count
};
__device__ float cuda_timers[ tid_count ];
#ifdef USETIMERS
#define TIMER_TIC clock_t tic; if ( threadIdx.x == 0 ) tic = clock();
#define TIMER_TOC(tid) clock_t toc = clock(); if ( threadIdx.x == 0 ) atomicAdd( &cuda_timers[tid] , ( toc > tic ) ? (toc - tic) : ( toc + (0xffffffff - tic) ) );
#else
#define TIMER_TIC
#define TIMER_TOC(tid)
#endif
These can then be used to instrument the device code as follows:
__global__ mykernel ( ... ) {
/* Start the timer. */
TIMER_TIC
/* Do stuff. */
...
/* Stop the timer and store the results to the "timer_this" counter. */
TIMER_TOC( tid_this );
}
You can then read the cuda_timers in the host code.
A few notes:
The timers work on a per-block basis, i.e. if you have 100 blocks executing the same kernel, the sum of all their times will be stored.
Having said that, the timer assumes that the zeroth thread is active, so make sure you do not call these macros in a possibly divergent part of the code.
The timers count the number of clock ticks. To get the number of milliseconds, divide this by the number of GHz on your device and multiply by 1000.
The timers can slow down your code a bit, which is why I wrapped them in the #ifdef USETIMERS so you can switch them off easily.
Although clock() returns integer values of type clock_t, I store the accumulated values as float, otherwise the values will wrap around for kernels that take longer than a few seconds (accumulated over all blocks).
The selection ( toc > tic ) ? (toc - tic) : ( toc + (0xffffffff - tic) ) ) is necessary in case the clock counter wraps around.
P.S. This is a copy of my reply to this question, which didn't get many points there since the timing required was for the whole kernel.

Precisely time a function call

I am using a microcontroller with a C51 core. I have a fairly timeconsuming and large subroutine that needs to be called every 500ms. An RTOS is not being used.
The way I am doing it right now is that I have an existing Timer interrupt of 10 ms. I set a flag after every 50 interrupts that is checked for being true in the main program loop. If the Flag is true the subroutine is called. The issue is that by the time the program loop comes round to servicing the flag, it is already more than 500ms,sometimes even >515 ms in case of certain code paths. The time taken is not accurately predictable.
Obviously, the subroutine cannot be called from inside the timer interrupt due to that large time it takes to execute.The subroutine takes 50ms to 89ms depending upon various conditions.
Is there a way to ensure that the subroutine is called in exactly 500ms each time?

I think you have some conflicting/not-thought-through requirements here. You say that you can't call this code from the timer ISR because it takes too long to run (implying that it is a lower-priority than something else which would be delayed), but then you are being hit by the fact that something else which should have been lower-priority is delaying it when you run it from the foreground path ('program loop').
If this work must happen at exactly 500ms, then run it from the timer routine, and deal with the fall-out from that. This is effectively what a pre-emptive RTOS would be doing anyway.
If you want it to run from the 'program loop', then you will have to make sure than nothing else which runs from that loop ever takes more than the maximum delay you can tolerate - often that means breaking your other long-running work into state-machines which can do a little bit of work per pass through the loop.

I don't think there's a way to guarantee it but this solution may provide an acceptable alternative.
Might I suggest not setting a flag but instead modifying a value?
Here's how it could work.
1/ Start a value at zero.
2/ Every 10ms interrupt, increase this value by 10 in the ISR (interrupt service routine).
3/ In the main loop, if the value is >= 500, subtract 500 from the value and do your 500ms activities.
You will have to be careful to watch for race conditions between the timer and main program in modifying the value.
This has the advantage that the function runs as close as possible to the 500ms boundaries regardless of latency or duration.
If, for some reason, your function starts 20ms late in one iteration, the value will already be 520 so your function will then set it to 20, meaning it will only wait 480ms before the next iteration.
That seems to me to be the best way to achieve what you want.
I haven't touched the 8051 for many years (assuming that's what C51 is targeting which seems a safe bet given your description) but it may have an instruction which will subtract 50 without an interrupt being possible. However, I seem to remember the architecture was pretty simple so you may have to disable or delay interrupts while it does the load/modify/store operation.
volatile int xtime = 0;
void isr_10ms(void) {
xtime += 10;
}
void loop(void) {
while (1) {
/* Do all your regular main stuff here. */
if (xtime >= 500) {
xtime -= 500;
/* Do your 500ms activity here */
}
}
}

You can also use two flags - a "pre-action" flag, and a "trigger" flag (using Mike F's as a starting point):
#define PREACTION_HOLD_TICKS (2)
#define TOTAL_WAIT_TICKS (10)
volatile unsigned char pre_action_flag;
volatile unsigned char trigger_flag;
static isr_ticks;
interrupt void timer0_isr (void) {
isr_ticks--;
if (!isr_ticks) {
isr_ticks=TOTAL_WAIT_TICKS;
trigger_flag=1;
} else {
if (isr_ticks==PREACTION_HOLD_TICKS)
preaction_flag=1;
}
}
// ...
int main(...) {
isr_ticks = TOTAL_WAIT_TICKS;
preaction_flag = 0;
tigger_flag = 0;
// ...
while (1) {
if (preaction_flag) {
preaction_flag=0;
while(!trigger_flag)
;
trigger_flag=0;
service_routine();
} else {
main_processing_routines();
}
}
}

A good option is to use an RTOS or write your own simple RTOS.
An RTOS to meet your needs will only need to do the following:
schedule periodic tasks
schedule round robin tasks
preform context switching
Your requirements are the following:
execute a periodic task every 500ms
in the extra time between execute round robin tasks ( doing non-time critical operations )
An RTOS like this will guarantee a 99.9% chance that your code will execute on time. I can't say 100% because whatever operations your do in your ISR's may interfere with the RTOS. This is a problem with 8-bit micro-controllers that can only execute one instruction at a time.
Writing an RTOS is tricky, but do-able. Here is an example of small ( 900 lines ) RTOS targeted at ATMEL's 8-bit AVR platform.
The following is the Report and Code created for the class CSC 460: Real Time Operating Systems ( at the University of Victoria ).

Would this do what you need?
#define FUDGE_MARGIN 2 //In 10ms increments
volatile unsigned int ticks = 0;
void timer_10ms_interrupt( void ) { ticks++; }
void mainloop( void )
{
unsigned int next_time = ticks+50;
while( 1 )
{
do_mainloopy_stuff();
if( ticks >= next_time-FUDGE_MARGIN )
{
while( ticks < next_time );
do_500ms_thingy();
next_time += 50;
}
}
}
NB: If you got behind with servicing your every-500ms task then this would queue them up, which may not be what you want.

One straightforward solution is to have a timer interrupt that fires off at 500ms...
If you have some flexibility in your hardware design, you can cascade the output of one timer to a second stage counter to get you a long time base. I forget, but I vaguely recall being able to cascade timers on the x51.

Ah, one more alternative for consideration -- the x51 architecture allow two levels of interrupt priorities. If you have some hardware flexibility, you can cause one of the external interrupt pins to be raised by the timer ISR at 500ms intervals, and then let the lower-level interrupt processing of your every-500ms code to occur.
Depending on your particular x51, you might be able to also generate a lower priority interrupt completely internal to your device.
See part 11.2 in this document I found on the web: http://www.esacademy.com/automation/docs/c51primer/c11.htm

Why do you have a time-critical routine that takes so long to run?
I agree with some of the others that there may be an architectural issue here.
If the purpose of having precise 500ms (or whatever) intervals is to have signal changes occuring at specific time intervals, you may be better off with a fast ISR that ouputs the new signals based on a previous calculation, and then set a flag that would cause the new calculation to run outside of the ISR.
Can you better describe what this long-running routine is doing, and what the need for the specific interval is for?
Addition based on the comments:
If you can insure that the time in the service routine is of a predictable duration, you might get away with missing the timer interrupt postings...
To take your example, if your timer interrupt is set for 10 ms periods, and you know your service routine will take 89ms, just go ahead and count up 41 timer interrupts, then do your 89 ms activity and miss eight timer interrupts (42nd to 49th).
Then, when your ISR exits (and clears the pending interrupt), the "first" interrupt of the next round of 500ms will occur about a ms later.
Given that you're "resource maxed" suggests that you have your other timer and interrupt sources also in use -- which means that relying on the main loop to be timed accurately isn't going to work, because those other interrupt sources could fire at the wrong moment.

If I'm interpretting your question correctly, you have:
a main loop
some high priority operation that needs to be run every 500ms, for a duration of up to 89ms
a 10ms timer that also performs a small number of operations.
There are three options as I see it.
The first is to use a second timer of a lower priority for your 500ms operations. You can still process your 10ms interrupt, and once complete continue servicing your 500ms timer interrupt.
Second option - doe you actually need to service your 10ms interrupt every 10ms? Is it doing anything other than time keeping? If not, and if your hardware will allow you to determine the number of 10ms ticks that have passed while processing your 500ms op's (ie. by not using the interrupts themselves), then can you start your 500ms op's within the 10ms interrupt and process the 10ms ticks that you missed when you're done.
Third option: To follow on from Justin Tanner's answer, it sounds like you could produce your own preemptive multitasking kernel to fill your requirements without too much trouble.
It sounds like all you need is two tasks - one for the main super loop and one for your 500ms task.
The code to swap between two contexts (ie. two copies of all of your registers, using different stack pointers) is very simple, and usually consists of a series of register pushes (to save the current context), a series of register pops (to restore your new context) and a return from interrupt instruction. Once your 500ms op's are complete, you restore the original context.
(I guess that strictly this is a hybrid of preemptive and cooperative multitasking, but that's not important right now)
edit:
There is a simple fourth option. Liberally pepper your main super loop with checks for whether the 500ms has elapsed, both before and after any lengthy operations.
Not exactly 500ms, but you may be able to reduce the latency to a tolerable level.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas