I am writing a small utility that will get the hardware and software time and print in a file.
This is to check whether both are in sync. I am searching for a vxworks function that prints the hardware time along with milliseconds.
Thanks
I looked this one up for you in the VxWorks 7.0 Manual.
try clock() - if this doesn't work (not able to test it) search the manual for terms like 'Time' and 'Clock' - yielded good results to me.
Related
I'm just starting to learn FreeRTOS and have set up the Posix/Linux simulator on my laptop. The Blinky demo works fine but the Full demo breaks after around 5000 ticks with the error Error: StreamBuffer - tick count 50000. The Demo file produces a trace dump when you exit the application but so far as I can see it is completely unreadable. It appears as seemingly random text characters as if I have the wrong encoding or an incorrect baud rate (if I was using a physical device). I can only assume that the dump files are not supposed to be viewed as a normal text file but I cannot find this in the documentation.
Thanks in advance
That is almost certainly a false positive error that will be due to running so many self monitoring tests at the same time - some of which assume one or more of their test tasks is the only task running at the highest priority. The trace comes from the Percepio tool, see https://www.freertos.org/trace
I'm trying to write a code in which every 1 ms a number plused one , should be replaced the old number . (something like a chronometer ! ) .
the problem is whenever the cpu usage increases because of some other programs running on the pc, this 1 milliseconds is also increased and timing in my program changes !
is there any way to prevent cpu load changes affecting timing in my program ?
It sounds as though you are trying to generate an analogue output waveform with a digital-to-analogue converter card using software timing, where your software is responsible for determining what value should be output at any given time and updating the output accordingly.
This is OK for stationary or low-speed signals but you are trying to do it at 1 ms intervals, in other words to output 1000 samples per second or 1 ks/s. You cannot do this reliably on a desktop operating system - there are too many other processes going on which can use CPU time and block your program from running for many milliseconds (or even seconds, e.g. for network access).
Here are a few ways you could solve this:
Use buffered, hardware-clocked output if your analogue output device supports it. Instead of writing one sample at a time, you send the device a waveform or array of samples and it outputs them at regular intervals using a timing signal generated in hardware. Unfortunately, low-end DAQ devices often don't support hardware-clocked output.
Instead of expecting the loop that writes your samples to the AO to run every millisecond, read LabVIEW's Tick Count (ms) value in the loop and use that as an index to your array of samples: rather than trying to output every sample, your code will now say 'what time is it now, and therefore what should the output be?' That won't give you a perfect signal out but at least now it should keep the correct frequency rather than be 'slowed down' - instead you will see glitches imposed on the signal whenever the loop can't keep up. This is easy to test and maybe it will be adequate for your needs.
Use a real-time operating system instead of a desktop OS. In the case of LabVIEW this would mean using the Real-Time software module and either a National Instruments hardware device that supports RT, such as the CompactRIO series, or installing the RT OS on a dedicated PC if the hardware is compatible. This is not a cheap option, obviously (unless it's strictly for personal, home use). In any case you would need to have an RT-compatible driver for your output device.
Use your computer's sound output as the output device. LabVIEW has functions for buffered sound output and you should be able to get reliable results. You'll need to upsample your signal to one of the sound output's available sample rates, probably 44.1 ks/s. The drawbacks are that the output level is limited in range and is not calibrated, and will probably be AC-coupled so you can't output a DC or very low-frequency signal. However if the level is OK for what you want to connect it to, or you can add suitable signal conditioning, this could be a neat solution. If you need the output level to be calibrated you could simultaneously measure it with your DAQ card and scale the sound waveform you're outputting to keep it correct.
The answer to your question is "not on a desktop computer." This is why products like LabVIEW Real-Time and dedicated deterministic hardware exist: you need a computer built around dedication to a particular process in order to consistently serve that process. Every application in a regular Windows/Mac/Linux desktop system has the problem you are seeing of potentially being interrupted by other system processes, particularly in its UI layer.
There is no way to prevent cpu load changes from affecting timing in your program unless the computer has a realtime clock.
If it doesn't have a realtime clock, there is no reason to expect it to behave deterministically. Do you need for your program to run at that pace?
having trouble understanding the exact role of an interpreter. to quote wikipedia - "Programs in interpreted languages[1] are not translated into machine code however, although their interpreter (which may be seen as an executor or processor) typically consists of directly executable machine code (generated from assembly and/or high level language source code)."
my doubt is about this statement - "interpreter (which may be seen as an executor or processor) typically consists of directly executable machine code" ? what does that mean? interpreter is supposed to be a program .How can it 'execute' code by itself ? they have re-stated this fact by saying " interpreter is different from language translators like compilers". Can anyone clarify please ? Also what is the difference (if any) between interpreted language and machine code ?
Compiler:
Transforms your code into binary machine code which can be directly executed by the CPU. Example: C, Fortran
Interpreter:
Is a program that executes the code written by the programmer without an additional step of transformation. Example: Bash scripts, Formulas in Excel
Actually it is not that easy any more. There are many concepts between these two pols. Java is compiled into an intermediate language that is then interpreted, just-in-time compilers compile small parts of interpreted code to speed them up.
"How can it 'execute' code by itself?" Take the Excel example. If you type a calculation into a cell, Excel somehow executes the code, right? But Excel does not compile the code and run it, but it parses it and executes in a general way. Excel has a sum function that in the end is executed on the processor as an add machine command, but there is a lot to do for Excel in between.
I will briefly describe an emulator to explain the main concept mentioned in the question.
Suppose I am using Mame, a video game emulator, and select the old classic arcade "Miss PacMan". Looking at the schematic or looking directly at a PCB inside an arcade video game, it is easy to find the processor : the zilog Z80, the only large chip with 40 pins. Now, if we get the technical data for that processor, we can find the binary encoding for each instruction it can execute. Basically, it get a 8-bit data (value ranging from 0 to 255) which tells the processor what to do. In the case of the emulator, it read the byte (the exact same bytes as would do the Z80 processor inside the original miss pac-man electronic board), determine what a Z80 would do and simulate the instruction.
Some classic video game may have use a x86 processor, similar to the one currently used in most PC. Even when selecting such a game in Mame, the emulator would still read the bytes as found in that game and interpret each one the way the x86 processor would do. In other words, the emulator would not take advantage of the fact that the PC and the emulated game are using a similar processor. It would perform the same steps to emulate any game no matter if the PC on which Mame is running share any similitude with the original game.
You are asking how an interpreter could execute code? The interpreter is a program (the interpreter is just a software, not a physical processor). The wording is effectively confusing. For this sentence to make sense, we would need all the following conditions:
1 - the program to interpret is already in binary, in a machine language that can be executed directly by the processor used in your PC
2 - the program location, the exact address used, is the same as the location that you can reserve in your PC
3 - any library and any I/O occupy the exact same address
When all these condition can be meet, the interpreter could just tell the processor on your PC to stop executing the code from the interpreter but instead, "jump" in the code of the program to be interpreted. Anyone could then say : it is not an interpreter, it is just a launcher.
Maybe such an interpreter which actually does not interpret but let your processor do the real job is still useful in the following way: it could let your processor perform some of the work, but request the generation of an exception when the code to be interpreted is executing some type of instruction. For example, let the code running, but generate a "general protection error" or "trap" or "exception" when trying to execute any of the variant of "IN" or "OUT". The interpreter would take note of the I/O port being written or it would choose a value to give instead of allowing to read a real I/O port. The interpreter would then manage to get the processor "jump" in the program to interpret at the location just after the instruction "IN" or "OUT".
Normally, an interpreter read an ASCII text file, the original source code (which could be Unicode instead of ASCII), determine line by line, word by word, what a compiler would do, then simulate the task on the fly. When the original compiler would need to read many lines to fully understand the current task, the interpreter would also need to read all these lines before being able to simulate the same task.
A big advantage of an interpreter is that it can not crash. Because every instruction is simulated, it is not sensitive to any bug or malicious code. That was a big advantage at the time when computers needed to reboot after encountering any bug, at a time where reboot was taking 10 minutes or more.
Today, with fast SSD to reboot in 5 second and with reliable operating systems which can trap any error in one process and close that process without affecting the stability of the machine, there is less incentive to prefer a slow interpreter over a much faster JIT or much much faster binary executable
I am somewhat familiar with the CUDA visual profiler and the occupancy spreadsheet, although I am probably not leveraging them as well as I could. Profiling & optimizing CUDA code is not like profiling & optimizing code that runs on a CPU. So I am hoping to learn from your experiences about how to get the most out of my code.
There was a post recently looking for the fastest possible code to identify self numbers, and I provided a CUDA implementation. I'm not satisfied that this code is as fast as it can be, but I'm at a loss as to figure out both what the right questions are and what tool I can get the answers from.
How do you identify ways to make your CUDA kernels perform faster?
If you're developing on Linux then the CUDA Visual Profiler gives you a whole load of information, knowing what to do with it can be a little tricky. On Windows you can also use the CUDA Visual Profiler, or (on Vista/7/2008) you can use Nexus which integrates nicely with Visual Studio and gives you combined host and GPU profile information.
Once you've got the data, you need to know how to interpret it. The Advanced CUDA C presentation from GTC has some useful tips. The main things to look out for are:
Optimal memory accesses: you need to know what you expect your code to do and then look for exceptions. So if you are always loading floats, and each thread loads a different float from an array, then you would expect to see only 64-byte loads (on current h/w). Any other loads are inefficient. The profiling information will probably improve in future h/w.
Minimise serialization: the "warp serialize" counter indicates that you have shared memory bank conflicts or constant serialization, the presentation goes into more detail and what to do about this as does the SDK (e.g. the reduction sample)
Overlap I/O and compute: this is where Nexus really shines (you can get the same info manually using cudaEvents), if you have a large amount of data transfer you want to overlap the compute and the I/O
Execution configuration: the occupancy calculator can help with this, but simple methods like commenting the compute to measure expected vs. measured bandwidth is really useful (and vice versa for compute throughput)
This is just a start, check out the GTC presentation and the other webinars on the NVIDIA website.
If you are using Windows... Check Nexus:
http://developer.nvidia.com/object/nexus.html
The CUDA profiler is rather crude and doesn't provide a lot of useful information. The only way to seriously micro-optimize your code (assuming you have already chosen the best possible algorithm) is to have a deep understanding of the GPU architecture, particularly with regard to using shared memory, external memory access patterns, register usage, thread occupancy, warps, etc.
Maybe you could post your kernel code here and get some feedback ?
The nVidia CUDA developer forum forum is also a good place to go for help with this kind of problem.
I hung back because I'm no CUDA expert, and the other answers are pretty good IF the code is already pretty near optimal. In my experience, that's a big IF, and there's no harm in verifying it.
To verify it, you need to find out if the code is for sure not doing anything it doesn't really have to do. Here are ways I can see to verify that:
Run the same code on the vanilla processor, and either take stackshots of it, or use a profiler such as Oprofile or RotateRight/Zoom that can give you equivalent information.
Running it on a CUDA processor, and doing the same thing, if possible.
What you're looking for are lines of code that have high occupancy on the call stack, as shown by the fraction of stack samples containing them. Those are your "bottlenecks". It does not take a very large number of samples to locate them.
I'm writing a Cocoa OS X (Leopard 10.5+) end-user program that's using timestamps to calculate statistics for how long something is being displayed on the screen. Time is calculated periodically while the program runs using a repeating NSTimer. [NSDate date] is used to capture timestamps, Start and Finish. Calculating the difference between the two dates in seconds is trivial.
A problem occurs if an end-user or ntp changes the system clock. [NSDate date] relies on the system clock, so if it's changed, the Finish variable will be skewed relative to the Start, messing up the time calculation significantly. My question:
1. How can I accurately calculate the time between Start and Finish, in seconds, even when the system clock is changed mid-way?
I'm thinking that I need a non-changing reference point in time so I can calculate how many seconds has passed since then. For example, system uptime. 10.6 has - (NSTimeInterval)systemUptime, part of NSProcessInfo, which provides system uptime. However, this won't work as my app must work in 10.5.
I've tried creating a time counter using NSTimer, but this isn't accurate. NSTimer has several different run modes and can only run one at a time. NSTimer (by default) is put into the default run mode. If a user starts manipulating the UI for a long enough time, this will enter NSEventTrackingRunLoopMode and skip over the default run mode, which can lead to NSTimer firings being skipped, making it an inaccurate way of counting seconds.
I've also thought about creating a separate thread (NSRunLoop) to run a NSTimer second-counter, keeping it away from UI interactions. But I'm very new to multi-threading and I'd like to stay away from that if possible. Also, I'm not sure if this would work accurately in the event the CPU gets pegged by another application (Photoshop rendering a large image, etc...), causing my NSRunLoop to be put on hold for long enough to mess up its NSTimer.
I appreciate any help. :)
Depending on what's driving this code, you have 2 choices:
For absolute precision, use mach_absolute_time(). It will give the time interval exactly between the points at which you called the function.
But in a GUI app, this is often actually undesirable. Instead, you want the time difference between the events that started and finished your duration. If so, compare [[NSApp currentEvent] timestamp]
Okay so this is a long shot, but you could try implementing something sort of like NSSystemClockDidChangeNotification available in Snow Leopard.
So bear with me here, because this is a strange idea and is definitely non-derterministic. But what if you had a watchdog thread running through the duration of your program? This thread would, every n seconds, read the system time and store it. For the sake of argument, let's just make it 5 seconds. So every 5 seconds, it compares the previous reading to the current system time. If there's a "big enough" difference ("big enough" would need to definitely be greater than 5, but not too much greater, to account for the non-determinism of process scheduling and thread prioritization), post a notification that there has been a significant time change. You would need to play around with fuzzing the value that constitutes "big enough" (or small enough, if the clock was reset to an earlier time) for your accuracy needs.
I know this is kind of hacky, but barring any other solution, what do you think? Might that, or something like that, solve your issue?
Edit
Okay so you modified your original question to say that you'd rather not use a watchdog thread because you are new to multithreading. I understand the fear of doing something a bit more advanced than you are comfortable with, but this might end up being the only solution. In that case, you might have a bit of reading to do. =)
And yeah, I know that something such as Photoshop pegging the crap out of the processor is a problem. Another (even more complicated) solution would be to, instead of having a watchdog thread, have a separate watchdog process that has top priority so it is a bit more immune to processor pegging. But again, this is getting really complicated.
Final Edit
I'm going to leave all my other ideas above for completeness' sake, but it seems that using the system's uptime will also be a valid way to deal with this. Since [[NSProcessInfo processInfo] systemUptime] only works in 10.6+, you can just call mach_absolute_time(). To get access to that function, just #include <mach/mach_time.h>. That should be the same value as returned by NSProcessInfo.
I figured out a way to do this using the UpTime() C function, provided in <CoreServices/CoreServices.h>. This returns Absolute Time (CPU-specific), which can easily be converted into Duration Time (milliseconds, or nanoseconds). Details here: http://www.meandmark.com/timingpart1.html (look under part 3 for UpTime)
I couldn't get mach_absolute_time() to work properly, likely due to my lack of knowledge on it, and not being able to find much documentation on the web about it. It appears to grab the same time as UpTime(), but converting it into a double left me dumbfounded.
[[NSApp currentEvent] timestamp] did work, but only if the application was receiving NSEvents. If the application went into the foreground, it wouldn't receive events, and [[NSApp currentEvent] timestamp] would simply continue to return the same old timestamp again and again in an NSTimer firing method, until the end-user decided to interact with the app again.
Thanks for all your help Marc and Mike! You both definitely sent me in the right direction leading to the answer. :)