CUDA - nvidia driver crash while running - crash

I run a raytracer in CUDA with N Bounces (each ray will bounce N times).
I view the results using openGL.
once N is small (1~4) everything works great. once i make N big (~10) each thread (about 800x1000) has to do a lot of computing and this when the screen goes black, and than back on, with the note that my nvidia crash.
i searched online and think now that what cause it some sort of a watch-dog timer since i use the same graphic card for my display and my computing (computing takes more than 2 sec so the driver reset itself).
is there a command to make the host (cpu) WAIT for the device(gpu) for as long as it takes?
what do i need to do? im stuck :(
thanks

Based on your description, you are running on Windows Vista or Windows 7. Windows operating systems have a watchdog timer, as you guessed. The watchdog timer only applies to GPUs with displays attached.
The easiest solution is to run 2 or more GPUs, and run CUDA on GPU(s) without a display attached.
You can disable the watchdog timer. See this question for more details. However you should do so with care—remember that when you have a long running kernel on your primary display GPU you will make your computer completely unresponsive (at least you won't be able to see what it is doing) until the kernel completes.

Related

NVIDIA GPU slows down unexpectidely

We have an application taking data from a camera and processing it in real time through the gpu to render a scene. The gpu is an Nvidia RTX 3000 in a Lenovo laptop T15.
Our application is started and all goes well for a first session of rendering, FPS is tracking at 30, gpu power is at 70W, cpu is around 50%. One application session lasts for a few minutes and results are fully rendered in real time.
Then we initiate a second application session and the FPS plummets and rendering lags which is unacceptable for our application (in the health space). Power drops to 30W and the gpu appears to not be loaded at all neither is the cpu.
We tried moving part of the initialization code to when we detect the beginning of the session, it works better and let us handle a few sessions in a row, but eventually the behavior reappears after a while.
Temperature does not seem to be involved here so we do not think the gpu gets throttled for excess temperature.
any suggestion on where to look for and how to get a more deterministic behavior ? Anything specific we could reset between sessions?
thanks a lot for any help.

Advanced GPU Control needed for browserwindows

I would like to set how much GPU RAM is being used as to prevent the app from overflowing on windows. Does anyone have any expert advice?
I am using electron to build an automatic player for windows. This player plays a mix of video's (h264encoded mp4), html and jpeg based on a schedule (sort of like a presentation).
I tested the app on several windows devices, the results vary greatly!
All devices are tiny computers by Asus. In general I noticed 2 distinct differences:
On devices that have no hardware acceleration the chromium gpu process uses up about 30MB of shared RAM, this number never changes, regardless of the content played. The CPU however has all the load here, meaning it is decoding the mp4's (h264) with software instead of hardware.
On devices with hardware acceleration the cpu load is of course less, but the RAM memory used up by the chromium gpu-process varies greatly. While displaying jpeg or html the RAM is about 0.5GB, when mp4'skick in the RAM memory easily goes up to 2GB and more.
On the stronger devices without hardware acceleration this is not a big issue, they have 8GB of shared memory or more and don't crash. However some of the other devices have only 4GB of shared memory and can run out of memory quite easily.
The result of this lack of memory is either the app crashes completely (message with memory overflow is displayed) or the app just hangs (keeps running but doesn't do anything anymore, usually just displays a white screen).
I know that I can pass certain flags to browserwindow using app.commandLine.appendSwitch.
These are a few of the flags that I tried and the effect they had, I found a list of them here:
--force-gpu-mem-available-mb=600 ==> no effect whatsoever, process behaves as before and still surpasses 2GB of RAM.
--disable-gpu ==> This one obviously worked but is undesirable because it disabled hardware acceleration completely
--disable-gpu-memory-buffer-compositor-resources ==> no change
--disable-gpu-memory-buffer-video-frames ==> no change
--disable-gpu-rasterization ==> no change
--disable-gpu-sandbox ==> no change
Why are some of these command line switches not having any effect on the GPU behaviour? All devices have onboard GPU and shared RAM. I know the Command Line Switches are being used on startup because when I check the processes in the windows task manager I can see the switches have been past to the processes (using the the command line tab in the task manager). So the switches are loaded but still appear to be ignored.
I would like to set how much GPU RAM is being used as to prevent the app from overflowing on windows. Does anyone have any expert advice?

How does a CPU idle (or run below 100%)?

I first learned about how computers work in terms of a primitive single stored program machine.
Now I'm learning about multitasking operating systems, scheduling, context switching, etc. I think I have a fairly good grasp of it all, except for one thing. I have always thought of a CPU as something which is just charging forward non-stop. It always knows where to go next (program counter), and it goes to that instruction, etc, ad infinitum.
Clearly this is not the case since my desktop computer CPU is not always running at 100%. So how does the CPU shut itself off or throttle itself down, and what role does the OS play in this? I'm guessing there's an input on the CPU somewhere which allows it to power down... and the OS can set this if it has nothing to schedule, but the next logical question is how does it start back up again? I'm guessing either one of two things:
It never shuts down completely, just runs at a very low frequency waiting for the scheduler to get busy again
It shuts down completely but is woken up by interrupts
I searched all over for info on this and came up fairly empty-handed. Any insight would be much appreciated.
The answer is that is depends on the hardware, the operating system and the way that the operating system has been configured.
And it could involve either or both of the strategies you proposed.
Another possibility for machines based on the x86 architecture, is that x86 has an HLT instruction that causes the core to stop until it receives an external interrupt. So the "Idle" task could simply execute HLT in a tight loop.
Just go to task manager, performance tab, and watch the cpu usage while you're doing absolutely nothing on your computer. it never stops fluctuating. Having an operating system like windows running, the cpu is going to ALWAYS be functioning, it never completely shuts down.
Having your monitor display an image requires your cpu to process a function allowing it to display anything. etc.
Everything runs through the CPU, just like your brain, it controls everything. nothing would function without it.
Some CPUs do have a 'wait for interrupt' instruction which allows the CPU to stop executing instructions when there is nothing to do, and will not re-awake until there is an interrupt event. This is particularly useful in microcontrollers, where they can sit for long periods of time waiting for something to happen.
Intel = HLT (Halt)
ARM = WFI (Wait for interrupt)
Sometimes a 'busy wait' is also used, where the CPU sits in a little 'idle' loop, checking for things to do. In this case, the CPU is still running instructions, but the operating system is in an idle state. It's not as efficient as using a HLT.
Modern CPUs can also adjust their power usage, and are capable of reducing clock rates, or shutting down parts of the CPU that aren't being used. In this way, power usage during an active idle state can be less than during active processing, even though the core CPU is still running and executing instructions.
If speaking about x86 architecture when an operating system has nothing to do it can use HLT instruction.
HLT instruction stops the CPU till next interrupt.
See http://en.m.wikipedia.org/wiki/HLT for details.
Other architectures have similar instruction to give CPU a rest.

Improving the efficiency of Kinect for Windows DTWGestureRecognizer Application

Currently I am using the DTWGestureRecognizer open source tool for Kinect SDK v1.5. I have recorded a few gestures and use them to navigate through Windows 7. I also have implemented voice control for simple things such as opening PowerPoint, Chrome, etc.
My main issue is that the application uses quite a bit of my CPU power which causes it to become slow. During gestures and voice commands, the CPU usage sometimes spikes to 80-90%, which causes the application to be unresponsive for a few seconds. I am running it on a 64 bit Windows 7 machine with an i5 processor and 8 GB of RAM. I was wondering if anyone with any experience using this tool or Kinect in general has made it more efficient and less performance hogging.
Right now I removed sections which display the RGB video and the Depth video but even doing that did not make a big impact. Any help is appreciated, thanks!
Some of the factors I can think of are
Reduce the resolution.
Reduce the frames being recorded/processed by the application using polling model i.e. OpenNextFrame(int millisecondsWait) method of DepthStream, ColorStream & SkeletonStream
instead of event model.
Tracking mode is Default instead of Seated(sensor.SkeletonStream.TrackingMode =
SkeletonTrackingMode.Default) as seated consumes more resources.
Use sensor.MapDepthFrameToColorFrame instead of calling sensor.MapDepthToColorImagePoint method in a loop.
Last and most imp. is the algorithm used in the open source tool.

High resolution timer on Coldfire (MCF5328)

I've inherited a embedded project that requires some simple, per-function performance profiling. It consists of a Coldfire (MCF5328) running uClinux (2.6.17.7-uc1).
I'm not an expert on either the Coldfire, or uClinux (or Linux for that matter), so excuse my ignorance.
In Windows I would simply use QueryPerformanceCounter to access the x86 high-resolution timer. Record the counter before and after and compare the difference.
I've learned that Linux has a number of variations on QueryPerformanceCounter:
clock_gettime/res
getnstimeofday
ktime_x
Or even access to the Time Stamp Counter via
get_cycles
None of these are available on the uClinux build this device is running. So it appears that the OS has no high-resolution timer access.
Does this mean that the Coldfire itself provides no such feature? Or did the author of the uClinux port leave them out? Is there something on the hardware that I can use, and how would go about using it?
Given how old your kernel is, you may not have support for high-resolution timers.
If you are writing a kernel driver, the APIs are different. If get_cycles() is stubbed out, it probably means your CPU architecture doesn't support a cycle counter. Since your kernel is very old, do_gettimeofday is probably the best you can do, short of writing a driver to directly query some timer hardware.
I ended up using one of the four DMA Timers on the Coldfire. It was a simple matter to enable the timer as a free-running, non-interrupt generating, counter. This provides a 12.5ns counter (at 80Mhz).