I am running TensorFlow for Windows with a Titan X GPU (12 GB memory). When I try to train a network for images of 256X256X1 with mini-batches larger than 50 images, my computer just crashes and restarts automatically. With smaller mini-batches it runs just fine.
Any clues on what might be causing this?
I've seen similar problems being discussed in some gaming forums, where the PC would just shut down when the GPU was under heavy load. The reason was usually that the GPU was drawing more power than the power supply unit could handle. Check e.g. here or here. So may be it's worth investigating whether your PSU is the culprit.
Edit: May be the program SpeedFan can help you debugging this - it is able to show both voltages and readings of temperature sensors, which would also tell you if your PC is overheating (I've never used the tool myself, and I'm not affiliated with it either, just found it online).
Related
When playing certain games or viewing certain websites, my computer will suddenly crash and my monitor will display "HDMI no signal" the computer cannot be restarted without unplugging it from the wall. Upon viewing the crash report I see event 10016 related to permissions I think, but I'm a moron. Any and all solutions are greatly appreciated. Relevant components are as follows:
Graphics Card: RTX 2080
Power supply: EVGA supernova 1000g2
Storage: Sandisk 500Gb
CPU: Ryzen 2700X
Monitor: Both HP EliteDisplay E222 and another HP monitor
Since you are not supplying your q with the crash report, I can only suspect your problem is rooted to either one of these:
Bug in the accompanying display driver and/or directX installation
Proposed solution : try and obtain the latest version of your RTX 2080, do a 2D and 3D test run afterwards to ensure everythings proper
Fan or cooling related issue. Some games might force your hardwares to work harder, especially over continuous use. Check your fan and coolings to ensure they are moving and cooling as fast as they should. Also install a temp monitoring software if you need to be extra sure.
Hope those help m8
I have a couple questions on hardware for a Deep Learning project I'm starting, I intend to use pyTorch for Neural Networks.
I am thinking about going for an 8th Gen CPU on a z390 (I'll wait month to see if prices drop after 9th gen CPU's are available) so I still get a cheaper CPU that can be upgraded later.
Question 1) Are CPU cores going to be beneficial would getting the latest Intel chips be worth the extra cores, and if cores on CPU will be helpful, should I just go AMD?
I am also thinking about getting a 1080ti and then later on, once I'm more proficient adding two more 2080ti's, I would go for more but it's difficult to find a board to fit 4.
Question 2) Does mixing GPU's effect parallel processing, Should I just get a 2080ti now and then buy another 2 later. And a part b to this question do the lane speeds matter, should I spend more on a board that doesn't slow down the PCIe slots if you utilise more than one.
Question 3) More RAM? 32GB seems plenty. So 2x16gb sticks with a board that can has 4 slots up to 64gb.
The matter when running multi GPU is also the number of available PCIe lanes. If you may go for up to 4 GPUs, I'd go for AMD Threadrippers for the 64 PCIe lanes.
For machine learning in a general manner, core & thread count is quite important, so TR is still a good option, depending on the budget of course.
Few poeple mention that running an instance on each GPU may be more interesting, if you do so, mising GPUs is not a problem.
32GB of ram seems good, no need to go for 4 sticks if your CPU does not support quad channel indeed.
I'm running a tensorflow code on an Intel Xeon machine with 2 physical CPU each with 8 cores and hyperthreading, for a grand total of 32 available virtual cores. However, I run the code keeping the system monitor open and I notice that just a small fraction of these 32 vCores are used and that the average CPU usage is below 10%.
I'm quite the tensorflow beginner and I haven't configured the session in any way. My question is: should I somehow tell tensorflow how many cores it can use? Or should I assume that it is already trying to use all of them but there is a bottleneck somewhere else? (for example, slow access to the hard disk)
TensorFlow will attempt to use all available CPU resources by default. You don't need to configure anything for it. There can be many reasons why you might be seeing low CPU usage. Here are some possibilities:
The most common case, as you point out, is the slow input pipeline.
Your graph might be mostly linear, i.e. a long narrow chain of operations on relatively small amounts of data, each depending on outputs of the previous one. When a single operation is running on smallish inputs, there is little benefit in parallelizing it.
You can also be limited by the memory bandwidth.
A single session.run() call takes little time. So, you end up going back and forth between python and the execution engine.
You can find useful suggestions here
Use the timeline to see what is executed when
Where I work, we do a lot of numerical computations and we are considering buying workstations with NVIDIA video cards because of CUDA (to work with TensorFlow and Theano).
My question is: should these computers come with another video card to handle the display and free the NVIDIA for the GPGPU?
I would appreciate if anyone knows of hard data on using a video card for display and GPGPU at the same time.
Having been through this, I'll add my two cents.
It is helpful to have a dedicated card for computations, but it is definitely not necessary.
I have used a development workstation with a single high-end GPU for both display and compute. I have also used workstations with multiple GPUs, as well as headless compute servers.
My experience is that doing compute on the display GPU is fine as long as demands on the display are typical for software engineering. In a Linux setup with a couple monitors, web browsers, text editors, etc., I use about 200MB for display out of the 6GB of the card -- so only about 3% overhead. You might see the display stutter a bit during a web page refresh or something like that, but the throughput demands of the display are very small.
One technical issue worth noting for completeness is that the NVIDIA driver, GPU firmware, or OS may have a timeout for kernel completion on the display GPU (run NVIDIA's 'deviceQueryDrv' to see the driver's "run time limit on kernels" setting). In my experience (on Linux), with machine learning, this has never been a problem since the timeout is several seconds and, even with custom kernels, synchronization across multiprocessors constrains how much you can stuff into a single kernel launch. I would expect the typical runs of the pre-baked ops in TensorFlow to be two or more orders of magnitude below this limit.
That said, there are some big advantages of having multiple compute-capable cards in a workstation (whether or not one is used for display). Of course there is the potential for more throughput (if your software can use it). However, the main advantage in my experience, is being able to run long experiments while concurrently developing new experiments.
It is of course feasible to start with one card and then add one later, but make sure your motherboard has lots of room and your power supply can handle the load. If you decide to have two cards, with one being a low-end card dedicated to display, I would specifically advise against having the low-end card be a CUDA-capable card lest it get selected as a default for computation.
Hope that helps.
In my experience it is awkward to share a GPU card between numerical computation tasks and driving a video monitor. For example, there is limited memory available on any GPU, which is often the limiting factor in the size of a model you can train. Unless you're doing gaming, a fairly modest GPU is probably adequate to drive the video. But for serious ML work you will probably want a high-performance card. Where I work (Google) we typically put two GPUs in desk-side machines when one is to be used for numerical computation.
This might seem weird, but I'm interesting in creating an electric heater out of my computer, that is program an application, that heats up my PC, and I need some help.
I currently made an application, that runs infinite loops on the GPU (using a little shader), and on the CPU cores, however I'm interesting in getting the ram going too, as well as the several output ports, so.. About the ram heating, just allocate, and start randomly accessing and writing using all 8 cores?
And what about triggering CD-ROM, floppy etc, how do I do this?
How about heater with a purpose? Just run World Community Grid, create tons of heat while making your computer do valuable computations for science. It runs the processors wide open, is stable, and isn't just wasting cycles.
Have a look at How to stress test a computer If your interested in making your own try searching for open source stress test software that you could modify to your liking.
Use Furmark together with LinX/Prime95. Max out your settings. Make sure you have a strong enough PSU.
There`s a torture test option for CPU & RAM in Prime95 that looks like what you want. As for the GPU, there is Furmark which achieves the same kind of stress.
The heat from the other components will likely be not relevant (unless you have something really specific like a physx card) if you stress enough your cpu and gpu imho.