I'm building a workstation and want to get into some heavy CUDA programming. I don't want to go all out getting the Tesla cards and have pretty much narrowed it down to either the Quadro 4000 and the GeForce 480, but I don't really understand the difference, on paper it looks like the 480 has more cores 480 vs 256 for the 4000, but the 4000 is almost twice as much the 480 in price. Does someone understand the difference here to justify the higher price.
I will be doing scientific computing on it, so everything will be in double precision, if that makes a difference between them.
If you neither care about visualization nor rendering (drawing final results on screen e.g. raytracing) than the answer to your question is slightly more simple, but not trivial.
I'm not going to go into detail about the differences between Quadro and GeForce cards, but I will just underline the significant points which can contribute in choosing between them.
In general:
If you need lots of memory than you need Tesla or Quadro. Consumer cards ATM have max 1.5 Gb (GTX 480) while Teslas and Quadros up to 6 Gb.
GF10x series cards have their double precision (FP64) performance capped at 1/8-th of the single precision (FP32) performance, while the architecture is capable of 1/2. Yet another market segmentation trick, quite popular nowadays among hardware manufacturers. Crippling the GeForce line is meant to give the Tesla line an advantage in HPC; GTX 480 is in fact faster than Tesla 20x0 - 1.34TFlops vs 1.03 TFlops, 177.4 Gb vs 144 Gb/sec (peak).
Tesla and Quadro are (supposed to be) more thoroughly tested and therefore less prone to produce errors that are pretty much irrelevant in gaming, but when it comes to scientific computing, just a single bit flip can trash the results. NVIDIA claims that Tesla cards are QC-d for 24/7 use.
A recent paper (Haque and Pande, Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU) suggests that Tesla is indeed less error prone.
My experience is that GeForce cards tend to be less reliable, especially at constant hight load. Proper cooling is very important, as well as avoiding overclocked cards including factory overclokced models (see Figure 1 of the previously mentioned paper).
So as a rule of thumb:
for development: GeForce (unless you absolutely need >1.5 Gb memory)
for production HPC/scientific computing:
Tesla: if you need lots of memory or FP64 (+reliability?)
Quadro if need FP64 and/or also need advanced rendering features (the new "Fermi" Teslas have similar rendering capabilities as a GeForce)
If you want to use FP64 intensively, forget about GeForce, otherwise
**non factory-overclocked* GeForce*: saves money ;)
Back to the specifics of your question:
The two cards you mention are from entirely different league and therefor not directly comparable. If you need the Quadro's rendering features get a Quadro. Otherwise, Quadro is not really worth it especially not the 4000 which is even slower than a GTX 460 while it costs ~3.5x more. I think you're better off with a GTX 470 or 480, just make sure that you buy the ones with standard frequencies.
Note that the crippled GeForce double precision performance is not an issue in this comparison, but let me elaborate. As the Quadro 4000 is a low-end model with AFAIR only 450 MHz shaders (I can't find the reference ATM, but it should be definitely lower than the 5000 which is clocked at 513 MHz) which gives it around 115 GFlops FP64. At the same time, the capped GTX 480 is around 168 GFlops FP64 and even a GTX 460 is around 113 GFlops (peak).
Both the FP32 performance and memory bandwidth is much lower on the Quadro 4000 comapred to the GTX 480 (86.9 vs 177.4 GB/s)!
Note, that from the point of view of theoretical peak performance the GTX 480 (data sheet) is considerably faster than both Tesla C2050/2070 and Quadro 6000 which is reflected in most applications.
There are some small advantages to the Quadro/Tesla cards not mentioned above:
ECC memory. Going along with the point about fewer bit errors, the Tesla cards and the high end Quadros (5000 and 6000, not .the 4000) have ECC memory, which should decrease the rate of soft errors.
Number of slots (and some related power and cooling issues). The Quadro 4000 is a single slot card. The Quadro 2000 is a single slot three quarter length card. While the GeForce GTX 480, 470, and even 460 can outdo the Quadro 4000 in single precision, you're not going to find them in a single slot. That means that if you putting it in a 1U server rack, or a blade, or if you want to have 6 GPUs working in parallel in a single server for GPGPU work, you can do some interesting things that aren't easy with the GeForce range. If you can massively reduce the number of blades or rack space that you need to take up, then the extra price for each individual card is well worth it. All of this is related to the binning of the chips.
Certainly these advantages don't make any difference for most people. For certain uses, though, they're critical.
The advantages of a "gamer" GPU (GeForce GTX series, like GTX 780) over "professional" GPU (Tesla series and Quadro series) for CUDA programming are:
GTX has better single precision performance
GTX has higher memory bandwidth
GTX costs much less
But
Quadro and Tesla generally have more memory
Quadro and Tesla offer better double precision performance
Some Quadro GPUs take only one PCI-E slot instead of two (as mentioned in another answer).
Clearly the choice of GPU depends on what you need in your application. But I think that for most applications GTX is a better choice. For example, in many image processing applications single precision is enough and GTX is clearly a better choice considering it performance and price. For example, in this article written by main developers of OpenCV GPU library the authors used NVidia GTX 580 for bench-marking their results against the CPU. I'd say go with Quadro or Tesla if you need better double precision performance or more memory.
It's not obvious looking at the specs, but I think that your need for double precision suggests the Quadro 4000 is a better match. Although the GeForce 480 has more cores and twice as much memory bandwidth, at its heart it's a gaming card. Quadros are targeted at professional work, and better supported as a result. Also, the fact that the Quadro can do 64x antialiasing (vs. 32x on the GeForce) suggests a more capable card.
Related
I am in the process of buying a GPU for production deploy. I am thinking of whether to buy an off the shelf gpu like the Nvidia GTX 1060 TI or spend more money and buy the Jetson TX2... space is not a constraint for us. But with some googling around i couldn't get a comparison on both.
How should i go about comparing these two gpus? Price wise GTX 1060 is much cheaper than Jetson TX2.
The specs of they two are quite accessible online, so I won't copy and paste them here.
Specs of TX2 and Specs of GTX 1060
In short, TX2 is a good choice, if:
you are power consumption sensitive;
you are computational power insensitive;
the data throughput is not too heavy;
you have no problem with developing in Ubuntu environment;
Otherwise, GTX1060 is the one you should go for, if:
you already have a PC as host device;
this PC has PCI slots and sufficient power supply;
you may develop in Windows environment;
the data throughput is heavy;
the computational requirement is tough;
In my opinion, one should always consider from a system's point of view, finding the optimal point for the banlance amongst computational power, power consumption, software integration, I/O speed, and so on.
I am part of a team working on a 3D game engine which has a vulkan rendering system. So far we have been testing on NVIDIA graphics cards, like the GTX 970 and have had decent performance.
But recently we tested a scene on an AMD card and got really low fps:
For example, rendering a sponza scene:
AMD R9 Fury: 5 fps
NVIDIA GeForce GTX 970: 64 fps
The NVIDIA fps is not great, but much better than on AMD.
Do you guys have any idea what could be causing this difference in fps on the AMD card?
Or do you know how I could go about isolating what is causing the low fps on the AMD card?
Thanks in advance for your help.
AMD drivers have issues when accessing numerous vkDeviceMemory values per submission. This is particularly a problem on Windows 7/8, which do not have WDDM 2.0. In fact, if you use too many (~1000) on Window 7, it is easy to reproduce a BSOD. Nvidia drivers seem to be doing something behind the scenes, and aren't subject to these limitations. However, as a result, their driver implementation may be hiding some opportunity for optimization from the user.
Regardless, the recommendation is to pool your memory allocations, such that VkImage and VkBuffers are allocated from the same segmented vkDeviceMemory. There is a open source library, called Vulkan Memory Allocater which attempts to aid in implementing this behavior (and it is suspiciously authored by AMD!).
I am thinking about getting a Vive and I wanted to check if my PC can handle it. My motherboard and processor are pretty old (Asus M4A79XTD EVO ATX AM3 and AMD Phenom II X4 965 3.4GHz respectively) but I recently upgraded to a GeForce GTX 980 Ti graphics card.
When I ran the Steam VR test program, I was expecting it to say that my graphics card was OK but that my CPU was a bit too slow. Actually, it's the other way round. Screenshot of steamVR.
Your system isn't capable of rendering low quality VR and it appears to be >mostly bound by its GPU.
We recommend upgrading your Graphics Card
I've made sure I have updated my NVidia drivers.
When I look in GeForce Experience, I get the picture I was expecting to see:
GeForce Experience screenshot. It thinks my graphics card is OK but my processor doesn't meet the minimum spec.
But, since the Steam VR test is actually rendering stuff, whereas the GeForce experience is just going by the hardware I've got, it makes we think that my GPU should be capable but something about my setup is throttling it.
I'd love to know what the problem might be. Perhaps because I'm using an NVidia card in an AMD chipset MB?
Well, I never found out exactly what the cause was but the problem is now resolved. I bought a new Motherboard, processor and RAM but kept the graphics card. After getting everything booted up, the system is reporting "high-quality VR" for both CPU and graphics card.
So, for whatever reason, it does seem like the MB/processor was throttling the graphics card in some way.
Steam VR only tests if your rig is able to keep steady frames over 75fps. I can run VR on my laptop and it's only got a GTX 960m. My CPU is a little more up to date. I7 6700k 16gb of ddr4. I also have a buddy able to run VR on a 780ti.
Typical GPUs today are mostly 32-bit-oriented. While they can do double precision, the ALUs take 32-bit integers basically, thread indices and grid sizes are 32-bit, and (I'm assuming) pseudo-pointers correspond to 32-bit unsigned physical addresses as well.
However, some GPUs (Teslas, GTX Titans) come with 6GB, 8GB, 12GB of memory.
Well, how does that work? I mean, can you address more than 4GB at once? If so, how? Can you do a[i] = 123 with i being of type unsigned long int? Or is it some segment-offset thing like in the good old days of 8086? Or maybe each kernel individually can address only 4GB, but different kernels can address more?
Well, it turns out that GPU pointers (at least in NVIDIA GPUs and probably in AMD's as well) are 64-bits. So there's no problem with addressing 4GB, 40 GB, 400 GB or 4 Million GB. It's only for 32-bit platforms that there might be legacy support for 32-bit pointers.
I am interested to try out GPU programming. One thing not clear to me is, what hardware do I need? Is it right any PC with graphics card is good? I have very little knowledge of GPU programming, so the starting learning curve is best not steep. If I have to make a lot of hacks just in order to run some tutorial because my hardware is not good enough, I'd rather to buy a new hardware.
I have a retired PC (~10 year old) installed with Ubuntu Linux, I am not sure what graphics card it has, must be some old one.
I am also planning to buy a new sub-$500 desktop which to my casual research normally has AMD Radeon 7x or Nvidia GT 6x graphics card. I assume any new PC is good enough for the programming learning.
Anyway any suggestion is appreciated.
If you want to use CUDA, you'll need a GPU from NVidia, and their site explains the compute capabilities of their different products.
If you want to learn OpenCL, you can start right now with an OpenCL implementation that has a CPU back-end. The basics of writing OpenCL code targeting CPUs or GPUs is the same, and they differ mainly in performance tuning.
For GPU programming, any AMD or NVidia GPU made in the past several years will have some degree of OpenCL support, though there have been some new features introduced with newer generations that can't be easily emulated for older generations.
Intel's integrated GPUs in Ivy Bridge and later support OpenCL, but Intel only provides a GPU-capable OpenCL implementation for Windows, not Linux.
Also be aware that there is a huge difference between a mid-range and high-end GPU in terms of compute capabilities, especially where double-precision arithmetic is supported. Some low-end GPUs don't support double-precision at all, and mid-range GPUs often perform double-precision arithmetic 24 times slower than single-precision. When you want to do a lot of double-precision calculations, it's absolutely worth it to get a compute-oriented GPU (Radeon 7900 series or GeForce Titan and up).
If you want a low-end system with a non-trivial GPU power, you best bet at the moment is probably to get a system built around an AMD APU.