Query Hardware-Specific Information on Windows With C++ - hardware

Specifically, I want to query a system's GPU for the following:
The name of the GPU, the series (e.g. ATI Radion 5800, NVIDIA GeForce 4 MX, etc.), the BIOS version, the driver version, the GPU clock speed, the GPU memory speed, the memory type, the memory size, the bus width, the bandwidth, the type of bus being used, the vendor.
Any ideas? The application I'm developing just has to display this information to the user.

I suggest querying WMI, using the following WMI objects:
Win32_DisplayConfiguration
- name of GPU
Win32_VideoController
- DAC type, speed
- video RAM size
also try:
CIM_VideoBIOSFeature
CIM_VideoBIOSElement

Related

GPUDirect RDMA out of range pin address by Quadro p620

I want to implement FPGA-GPU RDMA by nvidia quadro p620.
Also, I used common PCIe BAR resources(BAR0 - BAR1 - BAR2) for FPGA registers and other chunk controllers handling which is independent from RDMA in my custom driver.
PCIe managements are OK but direct memory access to GPU ram which is pinned are always wrong. Precisely, i always get 64KB pinned addresses starting from 2955739136 (~2.7GB) by using nvidia_p2p_get_pages() API without any errors but the point is that quadro p620 ram capacity is just 2GB!.
The virtual address obtained by cuMemAlloc() change every time (which is correct) and i pass this address, together the allocated size, to my driver by ioctl sys-call. Also, i linked my custom driver to nvidia driver as the nvidia GPUDirect RDMA document is said.
Well, every things sounds OK, but the physical addresses are out of range!. Why? Does it requirement to have the qudro GPU equal or over 4GB ram address?
I expect to find the right solution to get the correct physical addresses and then DMA data by FPGA bus master.
Thanks
P.S. before this i implemented FPGA direct memory access to system ram over PCIe without any problems.

Why are the GPU utilization rates different for the same data?

On two PCs, the exact same data is exported from “Cinem4D” with the “Redshift” renderer.
Comparing the two, one uses the GPU at 100% while the other uses very little (it uses about the same amount of GPU memory).
Cinema4D, Redshift and GPU driver versions are the same.
GPU is RTX3060
64GB memory
OS is windows 10
M2.SSD
The only difference is the CPU.
12th Gen intel core i9-12900K using GPU at 100%
AMD Ryzen 9 5950 16 Core on the other
is.
Why is the GPU utilization so different?
Also, is it possible to adjust the PC settings to use 100%?

CPU and GPU memory sharing

If the (discrete) GPU has its own video RAM, I have to copy my data from RAM to VRAM to be able to use them. But if the GPU is integrated with the CPU (e.g. AMD Ryzen) and shares the memory, do I still have to make copies, or can they both alternatively access the same memory block?
It is possible to avoid copying in case of integrated graphics, but this feature is platform specific, and it may work differently for different vendors.
How to Increase Performance by Minimizing Buffer Copies on Intel® Processor Graphics article describes how to achieve this for Intel hardware:
To create zero copy buffers, do one of the following:
Use CL_MEM_ALLOC_HOST_PTR and let the runtime handle creating a zero copy allocation buffer for you
If you already have the data and want to load the data into an OpenCL buffer object, then use CL_MEM_USE_HOST_PTR with a buffer allocated at a 4096 byte boundary (aligned to a page and cache line boundary) and a total size that is a multiple of 64 bytes (cache line size).
When reading or writing data to these buffers from the host, use clEnqueueMapBuffer(), operate on the buffer, then call clEnqueueUnmapMemObject().
GPU and CPU memory sharing ?
GPU have multiple cores without control unit but the CPU controls the GPU through control unit. dedicated GPU have its own DRAM=VRAM=GRAM faster then integrated RAM. when we say integrated GPU its mean that GPU placed on same chip with CPU, and CPU & GPU used same RAM memory (shared memory ).
References to other similar Q&As:
GPU - System memory mapping
Data sharing between CPU and GPU on modern x86 hardware with OpenCL or other GPGPU framework

Query amount of VRAM or GPU clock speed

I am writing code to pick a physical device, but I want to put in some logic to prefer newer devices (more VRAM or higher clock speed) in case multiple ones fit my minimum feature requirements.
Is this possible?
Vulkan has no specific API calls to get such GPU details, for that you'd need to go with vendor specific APIs like NVAPI. The only hint may be deviceType member of VkPhysicalDeviceProperties that returns whether it's an integraed, discrete or virtual GPU.
The VRAM size though can be determined by finding the memory heap with the DEVICE_LOCAL bit set using vkGetPhysicalDeviceMemoryProperties. The VkPhysicalDeviceMemoryProperties returned by that function contains all available memory heaps in the memoryHeaps member. The configuration differs esp. between discrete and integrated GPUs, so this may not always be what you're looking for, e.g. on integrated GPUs with shared memory.
Heaps for a discrete GPU: http://vulkan.gpuinfo.org/displayreport.php?id=1432#memoryheaps
Heaps for an integrated GPU: http://vulkan.gpuinfo.org/displayreport.php?id=1200#memoryheaps

How to enable crossfire on AMD Radeon Pro DUO

I am using AMD Radeon Pro duo for my application in opencl.
It has a Dual Fiji GPUs, How can i configure Cross Fire to make them work as one device. I am using clgetdeviceinfo in opencl for checking the device compute units but it's showing 64 for each fiji GPU.
I have total 128 compute units in two GPUS, How to use all of them by using Crossfire.
OpenCL has device fission but not device fusion. Devices can share memory for efficiency but shaders can't be joined.
There are also some functions that can't synchronize between two GPUs yet:
Atomic functions in kernels
Prefetch command(which GPUs global cache?)
clEnqueueAcquireGLObject(which GPU's buffer?)
clCreateBuffer (which device memorry does it choose? we can't choose.)
clEnqueueTask (where does this task go?)
You should partition the encoding work in two pieces and run on both GPUs. This may even need cross-fire to be disabled if drivers have problems with it. This shouldn't be harder than writing a GPGPU encoder.
But you may need to copy data to only one of the devices, then copy half of data to other GPU from that buffer, instead of passing through pci-e twice. The inter-GPU connection must be faster than pci-e.