Does the 32-bit addressing limit (of a 32-bit application) include dedicated video memory? - 32-bit

Can a 32-bit application utilize more than 4GB between system RAM and video RAM?
Context: Some games (for instance, Skyrim) are compiled under 32-bit architectures. I am running a 64-bit system with 16GB of DDR3 on the motherboard and 4GB of GDDR5 on the graphics card. Does the 32-bit architecture of the program limit its address capability to 4GB of total space, or is the graphics memory (which is on-board the graphics card) in a separate address space, thus neatly allowing the total sum of addressable memory to be greater than 4GB?
I ask because I have no way of knowing whether the paged amount in task manager and the amount of graphics memory used in GPU-Z are independent sets (which would seem to indicate that they are separately addressed, as the sum can be greater than 4GB), or if there is overlap between them.

It does not matters in which the game is compilled. It depends on your OS architecture. As you are uing 64 bit it will utilise all the memory

Related

Why does the amount of memory available to x86 applications fluctuate in vb.net? [duplicate]

Which is the maximum amount of memory one can achieve in .NET managed code? Does it depend on the actual architecture (32/64 bits)?
There are no hard, exact figure for .NET code.
If you run on 32 bit Windows; your process can address up to 2 GB, 3 GB if the /3GB switch is used on Windows Server 2003.
If you run a 64 bit process on a 64 bit box your process can address up to 8 TB of address space, if that much RAM is present.
This is not the whole story however, since the CLR takes some overhead for each process. At the same time, .NET will try to allocate new memory in chunks; and if the address space is fragmented, that might mean that you cannot allocate more memory, even though some are available.
In C# 2.0 and 3.0 there is also a 2G limit on the size of a single object in managed code.
The amount of memory your .NET process can address depends both on whether it is running on a 32/64 bit machine and whether or not it it running as a CPU agnostic or CPU specific process.
By default a .NET process is CPU agnostic so it will run with the process type that is natural to the version of Windows. In 64 bit it will be a 64 bit process, and in 32 bit it will be a 32 bit process. You can force a .NET process though to target a particular CPU and say make it run as a 32 bit process on a 64 bit machine.
If you exclude the large address aware setting, the following are the various breakdowns
32 bit process can address 2GB
64 bit process can address 8TB
Here is a link to the full breakdown of addressable space based on the various options Windows provides.
http://msdn.microsoft.com/en-us/library/aa366778.aspx
For 64 bit Windows the virtual memory size is 16 TB divided equally between user and kernel mode, so user processes can address 8 TB (8192 GB). That is less than the entire 16 EB space addressable by 64 bits, but it is still a whole lot more than what we're used to with 32 bits.
I have recently been doing extensive profiling around memory limits in .NET on a 32bit process. We all get bombarded by the idea that we can allocate up to 2.4GB (2^31) in a .NET application but unfortuneately this is not true :(. The application process has that much space to use and the operating system does a great job managing it for us, however, .NET itself seems to have its own overhead which accounts for aproximately 600-800MB for typical real world applications that push the memory limit. This means that as soon as you allocate an array of integers that takes about 1.4GB, you should expect to see an OutOfMemoryException().
Obviously in 64bit, this limit occurs way later (let's chat in 5 years :)), but the general size of everything in memory also grows (I am finding it's ~1.7 to ~2 times) because of the increased word size.
What I know for sure is that the Virtual Memory idea from the operating system definitely does NOT give you virtually endless allocation space within one process. It is only there so that the full 2.4GB is addressable to all the (many) applications running at one time.
I hope this insight helps somewhat.
I originally answered something related here (I am still a newby so am not sure how I am supposed to do these links):
Is there a memory limit for a single .NET process
The .NET runtime can allocate all the free memory available for user-mode programs in its host. Mind that it doesn't mean that all of that memory will be dedicated to your program, as some (relatively small) portions will be dedicated to internal CLR data structures.
In 32 bit systems, assuming a 4GB or more setup (even if PAE is enabled), you should be able to get at the very most roughly 2GB allocated to your application. On 64 bit systems you should be able to get 1TB. For more information concerning windows memory limits, please review this page.
Every figure mentioned there has to be divided by 2, as windows reserves the higher half of the address space for usage by code running in kernel mode (ring 0).
Also, please mind that whenever for a 32 bit system the limit exceeds 4GB, use of PAE is implied, and thus you still can't really exceed the 2GB limit unless the OS supports 4gt, in which case you can reach up to 3GB.
Yes, in a 32 bits environment you are limited to a 4GB address-space but Windows claims about half. On a 64 bits architecture it is, well, a lot bigger. I believe it's 4G * 4G
And on the Compact Framework it usually is in the order of a few hundred MB
I think other answers being quite naive, in real world after 2GB of memory consumption your application will behave really badly. In my experience GUIs generally go massively clunky, unsusable after lots of memory consumptions.
This was my experience, obviously actual cause of this can be objects grows too big so all operations on those objects takes too much time.
The following blog post has detailed findings on x86 and x64 max memory. It also has a small tool (source available) which allows easy easting of the different memory options:
http://www.guylangston.net/blog/Article/MaxMemory.

How did the first GPUs get support from CPUs?

I imagine CPUs have to have features that allow it to communicate and work with the GPU, and I can imagine this exists today, but in the early days of GPUs, how did companies get support from large CPU companies to have their devices be supported, and what features did CPU companies add to enable this?
You mean special support beyond just being devices on a bus like PCI? (Or even older, ISA or VLB.)
TL:DR: All the special features CPUs have which are useful for improved bandwidth to write (and sometimes read) video memory came after 3D graphics cards were commercially successful. They weren't necessary, just a performance boost.
Once GPUs were commercially successful and popular, and a necessary part of a gaming PC, it made obvious sense for CPU vendors to add features to make things better.
The same IO busses that let you plug in a sound card or network card already have the capabilities to access device memory and MMIO, and device IO ports, which is all that's necessary for video drivers to make a graphics card do things.
Modern GPUs are often the highest-bandwidth devices in a system (especially non-servers), so they benefit from fast buses, hence AGP for a while, until PCI Express (PCIe) unified everything again.
Anyway, graphics cards could work on standard busses; it was only once 3D graphics became popular and commercially important (and fast enough for the PCI bus to be a bottleneck), that things needed to change. At that point, CPU / motherboard companies were fully aware that consumers cared about 3D games, and thus it would make sense to develop a new bus specifically for graphics cards.
(Along with a GART, graphics address/aperture remapping table, an IOMMU that made it much easier / safer for drivers to let an AGP or PCIe video card read directly from system memory. Including I think with addresses under control of user-space, without letting user-space read arbitrary system memory, thanks to it being an IOMMU that only allows a certain address range.)
Before the GART was a thing, I assume drivers for PCI GPUs needed to have the host CPU initiate DMA to the device. Or if bus-master DMA by the GPU did happen, it could read any byte of physical memory in the system if it wanted, so drivers would have to be careful not to let programs pass arbitrary pointers.
Anyway, having a GART was new with AGP, which post-dates early 3D graphics cards like 3dfx's Voodoo and ATI 3D Rage. I don't know enough details to be sure I'm accurately describing the functionality a GART enables.
So most of the support for GPUs was in terms of busses, and thus a chipset thing, not CPUs proper. (Back then, CPUs didn't have integrated memory controllers, instead just talking to the chipset northbridge over a frontside bus.)
Relevant CPU instructions included Intel's SSE and SSE2 instruction sets, which had streaming (NT = non-temporal) stores which are good for storing large amounts of data that won't be re-read by the CPU any time soon, if at all.
SSE4.1 in 2nd-gen Core2 (2008 ish) added a streaming load instruction (movntdqa) which (still) only does anything special if used on memory regions marked in the CPU's page tables or MTRR as WC (aka USWC: uncacheable, write-combining). Copying back from GPU memory to the host was the intended use-case. (Non-temporal loads and the hardware prefetcher, do they work together?)
x86 CPUs introducing the MTRR (Memory Type Range Register) is another feature that improved CPU -> GPU write bandwidth. Again, this came after 3D graphics were commercially successful for gaming.

CPU and GPU memory sharing

If the (discrete) GPU has its own video RAM, I have to copy my data from RAM to VRAM to be able to use them. But if the GPU is integrated with the CPU (e.g. AMD Ryzen) and shares the memory, do I still have to make copies, or can they both alternatively access the same memory block?
It is possible to avoid copying in case of integrated graphics, but this feature is platform specific, and it may work differently for different vendors.
How to Increase Performance by Minimizing Buffer Copies on Intel® Processor Graphics article describes how to achieve this for Intel hardware:
To create zero copy buffers, do one of the following:
Use CL_MEM_ALLOC_HOST_PTR and let the runtime handle creating a zero copy allocation buffer for you
If you already have the data and want to load the data into an OpenCL buffer object, then use CL_MEM_USE_HOST_PTR with a buffer allocated at a 4096 byte boundary (aligned to a page and cache line boundary) and a total size that is a multiple of 64 bytes (cache line size).
When reading or writing data to these buffers from the host, use clEnqueueMapBuffer(), operate on the buffer, then call clEnqueueUnmapMemObject().
GPU and CPU memory sharing ?
GPU have multiple cores without control unit but the CPU controls the GPU through control unit. dedicated GPU have its own DRAM=VRAM=GRAM faster then integrated RAM. when we say integrated GPU its mean that GPU placed on same chip with CPU, and CPU & GPU used same RAM memory (shared memory ).
References to other similar Q&As:
GPU - System memory mapping
Data sharing between CPU and GPU on modern x86 hardware with OpenCL or other GPGPU framework

Is the 8088 processor 8 bit or 16 bit?

In Randall Hyde's Art of Assembly it says the 8088 CPU was 8 bits whilst the 8086 was 16 bits solely because of the width of the data bus.
I have always thought that the address size determined the size of the CPU.
Please shed some light on this issue.
From Wikipedia
The 16-bit registers and the one megabyte address range were unchanged, however. In fact, according to the Intel documentation, the 8086 and 8088 have the same execution unit (EU)—only the bus interface unit (BIU) is different.
So the processor is functionally identical, but the memory bus is smaller. The main purpose was for compatibility with 8-bit interfaces. If a 16-bit interface was wanted, then it would take 2 CPU cycles to accomplish what the 8086 could do in one.
There was a greater availability of 8-bit chips at the time.
I can't find any official definition of the property "x-bit cpu", i suppose it does not exist.
I would say that "x-bit cpu" property indicates that the cpu can manipulate the data (inside the chip) with the size of "x-bits" at once. To be more specific it has so called general registers with the size of "x"bits. So it can add (subtract, divide, multiply, xor etc) the data of "x-bits" length at once.
8086 has 16-bit general registers = 16-bit cpu
8088 has 16-bit general registers = 16-bit cpu
80510 has 32-bit general registers = 32-bit cpu
Again, the official definition of the property is unknown.
the wiki about ia-32 says =
The primary defining characteristic of IA-32 is the availability of
32-bit general-purpose processor registers (for example, EAX and EBX),
32-bit integer arithmetic and logical operations, 32-bit offsets
within a segment in protected mode, and the translation of segmented
addresses to 32-bit linear addresses.
I prefer to think that the bitnes of general registers is enough to determine the x-bit cpu property.
The funny thing is intel itself defines cpu bitness by using different criteria from time to time. if you look at the official intel doc it says
8088 is 8-bit hmos microprocessor
( i suppose they defined 8-bit cpu based on 8-bit data bus interface. remember this criteria). At the same time they say at the same document that the cpu has
16-bit internal architecture
Thats funny.. 8-bit cpu with 16-bit internal architecture.
Okey. Lets look at another example Intel pentium 510.
They say it is
32-bit microprocessor
the cpu has 64-bit data bus, so based on previous example we would need to say that intel 510 is 64-bit cpu however it is wrong.
The conclusion - to determine cpu bitness look at the size of general registers.

32 bit v/s 64 bit

I have a small confusion.
When we talk about 32-bit architecture and 64-bit architecture what do we actually mean. Do we mean that a 32 bit architecture has 32 bit registers OR 32 bit address-bus OR 32-bit data bus.
What is generally implied?
I would say that usually, this would mean that a 64-bit system has 64-bit address registers. In modern systems, data registers are usually at least as large as the address registers, so the data registers and data bus would likely be equivalently sized.
A 64-bit system, however, usually does not have a 64-bit address bus. There's no point, since there hasn't been enough RAM manufactured in the history of the planet to need a full 64 bit physical address bus. A given system will have a maximum amount of physical RAM that it can address, based on the width of its address bus.
We mean that we have 64 bit of address space for programs.
This usually means that we have 64 bit registers in the CPU (makes sense to have the registers in pointer size) and so on...
a 32 bit architecture means that the ALU (description) is capable of computing 32-bit words. The databus (width) and the registers are included in this definition, as well as adressing.
It means that the registers and stack (!) have a width of 32/64 bits. Address-spaces are often much smaller, see here:
In principle, a 64-bit microprocessor can address 16 exabytes of memory. In practice, it is less than that.
For example, the AMD64 architecture as of 2011 allows 52 bits for physical memory and 48 bits for virtual memory.
wikipedia-link
Well! Thanks a lot for your inputs.
After reading through a lot of articles and online material, I think now I my confusion is no more.
So I would like to briefly summarize.
n-bit CPU:
An n-bit CPU only means that it has n-bit registers which implies an n-bit word size. Don't give a second thought on address/data bus size.
As an example, consider Motorola 68000 processor - which comes in a 32-bit variant ie it is called a 32-bit processor but it has 16-bit data bus and 24-bit address bus. Due to its 24-bit address bus, it can address only 2^24 ie 16 MB of RAM.
Address bus only tells how much RAM can be addressed whereas data bus tells how many units of data can be transferred in one cycle.
68000 processor can thus transact only 2 Bytes of data due to 16 bits in data bus.