Tag bits required in TLB - virtual-machine

There is a 32-bit virtual address space in the machine and a 30-bit physical address space, with a virtual memory page size of 8 KBytes. The translation look-aside buffer (TLB) is organized to have 64 total entries and is 4-way set associative. And the L1 data cache organization parameters are 16 KByte, 2-way set associative, 64-byte block/line size. The L1 cache is also physically tagged.
Can someone explain how I could calculate the tag bits in the TLB?
And how many tag bits will be required for each L1 cache block?

Related

how to build a 16 bits RAM with 1 data register and a U-bus? (using Logisim)

Using LOGISIM, build a working 4-nibble RAM. The RAM must have the following elements: 4 nibbles, addressing circuits, the address register, the data register, the mode register (bit), a u-bus, and a clock. Your RAM must be able to read and write nibbles.
You will run your circuit manually by setting the address, mode and data registers, then you will start the clock. The desired action will take place. For example: if you plug in an address into the address register, and a 1 in the mode register, and a nibble in the data register, and you turn on the clock, then the nibble must be copied into the RAM nibble at the desired address.
Since you have a 4-nibble RAM then you will only have addresses 00, 01, 10, and 11. Mode register set to 1 is for write and 0 is for read.
A u-bus is needed to properly handle read and write operations between the data register and the 4-nibbles of RAM.
Elements allowed:
Connectors:
- Wires
- Input connector
- Output connector (or display)
Built-in machines and gates:
- A Bit (RS, or D)
- Clock
- AND, OR, NOT, XOR, NAND, NOR

What is the Logical Block Address of a sector in a USB Flashdrive?

I am implementing USB as a host to read the files stored in the Flashdrive. To read I implement the read(10) command in SCSI.
This command has a field called Logical Block Address, as in the address I want to read. Now, I know the sector number I want to read.
So, is the Logical Block Address and Sector Number the same?
I looked into Cylinder-Head-Sector(CHS) but I dont have information about cylinder or heads
In common usage in SCSI, a sector is the same as a Logical Block Address. It is very likely that your device has 512-byte sectors (512-byte logical blocks). There are some high-performance SSD's and large-capacity spinning media drives that have 4096-byte sectors. These drives are labelled as having "Advanced Formatting".
CHS addressing isn't supported by SCSI. So, if you somehow have just a sector number, it's probably the SCSI "sector" or logical block address.
All of those integer fields in the typical SCSI commands are in big-endian format. If you're on a typical x86 PC of some kind, your integers will be little-endian format. Before you put your sector number in the field in your READ(10) command, you'll need to convert it with htobe32() or htonl(). Likewise for the num field: (htobe16() or htons()).

Calculating the size of a page table

I have an assignment with the following prompt:
The page size for a virtual memory system is 8KB.
The instruction TLB is direct-mapped with 2 sets and each block contains one translation.
^(I don't believe this is relevant for the following 3 questions, as there are two more questions about the TLB)
The number of bits in a virtual address is 20.
The number of bits in a physical address is 15.
(1) What is the number of virtual pages?
I think I have this one figured out.
Page size = 8 * 2^10 = 8192, so the offset is 13 bits.
Virtual page number = 20 - 13 = 7 bits
Virtual pages = 2^7 pages
(2) What is the number of physical pages?
Here's where I'm a little confused. I think I'm supposed to add in the valid, dirty, and reference bits to the physical page number (which is 2, from 15 - 13). However 5 * 2^7 = 640 bytes, which seems incredibly small.
(3) How many bits are used in the virtual address for the page offset?
Answered above, it appears to be 13 bits.
Could anyone point me in the right direction? Thanks!
The valid, dirty, and reference bits are in a page table entry but are not part of the address bits. Therefore using your results there are 2^2 or 4 physical pages.
Yes this does seem small, but realize that there is only 2^15 or 32K bytes of physical memory.

Memory addresses, pointers, variables, values - what goes on behind the scenes

This is going to be a pretty loaded question but ever since I started learning about pointers I've been very curious about what happens behind the scenes when a program is run.
As far as I know, computer memory is commonly thought of as a long strip of memory divided evenly into individual bytes. Certainly pictures such as the following evoke such a metaphor:
One thing I've been wondering, what do the memory addresses themselves represent? I'm sure it's no coincidence that memory addresses appear as 8 digit hexadecimal values (eg/ 00EB5748). Why is this?
Furthermore, when I declare a variable x, what is happening at the memory level? Is the compiler simply reserving a random address (+however many consecutive addresses it needs for the variable type) for data storage?
Now suppose x is an unsigned int that occupies 2 bytes of memory (ie values ranging from 0 to 65536). When I declare x = 12, what is happening? What is it that I'm making equal to 12? When I draw conceptual diagrams, I usually have a box for an address (say &x) pointing to a variable (x) that occupies seemingly nothing, and I'm sure that can't be a fully accurate picture of what's going on.
And what's happening at the binary level? Is the address 00EB5748 treated as 111010110101011101001000 and storing a value of 12 somewhere, or 1100?
Mostly my confusion & curiosity stems from the relationship between memory addresses and actual values being declared (eg/ 12, 'a', -355.2). As another example, suppose our address 00EB5748 is pointing to a char 's' whose value is 115 according to ASCII charts. Is the address describing a position that stores the value 115 in 1 byte, by flipping the appropriate 1s and 0s at that position in memory?
Just open any book. You will see pages. Every page has a number. Consecutive pages are numbered by consecutive numbers. Do you have any confusion with numbered pages? I think no. Then you should not have confusion with computer memory.
Books were main memory storage devices before computer era. Computer memory derived basic concept from books: book has pages -> computer memory has memory cells, book has page numbers -> computer memory has memory addresses.
One thing I've been wondering, what do the memory addresses themselves represent?
Numbers. Every memory cell has number, like every page in book.
Furthermore, when I declare a variable x, what is happening at the memory level? Is the compiler simply reserving a random address (+however many consecutive addresses it needs for the variable type) for data storage?
Memory manager marks some memory cells occupied and tells the address of first reserved cell to compiler. Compiler associates name and type of variable with this address. (This picture is from my head, it can be inaccurate).
When I declare x = 12, what is happening?
When you declared variable x, memory cells were reserved for this variable. Now you write 12 into these memory cells. Note that 12 is binary coded in some way, depending on type of variable x. If x is unsigned int which occupies 2 memory cells, then one cell will contain 0, other will contain 12. Because binary integer representation of 12 is
0000 0000 0000 1100
|_______| |_______|
cell cell
If 12 is floating-point number it will be coded in other way.
A memory address is simply the position of a given byte in memory. The zeroth byte is at 0x00000000. The tenth at 0x0000000A. The 65535th at 0x0000FFFF. And so on.
Local variables live on the stack*. When compiling a block of code, the compiler counts how many bytes are needed to hold all the local variables, and then increments the stack pointer so that all the variables can fit below it (along with some other stuff like frame pointers and return addresses and whatnot). Then it just remembers that, for example, local variable x is at an offset -2 from the stack pointer, foo is at an offset -4 and so on, and uses those addresses whenever those variables are referenced in the following code.
Since the compiler knows that x is at address (stack pointer - 2), that's the location that is set to the value 12 when you do x = 12.
Not entirely sure if I understand this question, but say you want to read the memory at address 0x00EB5748. The control unit in the CPU reads the instruction, sees that it is a load instruction, and passes the address (in binary of course) to the load/store unit, along with some other junk like how many bytes to read. Then the LSU sends that address to some memory (probably L1 cache), and after a certain time gets the value 12 back. Then this data is available to, say, put in a register, or send to the ALU to do arithmetic, or whatever.
That seems to be accurate, yes. Going back to the first question, an address simply means "byte number 0xWHATEVER in memory".
Hope this clarified things a bit at least.
*I should probably explain the stack as well. A stack is a portion of memory reserved for local variables (and some other stuff). It starts at a fixed location in memory, and stops at the memory address contained in a special register called the stack pointer. To begin with, the stack is empty, so the stack pointer just contains the start of the stack. As you put more data on the stack, the SP is incremented. This means that you can always put more data on it simply by putting it at the address in the SP, and then incrementing the SP so that once again anything past that address is free memory.

GPU shared memory size is very small - what can I do about it?

The size of the shared memory ("local memory" in OpenCL terms) is only 16 KiB on most nVIDIA GPUs of today.
I have an application in which I need to create an array that has 10,000 integers. so the amount of memory I will need to fit 10,000 integers = 10,000 * 4b = 40kb.
How can I work around this?
Is there any GPU that has more than 16 KiB of shared memory ?
Think of shared memory as explicitly managed cache. You will need to store your array in global memory and cache parts of it in shared memory as needed, either by making multiple passes or some other scheme which minimises the number of loads and stores to/from global memory.
How you implement this will depend on your algorithm - if you can give some details of what it is exactly that you are trying to implement you may get some more concrete suggestions.
One last point - be aware that shared memory is shared between all threads in a block - you have way less than 16 kb per thread, unless you have a single data structure which is common to all threads in a block.
All compute capability 2.0 and greater devices (most in the last year or two) have 48KB of available shared memory per multiprocessor. That begin said, Paul's answer is correct in that you likely will not want to load all 10K integers into a single multiprocessor.
You can try to use cudaFuncSetCacheConfig(nameOfKernel, cudaFuncCachePrefer{Shared, L1}) function.
If you prefer L1 to Shared, then 48KB will go to L1 and 16KB will go to Shared.
If you prefer Shared to L1, then 48KB will go to Shared and 16KB will go to L1.
Usage:
cudaFuncSetCacheConfig(matrix_multiplication, cudaFuncCachePreferShared);
matrix_multiplication<<<bla, bla>>>(bla, bla, bla);