Customizing L2 Cache Sharing between Cores - gem5

I am trying to create a multi-chip multi-processor design where the L2 Caches are private to each chip. For example I am trying to create the following configuration:
2 Chips each containing 2 CPU Cores
Each Chip has 2 CPU Cores(each having its own L1 Cache) and a single L2 Cache shared between the two CPUs
Finally I will have the Main Memory shared between the 2 Chips
I am using the MOESI_CMP_directory protocol to generate the design. And I am using garnet2.0 to create the topology. But what I have understood is that all of the 4 CPUs share the two L2-Caches. But I want the L2-Cache be private to each Chip. Is there any way to do that in gem5?
Additional Info:
I checked the memory addresseses and accessed Caches through RubyNetwork to confirm that L1-Cache0 accesses L2-Cache0 as well as L2-Cache1. It seems the protocol is working correctly because L2 Cache being the last level cache is being shared. But I was wondering if I could make some customization so that L1-Cache0/1 requests only go to L2-Cache0 and not L2-Cache1.

I think I know how to resolve this. There are two files which will need modification for this:
src/mem/ruby/protocol/MESI_Two_Level-L1Cache.sm - In this file the Coherent Messages from L1Cache are sent through "actions". The function which controls the mapping, as to which L2Cache Node receives the coherent request, is "mapAddressToRange". This function is passed certain parameters in the .sm file and can be modified.
src/mem/ruby/slicc_interface/RubySlicc_ComponentMapping.hh - This file contains the implementation of the functions "mapAddressToRange" and we can make modifications here as per our requirement.

Related

How do operating systems isolate processes from each other?

Assuming the CPU is in protected mode:
When a ring-0 kernel sets up a ring-3 userspace process, which CPU-level datastructure does it have to modify to indicate which virtual address space this specific process can access?
Does it just set the Privilege Bit of all other memory segments in the Global Descriptor Table to (Ring) 0?
Each process will have a set of page tables it uses. On x86 that means a page directory with some page tables. The address to the page directory will be in the CR3 Register. Every set of pagetables will have the kernel mapped (with kernel permissions) so when you do a system call, the kernel can access it's own pages. User processes can't access these pages. When you do a context switch, you change the address in the CR3 register to the page tables of the process that will be executed. Because each process has a different set of pagetables, they will each have a different view on memory. To make sure that no two processes have access to the same physical memory, you should have some kind of physical memory manager, which can be queried for a brand new area of memory that is not yet mapped in any other pagetable.
So as long as each Process struct keeps track of it's own page table structure, the only cpu level datastructure you will have to modify is the CR3 register.
It appears that the Global Descriptor Table (GDT) provides a segmentation mechanism that can be used in conjunction with Paging, but is now considered legacy.
By loading the page directory address into the CR3 control register, the Ring 3 process is restricted to the linear memory defined by the paging mechanism. CR3 can only be changed from Ring 0:
In protected mode, the 2 CPL bits in the CS register indicate which ring/privilege level the CPU is on.
More here:
https://forum.osdev.org/viewtopic.php?f=1&t=31835
https://wiki.osdev.org/Paging
https://sites.google.com/site/masumzh/articles/x86-architecture-basics/x86-architecture-basics
https://en.wikipedia.org/wiki/X86_memory_segmentation
https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4

LR 12.55/TruClient vusers are stuck in Init state not going to running

I created a TruClient Web (IE) protocol script in LR12.55, when I try to run the script with 50 users, only some would go into running state (in between 25-37) and the rest would stuck in init forever.
I tried to change the Controller -> Options-> Timeout and changed Init timeout from default 180 to 999 however it does not resolve the issue. Can anybody comment on how to resolve this????
TruClient runs a real browser for each vuser (virtual-user), so system resource consumption is higher the API-level testing.
It is possible that 50 vusers is too much for your load-generator machine.
I'd suggest checking CPU and memory levels during the run. If either is over 80% utilization, you should split your load between multiple load-generator machines.
If resources are not fully utilized, the failures should be analyzed to determine the root cause.
To further e-Dough's excellent response, you should expect not to execute these virtual users on the same hardware as the controller. You should expect at least three load generators to be involved, two as primary load and one as a control set. This is in addition to the controller.
Your issue does manifest as the classical, "system out of resources" condition. Consider the same best practices for monitoring your load generator health as you would in monitoring your application under test infrastructure. You want to have monitors for your classical finite resource model components ( CPU, DISK, MEMORY and NETWORK) plus additional sub components, such as a breakout of System and Application under CPU, to understand where and how your system is performing. You want to be able to eliminate false negatives on scalability where your load generators are so unhealthy that they are distorting your test results - Virtual users showing the application is slow when in fact the Virtual Users are slow because the machine in use is resource constrained.

Is it possible to "wake up" linux kernel process from user space without system call?

I'm trying to modify a kernel module that manages a special hardware.
The user space process, performs 2 ioctl() system calls per milliseconds to talk with the module. This doesn't meet my real.time requirements because the 2 syscalls sometimes take to long to execute and go out my time slot.
I know that with mmap I could share a memory area, and this is great, but how can I synchronize the data exchange with the module without ioctl() ?

OS and/or IIS Caching

Is there a way where I can force caching files at an OS level and/or Web Server level (IIS)
The problem I am facing is that there a many static files ( xslt's for example ) that need to be loaded again and again - and I want to load all these files to memory so that no time wasted on hard disk I/O.
(1) I want to cache it at the OS level so that every program that runs on my OS and which tries to read a file must read it from memory. I want no changing in program source code - it must happen transparently. For example, read("c:\abc.txt") must not cause a disk I/O, it must read it from the memory.
(2) Achieving similar thing in IIS. I've read few things about output caching for database queries - but how to achieve it for files?
All suggestions are welcome!
Thanks
You should look into some tricks used by SO itself. One was that they moved all their static content off to another domain for efficiency.
The problem with default set ups for Apache (at a minimum) is that the web server will pass all requests through to an app server to see if the content is meant to be dynamic. That's a huge waste for content that you know to be static.
Far better to set up a separate domain for static content without an app server. That way, the static requests are not sent unnecessarily to another layer and the web server can run much faster.
Even in a setup where there's not another layer invoked every time, there are other reasons for a separate domain, as you'll see from that link (specifically removing cookies which both reduces traffic and improves the chances of the Internet caching your data).

Is there any reason to open a file with shared-write access?

I've always opened my files two ways -- either read access and shared-read, or read/write access and no sharing.
To me it seems that allowing shared-write could always result in unexpected things happening to the file while you're reading it. Are there any good reasons to open a file in shared-write mode?
If a file is shared by many processes, it is sometimes impractical to lock the whole file (for performance reasons).
In this case, you can lock a region of the file while it is being written.
In Windows you might use the function LockFile().
In Linux/Unix you might use fcntl() or flock()
I'll hazard a guess... one thing it may be used for is for parallel computations. Say you have two threads doing some highly parallelizable computation and you need the data to be written to a single file. You're also able to pre-determine the size needed to store the output of each thread (say 50MB).
So, allocate a 100MB file, have thread one start writing at offset 0 and thread #2 start at 50MB. When the threads complete you will have your single, composed file (otherwise, using separate files, you'd need to append the result from thread #2 to thread #1).
ASCII Art Attempt
==============================
| 50MB | 50MB | [100 MB Total FileSize]
| | |
==============================
^ ^
| |
Thread 1 Thread 2
All that said, I've never done this. It may not even work! You can conceivably just share the File Handle/Stream between threads using some other synchronization mechanism, but then you'd have to also reset the offset on each thread. Perhaps one or the other is more efficient.
On one hand there could be lots of disk thrashing if both threads are always writing simeltanouesly. Conversely, syncing the writes may negate the benefits of concurrency if there is a lot of contention on the write lock. And as often said, profile and test!
Anyways, I'm also curious about a "real life" scenario where shared write access has been used and will be watching for more answers!
Sockets on a level lower than File I/O.
Say a server listens on some local port 1999 and relays inbound to all subscribing clients on service port 3128.
The server could read from multiple local clients and relay to multiple remote clients. If the server were an authentication daemon, multiple local applications might attempt to authenticate via the same server (service). The remote clients could be notified that user-x is now authenticated because s/he logged in successfully to one of the apps sharing authentication server.
I don't know what I'm talking about. I'm venturing a guess.