Icache and Dcache in Simple.py configuration of gem5 - gem5

I am trying to understand the models generated using gem5. I simulated a build/X86/gem5.opt with the gem5/configs/learning_gem5/part1/simple.py configuration file provided in gem5 repo.
In the output directory I get the following .dot graph:
I have the following doubts:
Does this design not have any Instruction and Data Cache? I checked the config.ini file there were no configuration statistics such as ICache/Dcache size.
What is the purpose of adding the icache_port and dcache_port?
system.cpu.icache_port = system.membus.slave
system.cpu.dcache_port = system.membus.slave

Does this design not have any Instruction and Data Cache? I checked the config.ini file there were no configuration statistics such as ICache/Dcache size.
I'm not very familiar with that config, but unless caches were added explicitly somewhere, then there aren't caches.
Just compare it to an se.py run e.g.:
build/ARM/gem5.opt configs/example/se.py --cmd hello.out \
--caches --l2cache --l1d_size=64kB --l1i_size=64kB --l2_size=256kB`
which definitely has caches, e.g. that config.ini at gem5 211869ea950f3cc3116655f06b1d46d3fa39fb3a contains:
[system.cpu.dcache]
size=65536
What is the purpose of adding the icache_port and dcache_port?
I'm not very familiar with the port system.
I think ports are used as a way for components to communicate, often in master / slave pairs, e.g. CPU is a master and the cache is a slave. So here I think that the CPU port is there but there is nothing attached to it, so no caches.
For example on the above se.py example we see this clearly:

Related

How to get stats for several gem5 full system userland benchmark programs under Linux without having to reboot multiple times?

I know how to modify my image, reboot, and re-run it, but that would make my experiments very slow, since boot takes a few minutes.
Is there a way to quickly switch:
command line options
the executable
that is being run after boot?
This is not trivial because the Linux kernel knows about:
the state of the root filesystem
the state of memory, and therefore of kernel CLI options that could be used to modify init
so I can't just switch those after a checkpoint.
This question is inspired from: https://www.mail-archive.com/gem5-users#gem5.org/msg16959.html
Here is a fully automated setup that can help you to do it.
The basic workflow is as follows:
run your benchmark from the init executable that gets passed to the Linux kernel
https://unix.stackexchange.com/questions/122717/how-to-create-a-custom-linux-distro-that-runs-just-one-program-and-nothing-else/238579#238579
https://unix.stackexchange.com/questions/174062/can-the-init-process-be-a-shell-script-in-linux/395375#395375
to run a single benchmark with different parameters without
rebooting, do in your init script:
m5 checkpoint
m5 readfile | sh
and set the contents of m5 readfile on a host filesystem file before restoring the checkpoint with:
echo 'm5 resetstats && ./run-benchmark && m5 dumpstats' > path/to/script
build/ARM/gem5.opt configs/example/fs.py --script path/to/script
This is what the "configs/boot/hack_back_ckpt.rcS" but I think that
script is overly complicated.
to modify the executable without having to reboot, attach a second
disk image and mount after the checkpoint is restored:
How to attach multiple disk images in a simulation with gem5 fs.py?
Another possibility would be 9P but it is not working currently: http://gem5.org/WA-gem5
to only count only benchmark instructions, do:
m5 resetstats
./run-benchmark
m5 dumpstats
If that is not precise enough, modify the source of your benchmark with m5ops magic instructions that do resetstats and dumpstats from within the benchmark as shown at: How to count the number of CPU clock cycles between the start and end of a benchmark in gem5?
to make the first boot faster, you can boot with a simple CPU model like the default AtomicSimpleCPU and then switch to a more detailed, slower model after the checkpoint is restored: How to switch CPU models in gem5 after restoring a checkpoint and then observe the difference?

Access pagemap in gem5 FS mode

I am trying to run an application which uses pagemap in gem5 FS mode.
But I am not able to use pagemap in gem5. It throws below error -
"assert(pagemap>=0) failed"
The line of code is:
int pagemap = open("/proc/self/pagemap", O_RDONLY);
assert(pagemap >= 0);
Also, If I try to run my application on gem5 terminal with sudo ,it throws error-
sudo command not found
How can I use sudo in gem5 ??
These problems are not gem5 specific, but rather image / Linux specific, and would likely happen on any simulator or real hardware. So I recommend that you remove gem5 from the equation completely, and ask a Linux or image specific question next time, saying exactly what image your are using, kernel configs, and provide a minimal C example that reproduces the problem: this will greatly improve the probability that you will get help.
I have just done open("/proc/self/pagemap", O_RDONLY) successfully with: this program and on this fs.py setup on aarch64, see also these comments.
If /proc/<pid>/pagemap is not present for any file, do the following:
ensure that procfs is mounted on /proc. This is normally done with an fstab entry of type:
proc /proc proc defaults 0 0
but your init script needs to use fstab as well.
Alternatively, you can mount proc manually with:
mount -t proc proc proc/
you will likely want to ensure that /sys and /dev are mounted as well.
grep the kernel to see if there is some config controlling the file creation.
These kinds of things are often easy to find without knowing anything about the kernel.
If I do:
git grep '"pagemap'
to find the pagemap string, which is likely the creation point, on v4.18 this leads me to fs/proc/base.c, which contains:
#ifdef CONFIG_PROC_PAGE_MONITOR
REG("pagemap", S_IRUSR, proc_pagemap_operations),
#endif
so make sure CONFIG_PROC_PAGE_MONITOR is set.
sudo: most embedded / simulator images don't have it, you just login as root directly and can do anything by default without it. This can be seen by the conventional # in the prompt instead of $.

How to make /var/log symlink to a persistent storage in Yocto Rocko

I'm building a Yocto-based distribution with systemd and journald in its core.
And unfortunately I cannot get Yocto to store all logs in /var/log -> /data/log. I need journald logs as well as some other logs that are written there after multi-user.target to be persistent. /data is the persistent partition.
I have a very similar problem to this but unfortunately I couldn't modify it to work properly in my setup.
From my understanding there's two things I need to modify:
volatiles file in base-files which I hope is a config file for systemd-tmpfiles. It should tell it to create at runtime everything that journald needs. Here I modified one line:
L+ root root 0755 /var/log /data/log
fs-perms.txt
${localstatedir}/log link /data/log
I also tried to pull it off with VOLATILE_LOG_DIR set "no" (fs-perms-persistent-log.txtmodified but to no avail. And also adding some kind ofvar.confto/etc/tmpfiles.d` with a config similar to the one above. It also hasn't worked.
I launch a watch ls -l on the resulting rootfs/var and see that var/log is getting symlinked to `/data/log for a short while but later it's overridden somewhere to point to volatile/log once again.
I would greatly appreciate any advice because it seems like I'm overcomplicating this thing. It should be very easy. After all it's just making Yocto to make a symlink. But I guess this is a rather important directory to let me ln -sf /data/log /var/log.
I would also like to hear out implications of this approach.
Other than wearing out my eMMC. We can live with that because the log activity is very low compared to some other actions performed on the device. I'm mostly interested about mount order and stuff. If I remember correctly, journald will use a memory buffer until it has /var/log/journal created for it so I should be fine. But what should I do to ensure everything's in place before the logs are flushed? Do I need to modify systemd services to include RequireMountsFor or After=?
I want to be as defensive as possible so I'm looking forward to what you guys have to say on the topic.
EDIT:
Maybe I can just add a bind mount from /var/log to /data/log? If that is actually the solution I'd also like to know if there's no hidden hindrances down the road?
you can mount your persistent partition via tweaking the base-files recipe (base-files_%.bbappend)
do_install_append () {
cat >> ${D}${sysconfdir}/fstab <<EOF
# Data partition
/dev/mmcblk0p4 /data auto defaults,sync,noauto 0 2
EOF
}
dirs755 += "/data"
then you can tweak volatile-binds (volatile-binds.bbappend)
VOLATILE_BINDS = "\
/data/var/lib /var/lib\n\
/data/var/log /var/log\n\
/data/var/spool /var/spool\n\
/data/var/srv /srv\n\
"
This should hopefully help, I have not tested it fully here, but I hope this might provide you some starting point.
A persistent log data option has made it in Yocto 2.4: https://bugzilla.yoctoproject.org/show_bug.cgi?id=6132
Log data can be made persistent by defining the following in your distro config:
VOLATILE_LOG_DIR = "no"

Speed up gsutil rsync between s3 and gs

I want to rsync a bucket with 100M files between s3 and gs. I've got a c3.8xlarge instance and did a quick dry run:
$ time gsutil -m rsync -r -n s3://s3-bucket/ gs://gs-bucket/
Building synchronization state...
At source listing 10000...
^C
real 4m11.946s
user 0m0.560s
sys 0m0.268s
About 4 minutes for 10k files. At this rate, it's going to take 27 days just to compute the sync state. Anything I can do to speed this up?
I also noticed [and fixed] the following warning:
WARNING: gsutil rsync uses hashes when modification time is not
available at
both the source and destination. Your crcmod installation isn't using the
module's C extension, so checksumming will run very slowly. If this is your
first rsync since updating gsutil, this rsync can take significantly longer than
usual. For help installing the extension, please see "gsutil help crcmod".
Are the file hashes computed or am I just waiting for listing 100M files?
When setting up a sync process between two buckets, the first iteration is going to be the slowest because it needs to copy all of the data in source-bucket to dest-bucket. For cross-provider syncs, this is further slowed down by the need for two separate connections per object -- one to pull the data from the source to the host machine, and another to funnel it through from the host to the destination (gsutil refers to this as "daisy-chain" mode).
For the initial sync (and possibly subsequent syncs as well) between buckets, you might be better off using GCS's transfer service, which allows GCS to copy the objects on your behalf. This tends to be much faster than doing all the work with one machine running gsutil.
As for the warning, it's a general warning that's printed at the beginning of the command execution if you don't have the crcmod C extension installed, regardless of what's present at the destination.
Skyplane is a much faster alternative for transferring data between clouds (up to 110x for large files). You can transfer data with the command:
# copy data
skyplane cp -r s3://aws-bucket-name/ gcs://google-bucket-name/
# sync data
skyplane sync -r s3://aws-bucket-name/ gcs://google-bucket-name/

Initrd, Ramdisk, Initramfs, uclinux

I am working on uclinux porting on coldfire board M5272C3. Right now I have kernel running from RAM with romfs as my rootfile system.
I am not clear about few terms what they mean and when to use them....
Please explain me in a simplest possible manner:
Q1: What is initrd? Why we need that?
Q2: What is ramdisk? Why and where we need this?
Q3: what is initramfs? Why and where we use this?
Q4: What is ramfs? Why and where we use this?
Also please refer document/reference book for in depth knowledge of these terms....
Thanks
Phogat
A ramdisk merely refers to an in-memory disk image. It is implemented using the ramfs VFS driver in the kernel. The contents of the ramdisk would be wiped on the next reboot or power-cycle.
I'll give you details about initrd and initramfs next.
In simple terms, both initrd and initramfs refers to an early stage userspace root filesystem (aka rootfs) that will let you run a very minimal filesystem in memory.
The documentation present at Documentation/filesystems/ramfs-rootfs-initramfs.txt part of the linux kernel source tree, which would also give you a length description of what these are.
What is initrd ?
One common case where there is the need for such an early-stage filesystem is to load driver modules for hard disk controllers. If the drivers were present on the hard drive, it becomes a chicken-and-egg problem. Having these drivers as part of this early-stage rootfs helps the kernel load the drivers for any detected hard disk controllers, before it can mount the actual root filesystem from the hard drive. Another solution to this problem would be to have all the driver modules built into the kernel, but you're going to increase the size of the kernel binary this way. This kind of filesystem image is commonly referred to as initrd. It is implemented using either ramfs or tmpfs. It is emulated using a loopback block device.
The bootloader loads the kernel image into a memory address, the initrd image into another memory address, and tells the kernel where to find the initrd, passes the boot arguments to the kernel, and passes control to the kernel to let it continue the boot process.
So how is it different from initramfs then ?
initramfs is an even earlier stage filesystem compared to initrd which is built into the kernel (controlled by the kernel config of course).
As far as I know, both initrd and initramfs are controlled by this single kernel config, but it could have been changed in the recent kernels.
config BLK_DEV_INITRD
I'm not going deep into how to build your own initramfs, but I can tell you it just uses cpio format to store the files and can be configured using usr/Kconfig while building the kernel. Even if you do not specify your own initramfs image, but have turned on support for initramfs, kernel automatically embeds a very simple initramfs containing /dev/console, /root and some other files/directories.
In addition there is also a newer tmpfs filesystem which is commonly used to implement in-memory filesystems. In fact newer kernels implement initrd using tmpfs instead of ramfs.
UPDATE:
Just happened to stumble upon a similar question
This might also be useful