gem5 SE mode simulation hangs at the end when using ruby memory model and DerivO3CPU - gem5

I am trying to run gem5 in SE mode with number of cpus 2, cpu type DerivO3CPU, and ruby memory model using following command, but the simulation does not stop even after results are produced , ie "Hello world!" is printed to stdout
build/X86_MESI_Three_Level/gem5.opt configs/example/se.py -n 2 --ruby --cpu-type=DerivO3CPU -c 'tests/test-progs/hello/bin/x86/linux/hello;tests/test-progs/hello/bin/x86/linux/hello'
I started using ruby memory model after reading from gem5 email archive that classic memory does not work with multicore DerivO3CPU.

Related

How to disconnect Google Colab after finishing script

I'm using Google Colab Pro+ to train a computer vision model. Sometimes execution takes up to 8 hours so after completion I'm still allocating memory and a GPU. I wish I had a line of code to execute after the training finishes. Is there any way to terminate the environment using code?
You should be able to disconnect your runtime by running the following command,
!kill $(ps aux | awk '{print $2}')
This command will extract the PID of each running process and then kill (or stop) them.

Get mapping between /dev/nvidia* and nvidia-smi gpu list

A server with 4 GPUs is used for deep learning.
It often happens that GPU memory is not freed after the training process was terminated (killed). Results shown by nvidia-smi is
Nvidia-smi results
The cuda device 2 is used. (Might be a process launched with CUDA_VISIBLE_DEVICES=2)
Some sub-processes are still alive thus occupy the memory.
One bruce-force solution is to kill all processes created by python using:
pkill -u user_name python
This should be helpful if there is only one process to be cleaned up.
Another solution proposed by pytorch official My GPU memory isn’t freed properly
One may find them via
ps -elf | grep python.
However, if multiple processes are launched and we only want to kill the ones that related to a certain GPU, we can group processes by the gpu index (nvidia0, nvidia1, ...) as:
fuser -v /dev/nvidia*
fuser -v results
As we can see, /dev/nvidia3 is used by some python threads. Thus /dev/nvidia3 corresponds to cuda device 2.
The problem is: I want to kill certain processes launched with setting of CUDA_VISIBLE_DEVICES=2, but I do not know the gpu index (/dev/nvidia0, /dev/nvidia1, ...).
How to find the mapping between CUDA_VISIBLE_DEVICES={0,1,2,3} and /dev/nvidia{0,1,2,3}.
If you set CUDA_DEVICE_ORDER=PCI_BUS_ID environment variable, then the order should be consistent between CUDA and nvidia-smi.
There is also another option (if you are sure the limiting to a specific GPU is done via CUDA_VISIBLE_DEVICES env var). Every process' environment can be examined in /proc/${PID}/environ. The format is partially binary, but grepping through the output usually works (if you force grep to treat the file as text file). This might require root privileges.

How to get stats for several gem5 full system userland benchmark programs under Linux without having to reboot multiple times?

I know how to modify my image, reboot, and re-run it, but that would make my experiments very slow, since boot takes a few minutes.
Is there a way to quickly switch:
command line options
the executable
that is being run after boot?
This is not trivial because the Linux kernel knows about:
the state of the root filesystem
the state of memory, and therefore of kernel CLI options that could be used to modify init
so I can't just switch those after a checkpoint.
This question is inspired from: https://www.mail-archive.com/gem5-users#gem5.org/msg16959.html
Here is a fully automated setup that can help you to do it.
The basic workflow is as follows:
run your benchmark from the init executable that gets passed to the Linux kernel
https://unix.stackexchange.com/questions/122717/how-to-create-a-custom-linux-distro-that-runs-just-one-program-and-nothing-else/238579#238579
https://unix.stackexchange.com/questions/174062/can-the-init-process-be-a-shell-script-in-linux/395375#395375
to run a single benchmark with different parameters without
rebooting, do in your init script:
m5 checkpoint
m5 readfile | sh
and set the contents of m5 readfile on a host filesystem file before restoring the checkpoint with:
echo 'm5 resetstats && ./run-benchmark && m5 dumpstats' > path/to/script
build/ARM/gem5.opt configs/example/fs.py --script path/to/script
This is what the "configs/boot/hack_back_ckpt.rcS" but I think that
script is overly complicated.
to modify the executable without having to reboot, attach a second
disk image and mount after the checkpoint is restored:
How to attach multiple disk images in a simulation with gem5 fs.py?
Another possibility would be 9P but it is not working currently: http://gem5.org/WA-gem5
to only count only benchmark instructions, do:
m5 resetstats
./run-benchmark
m5 dumpstats
If that is not precise enough, modify the source of your benchmark with m5ops magic instructions that do resetstats and dumpstats from within the benchmark as shown at: How to count the number of CPU clock cycles between the start and end of a benchmark in gem5?
to make the first boot faster, you can boot with a simple CPU model like the default AtomicSimpleCPU and then switch to a more detailed, slower model after the checkpoint is restored: How to switch CPU models in gem5 after restoring a checkpoint and then observe the difference?

How to run the gem5 unit tests?

Gem5 has several tests in the source tree, and there is some documentation at: http://www.gem5.org/Regression_Tests but those docs are not very clear.
What tests are there and how to run them?
Unit vs regression tests
gem5 has two kinds of tests:
regression: run some workload (full system or syscall emulation) on the entire simulator
unit: test only a tiny part of the simulator, without running the entire simulator binary
We will cover both on this answer.
Regression tests
2019 regression tests
A new testing framework was added in 2019 and it is documented at: https://gem5.googlesource.com/public/gem5/+/master/TESTING.md
Before sending patches, you basically want to run:
cd tests
./main.py run -j `nproc` -t `nproc`
This will:
build gem5 for the actively supported ISAs: X86, ARM, RISCV with nproc threads due to j
download binaries required to run tests from gem5.org, e.g. http://www.gem5.org/dist/current/arm/ see also: http://gem5.org/Download It was not currently possible to download out or the source tree, which is bad if you have a bunch of git worktrees lying around.
run the quick tests on nproc threads due to -t, which should finish in a few minutes
You can achieve the same as the previous command without cd by passing the tests/ directory as an argument:
./main.py run -j `nproc` -t `nproc` tests
but I wish neither were necessary: https://gem5.atlassian.net/browse/GEM5-397
This is exactly what the automated upstream precommit tests are running as can be seen from tests/jenkins/presubmit.sh.
Stdout contains clear result output of form:
Test: cpu_test_DerivO3CPU_FloatMM-ARM-opt Passed
Test: cpu_test_DerivO3CPU_FloatMM-ARM-opt-MatchStdout Passed
Test: realview-simple-atomic-ARM-opt Failed
Test: realview-simple-atomic-dual-ARM-opt Failed
and details about each test can be found under:
tests/.testing-results/
e.g.:
.testing-results/SuiteUID:tests-gem5-fs-linux-arm-test.py:realview-simple-atomic-ARM-opt/TestUID:tests-gem5-fs-linux-arm-test.py:realview-simple-atomic-ARM-opt:realview-simple-atomic-ARM-opt/
although we only see some minimal stdout/stderr output there which don't even show the gem5 stdout. The stderr file does however contain the full command:
CalledProcessError: Command '/path/to/gem5/build/ARM/gem5.opt -d /tmp/gem5outJtSLQ9 -re '/path/to/gem5/tests/gem5/fs/linux/arm/run.py /path/to/gem5/master/tests/configs/realview-simple-atomic.py' returned non-zero exit status 1
so you can remove -d and -re and re-run that to see what is happening, which is potentially slow, but I don't see another way.
If a test gets stuck running forever, you can find its raw command with Linux' ps aux command since processes are forked for each run.
Request to make it easier to get the raw run commands from stdout directly: https://gem5.atlassian.net/browse/GEM5-627
Request to properly save stdout: https://gem5.atlassian.net/browse/GEM5-608
To further stress test a single ISA, you can run all tests for one ISA with:
cd tests
./main.py run -j `nproc` -t `nproc` --isa ARM --length long --length quick
Each test is classified as either long or quick, and using both --length runs both.
long tests are typically very similar to the default quick ones, but using more detailed and therefore slower models, e.g.
tests/quick/se/10.mcf/ref/arm/linux/simple-atomic/ is quick with a faster atomic CPU
tests/long/se/10.mcf/ref/arm/linux/minor-timing/ is long with a slower Minor CPU
Tested in gem5 69930afa9b63c25baab86ff5fbe632fc02ce5369.
2019 regression tests run just one test
List all available tests:
./main.py list --length long --length quick
This shows both suites and tests, e.g.:
SuiteUID:tests/gem5/cpu_tests/test.py:cpu_test_AtomicSimpleCPU_Bubblesort-ARM-opt
TestUID:tests/gem5/cpu_tests/test.py:cpu_test_AtomicSimpleCPU_Bubblesort-ARM-opt:cpu_test_AtomicSimpleCPU_Bubblesort-ARM-opt
TestUID:tests/gem5/cpu_tests/test.py:cpu_test_AtomicSimpleCPU_Bubblesort-ARM-opt:cpu_test_AtomicSimpleCPU_Bubblesort-ARM-opt-MatchStdout
And now you can run just one test with --uid:
./main.py run -j `nproc` -t `nproc` --isa ARM --uid SuiteUID:tests/gem5/cpu_tests/test.py:cpu_test_AtomicSimpleCPU_FloatMM-ARM
A bit confusingly, --uid must point to a SuiteUID, not TestUID.
Then, when you run the tests, and any of them fails, and you want to run just the failing one, the test failure gives you a line like:
Test: cpu_test_DerivO3CPU_FloatMM-ARM-opt Passed
and the only way to run just the test is to grep for that string in the output of ./main.py list since cpu_test_DerivO3CPU_FloatMM-ARM-opt is not a full test ID, which is very annoyng.
2019 regression tests out of tree
By default, tests/main.py places the build at gem5/build inside the source tree. Testing an out of tree build is possible with --build-dir:
./main.py run -j `nproc` -t `nproc` --isa ARM --length quick --build-dir path/to/my/build
which places the build in path/to/my/build/ARM/gem5.opt instead for example.
If your build is already done, save a few scons seconds with --skip-build option as well:
./main.py run -j `nproc` -t `nproc` --isa ARM --length quick --build-dir path/to/my/build --skip-build
Note however that --skip-build also skips the downloading of test binaries. TODO patch that.
2019 regression tests custom binary download director
Since https://gem5-review.googlesource.com/c/public/gem5/+/24525 you can use the --bin-path option to specify where the test binaries are downloaded, otherwise they just go into the source tree.
This allows you to reuse the large binaries such as disk images across tests in multiple worktrees in a single machine, saving time and space.
Pre-2019 regression tests
This way of running tests is deprecated and will be removed.
The tests are run directly with scons.
But since the test commands are a bit long, there is even an in-tree utility generate tests commands for you.
For example, to get the command to run X86 and ARM quick tests, run:
./util/regress -n --builds X86,ARM quick
The other options besides quick are long or all to do both long and quick at the same time.
With -n it just prints the test commands, and without it it actually runs them.
This outputs something like:
scons \
--ignore-style \
--no-lto \
build/X86/gem5.debug \
build/ARM/gem5.debug \
build/X86/gem5.fast \
build/ARM/gem5.fast \
build/X86/tests/opt/quick/se \
build/X86/tests/opt/quick/fs \
build/ARM/tests/opt/quick/se \
build/ARM/tests/opt/quick/fs
TODO: why does it build gem5.debug and gem5.fast, but then runs an /opt/ test?
So note how this would both:
build the gem5 executables, e.g. build/X86/gem5.debug
run the tests, e.g. build/X86/tests/opt/quick/fs
Or get the command to run all tests for all archs:
./util/regress -n all
Then, if you just want to run one of those types of tests, e.g. the quick X86 ones you can copy paste the scons just for that tests:
scons --ignore-style build/X86/tests/opt/quick/se
Running the tests with an out of tree build works as usual by magically parsing the target path: How to build gem5 out of tree?
scons --ignore-style /any/path/that/you/want/build/X86/tests/opt/quick/se
or you can pass the --build-dir option to util/regress:
./util/regress --build-dir /any/path/that/you/want all
The tests that boot Linux on the other hand require a Linux image with a specific name in the M5_PATH, which is also annoying.
This would however be very slow, not something that you can run after each commit: you are more likely to want to run just the quick tests for your ISA of interest.
Pre-2019 regression tests: Run just one test
If you just append the path under tests in the source tree to the test commands, it runs all the tests under a given directory.
For example, we had:
scons --ignore-style build/X86/tests/opt/quick/se
and we notice that the following path exists under tests in the source tree:
quick/se/00.hello/ref/x86/linux/simple-atomic/
so we massage the path by removing ref to get the final command:
scons build/X86/tests/opt/quick/se/00.hello/x86/linux/simple-atomic
Pre-2019 regression tests: Find out the exact gem5 CLI of the command run
When you run the tests, they output to stdout the m5out path.
Inside the m5out path, there is a simout with the emulator stdout, which contains the full gem5 command line used.
For example:
scons --ignore-style build/X86/tests/opt/quick/se
outputs:
Running test in /any/path/that/you/want/build/ARM/tests/opt/quick/se/00.hello/arm/linux/simple-atomic.
and the file:
/any/path/that/you/want/build/ARM/tests/opt/quick/se/00.hello/arm/linux/simple-atomic
contains:
command line: /path/to/mybuild/build/ARM/gem5.opt \
-d /path/to/mybuild/build/ARM/tests/opt/quick/fs/10.linux-boot/arm/linux/realview-simple-atomic \
--stats-file 'text://stats.txt?desc=False' \
-re /path/to/mysource/tests/testing/../run.py \
quick/fs/10.linux-boot/arm/linux/realview-simple-atomic
Pre-2019 regression tests: Re-run just one test
If you just run a test twice, e.g. with:
scons build/ARM/tests/opt/quick/fs/10.linux-boot/arm/linux/realview-simple-atomic
scons build/ARM/tests/opt/quick/fs/10.linux-boot/arm/linux/realview-simple-atomic
the second run will not really re-run the test, but rather just compare the stats from the previous run.
To actually re-run the test, you must first clear the stats generated from the previous run before re-running:
rm -rf build/ARM/tests/opt/quick/fs/10.linux-boot/arm/linux/realview-simple-atomic
Pre-2019 regression tests: Get test results
Even this is messy... scons does not return 0 success and 1 for failure, so you have to parse the logs. One easy way it to see:
scons --ignore-style build/X86/tests/opt/quick/se |& grep -E '^\*\*\*\*\* '
which contains three types of results: PASSSED, CHANGED or FAILED
CHANGED is mostly for stat comparisons that had a great difference, but those are generally very hard to maintain and permanently broken, so you should focus on FAILED
Note that most tests currently rely on SPEC2000 and fail unless you have access to this non-free benchmark...
Unit tests
The unit tests, which compile to separate executables from gem5, and just test a tiny bit of the code.
There are currently two types of tests:
UnitTest: old and deprecated, should be converted to GTest
GTest: new and good. Uses Google Test.
Placed next to the class that they test, e.g.:
src/base/cprintf.cc
src/base/cprintf.hh
src/base/cprintftest.cc
Compile and run all the GTest unit tests:
scons build/ARM/unittests.opt
Sample output excerpt:
build/ARM/base/cprintftest.opt --gtest_output=xml:build/ARM/unittests.opt/base/cprintftest.xml
Running main() from gtest_main.cc
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from CPrintf
[ RUN ] CPrintf.Misc
[ OK ] CPrintf.Misc (0 ms)
[ RUN ] CPrintf.FloatingPoint
[ OK ] CPrintf.FloatingPoint (0 ms)
[ RUN ] CPrintf.Types
[ OK ] CPrintf.Types (0 ms)
[ RUN ] CPrintf.SpecialFormatting
[ OK ] CPrintf.SpecialFormatting (0 ms)
[----------] 4 tests from CPrintf (0 ms total)
[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (0 ms total)
[ PASSED ] 4 tests.
Compile and run just one test file:
scons build/ARM/base/cprintf.test.opt
./build/ARM/base/cprintf.test.opt
List available test functions from a test file, and run just one of them:
./build/ARM/base/cprintftest.opt --gtest_list_tests
./build/ARM/base/cprintftest.opt SpecialFormatting
Tested on gem5 200281b08ca21f0d2678e23063f088960d3c0819, August 2018.
Unit tests with SimObjects
As of 2019, the unit tests are quite limited, because devs haven't yet found a proper way to test SimObjects in isolation, which make up the bulk of the simulator, and are tightly bound to the rest of the simulator. This unmerged patch attempted to address that: https://gem5-review.googlesource.com/c/public/gem5/+/15315
It might be possible to work around it with Google Mock, which is already present in-tree, but it is not clear if anyone has the patience to mock out enough of SimObject to actually make such tests.
I believe the only practical solution is to embed all tests into gem5.opt, and then have a --test <testname> option that runs tests instead of running simulating. This way we get a single binary without duplicating binary sizes, but can still access everything.
Related issue: https://gem5.atlassian.net/browse/GEM5-433
Continuous integration
20.1 Nightlies enabled
As mentioned at: https://www.gem5.org/project/2020/10/01/gem5-20-1.html a Jenkins that runs the long regressions was added at: https://jenkins.gem5.org/job/Nightly/
2019 CI
Around 2019-04 a precommit CI that runs after every pull request after the maintainer gives +1.
It uses a magic semi-internal Google provided Jenkins setup called Kokoro which provides low visibility on configuration.
See this for example: https://gem5-review.googlesource.com/c/public/gem5/+/18108 That server does not currently run nightlies. The entry point is tests/jenkins/presubmit.sh.
Nightlies were just disabled to start with.
What is the environment of the 2019 CI?
In-tree Docker images are used: https://askubuntu.com/questions/350475/how-can-i-install-gem5/1275773#1275773
Pre-2019 CI update
here was a server running somewhere that runs the quick tests for all archs nightly and posts them on the dev mailing list, adding to the endless noise of that enjoyable list :-)
Here is a sample run: https://www.mail-archive.com/gem5-dev#gem5.org/msg26855.html
As of 2019Q1, gem5 devs are trying to setup an automated magic Google Jenkins to run precommit tests, a link to a prototype can be found at: https://gem5-review.googlesource.com/c/public/gem5/+/17456/1#message-e9dceb1d3196b49f9094a01c54b06335cea4ff88 This new setup uses the new testing system in tests/main.py.
Pre-2019 CI: Why so many tests are CHANGED all the time?
As of August 2018, many tests have been CHANGED for a long time.
This is because stats can vary due to a very wide number of complex
factors. Some of those may be more accurate, others no one knows,
others just bugs.
Changes happen so often that devs haven't found the time to properly
understand and justify them.
If you really care about why they changed, the best advice I have is to bisect them.
But generally your best bet is to just re-run your old experiments on the newer gem5 version, and compare everything there.
gem5 is not a cycle accurate system simulator, so absolute values or
small variations are not meaningful in general.
This also teaches us that results obtained with small margins are
generally not meaningful for publication since the noise is too great.
What that error margin is, I don't know.

What is cfinteractive process in Yocto?

I am facing a problem with my Camera image fetching program stop working. When the program has no response, I captured the following info by ps command:
What is the first process cfinteractive?
cfinteractive is a kernel thread for the Interactive governor of the CPUFreq driver. You can verify that your system is using this governor with:
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
And you can temporarily disable CPU frequency scaling with the command:
echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
You need to do that for each of your CPUs.