Mono human readable GC statistics in runtime - mono

Is there a Mono profiler mode similar to Java -Xloggc?
I would like to see a human readable GC report while my application is running. Currently Mono can be run with --profile=log option but the output is in binary format and every time I need to run mprof-report to read it. The output file also contains a lot of info which is not interesting for me.
I tried to reduce the file size by specifying heapshot=14400000ms to collect statistics every few hours but it didn't help a lot. In a week I had few gigabytes log.
I also tried to use "sample" profiler but the overhead was too much.

You can use Mono's trace filters for this. Just set the MONO_LOG_MASK to gc and lower the MONO_LOG_LEVEL. Then run your app normally and you will get the human-readable GC statistics while your app is running:
$ export MONO_LOG_MASK=gc
$ export MONO_LOG_LEVEL=debug
$ mono ... # run your application normally ..
...
# notice the human readable GC output
mono: GC_MAJOR: (LOS overflow) pause 26.00ms, total 26.06ms, bridge 0.00ms major 31472K/0K los 1575K/0K
Mono: GC_MINOR: (Nursery full) pause 2.30ms, total 2.35ms, bridge 0.00ms promoted 31456K major 31456K los 5135K
Mono: GC_MINOR: (Nursery full) pause 2.43ms, total 2.45ms, bridge 0.00ms promoted 31456K major 31456K los 8097K
Mono: GC_MINOR: (Nursery full) pause 1.80ms, total 1.82ms, bridge 0.00ms promoted 31472K major 31472K los 11425K

Related

A full-system simulation running gem5 is missing files

I am trying to run a full system simulation of gem5, but I found that the full system file of X86 on the official website can no longer be downloaded, can someone help me?
thanks
I need a compressed package of x86 full system files“x86-system.tar.bz2”“config-x86.tar.bz2”
Here is a complete tutorial on how to run FS in gem5. Start by downloading your OS image:
https://www.gem5.org/documentation/gem5art/main/disks
from gem5.utils.requires import requires
from gem5.components.boards.x86_board import X86Board
from gem5.components.memory.single_channel import SingleChannelDDR3_1600
from gem5.components.cachehierarchies.ruby.mesi_two_level_cache_hierarchy import (
MESITwoLevelCacheHierarchy,
)
from gem5.components.processors.simple_switchable_processor import (
SimpleSwitchableProcessor,
)
from gem5.coherence_protocol import CoherenceProtocol
from gem5.isas import ISA
from gem5.components.processors.cpu_types import CPUTypes
from gem5.resources.resource import Resource
from gem5.simulate.simulator import Simulator
from gem5.simulate.exit_event import ExitEvent
This runs a check to ensure the gem5 binary is compiled to X86 and supports
requires(
isa_required=ISA.X86, coherence_protocol_required=CoherenceProtocol.MESI_TWO_LEVEL
)
Here we setup a MESI Two Level Cache Hierarchy.
cache_hierarchy = MESITwoLevelCacheHierarchy(
l1d_size="32KiB",
l1d_assoc=8,
l1i_size="32KiB",
l1i_assoc=8,
l2_size="256KiB",
l2_assoc=16,
num_l2_banks=1,
)
Setup the system memory.
Note, by default DDR3_1600 defaults to a size of 8GiB. However, a current
limitation with the X86 board is it can only accept memory systems up to 3GiB.
As such, we must fix the size.
memory = SingleChannelDDR3_1600("2GiB")
Here we setup the processor. This is a special switchable processor in which a starting core type and a switch core type must be specified. Once a configuration is instantiated a user may call processor.switch() to switch from the starting core types to the switch core types. In this simulation we start with TIMING cores to simulate the OS boot, then switch to the O3 cores for the command we wish to run after boot.
processor = SimpleSwitchableProcessor(
starting_core_type=CPUTypes.TIMING, switch_core_type=CPUTypes.O3, num_cores=2
)
Here we setup the board. The X86Board allows for Full-System X86 simulations.
board = X86Board(
clk_freq="3GHz", processor=processor, memory=memory, cache_hierarchy=cache_hierarchy
)
This is the command to run after the system has booted. The first m5 exit will stop the simulation so we can switch the CPU cores from TIMING to O3d and continue the simulation to run the echo command, sleep for a second, then, again, call m5 exit to terminate the simulation. After simulation has ended you may inspect m5out/system.pc.com_1.device to see the echo output.
command = (
"m5 exit;" + "echo 'This is running on O3 CPU cores.';" + "sleep 1;" + "m5 exit;"
)
Here we set the Full System workload. The set_workload function for the X86Board takes a kernel, a disk image, and, optionally, a the contents of the "readfile". In the case of the "x86-ubuntu-18.04-img", a file to be executed as a script after booting the system.
board.set_kernel_disk_workload(
kernel=Resource("x86-linux-kernel-5.4.49"),
disk_image=Resource("x86-ubuntu-18.04-img"),
readfile_contents=command,
)
Here we want override the default behavior for the first m5 exit exit event. Instead of exiting the simulator, we just want to switch the processor. The 2nd 'm5 exit' after will revert to using default behavior where the simulator run will exit.
simulator = Simulator(
board=board,
on_exit_event={
ExitEvent.EXIT: (func() for func in [processor.switch])
},
)
simulator.run()

Cannot see Cache level data movement in Gem5 simulations

I am using the following CLI:
M5_PATH=/home/febin/Storage/Gem5/gem5ist/m5/system/ Gem5/gem5/build/X86/gem5.opt --debug-flags=Cache,Exec,DRAM,TLB Gem5/gem5/configs/example/fs.py --kernel x86_64-vmlinux-2.6.22.9 --num-cpus=64 --num-dirs=64 --caches --elastic-trace-en --num-l2caches=16 --ruby --network=garnet2.0 --topology=Mesh_XY --mesh-rows=8 --command-line="paper3/Blackscholes/blackscholes.out 1 paper3/Blackscholes/in_16.txt paper3/Blackscholes/output.txt" >> paper3/Gem5_fs
I am able to see Exec, DRAM and TLB traces; but I cannot see any data from Cache. Same for SE simulations why is this ?
As mentioned by Daniel, you have to use --debug-flags RubyCache for ruby.
The flag is different because Ruby models the caches itself separately from the classic system.

How to calculate return instructions after a period of the program (based on the aspect of computer hardware)

I've been working on rop recently. When using perf to count hardware information, I want to measure the number of return instructions executed by a given piece of code. But the perf interface only provides branch instructions.
If you're only x86 with a recent Intel CPU:
perf list on my Skylake shows there's a hardware counter for br_inst_retired.near_return. That will count only ret instructions, not other branches. But see erratum SKL091 for branch-instruction counters.
perf stat -e instructions,br_inst_retired.near_return,... ./a.out may be what you're looking for. Or maybe attaching perf stat to an already-running program, or maybe -I 1000 to print accumulated counts over intervals.
But note that if you're looking for ROP gadgets, you can find a C3 opcode inside what normally decodes as some other instruction. So restricting yourself only to ret instructions that actually run during the target program's normal execution is more limiting than it needs to be.
e.g. a 4-byte immediate might usefully decode as something + ret if you jump to the immediate.

Switching the system does not work

I had the following situation: I'm in a live user mode debugging session and I wanted to show the win32k!_W32Process structure. Unfortunately, win32k is a kernel mode SYS file, so the symbols are not available in the user mode session.
I know that I can always load a DLL, EXE or SYS as a dump file and then inspect the symbols. Usually I would do that via File/Open Crash Dump.
This time, I wanted to show the participants of a debugging workshop that it's possible to debug multiple systems at the same time, so I opened the Win32K.sys via WinDbg's command prompt:
0:003> |
. 0 id: 10fc attach name: [...]\NetHeaps.exe
0:003> .opendump C:\Windows\winsxs\[...]\win32k.sys
Loading Dump File [C:\Windows\winsxs\[...]\win32k.sys]
Opened 'C:\Windows\winsxs\[...]\win32k.sys'
||0:0:003>
As we can now see, we have 2 systems and I'm currently on the live debugging system:
||0:0:003> ||
. 0 Live user mode: <Local>
1 Image file: C:\Windows\winsxs\[...]\win32k.sys
I thought I could switch to the other system now, but that does not work:
||0:0:003> ||1s
^ Illegal debuggee error in '||1s'
I would not have worried too much, but it can't find the symbols of win32k in this case:
||0:0:003> .reload
Reloading current modules
...........................
||0:0:003> dt win32k!_W32Process
Symbol win32k!_W32Process not found.
The problem is not in the || command, it's in the .opendump command.
The help says:
After you use the .opendump command, you must use the g (Go) command to finish loading the dump file.
Be aware that this will also run your live process. Therefore, freeze the threads first (~*f) and unfreeze later (~*u).
After that you can switch the system and display the type:
||1:1:004> ||
0 Live user mode: <Local>
. 1 Image file: C:\Windows\winsxs\[...]\win32k.sys
||1:1:004> dt _W32Process
win32k!_W32PROCESS
+0x000 Process : Ptr64 _EPROCESS
+0x008 RefCount : Uint4B
+0x00c W32PF_Flags : Uint4B
[...]

Remote Proc fails to load FreeRTOS Elf

I am using this port of FreeRTOS and I am loading it onto the Cortex-M3 within an OMAP4430. This works fine using the remote proc framework and I am able to use RPMsg to communicate with it.
Sometimes, however, rproc fails to load the elf and gives the following error:
rproc remoteproc1: bad phdr da 0x0 mem 0x10310
rproc remoteproc1: Failed to load program segments: -22
rproc remoteproc1: rproc_boot() failed -22
This seems to happen when the size of the elf file gets too large: this happens when the size is 377331 bytes but does not happen when I simply remove a bunch of print statements and bring the size down to 342563 bytes.
I have tracked the error message down to this piece of code: http://lxr.free-electrons.com/source/drivers/remoteproc/remoteproc_elf_loader.c?v=3.9#L188. It seems that rproc_da_to_va is unable to find a segment in memory large enough to fit the ELF.
How can I make sure that there is enough memory for the size of my ELF? Can I tell the kernel that I specifically want a certain region preallocated for this kind of thing? Is there some way to ensure that this part of my ELF remains small?
Thanks!
Make sure that the FreeRTOS configuration constants configTEXT_SIZE and configDATA_SIZE agree with the amounts demanded by your linker script. For example, if your linker script contains
MEMORY
{
TEXT (rwx) : ORIGIN = 0x00000000, LENGTH = 1M
DATA (rwx) : ORIGIN = 0x80000000, LENGTH = 1M
}
then you should set configTEXT_SIZE and configDATA_SIZE to 0x100000.