spec2017 gem5 x86 MOVNTDQ tried to write to umapped address - gem5

I am running spec2017 on Gem5 (X86 Arch) in SE mode, but I found some benchmarks, like 549.fotonik3d_r, will met this problem:
build/X86/arch/generic/debugfaults.hh:145: warn: MOVNTDQ: Ignoring non-temporal hint, modeling as cacheable!
build/X86/arch/x86/faults.cc:165: panic: Tried to write unmapped address 0x7ffff7fff048.
PC: (0x51d240=>0x51d249).(1=>2), Instr: MOVNTDQ_M_XMM : cda DS:[rdi + 0x2008]
Memory Usage: 16945308 KBytes
Program aborted at tick 661950210922
--- BEGIN LIBC BACKTRACE ---
/home/qishao/Project/gem5/build/X86/gem5.opt(+0x77d320)[0x560f34385320]
/home/qishao/Project/gem5/build/X86/gem5.opt(+0x7a3a23)[0x560f343aba23]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f9deff71520]
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f9deffc5a7c]
But I can run it on X86KvmCPU, failed in X86Atomic or X86Timing CPU. I don't know which part goes wrong, the way I compile spec2017 or the way it runs in SE mode.
Thanks for your help.

It is due to this address is beyond stack region thus I expend max stack size in src/arch/x86/process.cc. After that, I met another bug, with address 0x7fff_ffff_ffff_0048, which is larger than current stack base 0x7fff_ffff_ffff_0000. Thus I implement a similar code to make stack grow inversely to avoid this problem, shown in the following. Now, it seems worked and I run in single thread mode, so stack might can work this way. But I wonder how others get pass through this problem.
## -445,6 +450,18 ## MemState::fixupFault(Addr vaddr)
return true;
}
+ if (vaddr > _stackBase) {
+ while (vaddr > _stackBase) {
+ DPRINTF(Vma,"warning: inversely increase stack base %0#x to avoid addr %0#x assert.",
+ vaddr, _stackBase, _stackBase);
+ _stackBase += _pageBytes;
+ _maxStackSize+=_pageBytes;
+ _ownerProcess->allocateMem(_stackBase, _pageBytes);
+ inform("Increasing stack size by one page.");
+ }
+ return true;
+ }
+
return false;

Related

At which address is qemu expecting to find the image?

I'm working with the qemu riscv32 emulator. I have managed to boot a simple hello-world image I have got from github, however I haven't managed to boot my own image. I suspect this is because I built my image without a linker script, therefore it is being loaded at the wrong address. I'm trying to understand how the qemu boot sequence works to fix this.
This is the linker script I'm using
OUTPUT_ARCH( "riscv" )
OUTPUT_FORMAT("elf32-littleriscv")
ENTRY( _start )
SECTIONS
{
/* text: test code section */
. = 0x20400000;
.text : { *(.text) }
/* gnu_build_id: readonly build identifier */
.gnu_build_id : { *(.note.gnu.build-id) }
/* rodata: readonly data segment */
.rodata : { *(.rodata) }
/* data: Initialized data segment */
. = 0x80000000;
.data : { *(.data) }
.sdata : { *(.sdata) }
.debug : { *(.debug) }
. += 0x1000;
stack_top = .;
/* End of uninitalized data segement */
_end = .;
}
And this is the qemu command I'm executing:
qemu-system-riscv32 -nographic -machine sifive_e -bios none -kernel hello
# with -s -S when debugging
The source code is not very relevant, it is just a small assembly file that writes "hello".
My main question is:
How can I know at which address is qemu expecting to find the image?
Other questions I would like to answer:
With gdb, I have noticed that qemu starts executing at address 0x1004 (before me doing anything). I was expecting it to be 0x0. Why is this?
I have read hat qemu can use U-boot. Does it use it, or any other bootloader, by default?
If so, is there any way to load an image at address 0x0 without any sort of bootloader intervening? (I ask this for debugging purposes, because the first time you try a new arch. possibly yo want to keep everything as simple as possible)
Does the kernel option just load the provided image, or does it something more? (like loading a Linux kernel and execute the provided image on top of it)
I'm using the sifive_e emulator, therefore I have gone to the SiFive E series datasheet (like this one ) to check the memory map, and find the starting address. This is what I have found:
Those address are very different from those specified in the linker script above. It seems I'm looking at the wrong place, where can I found the SiFive E boot address?
EDIT
With regards to the last question about the memory map, I found the answer. It is explained here (5.16) and here (chapter 6)

How to define multiple errors for -XX:AbortVMOnException

I am depending on libraries that hang for certain errors, which I cannot fix! The offenders are currently StackOverflowError and OutOfMemoryError but there might be more.
I am trying to upgrade the unrecoverable hang to an exit/abort. However, I cannot figure out how to pass multiple different errors to the -XX:AbortVMOnException as only the latest argument is active in:
JAVA_OPTS="-XX:+UnlockDiagnosticVMOptions -XX:AbortVMOnException=java.lang.StackOverflowError -XX:java.lang.OutOfMemoryError" foo
There can be only one value for AbortVMOnException option.
JVM does a substring search when checking if the exception class matches AbortVMOnException value. E.g. -XX:AbortVMOnException=Error will cause the VM to abort on any throwable with Error in its name: java.lang.StackOverflowError, java.lang.OutOfMemoryError, java.lang.NoClassDefFoundError, etc.
To add a custom callback on the desired exception types, you may use the JVM TI approach described in this answer. You'd only need to replace
if (strcmp(class_name + 1, fatal_error_class) == 0) {
with
if (strstr(fatal_error_class, class_name + 1) != NULL) {
and then it will be possible to specify multiple exceptions types that cause VM exit:
java -agentpath:/path/to/libabort.so=java/lang/StackOverflowError,java/lang/OutOfMemoryError ...

Paraview looping with SaveScreenshot in a server is very slow

I mean to get a series of snapshots, at a sequence of time steps, of a layout with two views (one RenderView + one LineChartView).
For this I put together a script, see below.
I do
ssh -X myserver
and there I run
~/ParaView-5.4.1-Qt5-OpenGL2-MPI-Linux-64bit/bin/pvbatch myscript.py
The script is extremely slow to run. I conceive the following reasons/bottlenecks:
Communication of the graphic part (ssh -X) from the remote server to my computer.
Display of graphics in my computer.
Processing in the server.
Is there a way to assess which is the bottleneck, with my current resources?
(For instance, I know I could get a faster communication to assess item 1, but I cannot do that now.)
Is there a way to accelerate pvbatch?
The answer likely depends on my system, but perhaps there are generic actions I can take.
Creation of the layout with two views
...
ans = GetAnimationScene()
time_steps = ans.TimeKeeper.TimestepValues
for istep in range(len(time_steps)) :
tstep = time_steps[istep]
ans.AnimationTime = tstep
fname = "combo" + '-' + '{:08d}'.format(istep) + '.png'
print( "Exporting image " + fname + " for time step " + str(tstep) )
SaveScreenshot(fname, viewLayout1, quality=100)
Why do you need the -X ?
Just set DISPLAY to :0 and do not forward graphics.
The bottleneck is most likely the rendering on your local display. If your server has a X server, you can perform the rendering on your server by setting accordingly the DISPLAY environnement variable as Mathieu explained.
If your server does not have a X server running, then the best option is to build Paraview on your server using either the OSMesa backend or the EGL backend (if you have a compatible graphic card on it).

flash write efm32zg fails with while (DMA->CHENS & DMA_CHENS_CH0ENS)

I am attempting to create a boot loader which allows me to update a processor's software remotely.
I am using keil uvision compiler (V5.20.0.0).
Flash.c, startup_efm32zg.s, startup_efm32zg.c and em_dma.c configured to execute from RAM (code, Zero init data, other data) via their options/properties tabs.
Stack size configured at 0x0000 0800 via the startup_efm32zg.s Configuration Wizard tab.
Using Silicon Labs flash.c and flash.h, removed RAMFUNC as this is redundant to Keil configuration, above.
I modified the flash.c code slightly so it stays in the FLASH_write function (supposedly in RAM) until the DMA is done doing its thing.
I moved the
while (DMA->CHENS & DMA_CHENS_CH0ENS);
line down to the end of the function and added a little wrapper around it like this:
/* Activate channel 0 */
DMA->CHENS = DMA_CHENS_CH0ENS;
if (DMA->CHENS & DMA_CHENS_CH0ENS)
{
/* Start the transfer */
MSC->WRITECMD = MSC_WRITECMD_WRITETRIG;
/* Wait until transfer is done */
while (DMA->CHENS & DMA_CHENS_CH0ENS)
{
//do nothing here
}
}
FLASH_init() is called as part of the initial setup prior to entering my infinite loop.
When called upon to update the flash.....
(1): I disable interrupts.
(2): I call FLASH_erasePage starting at 0x0000 2400. This works.
(3): I call FLASH_write.
FLASH_write(&startAddress, (uint32_t *)flashBuffer, (BLOCK_SIZE/4));
Where:
startAddress = 0x00002400,
flashBuffer = a buffer of type uint8_t flashBuffer[256],
#define BLOCK_SIZE = 256.
It gets stuck here in the function:
while (DMA->CHENS & DMA_CHENS_CH0ENS)
Eventually the debugger execution stops and the Call Stack clears to be left with 0x00000000 and ALL of memory is displayed as 0xAA.
I have set aside 9K of flash for the bootloader. After a build I am told:
Program size: Code=7524 RO-data=304 RW-data=664 ZI-data=3432
Target Memory Options for Target1:
IROM1: Start[0x0] Size[0x2400]
IRAM1: Start[0x20000000] Size:[0x1000]
So .... what on earth is going on? Any help?
One of my other concerns is that it is supposed to be executing from RAM. When I look in the in the Call Stack for the Location/Value for FLASH_write after having stepped into the FLASH_write function I see 0x000008A4. This is flash!(?)
I've tried the whole RAM_FUNC thing, too with the same results.

Magento - fetching products and looping through is getting slower and slower

I'm fetching aroung 6k articles from the Magento database. Traversing through them in beginning is very fast (0 seconds, just some ms) and gets slower and slower. The loop takes about 8 hours to run and in the end each loop in the foreach takes about 16-20 seconds ! It seems like mysql is getting slower and slower in the end, but I cannot explain why.
$product = Mage::getModel('catalog/product');
$data = $product->getCollection()->addAttributeToSelect('*')->addAttributeToFilter('type_id', 'simple');
$num_products = $product->getCollection()->count();
echo 'exporting '.$num_products."\n";
print "starting export\n";
$start_time = time();
foreach ($data as $tProduct) {
// doing some stuff, no sql !
}
Does anyone know why it is so slow ? Would it be faster, just to fetch the ids and selecting each product one by one ?
The memory usage of the script running this code has a constant memory usage of:
VIRT RES SHR S CPU% MEM%
680M 504M 8832 R 90.0 6.3
Regards, Alex
Oh, well, Shot in the dark time. If you are running Magento 1.4.x.x, previous to 1.4.2.0, you have a memory leak that displays exactly this symptom as it eats up more and more memory, leading eventually to memory exhaustion. Profile exports that took 3-8 minutes under 1.3.x.x will now take 2-5 hours if it doesn't throw an error before completion. Another symptom is exports that fail without finishing and without giving any indication of why the window freezes or gives some sort of funky completion message with no output.
The Array Of Death(tm) has been noted and here's the official repair in the new version. Maybe Data Will Flow again!
Excerpt from 1.4.2.0rc1 /lib/Varien/Db/Select.php that has been patched for memory leak
public function __construct(Zend_Db_Adapter_Abstract $adapter)
{
parent::__construct($adapter);
if (!in_array(self::STRAIGHT_JOIN_ON, self::$_joinTypes)) {
self::$_joinTypes[] = self::STRAIGHT_JOIN_ON;
self::$_partsInit = array(self::STRAIGHT_JOIN => false) + self::$_partsInit;
}
}
Excerpt from 1.4.1.1 /lib/Varien/Db/Select.php with memory leak
public function __construct(Zend_Db_Adapter_Abstract $adapter)
{
parent::__construct($adapter);
self::$_joinTypes[] = self::STRAIGHT_JOIN_ON;
self::$_partsInit = array(self::STRAIGHT_JOIN => false) + self::$_partsInit;
}