GEM5 Simulation using HMC memory structure - gem5

I am new of GEM5.
I have a problem for GEM5 simulation.
I am working for simulation using HMC memory structure.
And I simulated with the command that:
gem5$./build/X86/gem5.opt ./config/example/hmctest.py
However, this simulation gave me an error like:
panic: Memory Size not divisible by page size
To solve this problem, I referred the web site (https://gem5-review.googlesource.com/c/public/gem5/+/6061 ), but it could not solve my error...
Is there any body who can give me some advice?
I will wait for your help!
Thank you.

I also meet the same problem. After reading the code, I found that this error occurs because there is a bug in HMC.py when defining the address ranges of HMC memory devices and serial links: the end address of an address range is not inclusive in gem5 by default (for more details, you could check addr_range.hh). However, in HMC.py, it considers the end address is inclusive so "-1" is added when defining addr ranges.
The easiest way to solve this problem is remove all "-1" when defining "ser_ranges", "ser_rangeX" and "addr_ranges_vaults" in HMC.py. In this way, you don't need to recompile gem5 at all. I've tried it and it works!

Related

How to interpret Hardware watchdog exceptions on a ESP chip?

For one of our Projects we have a Hardware Watchdog reset which happens on roughly 0.1% of our devices each day, resulting in many unwanted hardware resets.
We are trying to figure out what causes this Hardware Watchdog reset, but have failed to find anything relevant in our code which would result in this behavior.
We are using the Arduino 2.4.2 Version, we are not sure since when the Problem has bugged our solution since we had other issues which have now mainly been resolved.
Luckily our devices send us their reboot reasons when they reconnect, there we are receiving the following:
ResetReason=Hardware Watchdog;ResetInfo=Fatal exception:4 flag:1 (WDT)
epc1:0x40102329 epc2:0x00000000 epc3:0x00000000 excvaddr:0x00000000
depc:0x00000000;
We have looked for any thing, when this through the EspStackTraceDecoder we ended up with:
0x40102329: wDev_ProcessFiq at ??:?
A search looking at varies project which have asked similar questions mostly seemed to include a dns query. But not all, so it seems to be a general issue?
What additional information could we extract that might help us identity the issue?
Some Additional Information
Memory is stable and we have ~15-17Kb of free Heap, depending on the mode and the amount of data queued to send / receive queue.
Our side of the code uses yield, delay etc. so the S/W watchdog should always be fed. This also applies to the Async callback code.
Check whether you are doing any wrong memory read. The main reason for HW WDT is that it can trigger the reset if the software (or) cpu is not working anymore.
your CPU might have been stuck while executing some instructions and does't return.

USB behavior not as expected

I am porting the USB driver from the STM32F4 device to STM32L4 device. It almost works. During the enumeration it sends and receives the information, but the data is not exactly the same as from the "plain" STM Cube generated project. II have the same settings in both project but get the strange results.
I lost a week trying to find the solution, maybe someone here had a similar problem and can help me out. Sorry for the images but there is no other of posting some informations on the SO
As you can see the packets are almost the same, but not identical. After 25th transmition the board stalls and accepts only very limited number of the requests
The both files form the wireshark (in the wireshark and text formats) are here:
https://gitlab.com/diymat/usb-problem/tree/master
The ep* files are form my port, stmcdc* - STM Cube generated one. Both were running using the same hardware.
Is it same clock config ?
Is 48MHz USb clk ok ?
Using non crystal or ext osc to get 48MHz USB commonly lead to issue on large xfer Even on F4 can't say for l4
So it may appear to work with hid and rather short packet but start mis-behaving for larger xfer
So I would like to be of a bit better help, but I would definitely need more information to give you a good direction to head in. That being said, there could be a number of things wrong, so I will try to cover what I can think of. Let's start with some of the most obvious.
Clock configuration/hardware can be causing problems due to faulty components, or incorrect selection of software/hardware. This can cause a number of issues, but this could be a symptom of that.
If you are using the generated FATFS middleware from ST, Small stack size inside of the configuration of the L4 can cause this exact problem. it may work until the PC register receives an erroneous error which could lead to some sort of fault, or in some cases, just a bad result returned inside of the FATFS, or USB peripheral code from ST and it will return a dirty sector on the USB window read from the FATFS, which will result in terminated operations on the peripheral.
I see you're using STM32Cube, so you can edit stack and heap size by opening the file startup_stm32l475xx.s
and edit the two as you see fit for your application.
; Amount of memory (in bytes) allocated for Stack
; Tailor this value to your application needs
; <h> Stack Configuration
; <o> Stack Size (in Bytes) <0x0-0xFFFFFFFF:8>
; </h>
Stack_Size EQU 0x400
AREA STACK, NOINIT, READWRITE, ALIGN=3
Stack_Mem SPACE Stack_Size
__initial_sp
; <h> Heap Configuration
; <o> Heap Size (in Bytes) <0x0-0xFFFFFFFF:8>
; </h>
Heap_Size EQU 0x200
Try increasing the stack size and see what happens. Good luck on finding your solution!

Enabling 10 Hz sampling rate in Ublox modules

I'm using ublox NEO-M8N-0-01 GNSS module.
This module supports up to 5Hz GPS+GLONASS and 10 Hz GPS only.
However, when I try to change the sampling rate (via UBX-CFG-RATE in the messages view) I can only increase it to 5 Hz (Measurement period = 200ms). Any value below 200ms is impossible (changes the box to pink).
It happens even if I only produce NMEA message GxGGA.
The way I made it only GPS was via UBX-CFG-GNSS
Has anyone encountered this issue?
Thanks in advance
Roi Yozevitch
You don't say how you are setting the rate however going by your description I'm assuming you are using the ublox u-center software.
There is a simple explanation for this issue and a simple solution: Their software has a bug in (or wasn't updated to match the final specification of the part).
The solution is to not use u-center, it's the PC software that's complaining not the receiver. The receiver itself doesn't care what the spec sheet says, it will try it's best to run at whatever rate you request.
Sending commands directly I've managed to get a fairly reliable 10Hz GPS+Glonass. There is the occasional missing point but most of the time it keeps up.
Running GPS only you can get faster than 10Hz. If you play with the settings and restrict it to 8 channels 18-19Hz is fairly reliable. Unfortunately 20Hz is pushing it too far, you end up getting positions at 10Hz.
Obviously when running at these update rates make sure that your baud rate is high enough to cope with the requested messages and rate.

Can't erase the first 256KB of Nor Flash(M29W256GL) using BDI2000

I'm porting u-boot-2013.10 to MPC8306 based board. Previously, I can erase the first several sectors of Nor Flash using BDI2000. But after sometime, when the porting task is nearly done, (I mean that I can use gdb to trace the code execution and find the u-boot code runs into command line mainloop, though there are no serial output at the time, due to error configure of Serial Port)the first 256KB of Nor Flash can't be erased even if after power off reset. Other sectors can be erased normally.
The Nor Flash is Micron M29W256GL, with block size 128KB. I'm sure the WP# Pin has been pulled high, so there is no hardware protection upon the first block.
When config the jumper on board to change the PowerPC Config Word in order not to let MPC8306 fetch boot code at power up, the problem remains.
I used to run u-boot-1.1.6 on this board, I have erased this version of u-boot so many times without the problem mentioned above. I guess u-boot-2013.10 made some new approach to flash manipulation or others, for example, non-volatile protection on first 256KB of flash.
Is there someone can help me to solve the problem? I would very much appreciate your help.

How does one use dynamic recompilation?

It came to my attention some emulators and virtual machines use dynamic recompilation. How do they do that? In C i know how to call a function in ram using typecasting (although i never tried) but how does one read opcodes and generate code for it? Does the person need to have premade assembly chunks and copy/batch them together? is the assembly written in C? If so how do you find the length of the code? How do you account for system interrupts?
-edit-
system interrupts and how to (re)compile the data is what i am most interested in. Upon more research i heard of one person (no source available) used js, read the machine code, output js source and use eval to 'compile' the js source. Interesting.
It sounds like i MUST have knowledge of the target platform machine code to dynamically recompile
Yes, absolutely. That is why parts of the Java Virtual Machine must be rewritten (namely, the JIT) for every architecture.
When you write a virtual machine, you have a particular host-architecture in mind, and a particular guest-architecture. A portable VM is better called an emulator, since you would be emulating every instruction of the guest-architecture (guest-registers would be represented as host-variables, rather than host-registers).
When the guest- and host-architectures are the same, like VMWare, there are a ton of (pretty neat) optimizations you can do to speed up the virtualization - today we are at the point that this type of virtual machine is BARELY slower than running directly on the processor. Of course, it is extremely architecture-dependent - you would probably be better off rewriting most of VMWare from scratch than trying to port it.
It's quite possible - though obviously not trivial - to disassemble code from a memory pointer, optimize the code in some way, and then write back the optimized code - either to the original location or to a new location with a jump patched into the original location.
Of course, emulators and VMs don't have to RE-write, they can do this at load-time.
This is a wide open question, not sure where you want to go with it. Wikipedia covers the generic topic with a generic answer. The native code being emulated or virtualized is replaced with native code. The more the code is run the more is replaced.
I think you need to do a few things, first decide if you are talking about an emulation or a virtual machine like a vmware or virtualbox. An emulation the processor and hardware is emulated using software, so the next instruction is read by the emulator, the opcode pulled apart by code and you determine what to do with it. I have been doing some 6502 emulation and static binary translation which is dynamic recompilation but pre processed instead of real time. So your emulator may take a LDA #10, load a with immediate, the emulator sees the load A immediate instruction, knows it has to read the next byte which is the immediate the emulator has a variable in the code for the A register and puts the immediate value in that variable. Before completing the instruction the emulator needs to update the flags, in this case the Zero flag is clear the N flag is clear C and V are untouched. But what if the next instruction was a load X immediate? No big deal right? Well, the load x will also modify the z and n flags, so the next time you execute the load a instruction you may figure out that you dont have to compute the flags because they will be destroyed, it is dead code in the emulation. You can continue with this kind of thinking, say you see code that copies the x register to the a register then pushes the a register on the stack then copies the y register to the a register and pushes on the stack, you could replace that chunk with simply pushing the x and y registers on the stack. Or you may see a couple of add with carries chained together to perform a 16 bit add and store the result in adjacent memory locations. Basically looking for operations that the processor being emulated couldnt do but is easy to do in the emulation. Static binary translation which I suggest you look into before dynamic recompilation, performs this analysis and translation in a static manner, as in, before you run the code. Instead of emulating you translate the opcodes to C for example and remove as much dead code as you can (a nice feature is the C compiler can remove more dead code for you).
Once the concept of emulation and translation are understood then you can try to do it dynamically, it is certainly not trivial. I would suggest trying to again doing a static translation of a binary to the machine code of the target processor, which a good exercise. I wouldnt attempt dynamic run time optimizations until I had succeeded in performing them statically against a/the binary.
virtualization is a different story, you are talking about running the same processor on the same processor. So x86 on an x86 for example. the beauty here is that using non-old x86 processors, you can take the program being virtualized and run the actual opcodes on the actual processor, no emulation. You setup traps built into the processor to catch things, so loading values in AX and adding BX, etc these all happen at real time on the processor, when AX wants to read or write memory it depends on your trap mechanism if the addresses are within the virtual machines ram space, no traps, but lets say the program writes to an address which is the virtualized uart, you have the processor trap that then then vmware or whatever decodes that write and emulates it talking to a real serial port. That one instruction though wasnt realtime it took quite a while to execute. What you could do if you chose to is replace that instruction or set of instructions that write a value to the virtualized serial port and maybe have then write to a different address that could be the real serial port or some other location that is not going to cause a fault causing the vm manager to have to emulate the instruction. Or add some code in the virtual memory space that performs a write to the uart without a trap, and have that code instead branch to this uart write routine. The next time you hit that chunk of code it now runs at real time.
Another thing you can do is for example emulate and as you go translate to a virtual intermediate bytcode, like llvm's. From there you can translate from the intermediate machine to the native machine, eventually replacing large sections of program if not the whole thing. You still have to deal with the peripherals and I/O.
Here's an explaination of how they are doing dynamic recompilation for the 'Rubinius' Ruby interpteter:
http://www.engineyard.com/blog/2010/making-ruby-fast-the-rubinius-jit/
This approach is typically used by environments with an intermediate byte code representation (like Java, .net). The byte code contains enough "high level" structures (high level in terms of higher level than machine code) so that the VM can take chunks out of the byte code and replace it by a compiled memory block. The VM typically decide which part is getting compiled by counting how many times the code was already interpreted, since the compilation itself is a complex and time-consuming process. So it is usefull to only compile the parts which get executed many times.
but how does one read opcodes and generate code for it?
The scheme of the opcodes is defined by the specification of the VM, so the VM opens the program file, and interprets it according to the spec.
Does the person need to have premade assembly chunks and copy/batch them together? is the assembly written in C?
This process is an implementation detail of the VM, typically there is a compiler embedded, which is capable to transform the VM opcode stream into machine code.
How do you account for system interrupts?
Very simple: none. The code in the VM can't interact with real hardware. The VM interact with the OS, and transfer OS events to the code by jumping/calling specific parts inside the interpreted code. Every event in the code or from the OS must pass the VM.
Also hardware virtualization products can use some kind of JIT. A typical use cases in the X86 world is the translation of 16bit real mode code to 32 or 64bit protected mode code to not to be forced to emulate a CPU in real mode. Also a software-only VM replaces jump instructions in the executing code by jumps into the VM control software, which at each branch the following code path for jump instructions scans and them replace, before it jumps to the real code destination. But I doubt if the jump replacement qualifies as JIT compilation.
IIS does this by shadow copying: after compilation it copies assemblies to some temporary place and runs them from temp.
Imagine, that user change some files. Then IIS will recompile asseblies in next steps:
Recompile (all requests handled by old code)
Copies new assemblies (all requests handled by old code)
All new requests will be handled by new code, all requests - by old.
I hope this'd be helpful.
A virtual Machine loads "byte code" or "intermediate language" and not machine code therefore, I suppose, that it just recompiles the byte code more efficiently once it has more runtime data.
http://en.wikipedia.org/wiki/Just-in-time_compilation