Access rights in RISC-V linkerscripts - embedded

When programming ARM-based microcontrollers, I'm used to see a MEMORY{..} segment in the linkerscript like this:
MEMORY
{
FLASH (rx): ORIGIN = 0x08000000, LENGTH = 128K
RAM (xrw): ORIGIN = 0x20000000, LENGTH = 32K
}
The access rights are easy to understand:
r: read
w: write
x: execute
I'm making my first steps in the world of RISC-V based microcontrollers. The GD32VF103CBT6 microcontroller from GigaDevice has the following MEMORY{..} segment in its linkerscript:
MEMORY
{
/* Run in FLASH */
flash (rxai!w) : ORIGIN = 0x08000000, LENGTH = 64k
ram (wxa!ri) : ORIGIN = 0x20000000, LENGTH = 20k
/* Run in RAM */
/* flash (rxai!w) : ORIGIN = 0x20000000, LENGTH = 15k */
/* ram (wxa!ri) : ORIGIN = 0x20003C00, LENGTH = 5K */
}
How should I interpret these access rights?

They're not really "access rights", but rather "what kind of sections may be placed here".
From the GNU LD documentation (with some formatting mangled in the process of quotint :
The attr string must consist only of the following characters:
‘R’
Read-only section
‘W’
Read/write section
‘X’
Executable section
‘A’
Allocatable section
‘I’
Initialized section
‘L’
Same as ‘I’
‘!’
Invert the sense of any of the attributes that follow
If an unmapped section matches any of the listed attributes other than ‘!’, it will be placed in the memory region. The ‘!’ attribute reverses the test for the characters that follow, so that an unmapped section will be placed in the memory region only if it does not match any of the attributes listed afterwards. Thus an attribute string of ‘RW!X’ will match any unmapped section that has either or both of the ‘R’ and ‘W’ attributes, but only as long as the section does not also have the ‘X’ attribute.
With that background, I would interpret your config as follows:
flash (rxai!w) : ORIGIN = 0x08000000, LENGTH = 64k
...means that the "flash" region may contain anything except writeable sections, and
ram (wxa!ri) : ORIGIN = 0x20000000, LENGTH = 20k
...means that the "ram" region may contain anything except read-only and initialized sections.

Related

ELF sections and initialising thread-local storage on baremetal applications

I am porting Musl libc for a baremetal project. Musl only targets ELF executables, so I have to "bootstrap" an ELF environment before handing control to __libc_start_main. The main data structure of interest is the system args/environment/auxiliary vectors. They can be mocked something like this:
long elf_vec[] = {
reinterpret_cast<long>("baremetal"),
0, // Argc/Argv
0, // Environment
AT_PAGESZ, 4096,
AT_UID, 1000,
AT_EUID, 1000,
AT_GID, 1000,
AT_EGID, 1000,
AT_SECURE, false
0, // Auxilliary vector
};
Musl also uses this information to initialise thread-local storage. By inspecting the ELF headers and extracting a TLS section, Musl will ensure that enough thread-local storage is allocated.
What I'm not sure how to do correctly is mock this in a bare-metal environment. To my knowledge, it's not possible to get the linker to directly embed the ELF headers into a program. I can't use an ELF library to extract the headers, as the Boot ROM contains a binary image. The approach I have tried is to create my own ELF headers:
Elf64_Phdr tls;
tls.p_vaddr = reinterpret_cast<Elf64_Addr>(&__tls_start); // linker variable to TLS region
tls.p_type = PT_TLS; // TLS header
tls.p_align = sizeof(uintptr_t);
tls.p_filesz = __tdata_end - __tls_start; // End of tdata region
tls.p_memsz = __tls_end - __tls_start; // End of tbss region
tls.p_offset = 0; // Invalid value
tls.p_flags = 0; // Invalid value
long elf_vec[] = {
/* ... */
AT_PHDR, reinterpret_cast<long>(&tls),
AT_PHNUM, 1,
AT_PHENT, sizeof(tls),
/* ... */
};
It "appears" to work, but I'm not confident in this solution without a lot of testing and auditing the code-base. Is this approach along the lines of what is required? Or am I over-looking a simpler solution to embedding ELF headers into a binary baremetal image?
Another option I've considered is using a second-stage bootloader that can perform ELF image loading, like U-Boot, although the target SoC does not have a vendor supported solution.

RISC-V inline assembly using memory not behaving correctly

This system call code is not working at all. The compiler is optimizing things out and generally behaving strangely:
template <typename... Args>
inline void print(Args&&... args)
{
char buffer[1024];
auto res = strf::to(buffer) (std::forward<Args> (args)...);
const size_t size = res.ptr - buffer;
register const char* a0 asm("a0") = buffer;
register size_t a1 asm("a1") = size;
register long syscall_id asm("a7") = ECALL_WRITE;
register long a0_out asm("a0");
asm volatile ("ecall" : "=r"(a0_out)
: "m"(*(const char(*)[size]) a0), "r"(a1), "r"(syscall_id) : "memory");
}
This is a custom system call that takes a buffer and a length as arguments.
If I write this using global assembly it works as expected, but program code has generally been extraordinarily good if I write the wrappers inline.
A function that calls the print function with a constant string produces invalid machine code:
0000000000120f54 <start>:
start():
120f54: fa1ff06f j 120ef4 <public_donothing-0x5c>
-->
120ef4: 747367b7 lui a5,0x74736
120ef8: c0010113 addi sp,sp,-1024
120efc: 55478793 addi a5,a5,1364 # 74736554 <add_work+0x74615310>
120f00: 00f12023 sw a5,0(sp)
120f04: 00a00793 li a5,10
120f08: 00f10223 sb a5,4(sp)
120f0c: 000102a3 sb zero,5(sp)
120f10: 00500593 li a1,5
120f14: 06600893 li a7,102
120f18: 00000073 ecall
120f1c: 40010113 addi sp,sp,1024
120f20: 00008067 ret
It's not loading a0 with the buffer at sp.
What am I doing wrong?
It's not loading a0 with the buffer at sp.
Because you didn't ask for a pointer as an "r" input in a register. The one and only guaranteed/supported behaviour of T foo asm("a0") is to make an "r" constraint (including +r or =r) pick that register.
But you used "m" to let it pick an addressing mode for that buffer, not necessarily 0(a0), so it probably picked an SP-relative mode. If you add asm comments inside the template like "ecall # 0 = %0 1 = %1 2 = %2" you can look at the compiler's asm output and see what it picked. (With clang, use -no-integrated-as so asm comments in the template come through in the -S output.)
Wrapping a system call does need the pointer in a specific register, i.e. using "r" or +"r"
asm volatile ("ecall # 0=%0 1=%1 2=%2 3=%3 4=%4"
: "=r"(a0_out)
: "r"(a0), "r"(a1), "r"(syscall_id), "m"(*(const char(*)[size]) a0)
: // "memory" unneeded; the "m" input tells the compiler which memory is read
);
That "m" input can be used instead of the "memory" clobber, not instead of an "r" pointer input. (For write specifically, because it only reads that one area of pointed-to memory and has no other side-effects on memory user-space can see, only on kernel write write buffers and file-descriptor positions which aren't C objects this program can access directly. For a read call, you'd need the memory to be an output operand.)
With optimization disabled, compilers do typically pick another register as the base for the "m" input (e.g. 0(a5) for GCC), but with optimization enabled GCC picks 0(a0) so it doesn't cost extra instructions. Clang still picks 0(a2), wasting an instruction to set up that pointer, even though the "=r"(a0_out) is not early-clobber. (Godbolt, with a very cut-down version of the function that doesn't call strf::to, whatever that is, just copies a byte into the buffer.)
Interestingly, with optimization enabled for my cut-down stand-alone version of the function without fixing the bug, GCC and clang do happen to put a pointer to buffer into a0, picking 0(a0) as the template expansion for that operand (see the Godbolt link above). This seems to be a missed optimization vs. using 16(sp); I don't see why they'd need the buffer address in a register at all.
But without optimization, GCC picks ecall # 0 = a0 1 = 0(a5) 2 = a1. (In my simplified version of the function, it sets a5 with mv a5,a0, so it did actually have the address in a0 as well. So it's a good thing you had more code in your function to make it not happen to work by accident, so you could find the bug in your code.)

can't edit integer with .noinit attribute on STM32L476 with custom linker file

I'm doing my first attempt working with linker files. In the end i want to have a variable that keeps it's value after reset. I'm working with an STM32L476.
To achieve this i modified the Linker files: STM32L476JGYX_FLASH.ld and STM32L476JGYX_RAM.ld to include a partition called NOINT.
MEMORY
{
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 96K
RAM2 (xrw) : ORIGIN = 0x10000000, LENGTH = 32K
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 1024K -0x100
NOINIT (rwx) : ORIGIN = 0x8000000 + 1024K - 0x100, LENGTH = 0x100
}
/* Sections */
SECTIONS
{
...
/* Global data not cleared after reset. */
.noinit (NOLOAD): {
KEEP(*(*.noinit*))
} > NOINIT
...
In the main.c i initialize the variable reset_count as a global variable.
__attribute__((section(".noinit"))) volatile uint32_t reset_count = 0;
The =0 part is just for simplification. I actually want to set reset_count to zero somewhere in a function.
When i run the program and step through the initialization i would expect to see the value of reset_count as 0. But somehow i always get 0xFFFFFFFF. It seems like i can't edit the reset_count variable. Can anybody tell me how i can make this variable editable?
It is not clear from the question whether you want to have a variable that keeps its value when power is removed, or just while power stays on but hardware reset is pulsed.
If you want something that keeps its value when power is removed, then your linker script is ok to put the block in flash memory, but you need to use the functions HAL_FLASH_Program etc. to write to it, you can't just make an assignment. In addition, you could simplify the linker script by instead of creating the NOINIT output region, just putting >FLASH.
If you want a variable that just persists across reset wile power stays up then you need to put the variable into SRAM not FLASH, for example like this:
.noinit (NOLOAD) :
{
*(.noinit*)
}
> RAM2
Note that you don't need to use KEEP unless you want to link a section that is unreferenced, which will not be the case if you actually use the variables, and you don't need another * immediately before .noinit unless you section names don't start with a ., which they should.
You will not be able to write to the flash memory as simply as that. If you use ST HAL, there is a flash module that provides HAL_FLASH_Program() function.
Alternatively, if the data you are trying to store is 128 bytes or less and you have an RTC backup battery, you can use the RTC backup registers (RTC_BKPxR) to store your data.

(STM32) Erasing flash and writing to flash gives HAL_FLASH_ERROR_PGP error (using HAL)

Trying to write to flash to store some configuration. I am using an STM32F446ze where I want to use the last 16kb sector as storage.
I specified VOLTAGE_RANGE_3 when I erased my sector. VOLTAGE_RANGE_3 is mapped to:
#define FLASH_VOLTAGE_RANGE_3 0x00000002U /*!< Device operating range: 2.7V to 3.6V */
I am getting an error when writing to flash when I use FLASH_TYPEPROGRAM_WORD. The error is HAL_FLASH_ERROR_PGP. Reading the reference manual I read that this has to do with using wrong parallelism/voltage levels.
From the reference manual I can read
Furthermore, in the reference manual I can read:
Programming errors
It is not allowed to program data to the Flash
memory that would cross the 128-bit row boundary. In such a case, the
write operation is not performed and a program alignment error flag
(PGAERR) is set in the FLASH_SR register. The write access type (byte,
half-word, word or double word) must correspond to the type of
parallelism chosen (x8, x16, x32 or x64). If not, the write operation
is not performed and a program parallelism error flag (PGPERR) is set
in the FLASH_SR register
So I thought:
I erased the sector in voltage range 3
That gives me 2.7 to 3.6v specification
That gives me x32 parallelism size
I should be able to write WORDs to flash.
But, this line give me an error (after unlocking the flash)
uint32_t sizeOfStorageType = ....; // Some uint I want to write to flash as test
HAL_StatusTypeDef flashStatus = HAL_FLASH_Program(TYPEPROGRAM_WORD, address++, (uint64_t) sizeOfStorageType);
auto err= HAL_FLASH_GetError(); // err == 4 == HAL_FLASH_ERROR_PGP: FLASH Programming Parallelism error flag
while (flashStatus != HAL_OK)
{
}
But when I start to write bytes instead, it goes fine.
uint8_t *arr = (uint8_t*) &sizeOfStorageType;
HAL_StatusTypeDef flashStatus;
for (uint8_t i=0; i<4; i++)
{
flashStatus = HAL_FLASH_Program(TYPEPROGRAM_BYTE, address++, (uint64_t) *(arr+i));
while (flashStatus != HAL_OK)
{
}
}
My questions:
Am I understanding it correctly that after erasing a sector, I can only write one TYPEPROGRAM? Thus, after erasing I can only write bytes, OR, half-words, OR, words, OR double words?
What am I missing / doing wrong in above context. Why can I only write bytes, while I erased with VOLTAGE_RANGE_3?
This looks like an data alignment error, but not the one related with 128-bit flash memory rows which is mentioned in the reference manual. That one is probably related with double word writes only, and is irrelevant in your case.
If you want to program 4 bytes at a time, your address needs to be word aligned, meaning that it needs to be divisible by 4. Also, address is not a uint32_t* (pointer), it's a raw uint32_t so address++ increments it by 1, not 4. As far as I know, Cortex M4 core converts unaligned accesses on the bus into multiple smaller size aligned accesses automatically, but this violates the flash parallelism rule.
BTW, it's perfectly valid to perform a mixture of byte, half-word and word writes as long as they are properly aligned. Also, unlike the flash hardware of F0, F1 and F3 series, you can try to overwrite a previously written location without causing an error. 0->1 bit changes are just ignored.

allocPhysPages function in System

I am trying to implement a custom physical allocation for which I need to modify the allocPhysPages() function in the System class. There is this section of code in that function
AddrRange m5opRange(0xffff0000, 0x100000000);
if (m5opRange.contains(next_return_addr)) {
warn("Reached m5ops MMIO region\n");
return_addr = 0xffffffff;
pagePtr_arr[chiplet_id] = 0xffffffff >> PageShift;
}
My question is can I comment out this section of code which checks if the allocated physical address is in the MMIO region or not. I don't want to take care of the MMIO region to reduce the complexity. If I comment this out then would it lead to any kind of failure during simulation?
Note: allocPhysPages() is located in src/sim/system.cc