ELF sections and initialising thread-local storage on baremetal applications - embedded

I am porting Musl libc for a baremetal project. Musl only targets ELF executables, so I have to "bootstrap" an ELF environment before handing control to __libc_start_main. The main data structure of interest is the system args/environment/auxiliary vectors. They can be mocked something like this:
long elf_vec[] = {
reinterpret_cast<long>("baremetal"),
0, // Argc/Argv
0, // Environment
AT_PAGESZ, 4096,
AT_UID, 1000,
AT_EUID, 1000,
AT_GID, 1000,
AT_EGID, 1000,
AT_SECURE, false
0, // Auxilliary vector
};
Musl also uses this information to initialise thread-local storage. By inspecting the ELF headers and extracting a TLS section, Musl will ensure that enough thread-local storage is allocated.
What I'm not sure how to do correctly is mock this in a bare-metal environment. To my knowledge, it's not possible to get the linker to directly embed the ELF headers into a program. I can't use an ELF library to extract the headers, as the Boot ROM contains a binary image. The approach I have tried is to create my own ELF headers:
Elf64_Phdr tls;
tls.p_vaddr = reinterpret_cast<Elf64_Addr>(&__tls_start); // linker variable to TLS region
tls.p_type = PT_TLS; // TLS header
tls.p_align = sizeof(uintptr_t);
tls.p_filesz = __tdata_end - __tls_start; // End of tdata region
tls.p_memsz = __tls_end - __tls_start; // End of tbss region
tls.p_offset = 0; // Invalid value
tls.p_flags = 0; // Invalid value
long elf_vec[] = {
/* ... */
AT_PHDR, reinterpret_cast<long>(&tls),
AT_PHNUM, 1,
AT_PHENT, sizeof(tls),
/* ... */
};
It "appears" to work, but I'm not confident in this solution without a lot of testing and auditing the code-base. Is this approach along the lines of what is required? Or am I over-looking a simpler solution to embedding ELF headers into a binary baremetal image?
Another option I've considered is using a second-stage bootloader that can perform ELF image loading, like U-Boot, although the target SoC does not have a vendor supported solution.

Related

can't edit integer with .noinit attribute on STM32L476 with custom linker file

I'm doing my first attempt working with linker files. In the end i want to have a variable that keeps it's value after reset. I'm working with an STM32L476.
To achieve this i modified the Linker files: STM32L476JGYX_FLASH.ld and STM32L476JGYX_RAM.ld to include a partition called NOINT.
MEMORY
{
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 96K
RAM2 (xrw) : ORIGIN = 0x10000000, LENGTH = 32K
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 1024K -0x100
NOINIT (rwx) : ORIGIN = 0x8000000 + 1024K - 0x100, LENGTH = 0x100
}
/* Sections */
SECTIONS
{
...
/* Global data not cleared after reset. */
.noinit (NOLOAD): {
KEEP(*(*.noinit*))
} > NOINIT
...
In the main.c i initialize the variable reset_count as a global variable.
__attribute__((section(".noinit"))) volatile uint32_t reset_count = 0;
The =0 part is just for simplification. I actually want to set reset_count to zero somewhere in a function.
When i run the program and step through the initialization i would expect to see the value of reset_count as 0. But somehow i always get 0xFFFFFFFF. It seems like i can't edit the reset_count variable. Can anybody tell me how i can make this variable editable?
It is not clear from the question whether you want to have a variable that keeps its value when power is removed, or just while power stays on but hardware reset is pulsed.
If you want something that keeps its value when power is removed, then your linker script is ok to put the block in flash memory, but you need to use the functions HAL_FLASH_Program etc. to write to it, you can't just make an assignment. In addition, you could simplify the linker script by instead of creating the NOINIT output region, just putting >FLASH.
If you want a variable that just persists across reset wile power stays up then you need to put the variable into SRAM not FLASH, for example like this:
.noinit (NOLOAD) :
{
*(.noinit*)
}
> RAM2
Note that you don't need to use KEEP unless you want to link a section that is unreferenced, which will not be the case if you actually use the variables, and you don't need another * immediately before .noinit unless you section names don't start with a ., which they should.
You will not be able to write to the flash memory as simply as that. If you use ST HAL, there is a flash module that provides HAL_FLASH_Program() function.
Alternatively, if the data you are trying to store is 128 bytes or less and you have an RTC backup battery, you can use the RTC backup registers (RTC_BKPxR) to store your data.

Why is the pCode type const uint32_t*? (pCode in VkShaderModuleCreateInfo )?

I just read the Shader Modules Vulkan tutorial, and I didn't understand something.
Why is createInfo.pCode a uint32_t rather than unsigned char or uint8_t? Is it faster? (because moving pointer is now 4 bytes)?
VkShaderModule createShaderModule(const std::vector<char>& code) {
VkShaderModuleCreateInfo createInfo{};
createInfo.sType = VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO;
createInfo.codeSize = code.size();
createInfo.pCode = reinterpret_cast<const uint32_t*>(code.data());
VkShaderModule shaderModule;
if (vkCreateShaderModule(device, &createInfo, nullptr, &shaderModule) != VK_SUCCESS) {
throw std::runtime_error("failed to create shader module!");
}
}
A SPIR-V module is defined a stream of 32-bit words. Passing in data using a uint32_t pointer tells the driver that the data is 32-bit aligned and allows the shader compiler in the driver to directly access the data using aligned 32-bit loads.
This is usually faster (in most CPU designs) than random unaligned access, in particular for cases that cross cache lines.
This is also more portable C/C++. Using direct unaligned memory access instructions is possible in most CPU architectures, but not standard in the language. The portable alternative using a byte stream requires assembling 4 byte loads and merging them, which is less efficient than just making a direct aligned word load.
Note using reinterpret_cast here assumes the data is aligned correctly. For the base address of std::vector it will work (data allocated via new must be aligned enough for the largest supported primitive type), but it's one to watch out for if you change where the code comes from in future.
According to VUID-VkShaderModuleCreateInfo-pCode-parameter, the specification requires that:
pCode must be a valid pointer to an array of 4/codeSize uint32_t values
pCode being a uint32_t* allows this requirement to be partially validated by the compiler, making it one less thing to worry about for users of the API.
Because SPIR-V word is 32 bits.
It should not matter for performance. It is just a type.

How do I map a file into the virtual memory manager in OSX?

I am trying to map a file into OS X's virtual memory manager. How do I do this on Mac OS X using Objective C?
Use mmap. e.g.
FILE* f = fopen(...);
// Map the file into memory.
// Need the file size.
fseek(f, 0, SEEK_END); // seek to end of file
off_t fileSize = ftello(f); // get current file pointer
fseek(f, 0, SEEK_SET); // seek back to beginning of file
mappedSize = fileSize;
mappedAddress = mmap(0, _mappedSize, PROT_READ, MAP_PRIVATE, f->_file, 0);
... use mappedAddress as a pointer to your data
// Finally free up
munmap(_mappedAddress, _mappedSize);
fclose(f);
Using mmap() works, of course. Another option, given that you're using Cocoa, is to use NSData or NSMutableData. You can create the data object using -initWithContentsOfURL:options:error: with NSDataReadingMappedIfSafe or NSDataReadingMappedAlways in the options. There are two different options because mapping a file is not necessarily safe. If the file is on a file system that may disappear spontaneously (network file system, removable drive), then having it mapped opens your app to crashes. The former option only maps if that's not likely to happen. Otherwise, it reads the data into memory. The latter option always maps, leaving it to you to cope with the potential for crashes.

Developing a simple bootloader for an Embedded system

I've been tasked in developing a simple bootloader for an embedded system. We are not running any OS or RTOS so I want it to be really simple.
This code will be stored in a ROM and the processor will begin execution at power on.
My goal is to have a first part written in ASM which would take care of the following operations:
Initialize the processor
Copy the .data segment from ROM to RAM
Clear the .bss segment in RAM
Call main
Main would be obviously written in C and perform higher level operations like self-test etc...
Now what I really don't know how to do is combine these two programs into a single one. I found a crappy tool that basically uses objcopy to gather the .text and .data sections from executables and appends some asm in front but this seem to be a really ugly way to do it and I was wondering if someone could point me in the right direction?
You can (in principle) link the object file generated from the assembler code like you would link any object from your program.
The catch is that you need to lay out the generated executable so that your startup code is in the beginning. If you use GNU ld, the way to do that is a linker script.
Primitive setup (not checked for syntax errors):
MEMORY
{
FLASH (RX) : ORIGIN = 0, LENGTH = 256K
RAM (RWX) : ORIGIN = 0x40000000, LENGTH = 4M
}
SECTIONS
{
.bootloader 0 : AT(0) { bootloader.o(.text) } >FLASH AT>FLASH
.text : { _stext = .; *(.text .text.* .rodata .rodata.*); _etext = . } >FLASH AT>FLASH
.data : { _sdata = .; *(.data .data.*); _edata = .; _sdata_load = LOADADDR(.data) } >RAM AT>FLASH
.bss (NOLOAD) { _sbss = .; *(.bss .bss.*); _ebss = . } >RAM
}
The basic idea is to give the linker a rough idea of the memory map, then assign sections from the input files to sections in the final program.
The linker keeps the distinction between the "virtual" and the "load" address for every output section, so you can tell it to generate a binary where the code is relocated for the final addresses, but the layout in the executable is different (here, I tell it to place the .data section in RAM, but append it to the .text section in flash).
Your bootloader can then use the symbols provided (_sdata, _edata, _sdata_load) to find the data section both in RAM and in flash, and copy it.
Final caveat: if your program uses static constructors, you also need a constructor table, and the bootloader needs to call the static constructors.
Simon is right on. There are simpler linker scripts than that that will work just fine for what you are doing but the bottom line is it is the linker that takes the objects and makes the binary, so depending on the linker you are using you have to understand the ways you can tell that linker to do stuff and then have it do it. Unfortunately I dont think there is an industry standard to this you have to go linker by linker and understand them. And certainly with gnu ld there are many very complicated linker scripts out there, some folks live to solve things in the linker.

How do I get the Keil RealView MDK-ARM toolchain to link a region for execution in one area in memory but have it store it in another?

I'm writing a program that updates flash memory. While I'm erasing/writing flash I would like to be executing from RAM. Ideally I'd link my code to an execution region that's stored in flash that on startup I would copy to the RAM location to which it is linked.
I don't include any of the normal generated C/C++ initialization code so I can't just tag my function as __ram.
If I could do the above then the debuggers symbols would be relevant for the copied to RAM code and I'd be able to debug business as usual.
I'm thinking that something along the lines of OVERLAY/RELOC might help but I'm not sure.
Thanks,
Maybe your application code can do it manually. Something like
pSourceAddr = &FunctionInFlash;
pDestAddr = &RamReservedForFunction;
while(pSourceAddr <= (&FunctionInFlash+FunctionSize))
{ *pDestAddr++ = *pSourceAddr++;
};
typedef int (*RamFuncPtr)(int arg1); //or whatever the signature is..
result = ((RamFuncPtr)&RamReservedForFunction)(argument1);
You should be able to get the linker definition file to export symbols for the FunctionInFlash and RamReservedForFunction addresses.