I am trying to implement a custom physical allocation for which I need to modify the allocPhysPages() function in the System class. There is this section of code in that function
AddrRange m5opRange(0xffff0000, 0x100000000);
if (m5opRange.contains(next_return_addr)) {
warn("Reached m5ops MMIO region\n");
return_addr = 0xffffffff;
pagePtr_arr[chiplet_id] = 0xffffffff >> PageShift;
}
My question is can I comment out this section of code which checks if the allocated physical address is in the MMIO region or not. I don't want to take care of the MMIO region to reduce the complexity. If I comment this out then would it lead to any kind of failure during simulation?
Note: allocPhysPages() is located in src/sim/system.cc
Related
I have seen that there are quite a few questions about jumping from an app to the ST system bootloader, for example this one. These use the method of setting the MSP and PC then doing the jump with a function pointer.
This seems to cause an issue with the system bootloader dual-bank management whereby the first jump fails and a second jump needs to be done.
My question is - would it be possible/better to use the user option bytes to jump to the bootloader instead?
Since the OB register is read during boot in the OBL phase, if we set both the "nBOOT1 bit" and "nBOOT_SEL bit" and clear the "nBOOT0 bit" then do a soft reset would this avoid the empty check weirdness and let us jump to the bootloader in one go?
(Just for context - this would be the first step of doing updates via CAN as the MCU in question has a CAN bootloader built in)
Thanks in advance!
After some time tinkering with a dev board and with some help from Tilen Majerle I found that this is indeed possible and does work well.
I added the following in my main() while(1) loop so that when the blue button is pressed, the user option bits are modified and a reset is performed.
I found that we don't have to do the soft reset ourselves as the HAL_FLASH_OB_Launch() function triggers the reset for us, after which we should boot into system memory according to the reference manual page 67.
Also I found that the flash and option bytes must be unlocked before setting the option bytes, but not locked afterwards or the reset won't occur.
Here is the code to do it:
if(HAL_GPIO_ReadPin(BUTTON_GPIO_Port, BUTTON_Pin) == GPIO_PIN_RESET)
{
// Basic de-bounce for testing
HAL_Delay(100);
while(HAL_GPIO_ReadPin(BUTTON_GPIO_Port, BUTTON_Pin) == GPIO_PIN_RESET)
{
__NOP();
}
// Read, modify & write user option bits
// nBOOT1 = 1, nBOOT_SEL = 1, nBOOT0 = 0; will select system memory as boot area
uint32_t optBits = FLASH->OPTR;
optBits = (optBits | FLASH_OPTR_nBOOT1 | FLASH_OPTR_nBOOT_SEL);
optBits &= ~(FLASH_OPTR_nBOOT0);
// Unlock flash
HAL_FLASH_Unlock();
// Clear OPTLOCK
HAL_FLASH_OB_Unlock();
// Set up struct with desired bits
FLASH_OBProgramInitTypeDef optionBytesSetting = {0};
optionBytesSetting.OptionType = OPTIONBYTE_USER;
optionBytesSetting.USERConfig = optBits;
optionBytesSetting.USERType = OB_USER_nBOOT0;
// Write Option Bytes
HAL_FLASHEx_OBProgram(&optionBytesSetting);
HAL_Delay(10);
// Soft reset
HAL_FLASH_OB_Launch();
NVIC_SystemReset(); // is not reached
}
I verified that the flash OPTR register is modified correctly (it goes from 0xFFFFFEAA to 0xFBFFFEAA, essentially just the nBOOT0 bit is cleared as the other two bits were already set). The MCU does reset at HAL_FLASH_OB_Launch() as expected and pausing the program reveals that after reset it is running the system bootloader based on the PC address.
I also verified it using STM32CubeProgrammer which allows me to view the PC and option bytes, plus lets me set nBOOT0 back to 1 and boot the board to my app.
As for reverting the OB settings programmatically, you could either use the Write Memory command before jumping to the app, or you could use the Go command to jump to the app then modify the option bytes first thing in your app.
I'm doing my first attempt working with linker files. In the end i want to have a variable that keeps it's value after reset. I'm working with an STM32L476.
To achieve this i modified the Linker files: STM32L476JGYX_FLASH.ld and STM32L476JGYX_RAM.ld to include a partition called NOINT.
MEMORY
{
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 96K
RAM2 (xrw) : ORIGIN = 0x10000000, LENGTH = 32K
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 1024K -0x100
NOINIT (rwx) : ORIGIN = 0x8000000 + 1024K - 0x100, LENGTH = 0x100
}
/* Sections */
SECTIONS
{
...
/* Global data not cleared after reset. */
.noinit (NOLOAD): {
KEEP(*(*.noinit*))
} > NOINIT
...
In the main.c i initialize the variable reset_count as a global variable.
__attribute__((section(".noinit"))) volatile uint32_t reset_count = 0;
The =0 part is just for simplification. I actually want to set reset_count to zero somewhere in a function.
When i run the program and step through the initialization i would expect to see the value of reset_count as 0. But somehow i always get 0xFFFFFFFF. It seems like i can't edit the reset_count variable. Can anybody tell me how i can make this variable editable?
It is not clear from the question whether you want to have a variable that keeps its value when power is removed, or just while power stays on but hardware reset is pulsed.
If you want something that keeps its value when power is removed, then your linker script is ok to put the block in flash memory, but you need to use the functions HAL_FLASH_Program etc. to write to it, you can't just make an assignment. In addition, you could simplify the linker script by instead of creating the NOINIT output region, just putting >FLASH.
If you want a variable that just persists across reset wile power stays up then you need to put the variable into SRAM not FLASH, for example like this:
.noinit (NOLOAD) :
{
*(.noinit*)
}
> RAM2
Note that you don't need to use KEEP unless you want to link a section that is unreferenced, which will not be the case if you actually use the variables, and you don't need another * immediately before .noinit unless you section names don't start with a ., which they should.
You will not be able to write to the flash memory as simply as that. If you use ST HAL, there is a flash module that provides HAL_FLASH_Program() function.
Alternatively, if the data you are trying to store is 128 bytes or less and you have an RTC backup battery, you can use the RTC backup registers (RTC_BKPxR) to store your data.
I'm trying to understand how the entire L1/L2 flushing works. Suppose I have a compute shader like this one
layout(std430, set = 0, binding = 2) buffer Particles{
Particle particles[];
};
layout(std430, set = 0, binding = 4) buffer Constraints{
Constraint constraints[];
};
void main(){
const uint gID = gl_GlobalInvocationID.x;
for (int pass=0;pass<GAUSS_SEIDEL_PASSES;pass++){
// first query the constraint, which contains particle_id_1 and particle_id_1
const Constraint c = constraints[gID*GAUSS_SEIDEL_PASSES+pass];
// read newest positions
vec3 position1 = particles[c.particle_id_1].position;
vec3 position2 = particles[c.particle_id_2].position;
// modify position1 and position2
position1 += something;
position2 -= something;
// update positions
particles[c.particle_id_1].position = position1;
particles[c.particle_id_2].position = position2;
// in the next iteration, different constraints may use the updated positions
}
}
From what I understand, initially all data resides in L2. When I read particles[c.particle_id_1].position I copy some of the data from L2 to L1 (or directly to a register).
Then in position1 += something I modify L1 (or the register). Finally in particles[c.particle_id_2].position = position1, I flush the data from L1 (or a register) back to L2, right? So if I then have a second compute shader that I want to run afterward this one, and that second shader will read positions of particles, I do not need to synchronize Particles. It would be enough to just put an execution barrier, without memory barrier
void vkCmdPipelineBarrier(
VkCommandBuffer commandBuffer,
VkPipelineStageFlags srcStageMask, // here I put VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT
VkPipelineStageFlags dstStageMask, // here I put VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT
VkDependencyFlags dependencyFlags, // here nothing
uint32_t memoryBarrierCount, // here 0
const VkMemoryBarrier* pMemoryBarriers, // nullptr
uint32_t bufferMemoryBarrierCount, // 0
const VkBufferMemoryBarrier* pBufferMemoryBarriers, // nullptr
uint32_t imageMemoryBarrierCount, // 0
const VkImageMemoryBarrier* pImageMemoryBarriers); // nullptr
Vulkan's memory model does not care about "caches" as caches. Its model is built on the notion of availability and visibility. A value produced by GPU command/stage A is "available" to GPU command/stage B if the command/stage A has an execution dependency with command/stage B. A value produced by GPU command/stage A is "visible" to GPU command/stage B if command/stage A has a memory dependency with command/stage B with regard to the particular memory in question and the access modes that A wrote it and B will access it.
If a value is not both available and visible to a command/stage, then attempting to access it yields undefined behavior.
The implementation of availability and visibility will involve clearing caches and the like. But as far as the Vulkan memory model is concerned, this is an implementation detail it doesn't care about. Nor should you: understand the Vulkan memory model and write code that works within it.
Your pipeline barrier creates an execution dependency, but not a memory dependency. Therefore, values written by CS processes before the barrier are available to CS processes afterwards, but not visible to them. You need to have a memory dependency to establish visibility.
However, if you want a GPU level understanding... it all depends on the GPU. Does the GPU have a cache hierarchy, an L1/L2 split? Maybe some do, maybe not.
It's kind of irrelevant anyway, because merely writing a value to an address in memory is not equivalent to a "flush" of the appropriate caches around that memory. Even using the coherent qualifier would only cause a flush for compute shader operations executing within that same dispatch call. It would not be guaranteed to affect later dispatch calls.
Implementation-dependent. For all we know, a device might have no cache at all, or in future it might be some quantum magic bs.
Shader assignment operation does not imply anything about anything. There's no "L1" or "L2" mentioned anywhere in the Vulkan specification. It is a concept that does not exist.
Completely divorce ourselves from the cache stuff, and all mental bagage that comes with it.
What is important here is that when you read something, then that thing needs to be "visible to" the reading agent (irrespective of what kind of device you use, and what obscure memory architecture it might have). If it is not "visible to", then you might be reading garbage.
When you write something, this does not happen automatically. The writes are not "visible to" anyone.
First you put your writes into src* part of a memory dependency (e.g. via a pipeline barrier). That will make your writes "available from".
Then you put your reader into dst* that will take all referenced writes that are "available from" and make them "visible to" the second synchronization scope.
If you really want to shoehorn this into a cache system concept, don't think of it as levels of cache. Think of it as separate caches. That something is already in some cache does not mean it is in the particular cache the consumer needs.
Trying to write to flash to store some configuration. I am using an STM32F446ze where I want to use the last 16kb sector as storage.
I specified VOLTAGE_RANGE_3 when I erased my sector. VOLTAGE_RANGE_3 is mapped to:
#define FLASH_VOLTAGE_RANGE_3 0x00000002U /*!< Device operating range: 2.7V to 3.6V */
I am getting an error when writing to flash when I use FLASH_TYPEPROGRAM_WORD. The error is HAL_FLASH_ERROR_PGP. Reading the reference manual I read that this has to do with using wrong parallelism/voltage levels.
From the reference manual I can read
Furthermore, in the reference manual I can read:
Programming errors
It is not allowed to program data to the Flash
memory that would cross the 128-bit row boundary. In such a case, the
write operation is not performed and a program alignment error flag
(PGAERR) is set in the FLASH_SR register. The write access type (byte,
half-word, word or double word) must correspond to the type of
parallelism chosen (x8, x16, x32 or x64). If not, the write operation
is not performed and a program parallelism error flag (PGPERR) is set
in the FLASH_SR register
So I thought:
I erased the sector in voltage range 3
That gives me 2.7 to 3.6v specification
That gives me x32 parallelism size
I should be able to write WORDs to flash.
But, this line give me an error (after unlocking the flash)
uint32_t sizeOfStorageType = ....; // Some uint I want to write to flash as test
HAL_StatusTypeDef flashStatus = HAL_FLASH_Program(TYPEPROGRAM_WORD, address++, (uint64_t) sizeOfStorageType);
auto err= HAL_FLASH_GetError(); // err == 4 == HAL_FLASH_ERROR_PGP: FLASH Programming Parallelism error flag
while (flashStatus != HAL_OK)
{
}
But when I start to write bytes instead, it goes fine.
uint8_t *arr = (uint8_t*) &sizeOfStorageType;
HAL_StatusTypeDef flashStatus;
for (uint8_t i=0; i<4; i++)
{
flashStatus = HAL_FLASH_Program(TYPEPROGRAM_BYTE, address++, (uint64_t) *(arr+i));
while (flashStatus != HAL_OK)
{
}
}
My questions:
Am I understanding it correctly that after erasing a sector, I can only write one TYPEPROGRAM? Thus, after erasing I can only write bytes, OR, half-words, OR, words, OR double words?
What am I missing / doing wrong in above context. Why can I only write bytes, while I erased with VOLTAGE_RANGE_3?
This looks like an data alignment error, but not the one related with 128-bit flash memory rows which is mentioned in the reference manual. That one is probably related with double word writes only, and is irrelevant in your case.
If you want to program 4 bytes at a time, your address needs to be word aligned, meaning that it needs to be divisible by 4. Also, address is not a uint32_t* (pointer), it's a raw uint32_t so address++ increments it by 1, not 4. As far as I know, Cortex M4 core converts unaligned accesses on the bus into multiple smaller size aligned accesses automatically, but this violates the flash parallelism rule.
BTW, it's perfectly valid to perform a mixture of byte, half-word and word writes as long as they are properly aligned. Also, unlike the flash hardware of F0, F1 and F3 series, you can try to overwrite a previously written location without causing an error. 0->1 bit changes are just ignored.
When I read a source about the processes and threads in the operating system, I faced this sentence and it sounded weird to me:
When a program is executed and handled by the processor, it converts into a process. A process needs to use the data and code segment in the memory.
I think the first sentence is true naturally. However, I cannot understand why the process needs to use solely data and code segment?
#include <stdio.h>
x = 10;
y;
int main(void){
int *array = (int*)malloc(sizeof(int) * 4);
printf("x and y are %d %d", x, y);
return 0;
}
I think that when this code is executed, the generated process use bss, data, heap and code segment. In my opinion, a process can benefit from any segment of the memory.
If my thoughts are wrong, can anyone explain the reason ?
A process has to store in memory:
Code.
Heap.
Stack.
Data.
BSS.
Except for really trivial ones, a program will use all these segments. Take a look at wikipedia's explanation of what the segments contain.
I think in the sentence the author didn't want to go into details and refers to Stack/Heap/Data/BSS as the data of your program, not the actual data segment.
This statement is not correct.
When a program is executed and handled by the processor, it converts into a process. A process needs to use the data and code segment in the memory.
A process has to exist before a program can be executed. On many non-eunuch's systems a single process runs multiple program.s
I think that when this code is executed, the generated process use bss, data, heap and code segment. In my opinion, a process can benefit from any segment of the memory.
The LINKER deine program segments. The loader follows the instructions of the linker to create the address space.
"bss, data, heap, and code" is a bad way to envision the address space.
There is:
Executable data
Read only data
Read/write data that can be
initialized
uninitialized
Heap and stack are just read/write data. The operating system cannot even tell what data is stack and what is heap. It's all just memory.