How to set up a physical memory protection in riscv? - embedded

I am trying to write a small software that uses riscv PMP. I'm using SaxonSoc https://github.com/SpinalHDL/SaxonSoc. Which means, I have access to the hardware description and to simulation waves.
I am trying to understand why this small test is not working ? :
int main (){
volatile uint32_t * volatile mem=(uint32_t * volatile)( SYSTEM_RAM_A_CTRL+ 0x2000) ;
*mem=0x15;
main_println32x("mem :",*mem);
u32 new_pmpcfg0 =1<<7 ; //setting the L bit so restructions can be applied to M mode
new_pmpcfg0 =(new_pmpcfg0) | 3<<3 ; // A=3=NAPOT ; R=W=X=0
u32 new_pmpaddr0=(u32)( SYSTEM_RAM_A_CTRL+ 0x2000) ;
__asm__ volatile ("csrw pmpaddr0, %0"
: /* output: none */
: "r" (new_pmpaddr0) /* input : from register */
: /* clobbers: none */);
__asm__ volatile ("csrw pmpcfg0, %0"
: /* output: none */
: "r" (new_pmpcfg0) /* input : from register */
: /* clobbers: none */);
*mem=0x19; // I expect an exception here
main_println32x("mem :",*mem);
}
Simulation shows that the csr registers are configured correctly :
But unfortunately, I don't get the exception I'm waiting for.
Any idea about what I am missing here ?

My first thought would be that the pmpaddr is not set correctly. As you are using a NAPOT encoding, the start address and the region size is encoded as per Table3.10 in the RiscV privileged specs.
I would suggest to test it by setting pmpaddr ='1 (all ones) to cover the whole address space, so the address you are accessing will be matched against that pmp region. Then it should throw an exception as it violates the access permissions.

Related

Can we have dirty data on l1 cache in gpu?

I've read some of the common write policies in the microarchitecture of GPUs. For most of the GPU the written policy is the same as the below picture (the picture is from the gpgpu-sim manual). based on the below picture I have a question. can we have dirty data on the l1 cache?
The L1 on some GPU architectures is a write-back cache for global accesses. Note that this topic varies by GPU architecture, e.g. for whether global activity is cached in L1.
Speaking generally, then, yes you can have dirty data. By this I mean that the data in the L1 cache is modified (compared to what is otherwise in global space or the L2 cache) and it has not yet been "flushed" or updated into the L2 cache. (You can also have "stale" data - data in the L1 that has not been modified, but is not consistent with the L2.)
We can create a simple proof point for this (dirty data).
The following code, when executed on a cc7.0 device (and probably some other archtectures as well) will not give the expected answer of 1024.
This is due to the fact that the L1, which is a separate entity per SM, is not immediately flushed to the L2. It therefore has "dirty data" by the above definition.
(The code is broken for this reason. Don't use this code. It's just a proof point.)
#include <iostream>
#include <cuda_runtime.h>
constexpr int num_blocks = 1024;
constexpr int num_threads = 32;
struct Lock {
int *locked;
Lock() {
int init = 0;
cudaMalloc(&locked, sizeof(int));
cudaMemcpy(locked, &init, sizeof(int), cudaMemcpyHostToDevice);
}
~Lock() {
if (locked) cudaFree(locked);
locked = NULL;
}
__device__ __forceinline__ void acquire_lock() {
while (atomicCAS(locked, 0, 1) != 0);
}
__device__ __forceinline__ void unlock() {
atomicExch(locked, 0);
}
};
__global__ void counter(Lock lock, int *total) {
if (threadIdx.x == 1) {
lock.acquire_lock();
*total = *total + 1;
// __threadfence(); uncomment this line to fix
lock.unlock();
}
}
int main() {
int *total_dev;
cudaMalloc(&total_dev, sizeof(int));
int total_host = 0;
cudaMemcpy(total_dev, &total_host, sizeof(int), cudaMemcpyHostToDevice);
{
Lock lock;
counter<<<num_blocks, num_threads>>>(lock, total_dev);
cudaDeviceSynchronize();
cudaMemcpy(&total_host, total_dev, sizeof(int), cudaMemcpyDeviceToHost);
std::cout << total_host << std::endl;
}
cudaFree(total_dev);
}
In case there is any further doubt about whether this is a proper proof (e.g. to dispel arguments about things being "optimized into a register" etc.) we can study the resultant sass code. The end of the above kernel has code that looks like this:
/*0130*/ LDG.E.SYS R0, [R4] ; /* 0x0000000004007381 */
// load *total /* 0x000ea400001ee900 */
/*0140*/ IADD3 R7, R0, 0x1, RZ ; /* 0x0000000100077810 */
// add 1 /* 0x004fd00007ffe0ff */
/*0150*/ STG.E.SYS [R4], R7 ; /* 0x0000000704007386 */
// store *total /* 0x000fe8000010e900 */
/*0160*/ ATOMG.E.EXCH.STRONG.GPU PT, RZ, [R2], RZ ; /* 0x000000ff02ff73a8 */
//lock.unlock /* 0x000fe200041f41ff */
/*0170*/ EXIT ;
Since the result register has definitely been stored to the global space, we can infer that if another thread (in another SM) reads an unexpected value in global space for *total it must be due to the fact that the store from another SM has not reached the L2, i.e. has not reached device-wide consistency/coherency. Therefore the data in some other SM is "dirty". We can (presumably) rule out the "stale" case here (the data in the other L1 was written, but I have "old" data in my L1) because the global load indicated above does not happen until the lock is acquired in the SM.
Note that the above code "fails" on cc7.0 devices (and probably some other device architectures). It does not necessarily fail on the GPU you are using. But it is still "broken".

FIONBIO for ioctl under Fusion

I'm working on a project based on uCOS and the Fusion standard (rather than POSIX) and I want to set my socket into non-blocking mode. The POSIX ioctl command would be ioctl(data,FIONBIO, TRUE); but I can't seem to get it going under Fusion.
In the comments of the header fclioctl.h, I see the following:
/*
* The UNIX definition was as follows:
*
* int ioctl( int fd, int cmd, ... )
*
* But since POSIX does not include "ioctl" as part of it's requirements for
* Fusion the format follows more closely to Win32.
*
* TO get information about a device, a handle to the device or a device in
* it's device stack must be obtained.
*/
fclIoResult_t fclIoctl
(
fclHandle_t hDevice, /* Handle to device */
fclIoCode_t nIoControlCode, /* Function to perform */
fclIoBuffer_t pInBuffer, /* Data to the device */
fclIoSize_t nInBufferSize, /* Size of data to the device */
fclIoBuffer_t pOutBuffer, /* Data from the device */
fclIoSize_t nOutBufferSize, /* Size of buffer to receive data */
fclIoSize_t* pnBytesReturned /* Actual number of bytes received */
);
and for 1fclIoCode_t`, i only see:
/*
* IOCTL Types
*/
typedef unsigned char FIO_BYTE;
typedef unsigned int FIO_WORD;
typedef u32 FIO_DWORD;
#ifndef FCL_IOCODE_T
typedef u32 fclIoCode_t;
#define FCL_IOCODE_T fclIoCode_t
#endif
Does anybody have experience with Fusion and may be able to help out here?

STM32 Nucleo EEPROM Emulator ee_init issue

I am working on trying to get the EEPROM Emulator from stm32 working. I have followed the example given for a stm32 l47x board however I am still running into issues. When I call EE_init I end up running into a stack overflow. I am not too familiar with this emulator and am using the default configurations from the example.
This is how I am initializing everything.
EE_Status ee_status = EE_OK;
/* Enable and set FLASH Interrupt priority */
/* FLASH interrupt is used for the purpose of pages clean up under interrupt */
HAL_NVIC_SetPriority(FLASH_IRQn, 0, 0);
HAL_NVIC_EnableIRQ(FLASH_IRQn);
HAL_FLASH_Unlock();
if(__HAL_PWR_GET_FLAG(PWR_FLAG_SB) == RESET)
{
/* Blink LED_OK (Green) twice at startup */
LEDInterface_toggleColor(GREEN);
HAL_Delay(100);
LEDInterface_toggleColor(NONE);
HAL_Delay(100);
LEDInterface_toggleColor(GREEN);
HAL_Delay(100);
LEDInterface_toggleColor(NONE);
ee_status = EE_Init(EE_FORCED_ERASE);
if(ee_status != EE_OK)
{
while(1);
}
This is the eeprom_emul_conf.h settings which I also have not changed
/* Configuration of eeprom emulation in flash, can be custom */
#if defined (STM32L4R5xx) || defined (STM32L4R7xx) || defined (STM32L4R9xx) || defined (STM32L4S5xx) || defined (STM32L4S7xx) || defined (STM32L4S9xx)
#define START_PAGE_ADDRESS 0x08100000U /*!< Start address of the 1st page in flash, for EEPROM emulation */
#else
#define START_PAGE_ADDRESS 0x08080000U /*!< Start address of the 1st page in flash, for EEPROM emulation */
#endif
#define CYCLES_NUMBER 1U /*!< Number of 10Kcycles requested, minimum 1 for 10Kcycles (default),
for instance 10 to reach 100Kcycles. This factor will increase
pages number */
#define GUARD_PAGES_NUMBER 2U /*!< Number of guard pages avoiding frequent transfers (must be multiple of 2): 0,2,4.. */
/* Configuration of crc calculation for eeprom emulation in flash */
#define CRC_POLYNOMIAL_LENGTH LL_CRC_POLYLENGTH_16B /* CRC polynomial lenght 16 bits */
#define CRC_POLYNOMIAL_VALUE 0x8005U /* Polynomial to use for CRC calculation *
/
I am running into the osal_hooks.c file where I am getting stuck in this while loop
#if defined(DOXYGEN)
void vApplicationStackOverflowHook( TaskHandle_t xTask, char *pcTaskName )
#else
OSAL_WEAK_FN(void, vApplicationStackOverflowHook)( TaskHandle_t xTask, char *pcTaskName )
#endif
{
volatile char * name = pcTaskName;
(void)name;
while (1)
{
;
}
}
Im sure I need to change where I allocate the memory but what is the best way to go about this. Thank you

Writing to non-volatile memory without disrupting UART interrupts execution on STM32F4XX

I have several OVERRUN errors on UART peripheral because I keep receiving UART data while my code is stall because I'm executing a write operation on flash.
I'm using interrupts for UART and has it is explained on Application Note AN3969 :
EEPROM emulation firmware runs from the internal Flash, thus access to
the Flash will be stalled during operations requiring Flash erase or
programming (EEPROM initialization, variable update or page erase). As
a consequence, the application code is not executed and the interrupt
can not be served.
This behavior may be acceptable for many applications, however for
applications with realtime constraints, you need to run the critical
processes from the internal RAM.
In this case:
Relocate the vector table in the internal RAM.
Execute all critical processes and interrupt service routines from the internal RAM. The compiler provides a keyword to declare functions
as a RAM function; the function is copied from the Flash to the RAM at
system startup just like any initialized variable. It is important to
note that for a RAM function, all used variable(s) and called
function(s) should be within the RAM.
So I've search on the internet and found AN4808 which provides examples on how to keep the interrupts running while flash operations.
I went ahead and modified my code :
Linker script : Added vector table to SRAM and define a .ramfunc section
/* stm32f417.dld */
ENTRY(Reset_Handler)
MEMORY
{
ccmram(xrw) : ORIGIN = 0x10000000, LENGTH = 64k
sram : ORIGIN = 0x20000000, LENGTH = 112k
eeprom_default : ORIGIN = 0x08004008, LENGTH = 16376
eeprom_s1 : ORIGIN = 0x08008000, LENGTH = 16k
eeprom_s2 : ORIGIN = 0x0800C000, LENGTH = 16k
flash_unused : ORIGIN = 0x08010000, LENGTH = 64k
flash : ORIGIN = 0x08020000, LENGTH = 896k
}
_end_stack = 0x2001BFF0;
SECTIONS
{
. = ORIGIN(eeprom_default);
.eeprom_data :
{
*(.eeprom_data)
} >eeprom_default
. = ORIGIN(flash);
.vectors :
{
_load_vector = LOADADDR(.vectors);
_start_vector = .;
*(.vectors)
_end_vector = .;
} >sram AT >flash
.text :
{
*(.text)
*(.rodata)
*(.rodata*)
_end_text = .;
} >flash
.data :
{
_load_data = LOADADDR(.data);
. = ALIGN(4);
_start_data = .;
*(.data)
} >sram AT >flash
.ramfunc :
{
. = ALIGN(4);
*(.ramfunc)
*(.ramfunc.*)
. = ALIGN(4);
_end_data = .;
} >sram AT >flash
.ccmram :
{
_load_ccmram = LOADADDR(.ccmram);
. = ALIGN(4);
_start_ccmram = .;
*(.ccmram)
*(.ccmram*)
. = ALIGN(4);
_end_ccmram = .;
} > ccmram AT >flash
.bss :
{
_start_bss = .;
*(.bss)
_end_bss = .;
} >sram
. = ALIGN(4);
_start_stack = .;
}
_end = .;
PROVIDE(end = .);
Reset Handler : Added vector table copy SRAM and define a .ramfunc section
void Reset_Handler(void)
{
unsigned int *src, *dst;
/* Copy vector table from flash to RAM */
src = &_load_vector;
dst = &_start_vector;
while (dst < &_end_vector)
*dst++ = *src++;
/* Copy data section from flash to RAM */
src = &_load_data;
dst = &_start_data;
while (dst < &_end_data)
*dst++ = *src++;
/* Copy data section from flash to CCRAM */
src = &_load_ccmram;
dst = &_start_ccmram;
while (dst < &_end_ccmram)
*dst++ = *src++;
/* Clear the bss section */
dst = &_start_bss;
while (dst < &_end_bss)
*dst++ = 0;
SystemInit();
SystemCoreClockUpdate();
RCC->AHB1ENR = 0xFFFFFFFF;
RCC->AHB2ENR = 0xFFFFFFFF;
RCC->AHB3ENR = 0xFFFFFFFF;
RCC->APB1ENR = 0xFFFFFFFF;
RCC->APB2ENR = 0xFFFFFFFF;
RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN;
RCC->AHB1ENR |= RCC_AHB1ENR_GPIOBEN;
RCC->AHB1ENR |= RCC_AHB1ENR_GPIOCEN;
RCC->AHB1ENR |= RCC_AHB1ENR_GPIODEN;
RCC->AHB1ENR |= RCC_AHB1ENR_GPIOEEN;
RCC->AHB1ENR |= RCC_AHB1ENR_GPIOFEN;
RCC->AHB1ENR |= RCC_AHB1ENR_GPIOGEN;
RCC->AHB1ENR |= RCC_AHB1ENR_GPIOHEN;
RCC->AHB1ENR |= RCC_AHB1ENR_GPIOIEN;
RCC->AHB1ENR |= RCC_AHB1ENR_CCMDATARAMEN;
main();
while(1);
}
system_stm32f4xxx.c : Uncommented VECT_TAB_SRAM define
/*!< Uncomment the following line if you need to relocate your vector Table in
Internal SRAM. */
#define VECT_TAB_SRAM
#define VECT_TAB_OFFSET 0x00 /*!< Vector Table base offset field.
This value must be a multiple of 0x200. */
Added a definition of RAMFUNC to set section attributes :
#define RAMFUNC __attribute__ ((section (".ramfunc")))
Addded RAMFUNC before UART related function and prototypes so it gets run from RAM.
RAMFUNC void USART1_IRQHandler(void)
{
uint32_t sr = USART1->SR;
USART1->SR & USART_SR_ORE ? GPIO_SET(LED_ERROR_PORT, LED_ERROR_PIN_bp):GPIO_CLR(LED_ERROR_PORT, LED_ERROR_PIN_bp);
if(sr & USART_SR_TXE)
{
if(uart_1_send_write_pos != uart_1_send_read_pos)
{
USART1->DR = uart_1_send_buffer[uart_1_send_read_pos];
uart_1_send_read_pos = (uart_1_send_read_pos + 1) % USART_1_SEND_BUF_SIZE;
}
else
{
USART1->CR1 &= ~USART_CR1_TXEIE;
}
}
if(sr & (USART_SR_RXNE | USART_SR_ORE))
{
USART1->SR &= ~(USART_SR_RXNE | USART_SR_ORE);
uint8_t byte = USART1->DR;
uart_1_recv_buffer[uart_1_recv_write_pos] = byte;
uart_1_recv_write_pos = (uart_1_recv_write_pos + 1) % USART_1_RECV_BUF_SIZE;
}
}
My target runs properly with vector table and UART function in RAM but I still I get an overrun on USART. I'm also not disabling interrupts when performing the flash write operation.
I also tried to run code from CCM RAM instead of SRAM but has I saw on this post code can't be executed on CCM RAM on STMF32F4XX...
Any idea ? Thanks.
Any attempt to read from flash while a write operation is ongoing causes the bus to stall.
In order to not be blocked by flash writes, I think not only the the interrupt code, but the interrupted function has to run from RAM too, otherwise the core cannot proceed to a state when interrupts are possible.
Try relocating the flash handling code to RAM.
If it's possible, I'd advise switching to an MCU with two independent banks of flash memory, like the pin- and software-compatible 427/429/437/439 series. You can dedicate one bank to program code and the other to EEPROM-like data storage, then writing the second bank won't disturb code running from the first bank.
As suggested, it might be necessary to execute code from RAM; or, rather, make sure that no flash read operations are performed while the write is in progress.
To test, you might want to compile the entire executable for RAM, rather than flash (i.e., place everything into RAM and not use the flash at all).
You could then use gdb to load the binary and start execution... test your uart and make sure it is working as expected. At least this way you can be sure the flash is unused.
Some micros have READ WHILE WRITE sections that do not have a problem performing multiple operations simultaneously.

LPC824 microcontroller ADC demo HardFault problem

I'm trying to program LPC824 microcontroller board ([https://www.switch-science.com/catalog/2265/][1]) with LPCOpen.
I'm using it with LPCLink 2 debugger board.
My goal is to get some information from the "pressure sensor" with an ADC.
My code stops with a HardFault when executing a NVIC_EnableIRQ function(on line: 92).
If I don't use "NVIC interrupt controller" then my code works and I can get value from sensor with ADC.
What I am doing wrong?
Here is my adc.c code:
#include "board.h"
static volatile int ticks;
static bool sequenceComplete = false;
static bool thresholdCrossed = false;
#define TICKRATE_HZ (100) /* 100 ticks per second */
#define BOARD_ADC_CH 2
/**
* #brief Handle interrupt from ADC sequencer A
* #return Nothing
*/
void ADC_SEQA_IRQHandler(void) {
uint32_t pending;
/* Get pending interrupts */
pending = Chip_ADC_GetFlags(LPC_ADC);
/* Sequence A completion interrupt */
if (pending & ADC_FLAGS_SEQA_INT_MASK) {
sequenceComplete = true;
}
/* Threshold crossing interrupt on ADC input channel */
if (pending & ADC_FLAGS_THCMP_MASK(BOARD_ADC_CH)) {
thresholdCrossed = true;
}
/* Clear any pending interrupts */
Chip_ADC_ClearFlags(LPC_ADC, pending);
}
/**
* #brief Handle interrupt from SysTick timer
* #return Nothing
*/
void SysTick_Handler(void) {
static uint32_t count;
/* Every 1/2 second */
if (count++ == TICKRATE_HZ / 2) {
count = 0;
Chip_ADC_StartSequencer(LPC_ADC, ADC_SEQA_IDX);
}
}
/**
* #brief main routine for ADC example
* #return Function should not exit
*/
int main(void) {
uint32_t rawSample;
int j;
SystemCoreClockUpdate();
Board_Init();
/* Setup ADC for 12-bit mode and normal power */
Chip_ADC_Init(LPC_ADC, 0);
Chip_ADC_Init(LPC_ADC, ADC_CR_MODE10BIT);
/* Need to do a calibration after initialization and trim */
Chip_ADC_StartCalibration(LPC_ADC);
while (!(Chip_ADC_IsCalibrationDone(LPC_ADC))) {
}
/* Setup for maximum ADC clock rate using sycnchronous clocking */
Chip_ADC_SetClockRate(LPC_ADC, ADC_MAX_SAMPLE_RATE);
Chip_ADC_SetupSequencer(LPC_ADC, ADC_SEQA_IDX,
(ADC_SEQ_CTRL_CHANSEL(BOARD_ADC_CH) | ADC_SEQ_CTRL_MODE_EOS));
Chip_Clock_EnablePeriphClock(SYSCTL_CLOCK_SWM);
Chip_SWM_EnableFixedPin(SWM_FIXED_ADC2);
Chip_Clock_DisablePeriphClock(SYSCTL_CLOCK_SWM);
/* Setup threshold 0 low and high values to about 25% and 75% of max */
Chip_ADC_SetThrLowValue(LPC_ADC, 0, ((1 * 0xFFF) / 4));
Chip_ADC_SetThrHighValue(LPC_ADC, 0, ((3 * 0xFFF) / 4));
Chip_ADC_ClearFlags(LPC_ADC, Chip_ADC_GetFlags(LPC_ADC));
Chip_ADC_EnableInt(LPC_ADC,
(ADC_INTEN_SEQA_ENABLE | ADC_INTEN_OVRRUN_ENABLE));
Chip_ADC_SelectTH0Channels(LPC_ADC, ADC_THRSEL_CHAN_SEL_THR1(BOARD_ADC_CH));
Chip_ADC_SetThresholdInt(LPC_ADC, BOARD_ADC_CH, ADC_INTEN_THCMP_CROSSING);
/* Enable ADC NVIC interrupt */
NVIC_EnableIRQ(ADC_SEQA_IRQn);
Chip_ADC_EnableSequencer(LPC_ADC, ADC_SEQA_IDX);
SysTick_Config(SystemCoreClock / TICKRATE_HZ);
/* Endless loop */
while (1) {
/* Sleep until something happens */
__WFI();
if (thresholdCrossed) {
thresholdCrossed = false;
printf("********ADC threshold event********\r\n");
}
/* Is a conversion sequence complete? */
if (sequenceComplete) {
sequenceComplete = false;
/* Get raw sample data for channels 0-11 */
for (j = 0; j < 12; j++) {
rawSample = Chip_ADC_GetDataReg(LPC_ADC, j);
/* Show some ADC data */
if (rawSample & (ADC_DR_OVERRUN | ADC_SEQ_GDAT_DATAVALID)) {
printf("Chan: %d Val: %d\r\n", j, ADC_DR_RESULT(rawSample));
printf("Threshold range: 0x%x ",
ADC_DR_THCMPRANGE(rawSample));
printf("Threshold cross: 0x%x\r\n",
ADC_DR_THCMPCROSS(rawSample));
printf("Overrun: %s ",
(rawSample & ADC_DR_OVERRUN) ? "true" : "false");
printf("Data Valid: %s\r\n\r\n",
(rawSample & ADC_SEQ_GDAT_DATAVALID) ?
"true" : "false");
}
}
}
}
}
Hard fault usually means that you try to execute code outside allowed addresses. If you have not registered the interrupt in the vector table but enabled it, the MCU will jump to whatever address that's written there instead, after which the program crashes.
How to fix that depends on tool chain. Assuming LPCXpresso, you have several options to set up libraries (I don't know about LPCOpen specifically), so where to find the vector table is different from case to case. However, this works quite similar on most MCUs, ARM or not. Somewhere in a "crt start-up" file you should have something along the lines of this:
void (* const g_pfnVectors[])(void) = ...
This is an array of function pointers which will be the vector table allocated in memory at address 0 on Cortex M. You have to place your function at the relevant interrupt vector. For example it may say something like
PIN_INT0_IRQHandler, // PIO INT0
If that's the interrupt you should implement, then you replace that line:
#include "my_irq_stuff.h"
...
void (* const g_pfnVectors[])(void) =
...
my_INT0, // PIO INT0
Assuming my_irq_stuff.h contains the function prototype my_INT0 for the interrupt service routine. The actual routine should be implemented in the corresponding .c file.