I am trying to write on the flash memory of an STM32L476RG with the HAL_FLASH_Program function but it always returns an error.
static FLASH_EraseInitTypeDef EraseInitStruct;
uint32_t PAGEError;
uint32_t Address = 0x080FFF10;
uint64_t data = 5;
/* Unlock the Flash to enable the flash control register access *************/
HAL_FLASH_Unlock();
/* Erase the user Flash area*/
/* Fill EraseInit structure*/
EraseInitStruct.TypeErase = FLASH_TYPEERASE_PAGES;
EraseInitStruct.Page = Address;
EraseInitStruct.NbPages = 1;
if (HAL_FLASHEx_Erase(&EraseInitStruct, &PAGEError) != HAL_OK)
{
/*Error occurred while page erase.*/
HAL_FLASH_GetError ();
}
/*Write into flash*/
HAL_StatusTypeDef status = HAL_FLASH_Program(FLASH_TYPEPROGRAM_DOUBLEWORD, 0x1FFF7000, data);
if (status== HAL_OK)
{
printf("it works\n\r");
}
else
{
/* Error occurred while writing data in Flash memory*/
HAL_FLASH_GetError();
}
HAL_FLASH_Lock();
I tried to find wthe flash error code with the HAL_FLASH_GetError() function.
The error code I get is "168" (0xa8 in Hex) and I have no idea to what it corresponds.
My questions :
What error is the code 168 (0xa8 in Hex)
what do i need to change so that HAL_FLASH_Program works properly
The problem is how the fields in EraseInitStruct are being set. The HAL driver for some STM32 parts expects an address. However, the HAL library for the STM32L476 expects a page number.
typedef struct
{
uint32_t TypeErase; /*!< Mass erase or page erase.
This parameter can be a value of #ref FLASH_Type_Erase */
uint32_t Banks; /*!< Select bank to erase.
This parameter must be a value of #ref FLASH_Banks
(FLASH_BANK_BOTH should be used only for mass erase) */
uint32_t Page; /*!< Initial Flash page to erase when page erase is disabled
This parameter must be a value between 0 and (max number of pages in the bank - 1)
(eg : 255 for 1MB dual bank) */
uint32_t NbPages; /*!< Number of pages to be erased.
This parameter must be a value between 1 and (max number of pages in the bank - value of initial page)*/
} FLASH_EraseInitTypeDef;
So you need to set the page number correctly, and also specify which flash bank you are trying to erase:
EraseInitStruct.Banks = FLASH_BANK_2;
EraseInitStruct.Page = 255u;
It is good practice to check the result of all HAL function calls, and abort the operation if there is an error.
#Lundin brought up a good point about possibly being unable to erase / program the flash bank that you are running code from. This is an issue for some devices, but the reference manual for the STM32L476 (in section 3.3.5) says this is ok:
... during a program/erase operation to the Flash memory, any attempt to read the same Flash memory bank will stall the bus. The read operation will proceed correctly once the program/erase operation has completed.
Related
I'm trying to use FreeRTOS to write ADC data to SD card on the STM32F7 and I'm using V1 of the CMSIS-RTOS API. I'm using mail queues and I have a struct that holds an array.
typedef struct
{
uint16_t data[2048];
} ADC_DATA;
on the ADC half/Full complete interrupts, I add the data to the queue and I have a consumer task that writes this data to the sd card. My issue is in my Consumer Task, I have to do a memcpy to another array and then write the contents of that array to the sd card.
void vConsumer(void const * argument)
{
ADC_DATA *rx_data;
for(;;)
{
writeEvent = osMailGet(adcDataMailId, osWaitForever);
if(writeEvent.status == osEventMail)
{
// write Data to SD
rx_data = writeEvent.value.p;
memcpy(sd_buff, rx_data->data, sizeof(sd_buff));
if(wav_write_result == FR_OK)
{
if( f_write(&wavFile, (uint8_t *)sd_buff, SD_WRITE_BUF_SIZE, (void*)&bytes_written) == FR_OK)
{
file_size+=bytes_written;
}
}
osMailFree(adcDataMailId, rx_data);
}
}
This works as intended but if I try to change this line to
f_write(&wavFile, (uint8_t *)rx_data->data, SD_WRITE_BUF_SIZE, (void*)&bytes_written) == FR_OK)
so as to get rid of the memcpy, f_write returns FR_DISK_ERR. Can anyone help shine a light on why this happens, I feel like the extra memcpy is useless and you should just be able to pass the pointer to the queue straight to f_write.
So just a few thoughts here:
memcpy
Usually I copy only the necessary amount of data. If I have the size of the actual data I'll add a boundary check and pass it to memcpy.
Your problem
I am just guessing here, but if you check the struct definition, the data field has the type uint16_t and you cast it to a byte pointer. Also the FatFs documentation expects a void* for the type of buf.
EDIT: Could you post more details of sd_buff
I wanted to ask, how will behave DMA SPI rx in STM32 in following situation.
I have a specified (for example) 96 Bytes array called A which is intended to store the data received from the SPI. I turn on my circular SPI DMA which operates on each Byte, is configured to 96 Byte.
Is it possible, when DMA will fill my 96 Bytes array, the Transfer Complete interrupt will went off, to quickly copy the 96 Byte array to another - B, before circular DMA will start writing to A(and destroy the data saved in B)?
I want to transfer(every time when I will get new data from A in B) data from B quickly over USB to PC.
I'm just thinking how to transmit continous data stream SPI from STM32 over USB to PC, because a block of 96 Bytes of data transferred by USB once per certain time is easier I think than stream in real time SPI to USB by STM32? I don't know it's even possible
For that to work, you would have to be able to guarantee that you can copy all the data before the next SPI byte is received and transferred to the start of the buffer. Whether that were possible would depend on the clock speed of the processor and the speed of the SPI, and be able to guarantee that no higher priority interrupts occur that might delay the transfer. To be safe it would need an exceptionally slow SPI speed, and in that case would probably not need to use DMA at all.
All in all it is a bad idea and entirely unnecessary. The DMA controller has a "half-transfer" interrupt for exactly this purpose. You will get the HT interrupt when the first 48 bytes are transferred, and the DMA will continue transferring the remaining 48 bytes while you copy lower half buffer. When you get the transfer complete you transfer the upper half. That extends the time you have to transfer the data from the receive time of a single byte to the receive time of 48 bytes.
If you actually need 96 bytes on each transfer, then you simply make your buffer 192 bytes long (2 x 96).
In pseudo-code:
#define BUFFER_LENGTH 96
char DMA_Buffer[2][BUFFER_LENGTH] ;
void DMA_IRQHandler()
{
if( DMA_IT_Flag(DMA_HT) == SET )
{
memcpy( B, DMA_Buffer[0], BUFFER_LENGTH ) ;
Clear_IT_Flag(DMA_HT) ;
}
else if( DMA_IT_Flag(DMA_TC) == SET )
{
memcpy( B, DMA_Buffer[1], BUFFER_LENGTH ) ;
Clear_IT_Flag(DMA_TC) ;
}
}
With respect to transferring the data to a PC over USB, first of all you need to be sure that your USB transfer rate is at least as fast or faster than the SPI transfer rate. It is likely that the USB transfer is less deterministic (because it is controlled by the PC host - that is you can only output data on the USB when the host explicitly asks for it), so even if the the average transfer rate is sufficient, there may be latency that requires further buffering, so rather then simply copying from the DMA buffer A to a USB buffer B, you may need a circular buffer or FIFO queue to feed the USB. On the other hand, if you already have the buffer DMA_Buffer[0], DMA_Buffer[1] and B you already effectively have a FIFO of three blocks of 96 bytes, which may be sufficient
In one of my projects I faced a similar problem. The task was to transfer data coming from an external ADC chip (connected with SPI) to PC over full speed USB. The data was (8 ch x 16-bit) and I was requested to achieve the fastest sampling frequency possible.
I ended up with a triple buffer solution. There is 4 possible states a buffer can be in:
READY: Buffer is full with data, ready to be send over USB
SENT: Buffer is already sent and outdated
IN_USE: DMA (requested by SPI) is currently filling this buffer
NEXT: This buffer is considered empty and will be used when IN_USE is full.
As the timing of the USB request can't be synchonized with with the SPI process, I believe a double buffer solution wouldn't work. If you don't have a NEXT buffer, by the time you decide to send the READY buffer, DMA may finish filling the IN_USE buffer and start corrupting the READY buffer. But in a triple buffer solution, READY buffer is safe to send over USB, as it won't be filled even the current IN_USE buffer is full.
So the buffer states look like this as the time passes:
Buf0 Buf1 Buf2
==== ==== ====
READY IN_USE NEXT
SENT IN_USE NEXT
NEXT READY IN_USE
NEXT SENT IN_USE
IN_USE NEXT READY
Of course, if the PC don't start USB requests fast enough, you may still loose a READY buffer as soon as it turns into NEXT (before becoming SENT). PC sends USB IN requests asynchronously with no info about the current buffer states. If there is no READY buffer (it's in SENT state), the STM32 responds with a ZLP (zero length package) and the PC tries again after 1 ms delay.
For the implementation on STM32, I use double buffered mode and I modify M0AR & M1AR registers in the DMA Transfer Complete ISR to address 3 buffers.
BTW, I used (3 x 4000) bytes buffers and achieved 32 kHz sampling frequency at the end. USB is configured as vendor specific class and it uses bulk transfers.
Generally using circular DMA only works if you trigger on the half full/half empty, otherwise you don't have enough time to copy information out of the buffer.
I would recommend against copying the data out the buffer during the interrupt. Rather use the data directly from the buffer without an additional copy step.
If you do the copy in the interrupt, you are blocking other lower priority interrupts during the copy. On a STM32 a simple naive byte copy of 48 bytes may take additional 48*6 ~ 300 clock cycles.
If you track the buffers read and write positions independently, you just need to update a single pointer and post a delayed a notification call to the consumer of the buffer.
If you want a longer period then don't use circular DMA, rather use normal DMA in 48 byte blocks and implement circular byte buffer as a data structure.
I did this for a USART at 460k baud that receives asynchronously variable length packets. If you ensure that the producer only updates the write pointer and the consumer only updates the read pointer you can avoid data races in most of it. Note that the read and write of an aligned <=32 bit variable on cortex m3/m4 is atomic.
The included code is a simplified version of the circular buffer with DMA support that I used. It is limited to buffer sizes that are 2^n and uses Templates and C++11 functionality so it may not be suitable depending on your development/platform constraints.
To use the buffer call getDmaReadBlock() or getDMAwriteBlock() and get the DMA memory address and block length. Once the DMA completes use skipRead() / skipWrite() to increment the read or write pointers by the actual amount that was transferred.
/**
* Creates a circular buffer. There is a read pointer and a write pointer
* The buffer is full when the write pointer is = read pointer -1
*/
template<uint16_t SIZE=256>
class CircularByteBuffer {
public:
struct MemBlock {
uint8_t *blockStart;
uint16_t blockLength;
};
private:
uint8_t *_data;
uint16_t _readIndex;
uint16_t _writeIndex;
static constexpr uint16_t _mask = SIZE - 1;
// is the circular buffer a power of 2
static_assert((SIZE & (SIZE - 1)) == 0);
public:
CircularByteBuffer &operator=(const CircularByteBuffer &) = default;
CircularByteBuffer(uint8_t (&data)[SIZE]);
CircularByteBuffer(const CircularByteBuffer &) = default;
~CircularByteBuffer() = default;
private:
static uint16_t wrapIndex(int32_t index);
public:
/*
* The number of byte available to be read. Writing bytes to the buffer can only increase this amount.
*/
uint16_t readBytesAvail() const;
/**
* Return the number of bytes that can still be written. Reading bytes can only increase this amount.
*/
uint16_t writeBytesAvail() const;
/**
* Read a byte from the buffer and increment the read pointer
*/
uint8_t readByte();
/**
* Write a byte to the buffer and increment the write pointer. Throws away the byte if there is no space left.
* #param byte
*/
void writeByte(uint8_t byte);
/**
* Provide read only access to the buffer without incrementing the pointer. Whilst memory accesses outside the
* allocated memeory can be performed. Garbage data can still be read if that byte does not contain valid data
* #param pos the offset from teh current read pointer
* #return the byte at the given offset in the buffer.
*/
uint8_t operator[](uint32_t pos) const;
/**
* INcrement the read pointer by a given amount
*/
void skipRead(uint16_t amount);
/**
* Increment the read pointer by a given amount
*/
void skipWrite(uint16_t amount);
/**
* Get the start and lenght of the memeory block used for DMA writes into the queue.
* #return
*/
MemBlock getDmaWriteBlock();
/**
* Get the start and lenght of the memeory block used for DMA reads from the queue.
* #return
*/
MemBlock getDmaReadBlock();
};
// CircularByteBuffer
// ------------------
template<uint16_t SIZE>
inline CircularByteBuffer<SIZE>::CircularByteBuffer(uint8_t (&data)[SIZE]):
_data(data),
_readIndex(0),
_writeIndex(0) {
}
template<uint16_t SIZE>
inline uint16_t CircularByteBuffer<SIZE>::wrapIndex(int32_t index){
return static_cast<uint16_t>(index & _mask);
}
template<uint16_t SIZE>
inline uint16_t CircularByteBuffer<SIZE>::readBytesAvail() const {
return wrapIndex(_writeIndex - _readIndex);
}
template<uint16_t SIZE>
inline uint16_t CircularByteBuffer<SIZE>::writeBytesAvail() const {
return wrapIndex(_readIndex - _writeIndex - 1);
}
template<uint16_t SIZE>
inline uint8_t CircularByteBuffer<SIZE>::readByte() {
if (readBytesAvail()) {
uint8_t result = _data[_readIndex];
_readIndex = wrapIndex(_readIndex+1);
return result;
} else {
return 0;
}
}
template<uint16_t SIZE>
inline void CircularByteBuffer<SIZE>::writeByte(uint8_t byte) {
if (writeBytesAvail()) {
_data[_writeIndex] = byte;
_writeIndex = wrapIndex(_writeIndex+1);
}
}
template<uint16_t SIZE>
inline uint8_t CircularByteBuffer<SIZE>::operator[](uint32_t pos) const {
return _data[wrapIndex(_readIndex + pos)];
}
template<uint16_t SIZE>
inline void CircularByteBuffer<SIZE>::skipRead(uint16_t amount) {
_readIndex = wrapIndex(_readIndex+ amount);
}
template<uint16_t SIZE>
inline void CircularByteBuffer<SIZE>::skipWrite(uint16_t amount) {
_writeIndex = wrapIndex(_writeIndex+ amount);
}
template <uint16_t SIZE>
inline typename CircularByteBuffer<SIZE>::MemBlock CircularByteBuffer<SIZE>::getDmaWriteBlock(){
uint16_t len = static_cast<uint16_t>(SIZE - _writeIndex);
// full is (write == (read -1)) so on wrap around we need to ensure that we stop 1 off from the read pointer.
if( _readIndex == 0){
len = static_cast<uint16_t>(len - 1);
}
if( _readIndex > _writeIndex){
len = static_cast<uint16_t>(_readIndex - _writeIndex - 1);
}
return {&_data[_writeIndex], len};
}
template <uint16_t SIZE>
inline typename CircularByteBuffer<SIZE>::MemBlock CircularByteBuffer<SIZE>::getDmaReadBlock(){
if( _readIndex > _writeIndex){
return {&_data[_readIndex], static_cast<uint16_t>(SIZE- _readIndex)};
} else {
return {&_data[_readIndex], static_cast<uint16_t>(_writeIndex - _readIndex)};
}
}
`
I have the nucleo board (nucleo-L4R5ZI) and want to write a code to be able to send data from a uC to a PC via the USB. I followed some tutorials, used STM32CubeMx, other solutions found across the Internet, but anyways I failed. I can open the vcp on the PC side (using Hterm, TeraTerm and Realterm), but cannot get any data.
I use Eclipse and the buildin debugger, which I flashed to JLink.
The main loop:
int main(void)
{
HAL_Init();
SystemClock_Config();
MX_GPIO_Init();
MX_USB_DEVICE_Init();
HAL_Delay(10000);
uint8_t HiMsg[] = "0123456789987654321001234567899876543210\r\n";
while (1)
{
if( CDC_Transmit_FS(HiMsg, 20) == USBD_OK )
{
HAL_GPIO_TogglePin(GPIOB, GPIO_PIN_7); // blue LED
}
HAL_GPIO_TogglePin(GPIOB, GPIO_PIN_14); // red LED
HAL_Delay(1000);
}
}
After executing this function, the blue LED lights only once and never changes its state (does not blink). It means that the CDC_Transmit_FS(...) returns USBD_OK only once, and next calls give USBD_Busy.
The MX_USB_DEVICE_Init() looks as follow:
void MX_USB_DEVICE_Init(void)
{
USBD_Init(&hUsbDeviceFS, &FS_Desc, DEVICE_FS);
USBD_RegisterClass(&hUsbDeviceFS, &USBD_CDC);
USBD_CDC_RegisterInterface(&hUsbDeviceFS, &USBD_Interface_fops_FS);
USBD_Start(&hUsbDeviceFS);
USBD_CDC_Init (&hUsbDeviceFS, &USBD_CDC); // I need to init it somewhere so I think here is a good place
}
The CDC_Transmit_FS looks like that:
uint8_t CDC_Transmit_FS(uint8_t* Buf, uint16_t Len)
{
uint8_t result = USBD_OK;
/* USER CODE BEGIN 7 */
CDC_Init_FS();
USBD_CDC_HandleTypeDef *hcdc = (USBD_CDC_HandleTypeDef*)hUsbDeviceFS.pClassData;
if (hcdc->TxState != 0){
return USBD_BUSY;
}
USBD_CDC_SetTxBuffer(&hUsbDeviceFS, Buf, Len);
result = USBD_CDC_TransmitPacket(&hUsbDeviceFS);
CDC_Init_FS();
/* USER CODE END 7 */
return result;
}
Does anyone know how to make it running? What do I miss?
Best,
This part look suspicious:
result = USBD_CDC_TransmitPacket(&hUsbDeviceFS);
CDC_Init_FS();
The call to CDC_Init_FS() probably kills the packet before it had a chance to be sent to the host.
SO... I found the solution!
I can confirm that the code above works (just remove the CDC_Init_FS)!
Acctully, it was a driver problem. for windows 10 you also need to install it despide what's written in the reference
I have been trying to understand the following code which is writing to micro controller flash. The Microcontroller is TIVA ARM Cortex M4. I have read the Internal Memory Chapter 8 of Tiva™ TM4C123GH6PM Microcontroller Data sheet. At high level I understand Flash Memory Address (FMA), Flash Memory Data (FMD), and Flash Memory Control (FMC) and Boot Configuration (BOOTCFG).
Below are definitions for some of the variable used in the function.
#define FLASH_FMA_R (*((volatile uint32_t *)0x400FD000))
#define FLASH_FMA_OFFSET_MAX 0x0003FFFF // Address Offset max
#define FLASH_FMD_R (*((volatile uint32_t *)0x400FD004))
#define FLASH_FMC_R (*((volatile uint32_t *)0x400FD008))
#define FLASH_FMC_WRKEY 0xA4420000 // FLASH write key (KEY bit of FLASH_BOOTCFG_R set)
#define FLASH_FMC_WRKEY2 0x71D50000 // FLASH write key (KEY bit of FLASH_BOOTCFG_R cleared)
#define FLASH_FMC_MERASE 0x00000004 // Mass Erase Flash Memory
#define FLASH_FMC_ERASE 0x00000002 // Erase a Page of Flash Memory
#define FLASH_FMC_WRITE 0x00000001 // Write a Word into Flash Memory
#define FLASH_FMC2_R (*((volatile uint32_t *)0x400FD020))
#define FLASH_FMC2_WRBUF 0x00000001 // Buffered Flash Memory Write
#define FLASH_FWBN_R (*((volatile uint32_t *)0x400FD100))
#define FLASH_BOOTCFG_R (*((volatile uint32_t *)0x400FE1D0))
#define FLASH_BOOTCFG_KEY 0x00000010 // KEY Select
This function is used to erase a section of the flash. The function is called from a start address to and end address. I have not fully comprehended how this code works.
//------------Flash_Erase------------
// Erase 1 KB block of flash.
// Input: addr 1-KB aligned flash memory address to erase
// Output: 'NOERROR' if successful, 'ERROR' if fail (defined in FlashProgram.h)
// Note: disables interrupts while erasing
int Flash_Erase(uint32_t addr){
uint32_t flashkey;
if(EraseAddrValid(addr)){
DisableInterrupts(); // may be optional step
// wait for hardware idle
while(FLASH_FMC_R&(FLASH_FMC_WRITE|FLASH_FMC_ERASE|FLASH_FMC_MERASE)){
// to do later: return ERROR if this takes too long
// remember to re-enable interrupts
};
FLASH_FMA_R = addr;
if(FLASH_BOOTCFG_R&FLASH_BOOTCFG_KEY){ // by default, the key is 0xA442
flashkey = FLASH_FMC_WRKEY;
} else{ // otherwise, the key is 0x71D5
flashkey = FLASH_FMC_WRKEY2;
}
FLASH_FMC_R = (flashkey|FLASH_FMC_ERASE); // start erasing 1 KB block
while(FLASH_FMC_R&FLASH_FMC_ERASE){
// to do later: return ERROR if this takes too long
// remember to re-enable interrupts
}; // wait for completion (~3 to 4 usec)
EnableInterrupts();
return NOERROR;
}
return ERROR;
}
Questions: How does the function exit out of the two while loops? How are variables FLASH_FMC_WRITE, FLASH_FMC_ERASE, and FLASH_FMC_MERASE changed? Can '0' be written as part of the erase process?
FLASH_FMC_WRITE, FLASH_FMC_ERASE, and FLASH_FMC_MERASE are individual bits in the FLASH_FMC_R register value (a bitfield). Look in the part's reference manual (or maybe datasheet) at the description of the FLASH_FMC_R register and you will find the description of these bits and more.
The while loops repeatedly read the FLASH_FMC_R register value and exit when the specified bits are set. The flash memory controller sets these bits when it's appropriate (read the reference manual).
Erasing flash means setting all bits to 1 (all bytes to 0xFF). Writing flash means setting select bits to 0. You cannot change a bit from 0 to 1 with a write.
You need to erase to do that. This is just the way flash works.
I need help with the uart communication I am trying to implement on my Proteus simulation. I use a PIC18f4520 and I want to display on the virtual terminal the values that have been calculated by the microcontroller.
Here a snap of my design on Proteus
Right now, this is how my UART code looks like :
#define _XTAL_FREQ 20000000
#define _BAUDRATE 9600
void Configuration_ISR(void) {
IPR1bits.TMR1IP = 1; // TMR1 Overflow Interrupt Priority - High
PIE1bits.TMR1IE = 1; // TMR1 Overflow Interrupt Enable
PIR1bits.TMR1IF = 0; // TMR1 Overflow Interrupt Flag
// 0 = TMR1 register did not overflow
// 1 = TMR1 register overflowed (must be cleared in software)
RCONbits.IPEN = 1; // Interrupt Priority High level
INTCONbits.PEIE = 1; // Enables all low-priority peripheral interrupts
//INTCONbits.GIE = 1; // Enables all high-priority interrupts
}
void Configuration_UART(void) {
TRISCbits.TRISC6 = 0;
TRISCbits.TRISC7 = 1;
SPBRG = ((_XTAL_FREQ/16)/_BAUDRATE)-1;
//RCSTA REG
RCSTAbits.SPEN = 1; // enable serial port pins
RCSTAbits.RX9 = 0;
//TXSTA REG
TXSTAbits.BRGH = 1; // fast baudrate
TXSTAbits.SYNC = 0; // asynchronous
TXSTAbits.TX9 = 0; // 8-bit transmission
TXSTAbits.TXEN = 1; // enble transmitter
}
void WriteByte_UART(unsigned char ch) {
while(!PIR1bits.TXIF); // Wait for TXIF flag Set which indicates
// TXREG register is empty
TXREG = ch; // Transmitt data to UART
}
void WriteString_UART(char *data) {
while(*data){
WriteByte_UART(*data++);
}
}
unsigned char ReceiveByte_UART(void) {
if(RCSTAbits.OERR) {
RCSTAbits.CREN = 0;
RCSTAbits.CREN = 1;
}
while(!PIR1bits.RCIF); //Wait for a byte
return RCREG;
}
And in the main loop :
while(1) {
WriteByte_UART('a'); // This works. I can see the As in the terminal
WriteString_UART("Hello World !"); //Nothing displayed :(
}//end while(1)
I have tried different solution for WriteString_UART but none has worked so far.
I don't want to use printf cause it impacts other operations I'm doing with the PIC by adding delay.
So I really want to make it work with WriteString_UART.
In the end I would like to have someting like "Error rate is : [a value]%" on the terminal.
Thanks for your help, and please tell me if something isn't clear.
In your WriteByte_UART() function, try polling the TRMT bit. In particular, change:
while(!PIR1bits.TXIF);
to
while(!TXSTA1bits.TRMT);
I don't know if this is your particular issue, but there exists a race-condition due to the fact that TXIF is not immediately cleared upon loading TXREG. Another option would be to try:
...
Nop();
while(!PIR1bits.TXIF);
...
EDIT BASED ON COMMENTS
The issue is due to the fact that the PIC18 utilizes two different pointer types based on data memory and program memory. Try changing your declaration to void WriteString_UART(const rom char * data) and see what happens. You will need to change your WriteByte_UART() declaration as well, to void WriteByte_UART(const unsigned char ch).
Add delay of few miliseconds after line
TXREG = ch;
verify that pointer *data of WriteString_UART(char *data) actually point to
string "Hello World !".
It seems you found a solution, but the reason why it wasn't working in the first place is still not clear. What compiler are you using?
I learned the hard way that C18 and XC8 are used differently regarding memory spaces. With both compilers, a string declared literally like char string[]="Hello!", will be stored in ROM (program memory). They differ in the way functions use strings.
C18 string functions will have variants to access strings either in RAM or ROM (for example strcpypgm2ram, strcpyram2pgm, etc.). XC8 on the other hand, does the job for you and you will not need to use specific functions to choose which memory you want to access.
If you are using C18, I would highly recommend you switch to XC8, which is more recent and easier to work with. If you still want to use C18 or another compiler which requires you to deal with program/data memory spaces, then here below are two solutions you may want to try. The C18 datasheet says that putsUSART prints a string from data memory to USART. The function putrsUSART will print a string from program memory. So you can simply use putrsUSART to print your string.
You may also want to try the following, which consists in copying your string from program memory to data memory (it may be a waste of memory if your application is tight on memory though) :
char pgmstring[] = "Hello";
char datstring[16];
strcpypgm2ram(datstring, pgmstring);
putsUSART(datstring);
In this example, the pointers pgmstring and datstring will be stored in data memory. The string "Hello" will be stored in program memory. So even if the pointer pgmstring itself is in data memory, it initially points to a memory address (the address of "Hello"). The only way to point to this same string in data memory is to create a copy of it in data memory. This is because a function accepting a string stored in data memory (such as putsUSART) can NOT be used directly with a string stored in program memory.
I hope this could help you understand a bit better how to work with Harvard microprocessors, where program and data memories are separated.