I'm writing a SPI to Wishbone component with Chisel3, and for testing it on FPGA/real world I have to change the polarity of the reset (rstn).
To manage it I used RawModule for my Top module. And I used withClockAndReset() to change the reset polarity :
class TopSpi2Wb extends RawModule {
val clock = IO(Input(Clock()))
val rstn = IO(Input(Bool()))
// ...
withClockAndReset(clock, !rstn) {
val spi2Wb = Module(new Spi2Wb(dwidth, awidth))
// module connections IO
It seem's to works since I tryied to instanciate a Sync memory on the wishbone port. with this kind of code:
class TopSpi2Wb extends RawModule {
val clock = IO(Input(Clock()))
val rstn = IO(Input(Bool()))
// ...
withClockAndReset(clock, !rstn) {
val spi2Wb = Module(new Spi2Wb(dwidth, awidth))
val wmem = SyncReadMem(1 << awidth, UInt(dwidth.W))
val ackReg = RegInit(false.B)
val datReg = RegInit(0.U(dwidth.W))
ackReg := false.B
datReg := 0.U(dwidth.W)
when( && {
datReg := DontCare
datReg :=, & & !
// ...
That compile without error, but I can't manage to read/write on memory correctly (with icarus). Verilog emitted seems to drop memory instantiation.
Maybe it's discouraged to write chisel code like Register, memories, ... and just instantiate one top module in RawModule no ?
Well, if I'm wrapping a top module in the RawModule, that works really better:
// Testing Spi2Wb with a memory connexion
// and reset inverted
class TopSpi2Wb extends RawModule {
// Clock & Reset
val clock = IO(Input(Clock()))
val rstn = IO(Input(Bool()))
// Simple blink
val blink = IO(Output(Bool()))
// SPI
val mosi = IO(Input(Bool()))
val miso = IO(Output(Bool()))
val sclk = IO(Input(Bool()))
val csn = IO(Input(Bool()))
val dwidth = 8
val awidth = 7
withClockAndReset(clock, !rstn) {
val spi2Wb = Module(new Spi2WbMem(dwidth, awidth))
blink := := mosi
miso := := sclk := csn
With the code above, all connection and register instantiation for Wishbone memory are done in Spi2WbMem() standard module.

As jkoenig asked I re-written the module to reproduce the bug and ... fixed it !
Sorry for the inconvenience.
I had some difficulties to find the bug because Icarus didn't dump the content of memory and I thought it was not generated.
I think that my initial module wrapping fixed the bug without my realizing it.


Kotlin: Is there a tool that allows me to control parallelism when executing suspend functions?

I'm trying to execute certain suspend function multiple times, in such a way that never more than N of these are being executed at the same time.
For those acquainted with Akka and Scala Streaming libraries, something like mapAsync.
I did my own implementation using one input channel (as in kotlin channels) and N output channels. But it seems cumbersome and not very efficient.
The code I'm currently using is somewhat like this:
val inChannel = Channel<T>()
val outChannels = (0..n).map{
var i = 0
for(t in inChannel){
i = ((i+1)%n)
outChannels.forEach{outChannel ->
for(t in outChannel){
Of course it has error management and everything, but still...
Edit: I did the following test, and it failed.
test("Parallelism is correctly capped") {
val scope = CoroutineScope(Dispatchers.Default.limitedParallelism(3))
var num = 0
(1..100).map {
scope.launch {
num ++
println("started $it")
You can use the limitedParallelism-function on a Dispatcher (experimental in v1.6.0), and use the returned dispatcher to call your asynchronous functions. The function returns a view over the original dispatcher which limits the parallelism to a limit you provide. You can use it like this:
val limit = 2 // Or some other number
val dispatcher = Dispatchers.Default
val limitedDispatcher = dispatcher.limitedParallelism(limit)
for (n in 0..100) {
scope.launch(limitedDispatcher) {
Your question, as asked, calls for #marstran's answer. If what you want is that no more than N coroutines are being actively executed at any given time (in parallel), then limitedParallelism is the way to go:
val maxThreads: Int = TODO("some max number of threads")
val limitedDispatcher = Dispatchers.Default.limitedParallelism(maxThreads)
elements.forEach { elt ->
scope.launch(limitedDispatcher) {
Now, if what you want is to even limit concurrency, so that at most N coroutines are run concurrently (potentially interlacing), regardless of threads, you could use a Semaphore instead:
val maxConcurrency: Int = TODO("some max number of concurrency coroutines")
val semaphore = Semaphore(maxConcurrency)
elements.forEach { elt ->
scope.async {
semaphore.withPermit {
You can also combine both approaches.
Other answers already explained that it depends whether you need to limit parallelism or concurrency. If you need to limit concurrency, then you can do this similarly to your original solution, but with only a single channel:
val channel = Channel<T>()
repeat(n) {
launch {
for(t in channel){
Also note that offer() in your example does not guarantee that the task will be ever executed. If the next consumer in the round robin is still occupied with the previous task, the new task is simply ignored.

Delay implementation in STM32 using for loop

I am using a NUCLEO-L476RG development board,
I am learning to write GPIO drivers for STM32 family
I have implementing a simple logic in which I need to turn on an LED when a push button is pressed.
I have a strange issue:
Edit 1:The Bread board LED turns ON when the line temp=10 is commented, it doesn't turn ON when the delay issue called. Assuming if I add any line of code into that while loop the LED does not turn ON
The Bread board LED turns ON when the delay() function is commented, it doesn't turn ON when the delay issue called.
What could be the issue?
I have powered the board using the mini usb connector on the board, and the clock is configured at MSI with 4MHz
#define delay() for(uint32_t i=0; i<=50000; i++);
int main(void)
GPIO_Handle_t NucleoUserLED,NucleoUserPB,BreadBoardLED,BreadBoardPB;
uint8_t inputVal,BBinpVal;
uint32_t temp;
//User green led in the nucleo board connected to PA5
NucleoUserLED.pGPIO = GPIOA;
NucleoUserLED.GPIO_Pin_Cfg.GPIO_PinNumber = GPIO_PIN_5;
NucleoUserLED.GPIO_Pin_Cfg.GPIO_PinMode = GPIO_MODE_OP;
NucleoUserLED.GPIO_Pin_Cfg.GPIO_PinPuPdControl = GPIO_IP_NO_PUPD;
NucleoUserLED.GPIO_Pin_Cfg.GPIO_PinOpType = GPIO_OP_TYPE_PP;
//User blue button in the nucleo connected to PC13
NucleoUserPB.pGPIO = GPIOC;
NucleoUserPB.GPIO_Pin_Cfg.GPIO_PinNumber = GPIO_PIN_13;
NucleoUserPB.GPIO_Pin_Cfg.GPIO_PinMode = GPIO_MODE_IP;
NucleoUserPB.GPIO_Pin_Cfg.GPIO_PinPuPdControl = GPIO_IP_NO_PUPD;
//User led in the bread board connected to PC8
BreadBoardLED.pGPIO = GPIOC;
BreadBoardLED.GPIO_Pin_Cfg.GPIO_PinNumber = GPIO_PIN_8;
BreadBoardLED.GPIO_Pin_Cfg.GPIO_PinMode = GPIO_MODE_OP;
BreadBoardLED.GPIO_Pin_Cfg.GPIO_PinPuPdControl = GPIO_IP_NO_PUPD;
BreadBoardLED.GPIO_Pin_Cfg.GPIO_PinOpType = GPIO_OP_TYPE_PP;
//User DPDT connected in the breadboard connected to PC6
BreadBoardPB.pGPIO = GPIOC;
BreadBoardPB.GPIO_Pin_Cfg.GPIO_PinNumber = GPIO_PIN_6;
BreadBoardPB.GPIO_Pin_Cfg.GPIO_PinMode = GPIO_MODE_IP;
BreadBoardPB.GPIO_Pin_Cfg.GPIO_PinPuPdControl = GPIO_IP_PU;
* Controlling the IO present in the nucleo board *
inputVal = GPIO_ReadInputPin(NucleoUserPB.pGPIO, NucleoUserPB.GPIO_Pin_Cfg.GPIO_PinNumber);
BBinpVal = GPIO_ReadInputPin(BreadBoardPB.pGPIO, BreadBoardPB.GPIO_Pin_Cfg.GPIO_PinNumber);
if(inputVal == 0)
GPIO_ToggleOutputPin(NucleoUserLED.pGPIO, NucleoUserLED.GPIO_Pin_Cfg.GPIO_PinNumber);
* Controlling the IO present in the bread board *
if (BBinpVal == 0 )
GPIO_WriteOutputPin(BreadBoardLED.pGPIO, BreadBoardLED.GPIO_Pin_Cfg.GPIO_PinNumber, 1);
GPIO_WriteOutputPin(BreadBoardLED.pGPIO, BreadBoardLED.GPIO_Pin_Cfg.GPIO_PinNumber, 0);
return 0;
It is not a function only the macrodefinition.
Your loop is likely to be optimized out
define it as
void inline __attribute__((always_inline)) delay(uint32_t delay)
while(delay--) __asm("");
Bear in mind that 50000 can be quite long if you run on low clock settings.
Not sure what the issue is because "it is not working" is not very specific.
However there are "quality" issues:
That is an inappropriate use of a macro - there is no benefit over using a function. The function call overhead argument does not hold - it is a delay, it is supposed to take time!
The empty-loop counter is not declared volatile - the compiler at any optimisation level other then the minimum is likely to remove the loop altogether.
A for-loop for a delay is a crude and generally non-deterministic solution, with a period that will change between compilers, with different compiler options and on different targets or with different clock speeds. STM32 is a Cortex-M device and given that you should use the SYSTICK counter for this. For example, as a minimum something like:
volatile uint32_t tick = 0 ;
void SysTick_Handler(void)
tick++ ;
void delayms( uint32_t millisec )
static bool init = false ;
if( !init )
SysTick_Config( SystemCoreClock / 1000 ) ;
init = true ;
uint32_t start = tick ;
while( tick - start < millisec ) ;
The issue was solved by declaring the iterator as a global variable. Now the LED turns on when the Push button is pressed
Previous implementation
#define delay() for(uint32_t i=0; i<=50000; i++);
Working implementation
uint32_t temp;
void delay(void)
for(temp = 0;temp<=50000;temp++)
Can any one tell me how declaring the variable as global solves the issue?
Find the working implementation below
#include <stdint.h>
#include "stm32l476xx.h"
#include "stm32l476xx_gpoi_driver.h"
#if !defined(__SOFT_FP__) && defined(__ARM_FP)
#warning "FPU is not initialized, but the project is compiling for an FPU. Please initialize the FPU before use."
uint32_t temp;
void delay(void)
for(temp = 0;temp<=50000;temp++)
int main(void)
GPIO_Handle_t NucleoUserLED,NucleoUserPB,BreadBoardLED,BreadBoardPB;
volatile uint8_t inputVal,BBinpVal;
//User green led in the nucleo board connected to PA5
NucleoUserLED.pGPIO = GPIOA;
NucleoUserLED.GPIO_Pin_Cfg.GPIO_PinNumber = GPIO_PIN_5;
NucleoUserLED.GPIO_Pin_Cfg.GPIO_PinMode = GPIO_MODE_OP;
NucleoUserLED.GPIO_Pin_Cfg.GPIO_PinPuPdControl = GPIO_IP_NO_PUPD;
NucleoUserLED.GPIO_Pin_Cfg.GPIO_PinOpType = GPIO_OP_TYPE_PP;
//User blue button in the nucleo connected to PC13
NucleoUserPB.pGPIO = GPIOC;
NucleoUserPB.GPIO_Pin_Cfg.GPIO_PinNumber = GPIO_PIN_13;
NucleoUserPB.GPIO_Pin_Cfg.GPIO_PinMode = GPIO_MODE_IP;
NucleoUserPB.GPIO_Pin_Cfg.GPIO_PinPuPdControl = GPIO_IP_NO_PUPD;
//User led in the bread board connected to PC8
BreadBoardLED.pGPIO = GPIOC;
BreadBoardLED.GPIO_Pin_Cfg.GPIO_PinNumber = GPIO_PIN_8;
BreadBoardLED.GPIO_Pin_Cfg.GPIO_PinMode = GPIO_MODE_OP;
BreadBoardLED.GPIO_Pin_Cfg.GPIO_PinPuPdControl = GPIO_IP_NO_PUPD;
BreadBoardLED.GPIO_Pin_Cfg.GPIO_PinOpType = GPIO_OP_TYPE_PP;
//User DPDT connected in the breadboard connected to PC6
BreadBoardPB.pGPIO = GPIOC;
BreadBoardPB.GPIO_Pin_Cfg.GPIO_PinNumber = GPIO_PIN_6;
BreadBoardPB.GPIO_Pin_Cfg.GPIO_PinMode = GPIO_MODE_IP;
BreadBoardPB.GPIO_Pin_Cfg.GPIO_PinPuPdControl = GPIO_IP_PU;
* Controlling the IO present in the nucleo board *
inputVal = GPIO_ReadInputPin(NucleoUserPB.pGPIO, NucleoUserPB.GPIO_Pin_Cfg.GPIO_PinNumber);
BBinpVal = GPIO_ReadInputPin(BreadBoardPB.pGPIO, BreadBoardPB.GPIO_Pin_Cfg.GPIO_PinNumber);
if(inputVal == 0)
GPIO_ToggleOutputPin(NucleoUserLED.pGPIO, NucleoUserLED.GPIO_Pin_Cfg.GPIO_PinNumber);
* Controlling the IO present in the bread board *
temp = 10;
if (BBinpVal == 0 )
GPIO_WriteOutputPin(BreadBoardLED.pGPIO, BreadBoardLED.GPIO_Pin_Cfg.GPIO_PinNumber, 1);
GPIO_WriteOutputPin(BreadBoardLED.pGPIO, BreadBoardLED.GPIO_Pin_Cfg.GPIO_PinNumber, 0);
return 0;
The issue is solved,
There was an bug in the driver layer I have written
Whenever an GPIO is configured as Input, the registers related to Output for that GPIO pin should be set to their reset value or the driver should not implement the API related to the Output

Traveling salesman with random initial solution, optimization algorithm returning unexpected result

I know traveling salesman is well known, but I need some help on why my optimization algorithm is returning an unexpected result. I have created an initial solution by selecting cities in a random order. I have also created a class with a constructor with the distance matrix and initial solution as parameters. The optimization algorithm is very simple; it swaps two cities and checks if the route distance has been improved, and if it has improved the best solution should be updated. This goes on for 6 iterations. The problem is that the it seems like the best solution is updated and overwritten even if the condition for overwriting it is not met. I will add an image showing the results from a test run.
It seems like the variable bestSolution is overwritten but not bestDistance. I must have some sort of tunnel vision, because I can't figure this one out even if the code is really simple. Can someone please chime in why bestSolution is overwritten and returned with unexpected result?
Code example below:
package RandomMethod
import GreedyHeuristic
import java.util.*
fun main(args: Array<String>) {
/*A B C*/
val distances = arrayOf(/*A*/ intArrayOf(0, 2, 7),
/*B*/ intArrayOf(2, 0, 9),
/*C*/ intArrayOf(7, 9, 0))
val initalSolution = findRandomRoute(distances)
println("Initial solution: $initalSolution")
println("Total distance: ${findTotalDistance(distances, initalSolution)}\n")
val optimizedSolution = GreedyHeuristic(distances, initalSolution).optimize()
println("\nOptimized solution with Greedy Heuristic: $optimizedSolution")
println("Total distance: ${findTotalDistance(distances, optimizedSolution)}")
fun areAllCitiesVisited(isCityVisited: Array<Boolean>): Boolean {
for (visited in isCityVisited) {
if (!visited) return false
return true
fun findTotalDistance(distances: Array<IntArray>, orderToBeVisited: MutableList<Int>): Int {
var totalDistance = 0
for (i in 0..orderToBeVisited.size - 2) {
val fromCityIndex = orderToBeVisited.get(i)
val toCityIndex = orderToBeVisited.get(i + 1)
totalDistance += distances[fromCityIndex].get(toCityIndex)
return totalDistance
fun findRandomRoute(distances: Array<IntArray>): MutableList<Int> {
val visitedCities: Array<Boolean> = Array(distances.size, {i -> false})
// Find starting city index. 0 = A, 1 = B, 2 = C .... N = X
var currentCity = Random().nextInt(distances.size)
val orderToBeVisited: MutableList<Int> = mutableListOf(currentCity)
visitedCities[currentCity] = true
while (!areAllCitiesVisited(visitedCities)) {
currentCity = Random().nextInt(distances.size)
if (!visitedCities[currentCity]) {
visitedCities[currentCity] = true
return orderToBeVisited
And the class for optimization:
import java.util.*
class GreedyHeuristic(distances: Array<IntArray>, initialSoltion: MutableList<Int>) {
val mInitialSolution: MutableList<Int> = initialSoltion
val mDistances: Array<IntArray> = distances
fun optimize(): MutableList<Int> {
var bestSolution = mInitialSolution
var newSolution = mInitialSolution
var bestDistance = findTotalDistance(mDistances, bestSolution)
var i = 0
while (i <= 5) {
println("best distance at start of loop: $bestDistance")
var cityIndex1 = Integer.MAX_VALUE
var cityIndex2 = Integer.MAX_VALUE
while (cityIndex1 == cityIndex2) {
cityIndex1 = Random().nextInt(mInitialSolution.size)
cityIndex2 = Random().nextInt(mInitialSolution.size)
val temp = newSolution.get(cityIndex1)
newSolution.set(cityIndex1, newSolution.get(cityIndex2))
newSolution.set(cityIndex2, temp)
val newDistance: Int = findTotalDistance(mDistances, newSolution)
println("new distance: $newDistance\n")
if (newDistance < bestDistance) {
println("New values gived to solution and distance")
bestSolution = newSolution
bestDistance = newDistance
println("The distance of the best solution ${findTotalDistance(mDistances, bestSolution)}")
return bestSolution
fun findTotalDistance(distances: Array<IntArray>, orderToBeVisited: MutableList<Int>): Int {
var totalDistance = 0
for (i in 0..orderToBeVisited.size - 2) {
val fromCityIndex = orderToBeVisited.get(i)
val toCityIndex = orderToBeVisited.get(i + 1)
totalDistance += distances[fromCityIndex].get(toCityIndex)
return totalDistance
Kotlin (and JVM languages in general) doesn't copy values unless you specifically ask it to. This means that, when you do this:
var bestSolution = mInitialSolution
var newSolution = mInitialSolution
You're not setting bestSolution and newSolution to separate copies of mInitialSolution, but rather making them point at same MutableList, so mutating one mutates the other. Which is to say: your problem isn't that bestSolution is getting overwritten, it's that you're accidentally modifying it every time you modify newSolution.
You then reuse newSolution for every iteration of your while loop without ever creating a new list. This leads us to two things:
Because newSolution still aliases bestSolution, modifying the former also modifies the latter.
bestSolution = newSolution doesn't do anything.
As mentioned in a comment, the easiest way to fix this is by making strategic use of .toMutableList(), which will force copying the list.You can achieve this by making this change at the top:
var bestSolution = mInitialSolution.toMutableList()
var newSolution = mInitialSolution.toMutableList()
Then inside the loop:
bestSolution = newSolution.toMutableList()
Incidentally: As a general rule, you should probably return and accept List rather than MutableList unless you specifically want it to be part of the contract of your function that you're going to mutate things in-place. In this particular case, It would've forced you to either do something icky (like unsafe-casting mInitialSolution to MutableList, which should sound all sorts of warning bells in your head), or copy the list (which would've nudged you towards the right answer)

I am getting the val cannot be reassigned compile time error. But I have declared the variable as `var` only

val cannot be reassigned compile time error var variable. Can't we change the array value?
Array.kt:11:3: error: val cannot be reassigned
import java.util.Scanner
fun main(args: Array< String>){
println("Enter the no")
val scanner = Scanner(System.`in`)
var nos = Array<Int>(5){0}
var i : Int = 1
for (i in 1..3){
nos[i] = scanner.nextInt()
i = i+1
println("Given values $nos")
The for (i in 1..3) ... statement redefines i for the scope of its body, where it becomes a val (it's actually a separate variable that shadows the i declared outside the loop).
You can fix the code by using different names for these variables, or, in your case, by simply removing var i: Int = 1 and i = i + 1:
val scanner = Scanner(System.`in`)
var nos = Array<Int>(5) { 0 }
for (i in 1..3) {
nos[i] = scanner.nextInt()
println("Given values $nos")
UPD (answering to the comment): You can iterate in the opposite direction or using a non-unit step by building a progression with functions downTo and step, both described here in the reference.
var i : Int = 1
for (i in 1..3){
nos[i] = scanner.nextInt()
i = i+1
In this code you declared not one, but two variables with the name i because the for header creates its own declaration. Within the loop, only the version declared in the for header is visible, and that one is a val by definition.
Having said that, I'm unclear on what you were trying to achieve since everything looks like it would work just the way you want it without trying to update i in the loop.

UART Transmit failing after UART Receive thread starts in STM32 HAL Library

I am using STM32F1 (STM32F103C8T6) in order to develop a project using FreeRTOS.
The following is my GPIO and USART1 interface configuration:
GPIO_InitTypeDef GPIO_InitStruct;
GPIO_InitStruct.Pin = GPIO_PIN_9;
GPIO_InitStruct.Mode = GPIO_MODE_AF_PP;
GPIO_InitStruct.Speed = GPIO_SPEED_HIGH;
HAL_GPIO_Init(GPIOA, &GPIO_InitStruct);
GPIO_InitStruct.Pin = GPIO_PIN_10;
GPIO_InitStruct.Pull = GPIO_NOPULL;
HAL_GPIO_Init(GPIOA, &GPIO_InitStruct);
huart1.Instance = USART1;
huart1.Init.BaudRate = 9600;//115200;
huart1.Init.WordLength = UART_WORDLENGTH_8B;
huart1.Init.StopBits = UART_STOPBITS_1;
huart1.Init.Parity = UART_PARITY_NONE;
huart1.Init.Mode = UART_MODE_TX_RX;
huart1.Init.HwFlowCtl = UART_HWCONTROL_NONE;
HAL_NVIC_SetPriority(USART1_IRQn, 0, 0);
The question is: Why does UART transmit work before threads start but not after threads started or from threads? I want to transmit data from threads. i.e
int main(void)
uart_transmit_buffer[0] = 'H';
uart_transmit_buffer[1] = 'R';
uart_transmit_buffer[2] = '#';
uint8_t nums_in_tr_buf = 0;
nums_in_tr_buf = sizeof(uart_transmit_buffer)/sizeof(uint8_t);
state = HAL_UART_Transmit(&huart1, uart_transmit_buffer, nums_in_tr_buf, 5000);
for (;;);
static void A_Random_Thread(void const *argument)
if (conditionsMet()) //Executed once when a proper response received.
uart_transmit_buffer[0] = 'H';
uart_transmit_buffer[1] = 'R';
uart_transmit_buffer[2] = '#';
uint8_t nums_in_tr_buf = 0;
nums_in_tr_buf = sizeof(uart_transmit_buffer)/sizeof(uint8_t);
state = HAL_UART_Transmit(&huart1, uart_transmit_buffer, nums_in_tr_buf, 5000);
I have made sure no thread is in deadlock. The problem is UART_HAL_Transmit gives HAL_BUSY state.
Furthermore, I have dedicated one thread to receiving and parsing information from UART RX and I suspect this might be a cause of the problem. The following is the code:
static void UART_Receive_Thread(void const *argument)
uint32_t count;
(void) argument;
int j = 0, word_length = 0;
for (;;)
if (uart_line_ready == 0)
HAL_UART_Receive(&huart1, uart_receive_buffer, UART_RX_BUFFER_SIZE, 0xFFFF);
if (uart_receive_buffer[0] != 0)
if (uart_receive_buffer[0] != END_OF_WORD_CHAR)
uart_line_buffer[k] = uart_receive_buffer[0];
uart_receive_buffer[0] = 0;
uart_receive_buffer[0] = 0;
uart_line_ready = 1;
word_length = k;
k = 0;
if (uart_line_ready == 1)
for (j = 0; j <= word_length; j++)
UART_RECEIVED_COMMAND[j] = uart_line_buffer[j];
for (j = 0; j <= word_length; j++)
uart_line_buffer[j] = 0;
uart_line_ready = 0;
AssignReceivedData (word_length); //Results in uint8_t * RECEIVED_DATA
//Should be no delay in order not to miss any data..
Another cause to the problem I suspect could be related to interrupts of the system (Also please notice initialization part, I configured NVIC):
void USART1_IRQHandler(void)
Any help or guidance to this issue would be highly appreciated. Thanks in advance.
From what I can see HAL_UART_Transmit would've worked with the F4 HAL (v1.4.2) if it weren't for __HAL_LOCK(huart). The RX thread would lock the handle and then the TX thread would try to lock as well and return HAL_BUSY. HAL_UART_Transmit_IT and HAL_UART_Receive_IT don't lock the handle for the duration of the transmit/receive.
Which may cause problems with the State member, as it is non-atomically updated by the helper functions UART_Receive_IT and UART_Transmit_IT. Though I don't think it would affect operation.
You could modify the function to allow simultaneous RX and TX. You'd have to update this every time they release a new version of the HAL.
The problem is that the ST HAL isn't meant to be used with an RTOS. It says so in the definition of the macro __HAL_LOCK. Redefining it to use the RTOS's mutexes might worth trying as well. Same with HAL_Delay() to use the RTOS's thread sleep function.
In general though, sending via a blocking function in a thread should be fine, but I would not receive data using a blocking function in a thread. You are bound to experience overrun errors that way.
Likewise, if you put too much processing in the receive interrupt you might also run into overrun errors. I prefer using DMA for reception, followed by interrupts if I've run out of DMA streams. The interrupt only copies the data to a buffer, similarly to the DMA. A processRxData thread is then used to process the actual data.
When using FreeRTOS, you have to set interrupt priority to 5 or above, because below 5 is reserved for the OS.
So change your code to set the priority to:
HAL_NVIC_SetPriority(USART1_IRQn, 5, 0);
The problem turned out to be something to do with blocking statements.
Since UART_Receive_Thread has HAL_UART_Receive inside and that is blocking the thread until something is received, that results in a busy HAL (hence, the HAL_BUSY state).
The solution was using non-blocking statements without changing anything else.
i.e. using HAL_UART_Receive_IT and HAL_UART_Transmit_IT at the same time and ignoring blocking statements worked.
Thanks for all suggestions that lead to this solution.