My .c program create threads. But one of them is killed by system after 10 min running time. How to protect this thread? - raspbian

My .c program create threads. But one of them is killed by system after 10 min running time. How to protect this thread?
I creates some threads do period work. But one of them is killed by system. I notice this because the this thread does not print the period information after about 10 min running time.
How to protect this thread from being killed? Anybody met this before?
int main(void){
/* threads */
pthread_t thrid_up;
pthread_t thrid_down;
pthread_create( &thrid_up, NULL, (void * (*)(void *))thread_up, NULL);
pthread_create( &thrid_down, NULL, (void * (*)(void *))thread_down, NULL);
while (!exit_sig && !quit_sig) {
//statistics
...
}
}
void thread_up(void) {
while (!exit_sig && !quit_sig) {
//deal interrupt and send the messages to the server
...
}
}
void thread_down(void) {
while (!exit_sig && !quit_sig) {
//deal the messages sent by the server
...
}
}
The thread_up thread is killed by the system everytime when this program runs about 10 min long. But thread_down works well all the time.

Related

Why does xQueueReceive throw an unhandled exception (LoadProhibited)?

I'm working on an esp32 FreeRTOS application with two tasks. Its purpose is to take UART messages received from a peripheral device and transmit them via mqtt to a central broker.
The first task reads input from Serial1, processes the contents into a message structure, and adds it to a FreeRTOS queue:
typedef struct {
int length;
char buffer[AZ_EL_MAX_MESSAGE_LENGTH];
} tag_message_t;
void uart_read_task(void * pvParameters){
BaseType_t xStatus;
tag_message_t tag_message;
while(true) {
while(Serial1.available())
{
first_char = Serial1.read();
if (first_char == '+') // Indicates the beginning of a message
{
for(int i = 0; i < AZ_EL_MAX_MESSAGE_LENGTH; i++)
{
message_buffer[i] = Serial1.read();
if (message_buffer[i] == '\n') // End of message received
{
ESP_LOGV(TAG, "Message found: %s", message_buffer);
strncpy(tag_message.buffer, message_buffer, i + 1);
tag_message.length = i + 1;
xStatus = xQueueSend(xMessagesToSendQueue, (void*) &tag_message, 0);
if (xStatus != pdPASS)
ESP_LOGW(TAG, "Failed to queue message.");
break;
}
}
}
}
vTaskDelay(pdMS_TO_TICKS(20)); // Wait the minimum BLE advertisement period for messages to come in, i.e. 20ms
}
}
The main loop() (which is technically the second FreeRTOS task) then attempts to receive from that queue and transmit over MQTT to a local broker:
void setup()
{
Serial.begin(115200);
// Configure and start WiFi
configure_network();
connect_network();
// Configure the MQTT connection
configure_mqtt_client();
// Configure and create the inter-task queues
xMessagesToSendQueue = xQueueCreate(100, sizeof(tag_message_t));
if (xMessagesToSendQueue == NULL) {
ESP_LOGE(TAG, "Unable to create messaging queue. Will not create UART handling message queue.");
delay(10000);
esp_restart();
} else {
ESP_LOGI(TAG, "Messaging queue generated");
configure_uart();
xTaskCreate(uart_read_task, "UART_Processing", 20000, NULL, 1, NULL);
}
}
void loop()
{
const TickType_t xTicksToWait = pdMS_TO_TICKS(100); // milliseconds to wait
tag_message_t received_message;
if (network_connected) {
connect_mqtt_client();
while(mqtt_client.connected())
{
mqtt_client.loop();
// Process messages on the xMessagesToSendQueue
if (xMessagesToSendQueue != NULL)
{
ESP_LOGI(TAG, "Processing message");
if (xQueueReceive(xMessagesToSendQueue, &received_message, xTicksToWait) == pdPASS)
{
ESP_LOGD(TAG, "Received message, transmitting.");
if(!mqtt_client.publish("aoa", received_message.buffer, received_message.length));
ESP_LOGW(TAG, "Failed transmission.");
}
else
{
vTaskDelay(pdMS_TO_TICKS(50));
}
}
else
{
ESP_LOGE(TAG, "Messages queue is null.");
}
}
} else {
ESP_LOGE(TAG, "WARNING Device not connected to the network. Reconnecting.");
connect_network();
}
delay(5000);
}
I've verified that the MQTT broker works, that it connects to WiFi, and it can properly read messages from Serial1. HOWEVER, the xQueueReceive() call in loop() throws a LoadProhibited exception every time it's called.
Can anyone tell me what I'm getting wrong here?
All, thank you for your help. It turns out this wasn't a FreeRTOS issue. After a little research (i.e. reading this and watching a more experienced engineer explain things: [link]https://hackaday.com/2017/08/17/secret-serial-port-for-arduinoesp32/) it turns out ESP32 Serial1 pins are connected to flash memory.
Every time I tried Serial1.read() or Serial1.readBytesUntil() the ESP32 crashed. Turns out reading the flash is taboo?
I replaced '''Serial1.read()''' with '''Serial2.read()''' and others. That fixed everything. Now I'm off to optimizing my queues!
You are trying to receive an address on the Queue without casting it.
2 solutions: You either declare received_message as a Pointer:
tag_message_t* received_message;
...
received_message = new tag_message_t;
if (xQueueReceive(xMessagesToSendQueue, received_message, xTicksToWait) == pdPASS)
and don't forget to delete it right after usage.
Or you can cast it after receiving :
if (xQueueReceive(xMessagesToSendQueue, &received_message, xTicksToWait) == pdPASS)
{
received_message = *(static_cast<tag_message_t*>(&received_message));
ESP_LOGD(TAG, "Received message, transmitting.");
if(!mqtt_client.publish("aoa", received_message.buffer, received_message.length));
ESP_LOGW(TAG, "Failed transmission.");
}
or any other sort of dereferencing you might wanna try.
There is also the possibility xQueueReceive is already doing that! So let's be sure of what is going on by adding this:
ESP_LOGD(TAG, "Received message, transmitting. %s", received_message.buffer);
Right after you get the message.
I don't think tag_message is being deleted inside the task, so if the struct is still valid and present, if you properly parse/cast its address, you should be able to acquire the message without any issues.

How to put BG96 on power save mode between sending messages to Azure IoT Hub over HTTP

I'm using a Nucleo L496ZG, X-NUCLEO-IKS01A2 and the Quectel BG96 module to send sensor data (temperature, humidity etc..) to Azure IoT Central over HTTP.
I've been using the example implementation provided by Avnet here, which works fine but it's not power optimized and with a 6700mAh battery pack it only lasts around 30 hours sending telemetry ever ~10 seconds. Goal is for it to last around a week. I'm open to increasing the time between messages but I also want to save power in between sending.
I've gone over the Quectel BG96 manuals and I've tried two things:
1) powering off the device by driving the PWRKEY and turning it back on when I need to send a message
I've gotten this to work, kinda… until I get a hardfault exception which happens seemingly randomly anywhere from within ~5 minutes of running to 2 hours (messages successfully sending prior to the exception). Output of crash log parser is the same every time:
Crash location = strncmp [0x08038DF8] (based on PC value)
Caller location = _findenv_r [0x0804119D] (based on LR value)
Stack Pointer at the time of crash = [20008128]
Target and Fault Info:
Processor Arch: ARM-V7M or above
Processor Variant: C24
Forced exception, a fault with configurable priority has been escalated to HardFault
A precise data access error has occurred. Faulting address: 03060B30
The caller location traces back to my .map file and I don't know what to make of it.
My code:
// Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE file in the project root for full license information.
//#define USE_MQTT
#include <stdlib.h>
#include "mbed.h"
#include "iothubtransporthttp.h"
#include "iothub_client_core_common.h"
#include "iothub_client_ll.h"
#include "azure_c_shared_utility/platform.h"
#include "azure_c_shared_utility/agenttime.h"
#include "jsondecoder.h"
#include "bg96gps.hpp"
#include "azure_message_helper.h"
#define IOT_AGENT_OK CODEFIRST_OK
#include "azure_certs.h"
/* initialize the expansion board && sensors */
#include "XNucleoIKS01A2.h"
static HTS221Sensor *hum_temp;
static LSM6DSLSensor *acc_gyro;
static LPS22HBSensor *pressure;
static const char* connectionString = "xxx";
// to report F uncomment this #define CTOF(x) (((double)(x)*9/5)+32)
#define CTOF(x) (x)
Thread azure_client_thread(osPriorityNormal, 10*1024, NULL, "azure_client_thread");
static void azure_task(void);
EventFlags deleteOK;
size_t g_message_count_send_confirmations;
/* create the GPS elements for example program */
BG96Interface* bg96Interface;
//static int tilt_event;
// void mems_int1(void)
// {
// tilt_event++;
// }
void mems_init(void)
{
//acc_gyro->attach_int1_irq(&mems_int1); // Attach callback to LSM6DSL INT1
hum_temp->enable(); // Enable HTS221 enviromental sensor
pressure->enable(); // Enable barametric pressure sensor
acc_gyro->enable_x(); // Enable LSM6DSL accelerometer
//acc_gyro->enable_tilt_detection(); // Enable Tilt Detection
}
void powerUp(void) {
if (platform_init() != 0) {
printf("Error initializing the platform\r\n");
return;
}
bg96Interface = (BG96Interface*) easy_get_netif(true);
}
void BG96_Modem_PowerOFF(void)
{
DigitalOut BG96_RESET(D7);
DigitalOut BG96_PWRKEY(D10);
DigitalOut BG97_WAKE(D11);
BG96_RESET = 0;
BG96_PWRKEY = 0;
BG97_WAKE = 0;
wait_ms(300);
}
void powerDown(){
platform_deinit();
BG96_Modem_PowerOFF();
}
//
// The main routine simply prints a banner, initializes the system
// starts the worker threads and waits for a termination (join)
int main(void)
{
//printStartMessage();
XNucleoIKS01A2 *mems_expansion_board = XNucleoIKS01A2::instance(I2C_SDA, I2C_SCL, D4, D5);
hum_temp = mems_expansion_board->ht_sensor;
acc_gyro = mems_expansion_board->acc_gyro;
pressure = mems_expansion_board->pt_sensor;
azure_client_thread.start(azure_task);
azure_client_thread.join();
platform_deinit();
printf(" - - - - - - - ALL DONE - - - - - - - \n");
return 0;
}
static void send_confirm_callback(IOTHUB_CLIENT_CONFIRMATION_RESULT result, void* userContextCallback)
{
//userContextCallback;
// When a message is sent this callback will get envoked
g_message_count_send_confirmations++;
deleteOK.set(0x1);
}
void sendMessage(IOTHUB_CLIENT_LL_HANDLE iotHubClientHandle, char* buffer, size_t size)
{
IOTHUB_MESSAGE_HANDLE messageHandle = IoTHubMessage_CreateFromByteArray((const unsigned char*)buffer, size);
if (messageHandle == NULL) {
printf("unable to create a new IoTHubMessage\r\n");
return;
}
if (IoTHubClient_LL_SendEventAsync(iotHubClientHandle, messageHandle, send_confirm_callback, NULL) != IOTHUB_CLIENT_OK)
printf("FAILED to send! [RSSI=%d]\n", platform_RSSI());
else
printf("OK. [RSSI=%d]\n",platform_RSSI());
IoTHubMessage_Destroy(messageHandle);
}
void azure_task(void)
{
//bool tilt_detection_enabled=true;
float gtemp, ghumid, gpress;
int k;
int msg_sent=1;
while (true) {
powerUp();
mems_init();
/* Setup IoTHub client configuration */
IOTHUB_CLIENT_LL_HANDLE iotHubClientHandle = IoTHubClient_LL_CreateFromConnectionString(connectionString, HTTP_Protocol);
if (iotHubClientHandle == NULL) {
printf("Failed on IoTHubClient_Create\r\n");
return;
}
// add the certificate information
if (IoTHubClient_LL_SetOption(iotHubClientHandle, "TrustedCerts", certificates) != IOTHUB_CLIENT_OK)
printf("failure to set option \"TrustedCerts\"\r\n");
#if MBED_CONF_APP_TELUSKIT == 1
if (IoTHubClient_LL_SetOption(iotHubClientHandle, "product_info", "TELUSIOTKIT") != IOTHUB_CLIENT_OK)
printf("failure to set option \"product_info\"\r\n");
#endif
// polls will happen effectively at ~10 seconds. The default value of minimumPollingTime is 25 minutes.
// For more information, see:
// https://azure.microsoft.com/documentation/articles/iot-hub-devguide/#messaging
unsigned int minimumPollingTime = 9;
if (IoTHubClient_LL_SetOption(iotHubClientHandle, "MinimumPollingTime", &minimumPollingTime) != IOTHUB_CLIENT_OK)
printf("failure to set option \"MinimumPollingTime\"\r\n");
IoTDevice* iotDev = (IoTDevice*)malloc(sizeof(IoTDevice));
if (iotDev == NULL) {
return;
}
setUpIotStruct(iotDev);
char* msg;
size_t msgSize;
hum_temp->get_temperature(&gtemp); // get Temp
hum_temp->get_humidity(&ghumid); // get Humidity
pressure->get_pressure(&gpress); // get pressure
iotDev->Temperature = CTOF(gtemp);
iotDev->Humidity = (int)ghumid;
iotDev->Pressure = (int)gpress;
printf("(%04d)",msg_sent++);
msg = makeMessage(iotDev);
msgSize = strlen(msg);
sendMessage(iotHubClientHandle, msg, msgSize);
free(msg);
iotDev->Tilt &= 0x2;
/* schedule IoTHubClient to send events/receive commands */
IOTHUB_CLIENT_STATUS status;
while ((IoTHubClient_LL_GetSendStatus(iotHubClientHandle, &status) == IOTHUB_CLIENT_OK) && (status == IOTHUB_CLIENT_SEND_STATUS_BUSY))
{
IoTHubClient_LL_DoWork(iotHubClientHandle);
ThisThread::sleep_for(100);
}
deleteOK.wait_all(0x1);
free(iotDev);
IoTHubClient_LL_Destroy(iotHubClientHandle);
powerDown();
ThisThread::sleep_for(300000);
}
return;
}
I know PSM is probably the way to go since powering on/off the device draws a lot of power but it would be useful if someone had an idea of what is happening here.
2) putting the device to PSM between sending messages
The BG96 library in the example code I'm using doesn't have a method to turn on PSM so I tried to implement my own. When I tried to run it, it basically runs into an exception right away so I know it's wrong (I'm very new to embedded development and have no prior experience with AT commands).
/** ----------------------------------------------------------
* this is a method provided by current library
* #brief Tx a string to the BG96 and wait for an OK response
* #param none
* #retval true if OK received, false otherwise
*/
bool BG96::tx2bg96(char* cmd) {
bool ok=false;
_bg96_mutex.lock();
ok=_parser.send(cmd) && _parser.recv("OK");
_bg96_mutex.unlock();
return ok;
}
/**
* method I created in an attempt to use PSM
*/
bool BG96::psm(void) {
return tx2bg96((char*)"AT+CPSMS=1,,,”00000100”,”00000001”");
}
Can someone tell me what I'm doing wrong and provide any guidance on how I can achieve my goal of having my device run on battery for longer?
Thank you!!
I got Power Saving Mode working by using Mbed's ATCmdParser and the AT+QPSMS commands as per Quectel's docs. The modem doesn't always go into power saving mode right away so that should be noted. I also found that I have to restart the modem afterwards or else I get weird behaviour. My code looks something like this:
bool BG96::psm(char* T3412, char* T3324) {
_bg96_mutex.lock();
if(_parser.send("AT+QPSMS=1,,,\"%s\",\"%s\"", T3412, T3324) && _parser.recv("OK")) {
_bg96_mutex.unlock();
}else {
_bg96_mutex.unlock();
return false;
}
return BG96Ready(); }//restarts modem
To send a message to Azure, the modem will need to be manually woken up by driving the PWRKEY to start bi-directional communication, and a new client handle needs to be created and torn down every time since Azure connection uses keepAlive and the modem will be unreachable when it's in PSM.

Is there any limitations to number of running processes on host?

In platform file I have only one host:
<host id="Worker1" speed="100Mf" core="101"/>
Then in worker.c I create 101 (or > 100) processes expecting that each on each core one process will be launched. But I noticed that only 100 first processes able to execute task or write with XBT_INFO:
int worker(int argc, char *argv[])
{
for (int i = 0; i < 101; ++i) {
MSG_process_create("x", slave, NULL, MSG_host_self());
}
return 0;
}
int slave(){
MSG_task_execute(MSG_task_create("kotok", 1e6, 0, NULL));
MSG_process_kill(MSG_process_self());
return 0;
}
Other processes above 100 first ones are unable to manage and kill:
[ 1.000000] (0:maestro#) Oops ! Deadlock or code not perfectly clean.
[ 1.000000] (0:maestro#) 1 processes are still running, waiting for something.
[ 1.000000] (0:maestro#) Legend of the following listing: "Process <pid> (<name>#<host>): <status>"
[ 1.000000] (0:maestro#) Process 102 (x#Worker1): waiting for execution synchro 0x26484d0 (kotok) in state 2 to finish
UPDATE
Here some code functions are:
main
int main(int argc, char *argv[])
{
MSG_init(&argc, argv);
MSG_create_environment(argv[1]); /** - Load the platform description */
MSG_function_register("worker", worker);
MSG_launch_application(argv[2]); /** - Deploy the application */
msg_error_t res = MSG_main(); /** - Run the simulation */
XBT_INFO("Simulation time %g", MSG_get_clock());
return res != MSG_OK;
}
deployment.xml
<?xml version='1.0'?>
<!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid/simgrid.dtd">
<platform version="4">
<process host="Worker1" function="worker">
<argument value="0"/>
</process>
</platform>
There is actually an internal limit for the size of a maxmin system (the core of SimGrid), which is 100, and may be hit in this case. I just added a flag to make this limit configurable. Could you pull the last version, and try setting maxmin/concurrency_limit to 1000 and see if it fixes your issue?
the number of process that can be started on a host has nothing to do with the number of cores. As on a real machine, you can have several processes running "simultaneously" thanks to time sharing mechanisms. It's the same here. when the number of running processes is greater than the number of cores (be it 1 or more), they have to share the resources.
The cause of your issue is elsewhere, but you do not provide a full minimal working example (main? deployment file?) and it's hard to help.

List of all running processes in Contiki OS

is there a possibility to list all running processes in contiki os and output the result on the debugging output (i.e. UART) ?
Insert this in the contiki platform.c and main():
struct process *p;
uint8_t ps;
int n;
int
main(void) /*contiki main() here */
{
n=0;
while(1)
{
//...
//...
/*************************************************************/
if(n==100)
{
uint8_t ps=process_nevents();
PRINTF("there are %u events in the queue", ps);
PRINTF("\n\n");
PRINTF("Processes:");
for(p = PROCESS_LIST(); p != NULL; p = p->next)
{
char namebuf[30];
strncpy(namebuf, PROCESS_NAME_STRING(p), sizeof(namebuf));
PRINTF("%s", namebuf);
PRINTF("\n\n");
n=0;
}
}
n +=1;
/*********************************************************************/
//...
//...
}
return 0;
}
this will output the running processes every 100th iteration of the main loop
if you use UART as debugging port you have to redirect the output of PRINTF() to the correct port by i.e. on atmega128rfa1
/* Second rs232 port for debugging or slip alternative */
rs232_init(RS232_PORT_1, USART_BAUD_9600,USART_PARITY_NONE |
USART_STOP_BITS_1 | USART_DATA_BITS_8);
/* Redirect stdout */
/* #if RF230BB_CONF_LEDONPORTE1 || defined(RAVEN_LCD_INTERFACE) */
rs232_redirect_stdout(RS232_PORT_1);
the contiki shell source code contains very useful commands that can easily be used for debugging without using the entire shell, see
http://anrg.usc.edu/contiki/index.php/Contiki_Shell

SSH fork and children

I have a program where i ssh into a server and gets data. Here is the code... I fork it and the child executes the query and the parent waits for the child for a predetermined amount of time (in function timeout) and then kills the child. I did that part because sometimes, i am not exactly sure why, but the ssh connection stops and doesnot exit. That is there is a "ssh -oConnectTimeout=60 blah blah" in the processes list for a long and the timeout function doesnt seem to work. What am i doing wrong here? The last time this problem occured, there was an ssh in process list for 5 days and still it didnot timeout and the program had stopped because it was waiting for the child. There are those extra wait() functions because previously i was getting a lot of defunct processes a.k.a zombies. So i took the easy way out..
c = fork();
if(c==0) {
close(fd[READ]);
if (dup2(fd[WRITE],STDOUT_FILENO) != -1)
execlp("ssh", "ssh -oConnectTimeout=60", serverDetails.c_str(), NULL);
_exit(1);
}else{
if(timeout(c) == 1){
kill(c,SIGTERM);
waitpid(c, &exitStatus, WNOHANG);
wait(&exitStatus);
return 0;
}
wait(&exitStatus);
}
This is the timeout function.
int timeout(int childPID)
{
int times = 0, max_times = 10, status, rc;
while (times < max_times){
sleep(5);
rc = waitpid(childPID, &status, WNOHANG);
if(rc < 0){
perror("waitpid");
exit(1);
}
if(WIFEXITED(status) || WIFSIGNALED(status)){
/* child exits */
break;
}
times++;
}
if (times >= max_times){
return 1;
}
else return 0;
}
SIGTERM just asks for a polite termination of the process. If it's got stuck, then it won't respond to that, and you'll need to use SIGKILL to kill it. Probably after trying SIGTERM and waiting a little while.
The other possibility is that it's waiting for the output pipe to the parent process to not be full - maybe there's enough output to fill the buffer, and the child is waiting on that rather than the network.