CAN error counters and interrupts - embedded

I'm using the bxCAN peripheral of an STMF3 uC in an environment where
1.) it is essential that the node is detached from the network once the REC/TEC has reached the warning level (waiting for the bus-off condition is not an option)
2.) the baud rate of the host network is unknown
3.) the connection might be sporadic as the node is connected by the user
Due to 1.) the STM32 HAL CAN driver is used in IT mode and whenever the called with the EWG flag set, the error callback shuts down the transceiver and deinitializes the bxCAN. In case the REC is over the limit, it is easily recovered by configuring the bxCAN in silent mode, assuming there is traffic on the CAN. However, if the TEC is over the limit, the bxCAN won't be able to transmit an other frame as the error interrupt will be instantly triggered once enabled -> there we are in a deadlock.
I tried decrementing the TEC by transmitting frames in silent loopback mode but successful transmissions do not affect the TEC in this mode it seems.
I suppose the question is not specific to this peripheral but valid for other CAN implementations.
Any suggestions are welcome.

I have implemented a work-around that seems to work fine, with the following requirements:
1.) whenever the CAN error ISR is triggered, it disconnects the node from the bus (the transceiver is powered off)
2.) not all interrupt sources are enabled, only the ones that are of higher severity than the last error state (e.g. in PASSIVE state the WARNING and PASSIVE interrupts are disabled and the BUSOFF interrupt is enabled)
3.) the last error state and thus the interrupt sources are updated whenever a.) an error ISR is triggered or b.) polling the CAN peripheral with a high frequency shows change in the error state
4.) whenever attempting a connection to the bus the REC must heal in listen-only mode first. For this, traffic is required on the bus.
With these requirements implemented the node is able to fail silently but recover to normal operation.

Related

Issue with usb deregistration on linux kernel

I am using Linux 4.19.55 armv7l on a omap3 processor. On my target there is a usb modem that gets power from a gpio pin value (defined under /sys/class/gpio). There are occasions when I change the value parameter of this gpio pin to bring down the hardware and while doing so I frequently get an error (thrown by musb_handle_intr_disconnect from inside drivers/usb/musb/musb_core.c ) as under:
"musb_handle_intr_disconnect 843: unhandled DISCONNECT transition (a_idle)"
I tried debugging the issue by mounting debugfs and capturing data from the concerned bus by using usbmon. Bus id is identified from lsusb output and confirmed by observing /sys/kernel/debug/usb/devices. I observe that usbmon is unable to capture data whenever the mentioned error shows up. In a no error scenario the usbmon does capture the traffic from the concerned bus. Please help how to debug this issue.
Just checked that a commit on kernel branch fixes this issue which is present inside the states handled by the glue layer. This is the required commit

Does the operation of the CAN peripheral in STM32 wait for the execution of the ISR routine code?

I'm developing a stack layer on microcontroller STM32L433 that uses the CAN protocol; a fundamental part of the stack is the authentication of the devices.
During authentication it can occur that two (or more) devices start to send a CAN message (authentication message) with the same identifier and different payload (true random value). In this case every device should be able to detect if this message was sent first from another device.
I have studied this case and three situations can occur:
the devices start to send message at the same time; in this case only one device is able to sent the message because all others devices detect one error and then abort the transmission.
only one device is able to send the message and occupy the bus before all others devices load the transmission MAILBOX of the CAN peripheral, or before the CAN peripheral of the others devices set the message that is going to be sent in the SCHEDULED state.
In this case, the devices that have not been able to send the message will receive the reception interrupt; within the ISR routine of reception I'm able to abort the transmission.
only one device are able to send the message and occupy the bus and all others CAN peripherals of others devices have message in SCHEDULED state and are waiting that bus become idle.
In this case the devices that have not been able to send the message will receive the reception interrupt. Also in this situation I thought to stop the transmission within the ISR routine of reception (like situation 2) ), but I'm not sure that this is guaranteed for all messages because if the CAN peripheral sets the message that is going to be sent in the TRANSMIT state before the code inside ISR is executed, the operation of abort will have no effect.
My question is (related to the situation 3): Is the message in the transmission MAILBOX in the SCHEDULED state set in the TRANSMISSION state after that the code in the receiving ISR routine is executed or is this thing not guaranteed?
To answer on your third case first, no it is not guaranteed that your message is not on the bus, while receiving. Because interrupts might have some latency too, and within this time, the mailbox might be able to go ahead with transmission.
Your "authentication" also sounds a bit troublesome, since nobody from outside could also actually decide which ECU was actually the one that won the arbitration and actually sent that specific message.
We have ECUs in vehicles which decide at runtime, according to certain methods, where they are mounted by pin and some CAN reception, but only in listen mode. TX is actually disabled in the stack. After that, detection has completed, we switch configurations and restart the communications stack and further initialize the software going up.
But these "setups" are usually defined beforehand, e.g. due to master/slave (vehicle/private bus communication), or maybe some connector pins connected to GND / OPEN / UBAT, or maybe some bus message which tells on which bus it is on.
That seems to be more reliable than your method.

Suspend operation of lwIP Raw API

I am working on a project using a Zynq (Picozed devboard). The application is run bare-metal, uses lwIP TCP in RAW mode and basically behaves like this:
Receive a batch of data via Ethernet, which is stored in RAM.
Process the batch of data.
Send back the processed data via Ethernet.
The problem is, I need to measure the execution time of the processing part. However, running lwIP in RAW mode forces me to call tcp_fasttmr() and tcp_slowtmr() every 250/500 ms, which makes accurate measurement pretty hard. Whenever I'm not calling the tcp_tmr() functions for some time, I start repeatedly receiving error messages via UART ("unable to alloc pbuf in recv_handler"). It seems this is called from some ISR related to error handling, but I cannot really find the exact location.
My question is, how do I suspend the network functionality so I don't need to call tcp_tmr() periodically? I tried closing the connection and disabling the interface (netif_set_down()) and disabling the timer interrupt, but it still seems to have no effect on my problem.
I don't know anything about that devboard or the microcontroller on it but you should have an ethernetif.c (lwIP port) file which should contain the processing of an Ethernet receive interrupt or similar. This should be calling the lwIP function netif->input with a packet to process.
Disabling the interface won't stop this behaviour, it will just stop the higher level processing of the packet. If you are only timing how long the execution time is for debugging, you could try disabling the Ethernet receive interrupt and stop calling tcp_tmr until you have processed the packets.

understanding the concept of running a program in interrupt handler

Early Cisco routers running IOS operating system enhanced their packet processing speed by doing packet switching within the interrupt handler instead in "regular" operating system process. Doing packet processing in interrupt handler ensured that context switching within operating system does not affect the packet processing. As I understand, interrupt handler is a piece of software in operating system meant for handling the interrupts. How to understand the concept of packet switching done within the interrupt handler?
use of interrupts is preferred when an event requires some immediate attention by the operating system, or a program which installed an interrupt service routine. This as opposed to polling, where software checks periodically whether a condition exists, which indicates that the event has occurred.
interrupt service routines aren't commonly meant to do a lot of work themselves. They are rather written to reach their end as quickly as possible, so that normal execution can resume. "normal execution" meaning, the location and state previous processing was interrupted when the interrupt occurred. reason is that it must be avoided that the same interrupt occurs again while its handler is still executed, or it may be ignored, or lead to incorrect results, or even worse, to software failure (crashes). So what an interrupt service routine usually does is, reading any data associated with that event and storing it in a queue, signalling that the queue experienced mutation, and setting things such that another interrupt may occur, then resume by restoring pre-interrupt context. the queued data, associated with that interrupt, can now be processed asynchronously, without risking that interrupts pile up.
The following is the procedure for executing interrupt-level switching:
Look up the memory structure to determine the next-hop address and outgoing interface.
Do an Open Systems Interconnection (OSI) Layer 2 rewrite, also called MAC rewrite, which means changing the encapsulation of the packet to comply with the outgoing interface.
Put the packet into the tx ring or output queue of the outgoing interface.
Update the appropriate memory structures (reset timers in caches, update counters, and so forth).
The interrupt which is raised when a packet is received from the network interface is called the "RX interrupt". This interrupt is dismissed only when all the above steps are executed. If any of the first three steps above cannot be performed, the packet is sent to the next switching layer. If the next switching layer is process switching, the packet is put into the input queue of the incoming interface for process switching and the interrupt is dismissed. Since interrupts cannot be interrupted by interrupts of the same level and all interfaces raise interrupts of the same level, no other packet can be handled until the current RX interrupt is dismissed.
Different interrupt switching paths can be organized in a hierarchy, from the one providing the fastest lookup to the one providing the slowest lookup. The last resort used for handling packets is always process switching. Not all interfaces and packet types are supported in every interrupt switching path. Generally, only those that require examination and changes limited to the packet header can be interrupt-switched. If the packet payload needs to be examined before forwarding, interrupt switching is not possible. More specific constraints may exist for some interrupt switching paths. Also, if the Layer 2 connection over the outgoing interface must be reliable (that is, it includes support for retransmission), the packet cannot be handled at interrupt level.
The following are examples of packets that cannot be interrupt-switched:
Traffic directed to the router (routing protocol traffic, Simple Network Management Protocol (SNMP), Telnet, Trivial File Transfer Protocol (TFTP), ping, and so on). Management traffic can be sourced and directed to the router. They have specific task-related processes.
OSI Layer 2 connection-oriented encapsulations (for example, X.25). Some tasks are too complex to be coded in the interrupt-switching path because there are too many instructions to run, or timers and windows are required. Some examples are features such as encryption, Local Area Transport (LAT) translation, and Data-Link Switching Plus (DLSW+).
More here: http://www.cisco.com/c/en/us/support/docs/ios-nx-os-software/ios-software-releases-121-mainline/12809-tuning.html

How can the processor recognize the device requesting the interrupt?

1) How can the processor recognize the device requesting the interrupt?
2) Given that different devices are likely to require different ISR, how can the processor obtain the starting address in each case?
3) Should a device be allowed to interrupt the processor while another interrupt is being serviced?
4) How should two or more simultaneous interrupt requests be handled?
1) How can the processor recognize the device requesting the interrupt?
The CPU has several interrupt lines, and if you need more devices than there are lines there's an "interrupt controller" chip (sometimes called a PIC) which will multiplex several devices and which the CPU can interrogate.
2) Given the different devices are likely to require different ISR How can the pressor obtain the starting address in each case?
That's difficult. It may be by convention (same type of device always on the same line); or it may be configured, e.g. in the BIOS setup.
3) Should a device be allowed to interrupt the processor while amother interrupt is being services?
When there's an interrupt, further interrupts are disabled. However the interrupt service routine (i.e. the device-specific code which the CPU is executing) may, if it's willing, reenable interrupts if it's willing to be interrupted.
4) How should two or more simultanement interrupt requests be handled?
Each interrupt has a priority: the higher-priority interrupt is handled first.
The concept of defining the priority among devices so as to know which one is to be serviced first in case of simultaneous requests is called priority interrupt system. This could be done with either software or hardware methods.
SOFTWARE METHOD – POLLING
In this method, all interrupts are serviced by branching to the same service program. This program then checks with each device if it is the one generating the interrupt. The order of checking is determined by the priority that has to be set. The device having the highest priority is checked first and then devices are checked in descending order of priority.
HARDWARE METHOD – DAISY CHAINING
The daisy-chaining method involves connecting all the devices that can request an interrupt in a serial manner. This configuration is governed by the priority of the devices. The device with the highest priority is placed first.