Canopen node become stuck in preop state - embedded

I have 2 nodes (x and y) on a can bus using canopen. Using a temp node "z" I send an nmt message to put all nodes in preop state and then a command to put y into operational state. I then send a bunch of extended id messages on the bus intended for node y, node x does not know of these in its dictionary. During the sending to y, node monitoring on node x says it is in preop state. All seems fine. Upon completing of sending data to node y I send a command to put all nodes into operational state. Node x is stuck in preop state according to its nmt state code. Debugging i found the rx fifo in canopen x is overflowing. It should be ignoring all these extended messages when in preop mode? I even tried in stopped mode with the same results of a stuck x. Whats going on here?

For any CAN bus node, you have to read all incoming messages continuously and ignore the ones of no interest. Filter settings in the CAN controller can help a bit, but to build rugged applications, you must always be prepared that any CAN message with any ID can appear at any time. The best way to ensure this is to always read rx fifo buffer continuously, and at each time keep reading until it is empty.
A CANopen node remains in pre-operational state as long as there are errors. Optionally, it may send out an EMCY message telling the nature of the error, and then another with all bits set to zero when the error is cleared. In which case the NMT master should wait until EMCY clear message, before sending out start remote node.

Related

Does "error frames" on CAN bus delay/ impair the communication?

The quote below is from a document by Texas Instruments.
The error frame is a special message that violates the formatting
rules of a CAN message. It is transmitted when a node detects an error
in a message, and causes all other nodes in the network to send an
error frame as well. The original transmitter then automatically
retransmits the message. An elaborate system of error counters in the
CAN controller ensures that a node cannot tie up a bus by repeatedly
transmitting error frames.
Also, this wikipedia page provides more information on error frames.
As mentioned in several answers (link1, link2), CAN bus is half-duplex, that is, the nodes cannot transmit and receive data at the same time.
In general, a modern car contains more than 50 ECUs (nodes) on a CAN network. In case of an error, " if " the nodes would send error frames one after another, the CAN BUS would be occupied for a quite long time.
So, what do I miss here? Do the nodes send their error frames at the same time/ simultaneously and the hardware solves that issue? What happens if a node transmitted a different or corrupted error frame?
The other nodes will not send error-frames one after the other in most cases. If there is an error on the bus, then it is very very likely that all nodes will perceive the error. They will all then send their error-frames at (close to) the same time. As they all expect to see "their" error-frame, it does not matter who gets there first.
In the (unusual case) of an error only being noted by one node (perhaps some transient within the ECU) it will transmit an error-frame. The other ECUs will react to this error-frame (which is "simply" a violation of the stuffing rules) with their own error-frames. But again, they will all see it at the same time and so the case described above applies. They will all transmit their "own" error-frame at about the same time.
As noted by #Lundin in the question comments, error-frames are very unusual, so the impact on the bus-loading is not of major concern.
I do not understand this part of your question:
What happens if a node transmitted a different or corrupted error frame?
A node "cannot" transmit a different error-frame - it would not be an error-frame. An error-frame being corrupted is very unlikely as it is a string of dominant bits, which are driven hard, and usually by several to many ECUs at a time. If it were to happen, I think (but would have to check the spec) the ECUs would notice this as another error and transmit another error-frame.
A node that repeatedly sends active error frames first goes into the "Warning" state and then later into the "Bus Off" state. This prevents a broken node from becoming a "bubbling idiot".
See Bosch CAN specification page 63

USB transaction error completion code for Address Device Command with BSR = 0

I am writing a driver for the xHCI and I have been having a problem with the address assignment.
Here's what I am doing:
Reset the controller by writing 1 to the USBCMD.HCRST and wait for USBSTATUS.CONTROLLER_NOT_READY flag to clear.
Set the maximum slots enabled in the config register to 16 (which is the max supported by my xHC).
Allocate the device context base address register on a 64 byte aligned memory location and write that address to the dcbaap register.
allocate scratchpad buffers, write their addresses in an array, then write the address of that array in the 0th entry of the dcbaap.
Allocate a command ring and write its address in CRCR.
Set the R/S bit and wait for the controller not ready flag CNR to clear.
Allocate event ring, and event ring segment table with one entry, and write the address of table in ERSTBA
Then the software waits until a device is connected on port 12, I connect the device to port 12 and I can see the port goes to the enabled state as indicated by PP, CCS, PED being 1, and PR and PLS being 0.
Submit an enable slot command and I get an event on the event ring with a success completion code and a slot ID.
allocate output and input device contexts data structures and initialize them to valid values.
allocate transfer ring and write its address to the input context.
Submit the first address device command with BSR = 1 and I get an event on the event ring with success completion code.
Here is where things go wrong: submit the second address device command with BSR = 0, and I get USB transaction error for the completion code.
In the xhci spec rev 1.2, it says the USB transaction error is the result of a USB3 DPP(Data Packet Payload) Error. I looked in the USB3.2 rev 1 and it said DPP error could happen due to any of the following:
1.CRC incorrect
2.DPP aborted
3.DPP missing
4.Data Length in the Setup DPH does not match the actual data payload length.
I've been trying on this for about 2-3 weeks now and getting nowhere. I did see a post here where someone had this issue with the EHCI and they fixed it by reducing their delays. I tried to do a similar thing but still didn't fix it.
Does anyone have an idea why am I getting this error or how do I go about fixing it?
Something I want to note is that if skip step 8, an continue to enable slot command, I still get the same results. So I still get a device slot and I still get a successful first Address Device command even though there is no device connected. Which makes me think that the usb device isn't even talking? I'm using a thumb drive, I tried different thumb drives and different ports and still the same issue. I'm using a USB 3.0 thumb drive on a SS port.

How to simulate a LIN Slave Node in CANoe

I cannot find a comprehensive guide on how to build a Synthetic LIN Slave in a CANoe configuration, so I would like to create one here.
Scenario: an ECU acts as LIN master and communicates with n LIN slaves. The goal is to be able to add a synthetic slave to the CANoe simulation acting as a replacer for one of the physical slaves. Since there is no way to dynamically activate or de-activate a LIN node, our setup would be of n-1 physical slaves and 1 synthetic slave, plus the Master. Here, Master is under test, and, in particular, we want to assess its ability to react to certain slave responses, by mocking the slave and triggering whatever frame is needed. Let's assume there will be a GUI or something for that, it is not in scope for the question.
I am able to add a new Node to the simulation setup, assign it to the LIN network and, if active, it connects to the red line indicating the simulated bus. An LDF was created and added to the configuration and I know the linFrame ID the node should communicate.
The node is to be simulated via CAPL script. I am stuck on the transmission part:
on ???
{
// This is my call: as LIN slave I should output something.
output(myLinFrame);
}
Where should I add my logic to update and transmit the message?
The basic I tried was to key-bind it, but the output would be on the next associated slot of the LDF, plus it's key-bound.
on key 'A'
{
// prepare new content...
output(myLinFrame);
}
This question relates to an older question of mine regarding LIN censorship.
Final note: I have very limited slots for CANoe licenses to test whatever code I come up with, so I need to prepare and research in advance.
In this scenario, should I use linUpdateResponse()?
You should create on linFrame ... event handlers.
These event handlers will be called, once a frame has been put to the bus.
Inside the event handler, you can use linUpdateResponse (or also output) to modify the frame which will be sent the next time, i.e. the call modifies does not send an immediate response but rather modifies the internal state of the slave so that a different frame is sent next time.

What is the difference between an error active node and an error passive node in CAN?

I understand the concept of TEC and REC counters in CAN. Will the error active node send active error frames upon detection of error?
Once the TEC count is above 127 then the error active node will become error passive. Does this mean it will start transmitting passive error frames?
Also, when other nodes detect that a node is transmitting active error frames, do they automatically transmit passive error frames? Can these nodes be referred to as error passive nodes?
This is my confusion which needs clarity.
Yes, it will stop sending out so-called active error frames with dominant bit sequences, and switch to recessive. Other nodes will not respond but increase their REC counter. Once the active error frame is sent, bus arbitration re-starts as usual, with the highest priority frame winning.
Quoting an article from CAN-CiA:
Fault confinement
The CAN data link layers detect all communication errors with a very high probability. A node detecting an error condition sends an Error Flag and discards the currently transmitted frame. All nodes receiving an Error Flag discard the message, too. In case of local failures, all other nodes recognize the Error Frame sent by the node(s) that detected it and sent by themselves a second time, which results in an eventually overlapping Error Frame. The active Error Frame is made of six dominant bits and an 8-bit recessive delimiter followed by the IMF. This local error globalization method guarantees network-wide data consistency, an important feature in distributed control systems.
If all errors are detected with a very high probability, permanent errors may lead to an unacceptable delay in transmitting messages. In the worst-case, all communication is aborted by means of Error Frames. In order to avoid this, the CAN protocol introduces two error counters: one for received messages (REC) and one for transmitted messages (TEC). They are increased and decreased according to the rules as specified in ISO 11898-1, the standard of the CAN data link layer protocols.
If one of the counters reaches 127, the node transits to error passive state. In this state, the node transmits passive Error Flags made of six recessive bits. This flag is overwritten by dominant bits of a transmitting node. This means that an error passive node can’t inform the other nodes about an incorrectly received frame. This is a critical situation from the viewpoint of the system. If a transmitting node permanently produces Error Flags, this would also delay and in the worst-case (high-prior message) block the other communication. Therefore, the node is forced into bus-off state, if the TEC reaches 256. In bus-off state, the node transmits only recessive bit-level. To transit to the error active state requires two conditions: a reset and the occurrence of 128 by 11 bit-times. This means that the remaining nodes are able to transmit 128 data frames before the node in bus-off recovers and integrates itself again as an error active node into the network.

Apache Storm Join Pattern - At least once

I'm implementing a bolt in Storm that receives messages from a RabbitMQ spout (https://github.com/ppat/storm-rabbitmq).
Each event I have to process in Storm arrives as two messages from Rabbit so I have a fieldsGrouping on the bolt so that the two messages arrive in the same bolt.
My first approach I would:
Receive the first Tuple and save the message in Memory
Ack the first tuple
When second Tuple arrived fetch the first from Memory and emit a new tuple anchored to the second from the spout.
This worked but I could loose messages if a worker died because I would ack the first tuple before getting the second and processing.
I changed this to:
Receive the first Tuple and save it in Memory
When second Tuple arrived fetch the first from Memory, emit a new tuple anchored to both input tuples and ack both input tuples.
The in-memory cache is a Guava cache with time expiration and when a Tuple is evicted due to timeout I will fail() it in the topology so that it gets reprocessed latter.
This seemed to work but when I did some tests I got to a situation where the system would stop fetching messages from the Rabbit Queue.
The prefetch on the queue is set to 5, and spout with setMaxSpoutPending at 7. In the Rabbit interface I see 5 Unacked messages.
In the storm logs I see the same Tuples being evicted from the cache over and over again.
I understand that the problem is that spout will only fetch 5 messages that are all the first part of a pair. I can increase the prefetch but that is no warranty that this will not happen in production.
So my question is: How to implement a join while handling these problems in Storm?
Storm does not provide a good solution for this... What you would need is a reliable storage that buffers the first tuple (ie, a stateful operator). Thus, you could ack the first tuple immediately and recover the state after a failure.
As far as I know, Trident supports some state handling. But I never used it.
As a second alternative, you could use a distributed key-value store (like Casandra) as buffer. Of course, this would be a hand-written solution, ie, you need to code all Casandra interactions by yourself.
Last but not least, you could switch to a stream processing system that does support stateful operators like Apache Flink. (disclaimer: I am a committer at Flink)