Implementing an interval while generating flits in gem5 - gem5

I'm trying to implement the following -
Consider sending packets from router 3 to 45. But one condition- interval between sending packets is 20 cycles.
That is after sending packet 1, wait for 20 cycles, then send packet 2, and so on...
However, this delay does not affect a packet that is to be sent from router 3 to any router other than 45.
For example, a packet P1 is sent from 3 to 45. Then, another packet P2 is to be sent from 3 to 45. So, it needs to wait for 20 cycles. In the meantime, a packet P3. is to be sent from 3 to 44. It is sent, now the next packet from 3 to 44 needs to wait for at least 20 cycles after P3 is sent.
The packets P2 and P4 should wait in a buffer before they are injected into router 3.
I request suggestions on how to implement this. My thought process is that the file NetworkInterface.cc can hold these in some sort of buffer similar to the flit buffer. However, I'm finding it difficult to formulate any further ideas. Any ideas (even better, code) would be very helpful.

Related

CANOpen network load higher than expected

I am working on a project with a master computer connected via a CANOpen network to 4 slaves.
At each time step, the computer receives a measurement message from each slave, and sends them a control message. In total, 4 messages are received and 4 messages are sent at each time sample.
The message sent is a PDO with 6 data bytes (8 bytes including COB-ID)
The message received is a PDO with 8 data bytes (10 bytes including COB-ID)
My CAN network is configured at 1Mbit/s, and I run my program at 1000 Hz (1 ms sampling time). As the total load resulting from the messages described is 576 bits/cycle, the total load expected in the network is 576kbit/s, or 57%.
What I see, however, is that:
The controlling computer measures a load of ~86% (with minima of 68% and peaks of 100%).
A USB CAN bus analyser I connect to the network registers a traffic
of messages (count-wise) that is around half of what I nominally
expect (i.e., 4 sent, 4 received each cycle, for 50 seconds should result in 50k messages, while I only see 18-25k). Moreover, I receive
1-2 error messages per cycle from the slave devices that the
network is overloaded. Before it is pointed out, even counting the
size of these messages as part of traffic wouldn't get close to
explain the anomaly in load.
What I'd like to know is whether my way of calculating the CANOpen network load is correct. For instance, are there any protocol-specific handshakes, CRCs, or any sort of extra bytes sent to make the network simply work? It's nothing I could see in the wiki page of CANOpen, but I do know there are such appendices to messages in the original CAN bus standard.
In a CAN message, there is more than the data to be transmitted.
There is also the arbitration ID (11- or 29bits, depending on whether you use CAN 2.0A or 2.0B), there is a 15 bit CRC, an 7 bit EOF marker, the control field and also some other reserved bits.
Depending on the data, there may also be stuff bits.
Using CAN2.0B and assuming 48 bits (6 bytes) of data, you will get a message size of roughly 132 bits and roughly 151 bits for your 64 bits messages.
Summing this up, you will get roughly 1132 bits per cycle which is too much for a 1Mbit/s bus and 1000 Hz.
Hope that helps.

How to send/receive variable length protocol messages independently on the transmission layer

I'm writing a very specific application protocol to enable communication between 2 nodes. Node 1 is an embedded platform (a microcontroller), while node 2 is a common computer.
Such protocol defines messages of variable length. This means that sometimes node 1 sends a message of 100 bytes to node 2, while another time it sends a message of 452 bytes.
Such protocol shall be independent on how the messages are transmitted. For instance, the same message can be sent over USB, Bluetooth, etc.
Let's assume that a protocol message is defined as:
| Length (4 bytes) | ...Payload (variable length)... |
I'm struggling about how the receiver can recognise how long is the incoming message. So far, I have thought about 2 approaches.
1st approach
The sender sends the length first (4 bytes, always fixed size), and the message afterwards.
For instance, the sender does something like this:
// assuming that the parameters of send() are: data, length of data
send(msg_length, 4)
send(msg, msg_length - 4)
While the receiver side does:
msg_length = receive(4)
msg = receive(msg_length)
This may be ok with some "physical protocols" (e.g. UART), but with more complex ones (e.g. USB) transmitting the length with a separate packet may introduce some overhead. The reason being that an additional USB packet (with control data, ACK packets as well) is required to be transmitted for only 4 bytes.
However, with this approach the receiver side is pretty simple.
2nd approach
The alternative would be that the receiver keeps receiving data into a buffer, and at some point tries to find a valid message. Valid means: finding the length of the message first, and then its payload.
Most likely this approach requires adding some "start message" byte(s) at the beginning of the message, such that the receiver can use them to identify where a message is starting.

How does TensorFlow sync tensors which share a buffer between different step-ids

I have a problem in my contrib implementation for distributed TensorFlow. Not to bother you with non-relevant details, the solution applies a certain message protocol in order to utilize RDMA writes directly from/to source/destination tensors, to save memory copies on the CPU.
Let's say I have 2 sides, A and B, and A wants to receive a tensor from B.
The protocol is as follows:
A sends a REQUEST message to B.
B lookups the tensor locally (BaseRendezvousMgr::RecvLocalAsync) and sends a META-DATA response to A.
A uses the meta data to allocate the destination tensor, and sends a ACK to B containing the destination address.
B receives the ACK, and performs a remote DMA write to the destination address.
Between the REQUEST and the ACK, B keeps the local tensor alive (and Ref() > 0), by saving it in a local map (the REQUEST copies the tensor to the local map, the ACK pops it from the map).
To validate my solution, I added a checksum calculation at each step. Occasionally, I see that the checksum changes between the REQUEST and the ACK. This happens when I run PS with two workers:
Line 1 is REQUEST to worker 0.
Line 2 is ACK to worker 0.
Line 3 is REQUEST to worker 1.
Line 4 is ACK to worker 1.
The last value on each line is the checksum. The errors happens about 50% of the times. I always see it on line 4.
I also saw that the problematic tensor has a shared buffer for all step-ids (this is a given. I can't control it). So it is very likely that some other thread changed the tensor's content between lines 3 and 4, which is something I want to prevent.
So the question is how? What prevented the content from changing between lines 1 and 2, and 2 and 3? To emphasize, the time elapsed between lines 3 and 4 is less than 0.04 seconds, while the time elapsed between 2 and 3 is almost 2.5 seconds.
Thanks for your help. Code will be posted if required.
Are you using tf.Variable for the shared buffer? If so using tfe.Variable (to enable reasonable read-write semantics) or tf.get_variable(..., use_resource=True) to construct it will make any synchronization issues go away.
Otherwise this is hard to understand without knowing more about the generating graph.

Recvfrom() return value

Im using UDP packets and I want to be cleared about some points :
1 - what exactly does "recvfrom" Returns ? I mean if i send a packet with size of 450 byte + 20 byte of IP header + 8 byte UDP header does recvfrom returns 478 bytes as a whole or there could be something like :
it received 10 bytes,300 bytes,100 bytes,68 bytes ?
2 - does the return value of "recvfrom" related to packet fragmentation ?
note :
* Im talking with the assumption that "recvfrom" was successful
* I chose 450 byte to be sure that Im less than the min MTU
For an UDP socket, recvfrom() reads the UDP data. So it returns 450 , provided you supply a buffer that is at least 450 bytes big.
If you supply a buffer that is smaller than the received data, the data will be truncated, and recvfrom() will read as much data as can fit in the buffer you give it.
The IP layer will be the part that fragments an UDP packet, on the receiving host it will reassemble it. This is transparent to the sending/receiving application.

Go back N protocol

I'm trying to implement the Go back N protocol on two separate client and server applications. Say my sequence numbers must fit 3 bits, so 2^3 = 8 max numbers, and 2^3 - 1 = 7 window size.
I initially send my whole window. The first two packets (0 and 1) are received correctly. Packet 2 is dropped. When the Receiver gets packets 3 through 6, it was expecting 2, so it must nack the packet it got saying it wants 2.
Sender Receiver
0 0
1 1
2 (packet dropped)
3 nack2
4 nack2
5 nack2
6 nack2
When the Sender receives the first nack2, it understands that 0 and 1 have been received (through piggybacking) and move its window forward, but it must also resend its window starting at sequence number 2 (so 2-3-4-5-6- and possibly 7-0). By the time the Sender receives the second nack2, it has already sent those packets. Because of the protocol, the Sender will again resend his entire window, including 2. Now the Receiver will possibly receive 2 (and the others), but in the second nack2 batch it will re-receive 2, which is out of sequence, will have to nack its expected packet, and so on. Am I correct in all these assumptions?
If I am, it seems to me that Go Back N is sending a lot more packets than Stop and Wait, especially the more you increase its window size. What am I not getting?
The solution I found to this problem was to simply use more bits to represent the sequence number and therefore have a larger MAX. If your MAX is 2 * Window size, then a delayed 2 cannot be misinterpreted as a proper ACK.