How do PCIe error messages find it's root port? - hardware

Simple question, but I cannot find the answer in the spec nor in the mindshare book. MSI has it's capabilities that tell the device where to send their interrupt messages. Is there a similar register telling the device where to send the error messages? Is there some dedicated address/bdf that would have some sort of "pass upstream" meaning? Does it always use all zeroes for destination and root port picks it up on the way?
Thanks

Section 2.2.4 of the PCIe spec (rev 4.0) describes address- and ID-based routing. Memory reads and writes are routed based on the memory address. Completions are routed by ID.
Section 2.2.8 describes the routing of messages, including error messages. In particular, see Table 2-17 on page 105, section 2.2.8.3, and Table 2-21. Table 2-21 shows that the Routing field for Error Signaling Messages is always 0, indicating that they are routed toward the Root Complex.
The MSI capabilities don't have any special information about PCIe routing. An MSI is performed using a memory write transaction, so it is routed by address.

Related

Instance Mapping - nServiceBus

We can use “Instance Mapping” to route message to same instance hosted on multiple physical servers. What would be the impact if one of the physical machine goes down due to any reason? Will nServiceBus framework will start routing message to remaining active “physical” machine?
Regards
You're talking about MSMQ, which uses store & forward. It's first stored locally on the server, before it's forwarded to the actual machine.
There are two options to scale out
Distributor
Sender Side Distribution
You chose Sender Side Distribution, I assume from your question, using endpoint instance mapping. In that same document, there's a section about the limitations which mentions:
Sender-side distribution does not use message processing confirmations (the distributor approach). Therefore the sender has no feedback on the availability of workers and, by default, sends the messages in a round-robin behavior. Should one of the nodes stop processing, the messages will pile up in its input queue. As such, nodes running in sender-side distribution mode require more careful monitoring compared to distributor workers.
So the messages keep being sent to the machine that is down. If it is entirely unreachable, the messages will remain in the Outgoing queue on the sender machine. Otherwise they'll be stored on the incoming queue on the processing machine.

What's the difference between MQ(RabbitMQ,ActiveMQ...) and network library(ACE, Asio, libevent...)?

Currently, we plan to upgrade our product to use MQ(RabbitMQ or ActiveMQ) for message transfer between server and client. And now we are using a network lib(evpp) for doing so.
Because I don't use MQ before, so excpet for a lot of new features of MQ, I can't figure out the essential difference between them, and don't know exactly when and where should we use MQ or just use network library is fine.
And the purpose that we want to use MQ is that we want to solve the unreliability of communication, such as message loss or other problems caused by unstable network environment.
Hope there is someone familiar with both of them could release my confusion. Thanks for advance.
Message queuing systems (MQ, Qpid, RabbitMQ, Kafka, etc.) are higher-layer systems purpose-built for handling messages reliably and flexibly.
Network programming libraries/frameworks (ACE, asio, etc.) are helpful tools for building message queueing (and many other types of) systems.
Note that in the case of ACE, which encompasses much more than just networking, you can use a message queuing system like the above and drive it with a program that also uses ACE's classes for thread management, OS abstraction, event handling, etc.
Like in any network-programming, when a client sends a request to the server, the server responds with a response. But for this to happen the following conditions must be met
The server must be UP and running
The client should be able to make some sort of connection between them
The connection should not break while the server is sending the response to the client or vice-versa
But in case of a message queue, whatever the server wants to tell the client, the message is placed in a message-queue i.e., separate server/instance. The client listens to the message-queue and processes the message. On a positive acknowledgement from the client, the message is removed from the message queue. Obviously a connection has to made by the server to push a message to the message-queue instance. Even if the client is down, the message stays in the queue.

Archiving Windows Server Service Bus messages

The MSDN documentation for the BrokeredMessage.Complete method (http://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.brokeredmessage.complete.aspx) describes the method as this: "Completes the receive operation of a message and indicates that the message should be marked as processed and deleted or archived."
In my use of this method I've only seen the message deleted once it is processed. This is the one and only instance I've seen in the MSDN documentation, blogs, or anywhere else about Service Bus being capable of archiving old messages.
I could archive the message myself as part of my code that reads and processes a message and then marks it complete. But is it possible to make Windows Server Service Bus archive completed messages for me? If so, how do you turn on and configure this feature?
In case the difference matters, I am using the locally hosted Windows Server Service bus, not the Azure version.
No, Service Bus doesn't archive your messages. I'm going to follow up w/ the documentation folks on what that was supposed to express.

ARP Timeouts. Why fixed periodic?

This one's been bugging me for years.
Basic question: Is there some reason ARP has to be implemented with fixed timeouts on ARP cache entries?
I do a lot of work in Real Time ciricles. We do most of our inter-system communications these days on dedicated UDP/IP links. This for the most part works reliably in Real Time, but for one nit: ARP entry timeouts.
The way typical implementations do ARP is the following:
When client asks to send an IP packet to an IP address with an unkown MAC address, instead of sending that IP packet, the stack sends out an ARP request. If an upper layer (TCP) does resends, that's no problem. But since we use UDP, the original message is lost. At startup time this is OK, but in the middle of operation this is a Bad Thing™.
(Dynamic) ARP table entries are removed from the ARP table periodicly, even if we just got a packet from that system a millisecond ago. This means the Bad Thing™ happens to our system regularly.
The obvious solution (which we use religously) is to make all the ARP entries static. However, that's a royal PITA (particularly on RTOS's where finding an interface's MAC address is not always a matter of a couple of easy GUI clicks).
Back when we wrote our own IP stack, I solved this problem by never (ever) timing out ARP table entries. That has obvious drawbacks. A more robust and perfectly reasonable solution might be to refresh the entry timeout whenever a packet from the same MAC/IP combo is seen. That way an entry would only get timed-out if it hadn't communicated with the stack in that amount of time.
But now we're using our vendor's IP stack, and we're back to the stupid ARP timeouts. We have enough leverage with this vendor that I could perhaps get them to use a less inconvienient scheme. However, the universality of this brain-dead timeout algorithm leads me to believe it might be a required part of the implementation.
So that's the question. Is this behavior somehow required?
RFC1122 Requirements for Internet Hosts discusses this.
2.3.2.1 ARP Cache Validation
An implementation of the Address Resolution Protocol (ARP)
[LINK:2] MUST provide a mechanism to flush out-of-date cache
entries. If this mechanism involves a timeout, it SHOULD be
possible to configure the timeout value.
...
DISCUSSION:
The ARP specification [LINK:2] suggests but does not
require a timeout mechanism to invalidate cache entries
when hosts change their Ethernet addresses. The
prevalence of proxy ARP (see Section 2.4 of [INTRO:2])
has significantly increased the likelihood that cache
entries in hosts will become invalid, and therefore
some ARP-cache invalidation mechanism is now required
for hosts. Even in the absence of proxy ARP, a long-
period cache timeout is useful in order to
automatically correct any bad ARP data that might have
been cached.
Networks can be very dynamic; DHCP servers can assign the same IP address to different computers when old lease times expire (making current ARP data invalid), there can be IP conflicts that will never be noticed unless ARP requests are periodically made, etc.
It also provides a mechanism for checking if a host is still on the network. Imagine you're streaming a video over UDP to some IP address 192.168.0.5. If you cache the MAC address of that machine forever, you'll just keep spamming out UDP packets even if the host goes down. Doing an ARP request every now and then will stop the stream with a destination unreachable error because no one responded with a MAC for that IP.
It originated in distrust of routing protocols, especially in the non-Ethernet world (especially MIT's CHAOS networks). Chris Moon, one of the early "ARPAnauts" was quoted specifically about this in the original ARP RFC.
You can, of course, keep the other guys' ARP caches from timing out by proactively broadcasting your own ARP announcements. Most Ethernet layers will accept gratuitous ARP responses into their caches without trying to correlate them to ARP requests they have previously sent.

upnp, device presentation address

i am programming something about upnp and i would like to clear something :)
When i search with multicast M-Search message, devices must respond, with something like this(it is not complete message, i shortened it)
HTTP/1.1 200 OK
CACHE-CONTROL: max-age = seconds until advertisement expires
DATE: when response was generated
EXT:
LOCATION: URL for UPnP description for root device
So, location is some IP and port, from where i can get description. Now, i would like to search for specific device with unicast. My question is: can i use this address when i am trying to search for this device using UNICAST search message? or devices are listening on another address for unicast msearch messages? :) I have read upnp device architecture pdf file
Yes you can and must continue with unicast. LOCATION is not only "some IP and port" but a full resource location of device description. You can't do multicast to a single address :) There is no more "searching for this device" to do. You simply ask for that URL with HTTP GET. And you will basically get some more URLs of the specific services.
If you are "programming something about upnp" and having this kind of fundamental questions, i recommend downloading UPnP specifications bundle and reading document UPnP-arch-DeviceArchitecture. It describes in understandable steps, how the searching and querying phase of UPnP works.
Update upon OP's clarification:
Rediscovery of a device happens as unicast request to the IP known from original response (to multicast SEARCH) and either the standard port 1900 or a specific port, if the device announced itself via NOTIFY multicast message with a SEARCHPORT.UPNP.ORG value. So if the device needs rediscovery and didn't announce itself, the port defaults to 1900. See also the next page 32 in UPnP device architecture, description of HOST header field.
Note however, that such rediscovery should not be necessary, or very rarely. UPnP devices are expected to announce themselves upon connecting to the network with NOTIFY multicast packet of type ssdp:update, and type ssdp:byebye upon disconnecting. Furthermore, most of the devices have evented variables to which control points should subscribe automatically, and renew their subscription by a fixed lease time (by default 30 minutes). So an abrupt disappearance of device will be discovered anyway (by failed subscription renewal).