I'm learning how to use tf.records and in the official tutorial they mention you can print a tf.train.Example message (which is a primitive of the protobuf protocol if I get it right).
I understand that tf.records are used to serialize the data, and that they use the protobuf protocol in this case. I also understand that using tf.train.Feature, tf.train.Features and tf.train.Example one can convert the data into the right format.
My question is what does it mean to print a messege in this context? (the tutorial shows how to print an tf.train.Example message)
A message is classically thought of as a collection of bytes that are conveyed from one process/thread to another process/thread. Typically (but not necessarily), the collection of bytes means something to the sender and receiver, e.g. it's an object that has been serialised somehow (perhaps using Google Protocol Buffers). So, an object can become a message by serialising it and placing the bytes into an array that one might term a "message".
It's not necessarily the case the processes handling the collection of bytes will deserialise them. For example, a process that is simply going to pass them onwards down another connection need not actually deserialise them, if it already knows where the bytes are supposed to be sent.
The means by which a message is conveyed is typically some sort of queue / pipe / socket / stream / etc. Where it gets interesting is that most data transports of this sort are stream connections; whatever bytes you push in one end comes out the other. So, then, how to use those for sending messages?
The answer is that there has to be some way of demarcating between messages. There's lots of ways of doing that, but these days it makes far more sense to use something like ZeroMQ, which takes care of all that for you (and more besides). ZeroMQ is a library / protocol that allows a program to transfer a collection of bytes from one process/thread to another via stream connections, and ensure that the receiving program gets the collection in one nice and complete buffer. Those bytes could be objects serialised by Google Protocol Buffer, or serialised in some other way (there's lots). HTTP is also used as a way of moving objects around, e.g. a page of HTML.
So the pattern is object -> serialisation -> message buffer -> some sort of byte transport that demarcates one message from another -> message buffer -> deserialisation -> object.
An advantage of serialisations like Protocol Buffers is that the sender and receiver need not be written in the same language, or share anything at all except for the .proto file. Other approaches to serialisation often involves marking up class definitions in the program source code, which then makes it difficult to deserialise data in another language.
Also in languages like C/C++ one might get away with simply copying the bytes at the object's address from one place to another. This can be a total disaster if the destination is a different machine; endianness etc. can matter a lot. There are serialisation standards that get close to this, specifically Cap'n Proto (see this).
There are variations. Within a process, "passing a message" can simply mean passing ownership of an object around. Ownership can be by convention, i.e. if I've just written the object pointer to a message queue, I won't mutate the object anymore. I think in Rust it's even expressed by the language syntax, in that once object ownership has been given up the language won't let you mutate the object (worked out at compile time, part of what makes Rust so good). The net result looks like message transfer, but in fact all that's happened is a pointer (typically, 64bits) has been copied from A to B, not the entire data in the object. This is a lot faster.
EDIT
So, How Does a Message Transport Protocol Work?
It's worth digging into how something like ZeroMQ works. For it to be able to pass whole application messages across a stream connection, it needs operate some sort of protocol. That protocol is itself going to involve objects (Protocol Data Units) being "serialised" (well, converted to an agreed wire format), pushed through the stream connection, deserialised, and understood by the ZeroMQ library that's on the receiving end. And, when gets on down to it, ZeroMQ is using TCP (over a network), and that too is a protocol built on IP. And that goes on down to Ethernet frames.
So, there's protocols running atop protocols, running atop other protocols (in fact, this is the Layer Model of how computer interconnectedness works).
Why That Matters, and What Can Go Wrong
It's useful to bearing this protocol layering in mind. Sometimes, one might have a requirement to (for example), take very strong measures against buffer overflows, perhaps to prevent remote exploitation. That might be a reason to pick a serialisation technology that helps guard against such things - e.g. Protocol Buffers. However, when picking such a technology, one has to realise that the requirement is met provided that all of the protocol layerings are equally robust. There's no point using, say, Protocol Buffers and declaring oneself safe against buffer overflows, if the OS's IP stack is broken and exploitable.
This is well illustrated by the Heartbleed bug in OpenSSL (see here). This was caused effectively by a weakly specified protocol (see RFC6520); it's defined in English language, and requires the programmer to read this, code up the protocol by hand, and pay attention to all the strictures written in the document. The associated RFC5426 even says:
This document deals with the formatting of data in an external
representation. The following very basic and somewhat casually
defined presentation syntax will be used. The syntax draws from
several sources in its structure. Although it resembles the
programming language "C" in its syntax and XDR [XDR] in both its
syntax and intent, it would be risky to draw too many parallels. The
purpose of this presentation language is to document TLS only; it has
no general application beyond that particular goal.
The Heartbleed bug in OpenSSL was a result of the coding up of the English language spec being done wrong, and given that highlighted statement perhaps it's no great surprise. Applications that were using OpenSSL were wide, wide open to exploitation, even thought the applications themselves (e.g. Web servers) were very well written implementations of, say, HTTPS.
Now, had the designers of TLS chosen to use a decent and strict serialisation technology - perhaps even Google Protocol Buffers (plus some message demarcation) - to define the PDUs in TLS, it would have been far more likely that Heartbleed wouldn't have happened. Specifically, the payload_length field in a request / response would have been taken care of inside Google Protocol Buffers, thereby removing responsibility for handling the length of the payload from the developer.
What's interesting is to compare protocol specifications as written in RFCs with those that tend to be found in the world of telephony (regulated by the International Telephony Union). The ITU's specifications and tools are very "comprehensive" (that ought to be an acceptably neutral way of describing them). A lot of telephony uses ASN.1, which is not disimilar to (and substantially pre-dates) Google Protocol Buffers, but allows for very strict definitions of messages, requires pretty comprehensive tools to do it right, but is bang up to date (it even has JSON as a wire format these days).
"But", one points out, "what if the ASN.1 tools (or Google Protocol Buffers) has a bug?". Well indeed that is a problem, and that has indeed happened to ASN.1 (from one of the commercial ASN.1 tools vendors, can't rememeber which). But the point is that if there's one library that is widely used for defining lots of interfaces, then there's a greater chance of bugs being identified (I myself have found and reported bugs in commercial ASN.1 tools). Whereas if a messaging protocol is defined using, say, English language, there's only ever going to be a very few set of eyes on how well the developer has coded up the meaning of that English language.
Not Everyone Has Got the Message
What I find disappointing is that, across a large portion of the software world, there's still resistance to using tools like Google Protocol Buffers, ASN.1. There's also projects that, having identified the need for such things, go and invent their own.
One such example is dBus - which to be fair is pretty good. However they did go an invent their own serialisation technology for specifying dBus messages; I'm not sure what they gained over using something mature and off-the-shelf.
Google themselves, when they first announced Google Protocol Buffers to the world, were asked "Why didn't you use ASN.1?", and the Googler on the stage had to admit to never having heard of it. So, Googlers in Google hadn't used Google to Google for "binary serialisation technologies"; they'd just gone ahead and wrote their own, and GPB is missing a ton of useful features. Oh, the irony. They'd not even have had to write a toolset from scratch; they could have simply adopted and improved on one of the open source ASN.1 implementations.
Transliteration Problem
This fragmentation and proliferation causes problems. Say, for example, in your project you want to be able to transfer some of your messages into a dBus service on Linux. To do that, you've got a .proto defining your messages, which is great for communicating in/out of Tensor Flow, but fundamentally useless for dBus, which speaks its own format. You'd end up having something like
MyProtoMsg ipMsg;
MyEquivalentDBusMsg opMsg;
opMsg.field1 = ipMsg.field1;
opMsg.field2 = ipMsg.field2;
opMsg.field3 = ipMsg.field3;
and so on. Very laborious, very unmaintainable, and needlessly consumes resources. The other option would be simply to wrap up your GPB encoded messages in a byte array in a dBus message, but one feels that's missing the point (it bypasses any opportunity for dBus to assert that messages it's passing are correctly formed and within specifications).
If the world agreed on the One True Serialisation technology then the flexibility in object / message exchange would be fantastic.
If the question doesn't belong on stackoverflow, sorry for the noise. I couldn't find a better suiting site within stackexchange.
Question
There are definitions of:
Selective Forwarding Middlebox (abbreviated SFM) defined in RFC 7667 Section 3.7
Selective Forwarding Unit (abbreviated SFU) defined in WebRTC Glossary
What is the difference of these things? Are they essentially the same?
They are the same. The usage of SFU in a WebRTC context predates RFC 7667 and is hence a much more commonly used term (ironically the RFC itself still mentions 'SFU' in one place without defining the term).
See also this commit which does a simple replace of SFU with SFM.
I'd like to integrate a PLC with a computer. Set outputs and read inputs. I've looked at Modbus and its simple although if I want to act on the change in a input I would need to poll the input to detect the change. Are there any open and common protocols used by PLC's that would push/update on sensor/input change rather than requiring polling?
OPC UA (Unified Architecture) is an open protocol standard implemented on many PLCs with many PC client implementations available. It supports both "subscription" and "event" mechanisms, in addition to polling and other communication services.
Open and common, and also simple to implement, I don't think there are.
You should look for terms like "report by exception" and "unsolicited reporting". DNP3 for example has this feature, it's widely used in electrical applications, but it is not simple to implement, nor is it open.
Depending on your controller, maybe you can look at MQQT, there is support for Arduinos and RPi's, and also industrial controllers like WISE-5231
The two previous answer's are decent. As Nelson mentioned, you haven't specified which controller you are using. You also haven't mentioned what on the computer you'd like to integrate with the PLC. Beckhoff's TwinCAT PLCs can use MQTT, OPC-UA as well as a host of other protocols. They also offer libraries to use their ADS protocol.
As part of ADS, you can either set up an ADS server on your machine (it's very easy) and have your PLC's write to the server. The more typical way is to subscribe to variables/structure in the PLC using this ADS mechanism from within your program's runtime. An event will be fired when the variable struct changes (you can specify how much it should have changed by, if an analog value).
The method you pick is probably dictated by your architecture. If you have many PLCs, I would set up an ADS server in your computer, if you have a handful, subscribe from your program. Of course, you can mix and match these approaches too.
Here is a page of examples: https://infosys.beckhoff.com/english.php?content=../content/1033/tc3_adssamples_net/html/tcsample_net_intro.htm&id=8269274592628480035
Is there a "smart" UDP protocol analyzer that can help me reverse engineer a message based protocol?
I'm using Wireshark to do the sniffing, but if there's a tool that can detect regularities in the protocol (repeated strings, bits of the protocol that are CRC/Checksum or length, ...) and aid the process that would help.
You are asking for a universal inference engine. The best way to try to recover the protocol (assuming you are in a jurisdiction that permits this) is to understand the underlying message transfer from the beginning of a session, and then trying to manually simulate the behaviour of each party through a sequence of ping-pong message trials. This way you develop an understanding of the message structures and their functioning.
Using the UDP frame boundaries is a good place to start looking for structure.
If you have no documentation, you will find that even if you gain a good understanding of the protocol, expect to be surprised many times during the project.
If you can, have your existing systems carry out exactly the scenario you need to use, and then simply replicate the same sequence with payload (and any checksum) changes only. This way you can possibly achieve the requirement without a comprehensive understanding of the protocol.
For an example of the effort in doing this you could look at a historical review of the Samba project at A bit of history and a bit of fun.
Is there any module/definition available for a class/schema for representing the topology, connection, access details etc of networking devices ? The intent is to use this for automation, and to manage routers/servers as objects rather than as tcl keyed lists/arrays which gets unwieldy.
Look at SNMP (Simple Network Management Protocol). Most network devices and services, from IIS to Cisco routers, provide some sort of SNMP interface that may provide the capabilities for which you are searching. Specific implementations and capabilities vary between vendors and devices, but the protocol is standardized and very widely implemented.
The word topology in the context of communication nework refers to the way in which how devices are connectd over a network. Its important types are
BUS
RING
STAR
etc
Look into MIB2 (SNMP based). You should note there exists 10's of different MIBs to representing various networking technologies / solutions. You can even devise your private MIB to suit your needs.
You should refer to relevant IETF drafts explaining the nomenclature used in MIBs (when I find the reference, I'll post it).
I could also suggest you perform searches on keywords such as "OSS", "Network Management", "NMS".