When two peers are using WebRTC transmission with TURN as a relay server we've noticed that from time to time the data inside Send Indication or Channel Data is actually a valid STUN Binding Request message (type 0x0001). The other peer responds in the same way with a valid Binding Request Response (type 0x0101). It happens repeatedly during the whole conversation. Both peers are forced to use TURN server. What is the purpose of encapsulating typical STUN message inside data attribute of TURN transmission frame? Is it described in any document?
Here is an example of Channel Data frame:
[0x40,0x00,0x00,0x70,0x00,0x01,0x00,0x5c,0x21,0x12,0xa4,0x42,0x71,0x75,0x6d,0x6a,0x6f,0x66,0x69,0x6f...]
0x40,0x00 - channel number
0x00,0x70 - length of data
0x00,0x01,0x00,0x5c,0x21,0x12... - data, that can be parsed to a Binding Request
This is ICE (described in RFC 5245) connectivity checks running via TURN as well as consent checks described in RFC 7675.
Related
I have gone through RFC 5389 and RFC 5245 and the newer RFC 8445.
I understand how STUN works in returning the Server Reflexive Address or Relayed Address. The request is sent to the STUN server.
My fundamental question is about ICE connectivity check using STUN. RFC 8445 states on Page 10:
"...At the end of
this process, each ICE agent has a complete list of both its
candidates and its peer's candidates. It pairs them up, resulting in
candidate pairs. To see which pairs work, each agent schedules a
series of connectivity checks. Each check is a STUN request/response
transaction that the client will perform on a particular candidate
pair by sending a STUN request from the local candidate to the remote
candidate."
For Checking connectivity checks on candidate pairs, the STUN message must have provision for the target IP address, Port, Proto, at the minimum. Where is this STUN message structure described? Where can I get details of how STUN completes this connectivity check?
I can understand the difficulty in interpreting the RFC description of the process. I am attempting to simplify:-
Suppose I obtain the candidate pairs at my end as:-
IP1,P1
RIP2,P2
TIP3,P3
Similarly my peer has his own set as
(B)IP1,P4
(B)RIP2,P5
(B)TIP3,P6
Lets fast forward to the future, where we are having good media flow. Obviously, for the direction of media from A->B, we have two transport addresses. Since UDP is being used to send media, the socket has a source address and destination address. Let us call them SrcIP_A, SrcPort_A and SrcIP_B, SrcPort_B.
It must be clear that SrcIP_A, SrcPort_A is a part of the candidate pairs of A and SrcIP_B,SrcPort_B is a part of the candidate pairs of B.
Now, coming to the current time, from the perspective of A, in order to achieve smooth media flow from A->B, we just need to lock down on the pair we will eventually use from the set we already have.
Here is where STUN comes into picture. Remember STUN request needs to be sent to a particular IP,Port. And the response tells which is the NATted external address that the STUN server noticed in the request.
So A, creates 9 pairs,matching each entry in candidate pairs of its own with its peer. It then sends a STUN request from the RFC 8445 Page 14 base of of the pair from each of its own candidate set, to each of the remote candidate pairs. Now, the remote side B MUST have a STUN server logic implemented on it own side when it receives any traffic on its candidate pair. So, basically the socket when receiving any packets needs to be able to distinguish between media and STUN packets. In case of the latter it will send back a STUN response indicating from where it had received the request.
Lets assume while iterating A is at the following combinations.
IP1,P1 Vs RIP2,P5 Here the reuest might reach B, since the reflexive address of RIP2,P5 will reach inside the NAT. The observed address returned will be the reflected address of IP1,P1. At the side of A, when the response is received, it will discard this set since the contained address is not IP1,P1.
RIP2,P2 Vs (B)IP1,P4 This will clearly fail. Since you cannot send to IP1,P4 which is a private address.
RIP2,P2 Vs RIP2,P5 Here the reuest might reach B, since the reflexive address of RIP2,P5 will reach inside the NAT. The observed address returned will also be RIP2,P2. So this can be marked as the "valid pair".
Hope I have been clear.
You'll find the STUN message structure described in RFC-5389, section 6. https://www.rfc-editor.org/rfc/rfc5389#page-10.
Notable pieces of the description:
STUN messages are encoded in binary using network-oriented format
(most significant byte or octet first, also commonly known as big-
endian). The transmission order is described in detail in Appendix B
of RFC 791 [RFC0791]. Unless otherwise noted, numeric constants are
in decimal (base 10).
All STUN messages MUST start with a 20-byte header followed by zero
or more Attributes. The STUN header contains a STUN message type,
magic cookie, transaction ID, and message length.
The most significant 2 bits of every STUN message MUST be zeroes.
This can be used to differentiate STUN packets from other protocols
when STUN is multiplexed with other protocols on the same port.
The message type defines the message class (request, success
response, failure response, or indication) and the message method
(the primary function) of the STUN message. Although there are four
message classes, there are only two types of transactions in STUN:
request/response transactions (which consist of a request message and
a response message) and indication transactions (which consist of a
single indication message). Response classes are split into error
and success responses to aid in quickly processing the STUN message.
On reading about usb protocol in
http://www.beyondlogic.org/usbnutshell/usb4.shtml
It is said that interrupt endpoint is unidirectional and periodic.
Yet, I see in the description for IN interrupt endpoint, that host initiate the IN token and then data packet is send from device to host.
"If an interrupt has been queued by the device, the function will send
a data packet containing data relevant to the interrupt when it
receives the IN Token."
So, If the data packet is sent on this IN endpoint from device to host, doesn't it mean that the same endpoint is used both the transmit and receive ?
I believe the terminology "unidirectional" is meant for only data and not for token and handshake packets. So "IN" endpoint is for reading data and "OUT" endpoint is for writing data. That's why its called unidirectional.
But control endpoint is bidirectional because you can read or write data using the control endpoint. Check the standard USB commands like "Get Descriptor" and "Set Descriptor".
The WebRTC RTCPeerConnection interface has a createDataChannel method and an ondatachannel event handler. How do these interact? How do I create a single data channel that can be used to send/receive data between two peers?
Also, the RTCDataChannelInit constructor has a negotiated field, which by default is set to false and says it results in the channel being announced in-band. What happens if it's set to true?
Firstly: to create any data channel, the peers need to exchange an SDP offer/answer that negotiates the properties of the SCTP connection used by all data channels. This doesn't happen by default; you must call createDataChannel before calling createOffer for the offer to contain this SCTP information (an "m=application" section in the SDP).
If you don't do this, the data channel state will be stuck forever at connecting.
With that out of the way, there are two ways to negotiate a data channel between two peers:
In-band negotiation
This is what occurs by default, if the negotiated field is not set to true. One peer calls createDataChannel, and the other connects to the ondatachannel EventHandler. How this works:
Peer A calls createDataChannel.
Normal offer/answer exchange occurs.
Once the SCTP connection is up, a message is sent in-band from Peer A to Peer B to tell it about the data channel's existence.
On Peer B, the ondatachannel EventHandler is invoked with a new data channel, created from the in-band message. It has the same properties as the data channel created by Peer A, and now these data channels can be used to send data bidirectionally.
The advantage of this approach is that data channels can be created dynamically at any time, without the application needing to do additional signaling.
Out-of-band negotiation
Data channels can also be negotiated out-of-band. With this approach, instead of calling createDataChannel on one side and listening for ondatachannel on the other side, the application just calls createDataChannel on both sides.
Peer A calls createDataChannel({negotiated: true, id: 0})
Peer B also calls createDataChannel({negotiated: true, id: 0}).
Normal offer/answer exchange occurs.
Once the SCTP connection is up, the channels will instantly be usable (readyState will change to open). They're matched up by the ID, which is the underlying SCTP stream ID.
The advantage of this approach is that, since no message needs to be sent in-band to create the data channel on Peer B, the channel is usable sooner. This also makes the application code simpler, since you don't even need to bother with ondatachannel.
So, for applications that only use a fixed number of data channels, this approach is recommended.
Note that the ID you choose is not just an arbitrary value. It represents an underlying 0-based SCTP stream ID. And these IDs can only go as high as the number of SCTP streams negotiated by the WebRTC implementations. So, if you use an ID that's too high, your data channel won't work.
What about native applications?
If you're using the native webrtc library instead of the JS API, it works the same way; things just have different names.
C++:
PeerConnectionObserver::OnDataChannel
DataChannelInit::negotiated
DataChannelInit::id
Java:
PeerConnection.Observer.onDataChannel
DataChannel.Init.negotiated
DataChannel.Init.id
Obj-C:
RTCPeerConnectionDelegate::didOpenDataChannel
RTCDataChannelConfiguration::isNegotiated
RTCDataChannelConfiguration::channelId
Here's a very in-depth article about the particulars of peer-to-peer...
https://blog.sessionstack.com/how-javascript-works-webrtc-and-the-mechanics-of-peer-to-peer-connectivity-87cc56c1d0ab
Primary sources...
https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection
https://developer.mozilla.org/en-US/docs/Web/API/Media_Streams_API
https://developer.mozilla.org/en-US/docs/Web/API/RTCDataChannel
The mother load of peer-to-peer projects...
https://github.com/kgryte/awesome-peer-to-peer
The heartbeat protocol requires the other end to reply with the same data that was sent to it, to know that the other end is alive. Wouldn't sending a certain fixed message be simpler? Is it to prevent some kind of attack?
At least the size of the packet seems to be relevant, because according to RFC6520, 5.1 the heartbeat message will be used with DTLS (e.g. TLS over UDP) for PMTU discovery - in which cases it needs messages of different sizes. Apart from that it might be simply modelled after ICMP ping, where you can also specify the payload content for no reason.
Just like with ICMP Ping, the idea is to ensure you can match up a "pong" heartbeat response you received with whichever "ping" heartbeat request you made. Some packets may get lost or arrive out of order and if you send the requests fast enough and all the response contents are the same, there's no way to tell which of your requests were answered.
One might think, "WHO CARES? I just got a response; therefore, the other side is alive and well, ready to do my bidding :D!" But what if the response was actually for a heartbeat request 10 minutes ago (an extreme case, maybe due to the server being overloaded)? If you just sent another heartbeat request a few seconds ago and the expected responses are the same for all (a "fixed message"), then you would have no way to tell the difference.
A timely response is important in determining the health of the connection. From RFC6520 page 3:
... after a number of retransmissions without
receiving a corresponding HeartbeatResponse message having the
expected payload, the DTLS connection SHOULD be terminated.
By allowing the requester to specify the return payload (and assuming the requester always generates a unique payload), the requester can match up a heartbeat response to a particular heartbeat request made, and therefore be able to calculate the round-trip time, expiring the connection if appropriate.
This of course only makes much sense if you are using TLS over a non-reliable protocol like UDP instead of TCP.
So why allow the requester to specify the length of the payload? Couldn't it be inferred?
See this excellent answer: https://security.stackexchange.com/a/55608/44094
... seems to be part of an attempt at genericity and coherence. In the SSL/TLS standard, all messages follow regular encoding rules, using a specific presentation language. No part of the protocol "infers" length from the record length.
One gain of not inferring length from the outer structure is that it makes it much easier to include optional extensions afterwards. This was done with ClientHello messages, for instance.
In short, YES, it could've been, but for consistency with existing format and for future proofing, the size is spec'd out so that other data can follow the same message.
I built a RESTful API based on expresss.js which communicates to a remote server through a TCP socket using JSON. Requested URLs are converted into the appropriate JSON messages, a new TCP socket is open and the message is sent. Then when a message coming from the same connection is received an event is fired, the JSON reply is evaluated and a new JSON message is returned as the result of the GET request.
Possible paths:
Async (currently in use) - Open a connection to the server for each
request.
Sync - Create a queue with all the requests and wait for
the response, blocking code.
Track - Send all the request at once and asynchronously receive the answers. Use a tracker id on the request to relate each request with its answer.
What will be the best direction to go? Is there any common pattern to solve this kind of application?
1 (async, a new connection for each request) is probably the easiest to implement.
If you want to reuse the socket for efficiently, you should come up with your own "keep-alive" mechanism - essentially streaming multiple requests and answers using the same socket.
I'd probably use double CRLF ('\n\r\n\r') as the delimiter of each JSON request, fire a 'request' event for each request, and simply write back the answer asynchronously. Delimiter-less streaming is possible, but it requires extra parsing when you receive a partial JSON string from the socket.