WebRTC: do we need a TURN server if one peer is always using Full Cone Or Address Restricted (but not Port Restricted) NAT? - webrtc

I have been reading a bit about WebRTC, and I'm not getting why we need a Turn Server if only 1 peer is using Symmetric NAT, and the other is using neither Symmetric nor Port Restricted NAT, so let’s say A is using Full Cone NAT, B is using Symmetric NAT:
STUN SERVER will send the correct IP address of B to A, and the correct IP + Port address of A to B.
A tries to connect to B (now A will be able to accept messages from B since it’s in the Dest Address Column).
B tries to connect to A, which will allow requests from A going to B (ofc A needs to update the port to the one received from B instead of the Sdp).
am I missing something, or is this correct (and implemented), or is this too complicated to be implemented?
And if this is correct, then theoretically, if I’m peer A and I'm using Full Cone NAT, any peer B can connect to me (as long as I send the connection request first), without needing a TURN server.
Thanks

If the symmetric NAT environment only changes the port, you would be correct with regarding connectivity to Full Cone NAT. The hole punching step would work.
But many enterprise and mobile environments have complex routing schemes and crazy network environments that are different from a legacy home network router. These environments aren't just a little router box that hooks up to a cable modem. It's a complex array of routers and load balancers using a bank of IP addresses. And each outbound connection might get an IP address different from a previous connection. So it's technically "symmetric NAT".
And so after a node within this environment obtains an external IP/port pair from a STUN server, subsequent sends to a peer address might change both both the port and the IP address as well.
As such, the NATs see completely different IP addresses than expected when the UDP packets arrive during the hole punching step. Hence, a relay address (TURN) is needed here.

This question is a little easier if you think in terms of Mapping/Filtering. The other NAT terms don't do a good job of describing how things actually work. My answer comes from RFC 4787 and WebRTC for the Curious: Connecting
Mapping is when your NAT allocates a IP/Port for an outbound packet. A remote peer can the send traffic to this mapping. Filtering are the rules around who can use these mappings.
Filtering and Mappings can then be address dependent and independent. If a mapping is address dependent it means a new mapping is created for each time you contact a new IP/Port. If a mapping is address independent it means it is re-used no matter where you send traffic. These same rules apply to filtering.
If one peer is address + filtering independent I don't believe a TURN server would provide a benefit.
If you want TCP connectivity deploying a TURN server is a good idea. Some WebRTC servers support TCP, but I don't believe any browsers generate passive TCP candidates.

Related

Is a single TURN server sufficient for two devices behind restricted NAT configurations?

Let's assume two devices are behind symmetric NATs.
Device_1 ONLY gathers HOST and REFLEXIVE candidates. It sends them to Device_2 in an SDP OFFER.
Device_2 gathers HOST, REFLEXIVE and RELAY candidates. It sends them to Device_1 in an SDP ANSWER.
During ICE connectivity checks, Device_2 will install a permission into its TURN server. This will be the REFLEXIVE candidate of Device_1.
At some point, a STUN indication will be sent from Device_1 (reflexive) to Device_2 (relay). If Device_2 has created a permission for reflexive address of Device_1, the UDP packet should hit the TURN server, get wrapped in a STUN message, and then arrive at Device_2. This connectivity check should pass, and therefore bidirectional traffic should flow.
Is this understanding correct?
RESTRICTED_NAT RESTRICTED_NAT
| |
Device_1 <----> | <--UDP--> [Device_2_TURN] <--SEND--> | <----> Device_2
Peer | | Client
Host Host
Reflexive Reflexive
Relay
I asked a similar question a while ago Will ICE negotiations between peers behind two symmetric NAT's result in requiring two TURN servers?, but now im having doubts that two TURN servers are required for two peers behind symmetric NATs. The reason being, the permission that is created on the TURN server only includes an IP address.
https://datatracker.ietf.org/doc/html/rfc5766#section-9.1
When forming a CreatePermission request, the client MUST include at
least one XOR-PEER-ADDRESS attribute, and MAY include more than one
such attribute. The IP address portion of each XOR-PEER-ADDRESS
attribute contains the IP address for which a permission should be
installed or refreshed. The port portion of each XOR-PEER-ADDRESS
attribute will be ignored and can be any arbitrary value. The
various XOR-PEER-ADDRESS attributes can appear in any order.
That means that as long as the same server reflexive IP sends a UDP message to the relayed transport address, it should go through. Meaning, only one TURN server should be used.
Yes Philipp is right, it depends on the implementation details.
But the other criteria you did not specify is your list of assumptions is if the NAT's in front of Device_1 and Device_2 does let UDP pass through or not.
The most evil/problematic scenario is if Device_1 and Device_2 can't use UDP to connect to the Internet. In that case they both will have to use TCP to connect to each of their TURN relays. Since most (?) TURN servers today only hand our UDP candidates in this scenario Device_1 uses TCP to connect Device_1_TURN. The same applies to Device_2. But since both hand out UDP candidates now TURN server needs to connect to TURN server.
Device_1 <--TCP--> Device_1_TURN <--UDP--> Device_2_TURN <--TCP--> Device_2
In this case you can't get away with just one TURN relay, since Device_1 can't connect via UDP to the UDP based relay candidates of Device_2.
You could reduce this scenario to one TURN server again if the TURN server and Device_2 both implement support for TURN TCP candidates. And then Device_1 needs to have support for ICE TCP and connect via TCP to the TURN server:
Device_1 <--TCP--> Device_2_TURN <--TCP--> Device_2
This depends on how well the turn server software that you use has implemented the RFC. You'll need to check it.
The port is part of the xor-peer-address that gets sent to the TURN server so the natural assumption would be that any lookups happen against the full address.
But in the case of the other side being behind a symmetric NAT the port (and sometimes the IP but that is rare) used will be different. Which is likely the reason for the specific requirement in the RFC.

Understanding SFU's, TURN servers in WebRTC

If I am building a WebRTC app and using a Selective Forwarding Unit media server, does this mean that I will have no need for STUN / TURN servers?
From what I understand, STUN servers are used for clients to discover their public IP / port, and TURN servers are used to relay data between clients when they are unable to connect directly to each other via STUN.
My question is, if I deploy my SFU media server with a public address, does this eliminate the need for STUN and TURN servers? Since data will always be relayed through the SFU and the clients / peers will never actually talk to each other directly?
However, I noticed that the installation guide for Kurento (a popular media server with SFU functionality) contains a section about configuring STUN or TURN servers. Why would STUN or TURN servers be necessary?
You should still use a TURN server when running an SFU. To understand diving into ICE a little bit will help. All SFUs work a little differently, but this is true for most.
For each PeerConnection the SFU will listen on a random UDP (and sometimes TCP port)
This IP/Port combination is giving to each peer who then attempts to contact the SFU.
The SFU then checks the incoming packets if they contain a valid hash (determined by upwd). This ensures there is no attacker connecting to this port.
A TURN server works by
Provides a single allocation port that peers can connect to. You can use UDP, DTLS, TCP or TLS. You need a valid username/password.
Once authenticated you send packets via this connection and the TURN server relays them for you.
The TURN server will then listen on a random port so that others can then send stuff back to the Peer.
So a TURN server has a few nice things that an SFU doesn't
You only have to listen on a single public port. If you are communicating with a service not on the internet you can just have your clients only connect to the allocation
You can also make your service available via UDP, DTLS, TCP and TLS. Most ICE implementations only support UDP.
These two factors are really important in government/hospital situations. You have networks that only allow TLS traffic over port 443. So a TURN server is your only solution (you run your allocation on TLS 443)
So you need to design your system to your needs. But IMO you should always run a well configured TURN server in real world environments.

Difference between STUN/TURN(coTURN) servers and Signaling servers (written with socket.io/websocket) in WebRTC?

I am building this video teaching site and did some research and got a good understanding but except for this thing. So when a user want's to connect to another user, P2P, I need signaling server to get their public IP to get them connected. Now STUN is doing that job and TURN will relay the media if the peers cannot connect. Now if I write signaling server with WebSocket to communicate the SDP messages and have ICE working, do I need coTURN installed? What will be the job of the job of them particularly?
Where exactly I am confused is the work of my simply written WebSocket Signaling server (from what I saw in different tutorials) and the work of the coTURN server I'll install. And how to connect them with the media server I'll install.
A second question, is there a way to use P2P when there is only two/three participants and get the media servers involved is there is more than that so that I don't use up the participant's bandwidth too much?
The signaling server is required to exchange messages between peers (SDP packets) until they have established a P2P connection.
A STUN server is there to help a peer discover information about its public IP and to open up firewall ports. The main problem this is solving is that a lot of devices are behind NAT routers within small private networks; NAT basically allows outgoing requests and their response, but blocks any other "unsolicited" incoming requests. You therefore have a Catch-22 scenario when both peers are behind a NAT router and could make an outgoing request, but have nowhere to send it to since the opposite peer doesn't expose anything to make a request to. STUN servers act as a temporary middleman to make requests to, which opens a port on the NAT device to allow the response to come back, which means there's now a known open port the other peer can use. It's a form of hole-punching.
A TURN server is a relay in a publicly accessible location, in case a P2P connection is impossible. There are still cases where hole-punching is unsuccessful, e.g. due to more restrictive firewalls. In those cases the two peers simply cannot talk 1-on-1 directly, and all their traffic is relayed through a TURN server. That's a 3rd party server that both peers can connect to unrestrictedly and that simply forwards data from one peer to the other. One popular implementation of a TURN server is coturn.
Yes, basically all those functions could be fulfilled by a single server, but they’re deliberately separated. The WebRTC specification has absolutely nothing to say about signaling servers, since the signaling mechanism is very unique to each application and could take many different forms. TURN is very bandwidth intensive and must usually be delegated to a larger server farm if you’re hoping to scale at all, so is impractical to mix in with any of the other two functions. So you end up with three separate components.
Regarding multi-peer connections: yes, you can set up a P2P group chat just fine. However, each peer will need to be connected to every other peer, so the number of connections and bandwidth per peer increases with each new peer. That’s probably going to work okay for 3 or 4 peers, but beyond that you may start to run into bandwidth and CPU limits of individual peers, especially if you’re doing decent quality video streaming.

Is STUN server absolutely necessary for webrtc when I have a socket.io based signaling server?

My understanding about STUN server for webrtc is that when the clients are behind the NAT (in most cases, if not all), the STUN server will help the webrtc clients to identify their addresses and ports. And I also read some article saying that a signaling server is needed for webrtc clients. The signaling server could be a web server, socket.io, or even emailing a url. My first question would be: is the STUN server the signaling server?
Actually now I built a very simple socket.io based service which broadcasts client's session descriptions to all other clients. So I believe the socket.io based server should have enough knowledge about the clients' addresses and ports information. If this is the case, why do we bother to have another STUN server?
The STUN server is NOT the signalling server.
The purpose of the signalling server is to pass information between the peers at the start up of the session(how can they send an offer without knowing who to send to?). This information includes the SDPs that are created on the offers and the answers and also any Ice Candidates that are created by either party.
The reason to have a STUN server is so that the two peers can send the media to each other. The media streams will not hit your signalling server but instead will go straight to the other party(the definition of a peer-to-peer connection), the exception to this would be the case when a TURN server is used.
Media cannot magically go through a NAT or a firewall because the two parties do not have direct access to each other(like they would if they were on the same LAN).
In short STUN server is needed the large majority of the time when the two parties are not on the same network(to get valid connection candidates for peer-to-peer media streaming) and a signalling server is ALWAYS needed(whether they are on different networks or not) so that the negotiation and connection build up can take place. Good explanation of the connection and streaming process
STUN is used to implement the ICE protocol, which tries to find a working network path between the two clients. ICE will also use TURN relay servers (if configured in the RTCPeerConnection) for cases where the two clients (due to NAT/Firewall restrictions) can't make a direct peer-to-peer connection.
STUN servers are used to identify the external address used by the computer on the internet (the outside-the-NAT address) and to attempt to set up a port mapping usable by the peer (if the NAT isn't "symmetric") -- contacting the STUN server will tell you the external IP and port to try to use in ICE. These are the ICE candidates included in the SDP or in the trickle-ICE messages.
For almost-guaranteed connectivity, a server should have TURN servers (preferably supporting UDP and TCP TURN, though UDP is far preferred). Note that unlike STUN, TURN can use appreciable bandwidth, and so can cost money to host. Luckily, most connections succeed without needing to use a TURN server (i.e. they run peer-to-peer)
NAT(Network Address Transformation) is used to translate "Private IP', which is valid only in LAN into "Public IP" which is valid in WAN.
The problem is that "Public IP" is only visible from outside, so we need STUN or TURN server to send back "Public IP" to you.
This process enables a WebRTC peer to get a publicly accessible address for itself, and then pass that on to another peer via a signaling mechanism
A STUN server is used to get an external network address.
TURN servers are used to relay traffic if direct (peer to peer) connection fails.
for more you can also refer from below link: https://www.html5rocks.com/en/tutorials/webrtc/infrastructure/#what-is-signaling
In your case, you need STUN. Most clients will be behind NAT, so you need STUN to get the clients public IP. But if both your clients were not behind NAT, then you wouldn't need STUN. More generally, no, a STUN server is not strictly required. I know this because I successfully connected 2 WebRTC peers without a stun server. I used the example code from aiortc, a python WebRTC/ ORTC library where both clients were running locally on my laptop. The signalling channel used my manual copy-pasting. I literally copied the SD (session description) from the one peer to the other. Then, copied the SD from the 2nd peer to the 1st peer once again.
From the ICE RFC (RFC8445), which WebRTC uses
An ICE agent SHOULD gather server-reflexive and relayed candidates.
However, use of STUN and TURN servers may be unnecessary in certain
networks and use of TURN servers may be expensive, so some
deployments may elect not to use them.
It's not clear that STUN is a requirement for ICE, but the above says it may be unnecessary.
However, signalling has nothing to do with it. This question actually stems from not understanding what STUN does, and how STUN interplays with signalling. I would argue the other 3 answers here do not actually answer these 2 concerns.
Pre-requisite: Understand the basic concepts of NAT. STUN is a tool to go around NAT, so you have to understand it.
Signalling: Briefly, in WebRTC you need to implement your own signalling strategy. You can manually type the local session description created by one peer in the other peer, use WebSockets, socket.io, or any other methods (I saw a joke that smoke signals can be used, but how are you going to pass the following session description (aka. SDP message) through a smoke signal...). Again, I copy pasted something very similar to below:
v=0
o=alice 2890844526 2890844526 IN IP4 host.anywhere.com
s=
c=IN IP4 host.anywhere.com
t=0 0
m=audio 49170 RTP/AVP 0
a=rtpmap:0 PCMU/8000
m=video 51372 RTP/AVP 31
a=rtpmap:31 H261/90000
m=video 53000 RTP/AVP 32
a=rtpmap:32 MPV/90000
When both peers are not behind NAT, you don't need a STUN server, as the IP addresses located in the session description (the c= field above, known as connection data) generated by each peer would be enough for each peer to send datagrams or packets to each other. In the example above, they've provided the domain name instead of IP address, host.anywhere.com, but this can be resolved to an A record. (Study DNS for more information).
Why don't you need a STUN server in this case? From RFC8445:
There are different types of candidates; some are derived from physical or logical network interfaces, and others are discoverable via STUN and TURN.
If you're not using NAT, the client already knows the IP address which peers can directly address, so the additional ICE candidates that STUN would generate would not be helpful (it would just give you the same IP address you already know about).
But when a client is behind a NAT, the IP they think they won't help a peer contact them. Its like telling you my ip address is 192.168.1.235, it really is, but its my private IP. The NAT might be on the router, and your client may have no way of asking for the public IP. So STUN is a tool for dealing with this. Specifically,
It provides a means for an endpoint to determine the IP address and port allocated by a NAT that corresponds to its private IP address and port.
STUN basically lets the client find out what the IP address. If you were hosting a Call of Duty server from your laptop, and port forwarded a port to your machine in the router settings, you still had to look up your public IP address from a website like https://whatismyipaddress.com/. STUN lets a client do this for itself, without you accessing a browser.
Finally, how does STUN interplay with signalling?
The ICE candidates are generated locally and with the help of STUN (to get client public IP addresses when they're behind NAT) and even TURN. Session descriptions are sent to the peer using the signalling channel. If you don't use STUN, you might find that the ICE candidates generated that is tried by ICE all fail, and a connection (other than the signalling channel) does not successfully get created.

GameKit/Peer-to-peer over internet

For an iOS app I am developing, I want multiple phone to connect to each other and be able to voice chat between those devices.
I have it working when both devices are on the same network. This was quite simple and most of the stuff I want to do, is possible.
But now I am adding internet support, which is quite a hassle. I'll first try to explain how I want to match the devices, using a small webservice I set up.
Server
Start a new GameKit session, with session-mode GKSessionModePeer
Find the "Peer ID" of the server on the session I just created
Create a new CFSocketRef on an free port and keep it ready to accept connections
Send Peer ID and Port number to my webservice, running on an external server.
WebService
Webservice receives the information and stores it together with an ID and the IP address of the client in a database.
Send ID back to Server, which displays the ID
Client
When the user chooses to use the "Online" feature of GameKit to search for games, I ask the user for an ID (where the user should input the ID the server receives).
Client connects to the webservice supplying the ID. The webservice returns the information about the session (IP, PORT, Peer ID) of the server.
The user tries to connect to the IP address, with the port information and set up an input and output stream with the server.
This does not work ofcourse, because my network does not allow incoming connections and a random port (from an external network).
But now the question is, how do I solve this? I want to be able to set up a peer to peer connection between 2 devices, those devices could be on the same network, but also on separate networks.
Is there a framework, example or anything showing how to do this? I want to be able to send data from device to device, without sending it to a server first.
I'm not aware of any frameworks that do this. I do however have a lot of experience with p2p networking across multiple networks.
One important rule I learned: when communicating between networks, don't create a direct connection unless necessary. There are just too many factors that can (will?) cause issues, such as firewalls, NATs, etc.
Sure, you can let the connection try first. You can try to connect to the given IP addresses*, but in most cases it will fail. Even when using UPnP and NAT-PMP, you'll find that in a lot of cases (more than half?) you won't be able to accept incoming connections at all.
So make sure to have a backup plan. Make a network layer abstraction that doesn't only listen(), but also connects to a server. That way, when you can't connect to the IPs* of the client, you simply setup a connection via the server and the network abstraction takes care of it all.
Let me reiterate the above: don't rely on incoming connections only, always have a backup plan.
* I write IPs because clients can have multiple local/remote IPs. Always iterate over all these IPs when connecting. Example: my phone has 2 local IPv4 addresses (10.0.0.172 and 10.8.0.2), and an IPv6 address ([2001:x:x::6]). Of these three addresses, only the IPv6 address is publicly reachable, and the two IPv4 addresses are on different subnets so whether you can connect to them depends on the subnet that the other client is on. Always try to connect to both, and fall back to a server-proxied connection when it fails.
** I mentioned IPv6, yes. Let's not forget that IPv6 is not limited by NATs, unlike IPv4, and this means that you're far more likely to get a good connection via IPv6 than IPv4, if supported.