SIP: IPSEC vs TLS - ssl

I am new to the VOIP concepts. I just took a course on VOIP. I am interested in implementations of SIP using TLS, IPSEC and Digest as well.
I want to see if SIP signaling uses IPSEC instead of TLS, how would it effect the performance, would the singaling be a little more time consuming? Would it add to the security or reduce the security.
I was searching for a softphone which has both TLS/IPSEC mechanisms so that I can analyze both packets on wireshark but I didnot come across any. Any suggestions on how to do that?
Thanking in advance!

Softphones don't explicitly support IPSEC because it is a network layer protocol. The lowest level dealt with in the SIP RFC is the transport layer where it provides 3 alternatives those being UDP, TCP and TLS.
SIP signalling would work fine over IPSEC since it's not that time sensitive. Even a delay of 1 or 2 seconds for SIP requests and responses will be barely noticeable. However the RTP traffic which carries the call media could well be affected if additional latency is introduced.

Let's suppose you have to secure your SIP signalling between your end points (soft phone or hardware) and the call control engine, also suppose both end-points and the call control reside inside the same campus, or LAN or private network.
The right choice is to employ TLS between end-points and the call control engine, most of end-points support TLS in their implementation (firmware or software). Anyway the same applies also for mobile workers (they establish TLS from the internet with an authenticated front-end placed in the DMZ inside your corporate); TLS is well suited in those scenarious where non additional client installation is needed.
At the other hand to have SIP protected inside IPSec you need to install IPSec client on your laptop (where sip client resides) first, the IPSec client establishes security with an IPSec Gateway placed in your corporate (more complex and expensive to manage). In brief IPSec is well suited in those scanarious where you need privacy between LANs accross the public network (IPSec tunnel mode is set up between IPsec Gateways).
When you talk about SIP with TLS or IPSec you cannot neglet RTP.
To have SecureRTP in place, TLS protocol is the choice. During session setup SIP entities exchange TLS parameters (such as the chiper suite to use to protect RTP) within SDP body, have a look at logs to see SDP inside SIP. At the end of the exchange SIP entities are able to secure RTP traffic because of they agreed on a common session encryption key (is the simmetric key derived from the key exchange protocol).
With IPSec is different, there are no IPSec information exchanged within SDP or SIP messages. You setup IPSec tunnel first and then you send your traffic inside the tunnel, this traffic can be SIP and RTP as well. Note that with IPSec tunnel mode the original packet is encapsulated on a new IP packet (there is an external IP header), so more overhead and processing..
To summary:
Securing SIP is possible with TLS and IPSec, consider your environment. I would consider TLS as much as possible.
Securing RTP is possible with TLS, it is called SecureRTP (encryption, message authentication and integrity, and replay attack protection to the RTP data ).
Securing RTP\SIP with IPSec requires more effort. Consider IPSec tunnel when you need to protect more than SIP\RTP.
Have a look at NAT traversal feature when you decide to employ IPSec with voice\video and a NAT device is traversed.

Related

Understanding SFU's, TURN servers in WebRTC

If I am building a WebRTC app and using a Selective Forwarding Unit media server, does this mean that I will have no need for STUN / TURN servers?
From what I understand, STUN servers are used for clients to discover their public IP / port, and TURN servers are used to relay data between clients when they are unable to connect directly to each other via STUN.
My question is, if I deploy my SFU media server with a public address, does this eliminate the need for STUN and TURN servers? Since data will always be relayed through the SFU and the clients / peers will never actually talk to each other directly?
However, I noticed that the installation guide for Kurento (a popular media server with SFU functionality) contains a section about configuring STUN or TURN servers. Why would STUN or TURN servers be necessary?
You should still use a TURN server when running an SFU. To understand diving into ICE a little bit will help. All SFUs work a little differently, but this is true for most.
For each PeerConnection the SFU will listen on a random UDP (and sometimes TCP port)
This IP/Port combination is giving to each peer who then attempts to contact the SFU.
The SFU then checks the incoming packets if they contain a valid hash (determined by upwd). This ensures there is no attacker connecting to this port.
A TURN server works by
Provides a single allocation port that peers can connect to. You can use UDP, DTLS, TCP or TLS. You need a valid username/password.
Once authenticated you send packets via this connection and the TURN server relays them for you.
The TURN server will then listen on a random port so that others can then send stuff back to the Peer.
So a TURN server has a few nice things that an SFU doesn't
You only have to listen on a single public port. If you are communicating with a service not on the internet you can just have your clients only connect to the allocation
You can also make your service available via UDP, DTLS, TCP and TLS. Most ICE implementations only support UDP.
These two factors are really important in government/hospital situations. You have networks that only allow TLS traffic over port 443. So a TURN server is your only solution (you run your allocation on TLS 443)
So you need to design your system to your needs. But IMO you should always run a well configured TURN server in real world environments.

webrtc peer to peer video chat behind NAT without STUN server

Can I code a website which allows p2p video calls for peers behind NATs without relaying the video data myself as a 3rd party server (since it is expensive)
my network knowledge says it's impossible but this is not emphasized in any docs I've been reading, so simple yes/no answer to this question please.
and I assume most of the computers people use are behind NATs, so they are not outliers but the norm.
Not impossible. Definitely possible, but not 100% reliable.
WebRTC does support peer to peer video conferencing using STUN instead of TURN relays.
Bare minimum for establishing a WebRTC session:
At least a STUN server for clients to self-discover their own IP:port mappings
Your own web service for clients to exchange SDP data generated by WebRTC APIs.
The TURN server is a superset to STUN that supports relay data as well. While you aren't required to have it, clients that are behind "symmetric NAT" or any NAT configuration where the port mapping can not be predicted will have difficulty connecting to other endpoints.

Does exchanging SDP insecurely jeopardize the security of a peer connection? [duplicate]

i have a problem. I've developed a web-app using WebRtc for one-to-one videocall via browser using WebRtc with signalling server on node js (listening e.g. on 8181 port).
Now i would implement MITM attack. I was thinking that, wheen Peer_1 should invoke two rtc peer connection, one for the second peer (Peer_2), one to the MITM. The same thing for the second peer.
Now, i was thinking that signalling server needs to listen on another port, for each rtc peer connection received from the two peers (e.g. 8282 for Peer_1 and 8383 for Peer_2).
Am i right? I think that because signalling server's implementation is to one-to-one communication.
In this way, signalling server on port 8181 allows end-to-end communication for Peer_1 and Peer_2, on 8282 there is the signalling path for Peer_1 and the MITM, and on 8383 for MITM and Peer_2.
Am i right or not? Thanks for the support.
Man in the middle refers to interception during transmission, which WebRTC itself is secured against using DTLS and key exchange, so the weak point is usually the signaling server chosen by an application instead.
But what you describe however sounds like Man on both ends. You have to trust the service (the server) to guarantee whom you're being connected to. If that server is compromised, or either client is compromised - say by injection - then there's no guarantee whom you're talking to, since a client can easily forward a transmission to another party.

Does WebRTC allow actual peer-to-peer communication?

Is the signaling server used only the first time to establish a connection between 2 peers or is it also used to send and receive data-streams between the peers?
According to the w3c proposal:
An RTCPeerConnection allows two users to communicate directly, browser to browser. Communications are coordinated via a signaling channel which is provided by unspecified means, but generally by a script in the page via the server, e.g. using XMLHttpRequest.
So the Server is only used for signalig not for data transmission. But signaling is not limited to establishing the first connection. The signaling channel is also used for transmitting error messages, metadata such as codecs, codec settings, networkdata and keys for secure transmission.
This depends on the network configuration.
If at least one of the peers is not behind a NAT firewall, the peer that is directly on the internet acts as server, and the signalling server is no longer used after the connection is established.
If both peers are behind a NAT appliance, under certain circumstances it might be possible to negociate a client server connection between the peers, and the data is again sent directly between the two peers.
If both peers are behind a NAT firewall that is locked down, all the traffic between the peers passes through the signalling server.
Notice also that in the first two cases, a STUN server is used to establish the connection. If the full data is relayed through the server, a TURN server is used.
Look at a good explanation in the article an video on html5rocks. They claim only about 14% of all connexions need TURN, which seems a really low number to me (This corresponds to only 37% of all clients are behind a locked down NAT router).

How does WebRTC work?

I'm interested in Peer-to-Peer connections in the browser. Since this seems to be possible with WebRTC, I'm wondering how it works exaclty.
I've read some explanations and saw diagrams about it and now it's clear to me, that the connection establishmet works over the server. The server seems to exchange some data between the client that are willing to connect to each other, so that they can start a direct connection, that is independent of the server.
But that's exaclty what I don't understand. Until now, I thought the only way to create connections is to listen on a port on computer A and connect to that port from computer B. But this does not seem to be the case in WebRTC. I think none of the clients starts to listen on a port. Somehow, they can create a connection without listening on ports and accepting connections. Neither client A, nor client B starts acting as a server.
But how? What data is exchanged over the WebRTC server, that the clients can use to connect to each other?
Thanks for your explanations for this :)
Edit
I found this article. It's not related to WebRTC, but I think it answers a part of my question. I'm not sure, tough. It still would be cool, if someone could explain it to me and give me some additional links.
WebRTC gives SDP Offer to the client JS app to send (however the JS app wants) to the other device, which uses that to generate an SDP Answer.
The trick is that the SDP includes ICE candidates (effectively "try to talk to me at this IP address and this port"). ICE works to punch open ports in the firewalls; though if both sides are symmetric NATs it won't be possible generally, and an alternative candidate (on a TURN server) can be used.
Once they're talking directly (or via TURN, which is effectively a packet-mirror), they can open a DTLS connection and use it to key the SRTP-DTLS media streams, and to send DataChannels over DTLS.
Edit:
Acronyms here: http://blog.1click.io/10-jargons-abbreviations-for-webrtc-fans/ for the rest, there is Google. Most of these are defined by the IETF (http://ietf.org/)
Edit 2:
Firefox and Chrome (and the spec) have moved to using "trickle" for ICE candidates, so the ICE candidates are generally added after-the-face to the PeerConnection and exchanged independently of the initial SDP (though you can wait until the initial candidates are ready before sending an offer, and bundle them together).
See https://webrtcglossary.com/trickle-ice/ and https://datatracker.ietf.org/doc/draft-ietf-ice-trickle/
How WebRTC Works
This document provides a quick and abstract introduction to WebRTC. In order to get more information about WebRTC please look at the Further Reading section at the end of this document.
WebRTC
WebRTC(Web Real-Time Communication) is a set of technologies that is developed for peer to peer duplex real-time communication between browsers. As its name mentions it is compatible with Web and it is a standard in W3C One of the important feature of WebRTC is that it works even behind NAT addresses.
WebRTC uses several technologies to provide real-time peer to peer communication between browsers. These technologies are
SDP (Session Description Protocol)
ICE (Interactivity Connection Establishment)
RTP (Real Time Protocol)
There is one more thing which is Signalling Server is needed for running WebRTC. However, there is no defined standart in implementing signalling server. Each implementation creates its own style. There will give some more information about Signalling Server later in this section.
Let's give some quick info about technologies above.
SDP (Session Description Protocol)
SDP is a simple protocol and it is used for which codecs are supported in browsers. For instance, assume that there are two peers(Client A and Client B) which will be connected through WebRTC. Client A and Client B create SDP strings that defines which codecs they support. For example, Client A may support H264, VP8 and VP9 codecs for video, Opus and PCM codecs for audio. Client B may support only H264 for video and only Opus codec for audio. For this case, the codecs that will be used between Client A and Client B are H264 and Opus. If there are no common codecs between peers, peer to peer communication cannot be established.
You may have a question about how these SDP strings are sent between each others. This is where Signalling Server takes place.
ICE (Interactivity Connection Establishment)
ICE is the magic that establishes connection between peers even if they are behind NAT. Let's assume again Client A and Client B will get connected and take a look at how ICE is used for that.
Client A finds out their local address and public Internet address by using STUN server and sends these address to Client B through Signalling Server. Each addresses received from STUN server is called ICE candidate
In the image above, there are two servers. One of them is STUN and other of them is TURN server.
STUN server is used to let Client A learn its all addresses. Let me give an example for this, our computers generally has one local address in the 192.168.0.0 network and there is a second address we see when we connect to www.whatismyip.com, this IP address is actually the Public IP address of our Internet Gateway(modem, router, etc.) so let's define STUN server; STUN servers lets peers know theirs Public and Local IP addresses. Btw, Google provides free STUN server(stun.l.google.com:19302).
There is a one more server, TURN Server, in the image. TURN Server is used when peer to peer connection cannot be established between peers. TURN server just relays the data between peers.
Client B does the same, gets local and public IP addresses from STUN server and sends these addresses to Client A through Signalling Server.
Client A receives Client B's addresses and tries each IP addresses by sending special pings in order to create connection with Client B. If Client A receives response from any IP addresses, it puts that address in a list with its response time and other performance credentials. At last Client A choose the best addresses according to its performance.
Client B does the same in order to connect to Client A
RTP (Real Time Protocol)
RTP is a mature protocol for transmitting real-time data. It is based on UDP. Audio and Video are transmitted with RTP in WebRTC. There is a sister protocol of RTP which name is RTCP(Real-time Control Protocol) which provides QoS in RTP communication. RTP is also used in RTSP(Real-time Streaming Protocol)
Signalling Server
The last part is the Signalling Server which is not defined in WebRTC. As mentioned above, Signalling Server is used to send SDP strings and ICE Candidates between Client A and Client B. Signalling Server also decides which peers get connected to each other. WebSocket technology is generally used in Signalling Servers for communication.
Compatibility
In the last one year, all browsers including Safari, Edge have released new versions supporting WebRTC. Chrome, Firefox and Opera have already supported WebRTC for a while. The video codec that is common to browsers are H264. For the audio, Opus is common in browsers. PCM can also be used for audio codec but AAC is not used even if AAC is supported in all browsers because of licensing issues. IP Cameras generally support H264 for video codec and PCM or AAC for audio codec.
Further Reading and References
WebRTC Samples
ICE Wikipedia
SDP
RTP RFC
Getting Started with WebRTC
WebRTC.org
STUN Server
RTSP Wikipedia
Btw, I am developer at Ant Media Server which supports scalable one-to-many WebRTC and peer to peer WebRTC connection
Establishing a p2p WebRTC connection has 3 steps (10.000 feet overview) :
Step 1: Signaling: both peers connect to a signaling server (using websockets over 80/443, comet, SIP,etc..) and exchange information (about their media capabilities, public IP:port pairs when they become available, etc.)
Step 2: Discovery: Devices connected to LAN or mobile networks are not aware of their public IP (and port) where they can be reached at so they use STUN/TURN servers located on the public Internet to discover their ip:port pair (ICE candidates). In the process they punch a hole through the NAT/router which is used in step3:
Step 3: P2P connection: once the ICE candidates are exchanged through the initial signaling channel each peer is aware of each other's ip:port (and holes have been punched in NATs/routers) so a peer to peer UDP connection can be established.
The scheme above explains the process with 2 devices connected to local networks. It's part of an article I wrote that deals with troubleshooting connection issues but it does a good job of explaining how WebRTC works.
A very good explanation can be found in this book "High Performance Browser Networking (O'Reilly)" http://chimera.labs.oreilly.com/books/1230000000545/ch03.html#STUN_TURN_ICE
which provides the fundamentals on how WebRTC uses ICE technology.
In particular assuming the IP address of the STUN server is known, the WebRTC application first sends a binding request to the STUN server. The STUN server replies with a response that contains the public IP address and port of the client as seen from the public network.
Now the application discovers its public IP and port tuple which can send to the other peer through SDP. (note that SDP are sent over an external signalling channel, f.i. websocket established through a web service)
With this mechanism in place, whenever two peers want to talk to each other over UDP, they can then use the established public IP and port tuples to exchange data.
Unfortunately, in some cases UDP may be blocked by a firewall. To address this issue, whenever STUN fails, we can use the Traversal Using Relays around NAT (TURN) protocol as a fallback, which can run over UDP and switch to TCP if all else fails.
WebRTC connection starts with WebRTC offer.
Caller creates the WebRTC offer and posts it to the Signaling server which will pass the offer to the callee. Users actually passing their SDP (Session Description Protocol) information each other.
Then we need to exchange the internet connection details. it allows clients to discover their public IP address and the type of NAT they are behind. this information is used to establish the media connection. This is handled by STUN server. this process is also known as getting the ICE Candidates. This data is also exchanged via the Signalling Server.
Final step is to exchange audio and video streams true TURN (Traversal Using Relay NAT) server. This ensures the connection even thought users are behind the firewall. this server process a lot of heavy calculations so its cost is high. when you test your app in dev with different browsers, you are directly connecting each other, you are not using TURN server