ICE and WebRTC: What Is This Sorcery? We Explain…

Posted On August 19, 2016 by Sherwin Sim in Blog, Tutorials

We support many types of customers at Temasys. They all have different experience levels and challenges with their projects. Some (in fact quite a few) have attempted to build their own WebRTC-powered applications; an admirable achievement. Often these solutions work nicely on a small scale or within a closed network or testing scenario. The Internet and how things work outside of the lab is a different prospect. Sometimes, due to the vast array of network configurations that are out there, our customers come to Temasys because they can’t connect their users and their media can’t traverse in real world network scenarios. Temasys uses WebRTC and the STUN, TURN, and ICE protocols as part of the Temasys Platform. In this post we explain how that works, sort of like magic.

WebRTC allows media to go from one computer to another, regardless of the NATs that exist in between them. Thanks to the Interactive Connectivity Establishment (ICE) protocol, which uses two other protocols – STUN and TURN – they help WebRTC helps dynamically generate and find the shortest path for media to travel between endpoints or peers.

Temasys provides integrated TURN and STUN services that allow our SDKs to send media between users of applications built with those SDKs. It doesn’t matter which SDK we’re talking about (web, mobile, or embedded), Temasys’s TURN/STUN server is a globally deployable, scalable, and highly available service, which aims to provide a seamless interaction experience for our customers and their users.

What is ICE? What are TURN and STUN? How do they work?

Before understanding why ICE is needed, you must understand what Network Address Translation (NAT) is, and why we care about it.  In a closed network or a world without NAT information contained in TCP/IP or UDP/IP packets can be sent to and from endpoints pretty easily.

A World Without NAT

Credit: Sam Dutton (HTML5 Rocks)

NAT Exists. Here’s Why.

There are a lot of reasons why we have routers, firewalls and NAT. The biggest need is to maintain the security of our business and personal data. If you have a wireless router in your house, it almost certainly also has a built-in firewall that uses NAT. NAT exists to create bindings between internal private and external public addresses and ports within a network. Each TCP/IP or UDP/IP packet contains a stack of information that it carries with it as it travels around the internet. This information includes a source IP address, source port, a destination IP address and a destination port. The complexities of NAT and how it works are described here. There are a variety of different NATs which may have different restrictions or limitations that are placed on a given network.

Credit: Sam Dutton (HTML5 Rocks)

So Why Are NATs Bad For Real-Time Communication?

In order to exchange media, WebRTC uses session description protocol (SDP) to initiate and execute an “offer” and “answer” mechanism between endpoints or peers. Supported codecs, connectivity, and protocols are added to the SDP so that clients can decide what media codecs they can send and receive, and where to send them. SDPs are created at the application layer and then sent across the wire, going back and forth as they negotiate how the end points will connect and transfer media. That’s fine, but NAT mappings sit at the network layer. NATs will only modify the IP headers of a TCP/IP or UDP/IP packet. NATs do not know what an SDP is, or how to modify it so it can be let into or out of the network safely. The result? Media will most likely be discarded depending on the type of NAT that exists between two peers. Again, here’s why:

That means that users cannot connect to each other if they’re trying to use your application and they’re stuck behind a restrictive firewall.

Don’t panic. There is help on the way. This is why we use STUN, TURN, and ICE!

Interactive Connectivity Establishment (ICE)

The ICE protocol is used to generate media traversal candidates which can be used in WebRTC applications, and which can be successfully sent and received through NATs. ICE utilizes different technologies and protocols to overcome the challenges posed by different types of NAT mappings. The two most prominent protocols are STUN and TURN. Both STUN and TURN require implementation of client and server-side components.

STUN

“Session Traversal Utilities for NAT” (STUN) allows clients to learn what their public NAT’d IP address and port is. Once this is achieved it’s possible to provide the correct details to other clients that want to connect to your client. Typically, a STUN server is needed. A STUN client can send messages to the STUN server to get the information about public IP and ports, and retrieve that information. This protocol does not work for symmetric NATs, however. Symmetric NATs generate ports are random for bindings. STUN cannot communicate that dynamic mapping when negotiating media paths.

Credit: Sam Dutton (HTML5 Rocks)

TURN

“Traversal using Relay NAT” (TURN) allows clients to send and receive data through an intermediary server. The TURN protocol is an extension to STUN. When endpoints are stuck behind different types of NATs, or when a symmetric NAT is in use,  it may be easier to send media through a relay server. This is what TURN does. Clients connect to the TURN server, rather than trying to connect through difficult NATs.

Credit: Sam Dutton (HTML5 Rocks)

The Client Process

Putting everything together. A client will generally require a TURN server (which has STUN protocol capabilities). Clients then generate a connection offer, and start to generate multiple candidates to be used to stream media to another client. The remote client will exchange the media offer/answers and candidates and then decide how to send media.

Step 1: Allocation

During the WebRTC offer/answer process, a client gathers candidates to be used for ICE. Each candidate is a potential address/port to receive media. Generally, three types of candidates get generated in this initial process.

After a handful of ICE candidates are generated, they must be properly formatted and encoded to be sent to the end client. This encoding can be placed in the offer and answer SDP or be sent standalone (trickle ICE).

Step 2: Exchange

As an offer is generated and sent to the end client. The client can also choose to generate their own candidates and send them, too. The candidates can be packed in the original offer, or can be sent independently after the offer is sent. The latter is known as trickle ICE. The far end client which is now receiving the offer and its accompanying candidates will now begin to prepare its answer. The answer is generated in a similar manner to the way the offer is generated, and an answer SDP is created. The far end client can choose to pack its generated candidates into the SDP or send them independently (again, trickle ICE).

Step 3: Verification

As the offer and answer exchanges take place, each client has an ICE agent handling connection management. After sending and receiving all the candidates, a verification process begins.

  1. Each agent matches up its candidates (local) with its peers (remote) creating candidate pairs.
  2. The agent then sends connectivity checks every 20 ms, in pair priority, over the binding requests from the local candidate to the remote candidate.
  3. Upon receipt of the request, the peer agent generates a response.
  4. If the response is received, the check has succeeded

As agents perform connectivity checks, they may produce additional candidates known as peer reflexive candidates. This usually happens when there is a symmetric NAT in between clients. During the connectivity check process, a STUN request is sent directly to the client, which can generate a brand new binding. If it does, the STUN response is sent back informing the originating client that a new binding was formed. This allows clients to have a direct media path between them, even in the presence of a symmetric NAT.

Step 4: Coordination

At this point, ICE agents should have an idea of which candidate pairs are successfully working. Now, the ICE agent needs to decide which candidate pair it will use for each component in the media stream. One agent acts as a controlling agent, while the other is a passive agent. The controlling agent is typically the offerer. The controlling agent will decide when STUN checks are finished, and which candidate pair to use when verification has finished.

Step 5: Great Success!

Now that a candidate pair is selected, media should be sent to and from the clients. Based on the type of connection, media can flow between clients in a variety of ways.

And that, ladies and gentlemen, is how the magic happens. ICE does a fantastic job of attempting to traverse through multiple NATs.

Further Reading

If you’re interested in learning more, let us know! You can always ask questions of our team at dev-support@temasys.io!

And, there are lots of great resources available elsewhere. We’ll highlight more of these as we move along, but I want to give wcredit for the images in this post to Sam Dutton and his excellent tutorial over at HTML5 Rocks“WebRTC in the real world: STUN, TURN, and signaling”.

Leave a Reply

Your email address will not be published. Required fields are marked *