ICE and WebRTC: What Is This Sorcery? We Explain…Tim Bell
We support many types of customers at Temasys. They all have different experience levels and challenges with their projects. Some (in fact quite a few) have attempted to build their own WebRTC-powered applications; an admirable achievement. Often these solutions work nicely on a small scale or within a closed network or testing scenario. The Internet and how things work outside of the lab is a different prospect. Sometimes, due to the vast array of network configurations that are out there, our customers come to Temasys because they can’t connect their users and their media can’t traverse in real world network scenarios. Temasys uses WebRTC and the STUN, TURN, and ICE protocols as part of the Temasys Platform. In this post we explain how that works, sort of like magic.
WebRTC allows media to go from one computer to another, regardless of the NATs that exist in between them. Thanks to the Interactive Connectivity Establishment (ICE) protocol, which uses two other protocols – STUN and TURN – they help WebRTC helps dynamically generate and find the shortest path for media to travel between endpoints or peers.
Temasys provides integrated TURN and STUN services that allow our SDKs to send media between users of applications built with those SDKs. It doesn’t matter which SDK we’re talking about (web, mobile, or embedded), Temasys’s TURN/STUN server is a globally deployable, scalable, and highly available service, which aims to provide a seamless interaction experience for our customers and their users.
What is ICE? What are TURN and STUN? How do they work?
Before understanding why ICE is needed, you must understand what Network Address Translation (NAT) is, and why we care about it. In a closed network or a world without NAT information contained in TCP/IP or UDP/IP packets can be sent to and from endpoints pretty easily.
A World Without NAT
Credit: Sam Dutton (HTML5 Rocks)
NAT Exists. Here’s Why.
There are a lot of reasons why we have routers, firewalls and NAT. The biggest need is to maintain the security of our business and personal data. If you have a wireless router in your house, it almost certainly also has a built-in firewall that uses NAT. NAT exists to create bindings between internal private and external public addresses and ports within a network. Each TCP/IP or UDP/IP packet contains a stack of information that it carries with it as it travels around the internet. This information includes a source IP address, source port, a destination IP address and a destination port. The complexities of NAT and how it works are described here. There are a variety of different NATs which may have different restrictions or limitations that are placed on a given network.
Credit: Sam Dutton (HTML5 Rocks)
So Why Are NATs Bad For Real-Time Communication?
In order to exchange media, WebRTC uses session description protocol (SDP) to initiate and execute an “offer” and “answer” mechanism between endpoints or peers. Supported codecs, connectivity, and protocols are added to the SDP so that clients can decide what media codecs they can send and receive, and where to send them. SDPs are created at the application layer and then sent across the wire, going back and forth as they negotiate how the end points will connect and transfer media. That’s fine, but NAT mappings sit at the network layer. NATs will only modify the IP headers of a TCP/IP or UDP/IP packet. NATs do not know what an SDP is, or how to modify it so it can be let into or out of the network safely. The result? Media will most likely be discarded depending on the type of NAT that exists between two peers. Again, here’s why:
- SDPs are not aware of an external NATs IP address
- SDPs are not aware of how to handle any port restrictions
- SDPs are not aware of handling non-port preserved connections
That means that users cannot connect to each other if they’re trying to use your application and they’re stuck behind a restrictive firewall.
Don’t panic. There is help on the way. This is why we use STUN, TURN, and ICE!
Interactive Connectivity Establishment (ICE)
The ICE protocol is used to generate media traversal candidates which can be used in WebRTC applications, and which can be successfully sent and received through NATs. ICE utilizes different technologies and protocols to overcome the challenges posed by different types of NAT mappings. The two most prominent protocols are STUN and TURN. Both STUN and TURN require implementation of client and server-side components.
“Session Traversal Utilities for NAT” (STUN) allows clients to learn what their public NAT’d IP address and port is. Once this is achieved it’s possible to provide the correct details to other clients that want to connect to your client. Typically, a STUN server is needed. A STUN client can send messages to the STUN server to get the information about public IP and ports, and retrieve that information. This protocol does not work for symmetric NATs, however. Symmetric NATs generate ports are random for bindings. STUN cannot communicate that dynamic mapping when negotiating media paths.
Credit: Sam Dutton (HTML5 Rocks)
“Traversal using Relay NAT” (TURN) allows clients to send and receive data through an intermediary server. The TURN protocol is an extension to STUN. When endpoints are stuck behind different types of NATs, or when a symmetric NAT is in use, it may be easier to send media through a relay server. This is what TURN does. Clients connect to the TURN server, rather than trying to connect through difficult NATs.
Credit: Sam Dutton (HTML5 Rocks)
The Client Process
Putting everything together. A client will generally require a TURN server (which has STUN protocol capabilities). Clients then generate a connection offer, and start to generate multiple candidates to be used to stream media to another client. The remote client will exchange the media offer/answers and candidates and then decide how to send media.
Step 1: Allocation
During the WebRTC offer/answer process, a client gathers candidates to be used for ICE. Each candidate is a potential address/port to receive media. Generally, three types of candidates get generated in this initial process.
- Host Candidate: Host candidates are generated by the client by binding to its locally assigned IP addresses and port. If you have multiple IP addresses, you can generate multiple host candidates.
- Server Reflex Candidate: Server reflex candidates are generated by sending STUN messages to a STUN/TURN server. A client sends a query message to the STUN server. That query passes through the NAT which creates a binding. The response to the query contains the public IP and port that was generated for the binding. This can now be used as a server reflex candidate.
- Relay Candidate: Relay candidates are generated in the same way as a server reflex candidate. A query message is sent to the TURN server which creates a NAT binding. That binding is used, but the binding will be sent to and from the relay server.
After a handful of ICE candidates are generated, they must be properly formatted and encoded to be sent to the end client. This encoding can be placed in the offer and answer SDP or be sent standalone (trickle ICE).
Step 2: Exchange
As an offer is generated and sent to the end client. The client can also choose to generate their own candidates and send them, too. The candidates can be packed in the original offer, or can be sent independently after the offer is sent. The latter is known as trickle ICE. The far end client which is now receiving the offer and its accompanying candidates will now begin to prepare its answer. The answer is generated in a similar manner to the way the offer is generated, and an answer SDP is created. The far end client can choose to pack its generated candidates into the SDP or send them independently (again, trickle ICE).
Step 3: Verification
As the offer and answer exchanges take place, each client has an ICE agent handling connection management. After sending and receiving all the candidates, a verification process begins.
- Each agent matches up its candidates (local) with its peers (remote) creating candidate pairs.
- The agent then sends connectivity checks every 20 ms, in pair priority, over the binding requests from the local candidate to the remote candidate.
- Upon receipt of the request, the peer agent generates a response.
- If the response is received, the check has succeeded
As agents perform connectivity checks, they may produce additional candidates known as peer reflexive candidates. This usually happens when there is a symmetric NAT in between clients. During the connectivity check process, a STUN request is sent directly to the client, which can generate a brand new binding. If it does, the STUN response is sent back informing the originating client that a new binding was formed. This allows clients to have a direct media path between them, even in the presence of a symmetric NAT.
Step 4: Coordination
At this point, ICE agents should have an idea of which candidate pairs are successfully working. Now, the ICE agent needs to decide which candidate pair it will use for each component in the media stream. One agent acts as a controlling agent, while the other is a passive agent. The controlling agent is typically the offerer. The controlling agent will decide when STUN checks are finished, and which candidate pair to use when verification has finished.
Step 5: Great Success!
Now that a candidate pair is selected, media should be sent to and from the clients. Based on the type of connection, media can flow between clients in a variety of ways.
- Host: These clients are on the same LAN and maybe under the same NAT. They can send media direct from host to host.
- Server Reflex: STUN has successfully figured out how to create a connection (like punching a hole) through the NATs, and media is flowing between the two clients.
- Relay: TURN has successfully allowed media to be sent to an intermediate server between the two NATs. This allows NATs not have to attempt to send traffic between each other if it is not permitted.
- Peer Reflex: During connectivity checks we found a better way to send media directly, between clients. One or both clients could be behind symmetric NATs where port preservation is not allowed. In this case, a STUN message allowed a new binding to be created directly between the clients, and in spite of the symmetric NAT. Now, media can flow directly between them.
And that, ladies and gentlemen, is how the magic happens. ICE does a fantastic job of attempting to traverse through multiple NATs.
If you’re interested in learning more, let us know! You can always ask questions of our team at firstname.lastname@example.org!
And, there are lots of great resources available elsewhere. We’ll highlight more of these as we move along, but I want to give wcredit for the images in this post to Sam Dutton and his excellent tutorial over at HTML5 Rocks, “WebRTC in the real world: STUN, TURN, and signaling”.
- The WebRTC Book gives a lot of detail about data and signaling pathways, and includes a number of detailed network topology diagrams.
- Ben Strong’s presentation A Practical Guide to Building WebRTC Apps provides a lot of information about WebRTC topologies and infrastructure.
- The WebRTC chapter in Ilya Grigorik‘s High-Performance Browser Networking goes deep into WebRTC architecture, use cases and performance.