|
| 1 | +import { BlogPostLayout } from '@/components/BlogPostLayout' |
| 2 | +import {ThemeImage} from '@/components/ThemeImage' |
| 3 | + |
| 4 | +export const post = { |
| 5 | + draft: false, |
| 6 | + author: 'Floris Bruynooghe', |
| 7 | + date: '2025-09-01', |
| 8 | + title: 'Moving from STUN to QUIC Address Discovery', |
| 9 | + description: |
| 10 | + "Moving STUN into QUIC", |
| 11 | +} |
| 12 | + |
| 13 | +export const metadata = { |
| 14 | + title: post.title, |
| 15 | + description: post.description, |
| 16 | + openGraph: { |
| 17 | + title: post.title, |
| 18 | + description: post.description, |
| 19 | + images: [{ |
| 20 | + url: `/api/og?title=Blog&subtitle=${post.title}`, |
| 21 | + width: 1200, |
| 22 | + height: 630, |
| 23 | + alt: post.title, |
| 24 | + type: 'image/png', |
| 25 | + }], |
| 26 | + type: 'article' |
| 27 | + } |
| 28 | +} |
| 29 | + |
| 30 | +export default (props) => <BlogPostLayout article={post} {...props} /> |
| 31 | + |
| 32 | +# Holepunching |
| 33 | + |
| 34 | +As you probably know, `iroh` is in the business of holepunching. |
| 35 | +The typical scenario is establishing a direct QUIC connection between two devices, like laptops or phones, both on different home networks. |
| 36 | +Home networks tend to have a [NAT] router in front of them, |
| 37 | +and tend to block new incoming connections even when using IPv6. |
| 38 | +To be fair: blocking random incoming connections to a home network is a sensible choice. |
| 39 | + |
| 40 | +[NAT]: https://en.wikipedia.org/wiki/Network_address_translation |
| 41 | + |
| 42 | +The simplified theory of how UDP holepunching works is that both endpoints send a packet to each other at the same time. |
| 43 | +Both routers see the *outgoing* datagram first, and when they receive the *incoming* datagram, it is considered to be the same connection and is allowed in. |
| 44 | +To achieve this in practice you need two things: |
| 45 | + |
| 46 | +- A means of communicating the coordination. |
| 47 | + Iroh uses the relay server as a network path between the two endpoints for this. |
| 48 | + We explained this in more detail in the [iroh on QUIC Multipath] post. |
| 49 | + |
| 50 | +[iroh on QUIC Multipath]: https://www.iroh.computer/blog/iroh-on-QUIC-multipath |
| 51 | + |
| 52 | +- The address the NAT router is going to be using for the other endpoint – this is where you have to send your holepunching datagrams. |
| 53 | + |
| 54 | +The second part is often called "address discovery", and it seems an impossible task. |
| 55 | +How are we supposed to predict how a random router on the internet is going to behave? |
| 56 | + |
| 57 | +# NAT Types |
| 58 | + |
| 59 | +NAT routers have existed for a very long time, |
| 60 | +and as the world has tried to understand them many words have been spilled classifying and naming them. |
| 61 | +It's a confusing mess. |
| 62 | +[RFC 4787] can be used as a jumping-off point to explore the bewildering number of updates and references to older RFCs. |
| 63 | +Practical people today mostly classify NATs into two types however: |
| 64 | + |
| 65 | +[RFC 4787]: https://datatracker.ietf.org/doc/rfc4787/ |
| 66 | + |
| 67 | +- Destination Endpoint Independent |
| 68 | +- Destination Endpoint Dependent |
| 69 | + |
| 70 | +Let's unpack that a bit more: |
| 71 | +a NAT router's job is to map an internal IP & port to an external IP & port, |
| 72 | +or let's call this mapping an *internal address* to an *external address* for simplicity.[^addr] |
| 73 | +When a new connection is created from inside the network an endpoint binds a socket on an internal source address, |
| 74 | +usually leaving exact IP & port choices to the kernel. |
| 75 | +When this endpoint sends out a datagram to the internet, |
| 76 | +the NAT router creates a mapping and sends the datagram from an external address of its choosing. |
| 77 | +Incoming datagrams to this external address are then looked up in the mapping table to deliver back to the original source address of the endpoint. |
| 78 | + |
| 79 | +[^addr]: Technically we are dealing with *socket addresses*, which on IPv4 is indeed an IP address and port, |
| 80 | + but IPv6 adds in a scope and flow label into the socket address. |
| 81 | + These fields have some advanced uses but are often ignored, |
| 82 | + so it is easier to think of an IP & port 2-tuple. |
| 83 | + So naming this *address* is a bit of a handwavy term, |
| 84 | + though sufficient to understand the needed logic. |
| 85 | + |
| 86 | +For a Destination Endpoint Independent mapping the mapping is straightforward: |
| 87 | +each unique source address is mapped to one of the available external addresses (an IP address & port combination), |
| 88 | +*regardless* of the destination address of the datagram. |
| 89 | +That means a single source address can send datagrams to many destinations on the internet, |
| 90 | +and they will all share the same external address on the NAT router. |
| 91 | + |
| 92 | +For a Destination Endpoint Dependent mapping there could be several variations. |
| 93 | +However, a home router typically only has one external IP address, so only the external port can change. |
| 94 | +So the NAT router can pick a new port for each destination, even if the source address remains the same. |
| 95 | + |
| 96 | +Now think back to holepunching: |
| 97 | +you need to know the external address the NAT router will map to, |
| 98 | +in order to send the holepunching datagrams to each other at the same time. |
| 99 | +With Destination Endpoint *Independent* NAT you can use the information from another connection for this. |
| 100 | +Destination Endpoint *Dependent* NAT however makes this much harder. |
| 101 | +There are still tricks you can do, but for now iroh does not yet support this. |
| 102 | + |
| 103 | + |
| 104 | +# Reflexive Transport Address |
| 105 | + |
| 106 | +This brings us to the fancy term "Reflexive Transport Address". |
| 107 | +Consider you are a server sitting on the internet and you receive a datagram from an endpoint behind a NAT router. |
| 108 | +The IP header of the received datagram contains the source IP address, |
| 109 | +while the UDP header contains the source port number. |
| 110 | +The IP & port combination the server sees is the external address, the mapped address the NAT router made. |
| 111 | +To send a response, the server needs to send a datagram addressed to this observed source address. |
| 112 | + |
| 113 | +In other words, the source address the server *observes*, |
| 114 | +is the address it sends responses to. |
| 115 | +Thus you can build a server that informs a client endpoint about the client's address as observed by the server. |
| 116 | +To the client this is the *Reflexive Transport Address*. |
| 117 | + |
| 118 | +If the client is behind a NAT router this will be a different address than the client itself is sending from. |
| 119 | +So a client can use this to detect if it is behind a NAT. |
| 120 | +A client can go even further and use multiple such servers: |
| 121 | +now if it receives the same reflexive transport address twice, |
| 122 | +it is behind a Destination Endpoint Independent NAT. |
| 123 | +If it receives two different reflexive transport addresses, |
| 124 | +it is stuck behind a Destination Endpoint Dependent NAT. |
| 125 | + |
| 126 | + |
| 127 | +# STUN |
| 128 | + |
| 129 | +Naturally such servers have existed for a while. |
| 130 | +As part of the standardization around audio-video calls in the form of SIP and WebRTC, |
| 131 | +there was a need for endpoints to learn about their reflexive transport addresses. |
| 132 | +For this the STUN spec was created, |
| 133 | +which by now has evolved into [RFC 8489]. |
| 134 | +A sizable tome. |
| 135 | + |
| 136 | +[RFC 8489]: https://datatracker.ietf.org/doc/html/rfc8489 |
| 137 | + |
| 138 | +Not going to lie about it: I've never read the full STUN spec. |
| 139 | +It contains a lot and can do many things. |
| 140 | +And yet, the part `iroh` actively used is surprisingly small: |
| 141 | + |
| 142 | +- Generate a STUN transaction ID, just a few random bytes. |
| 143 | +- Send a STUN request to a STUN server in a UDP datagram. |
| 144 | +- Wait for a response from the server which matches the request's transaction ID. |
| 145 | + |
| 146 | +That's it. |
| 147 | + |
| 148 | +So why change working systems? |
| 149 | +Let's look at what we don't get from this: |
| 150 | + |
| 151 | +- Encryption. |
| 152 | + While in theory you can encrypt STUN requests using DTLS it's not something that is done much. |
| 153 | + |
| 154 | +- Reliability. |
| 155 | + It's a simple UDP-based protocol. |
| 156 | + If the request is lost you eventually time out and need to resend it – very primitive. |
| 157 | + |
| 158 | +- Congestion Control. |
| 159 | + You will be sending application traffic over the same sockets as the STUN datagrams. |
| 160 | + However, STUN requests are sent outside of the normal flow of data, |
| 161 | + which makes packet loss much more likely if the application is busy. |
| 162 | + |
| 163 | +All of these are things that are solved in QUIC: |
| 164 | +QUIC is a secure, reliable transport with advanced congestion control and loss detection. |
| 165 | +And we already use it for our application protocol so we won't have two different endpoints sending and receiving on the same socket. |
| 166 | + |
| 167 | + |
| 168 | +# QUIC Address Discovery |
| 169 | + |
| 170 | +This is such an obvious idea that someone already wrote it down as an IETF draft (thanks Marten and Christian!): |
| 171 | +https://quicwg.org/address-discovery/draft-ietf-quic-address-discovery.html |
| 172 | + |
| 173 | +QUIC Address Discovery, or QAD as we call it, is an extension to the QUIC protocol that gets negotiated during the QUIC handshake. |
| 174 | +If negotiated, |
| 175 | +the remote side will send you an OBSERVED_ADDRESS frame containing the reflexive transport address it observed for you. |
| 176 | + |
| 177 | +One of the cool things is that this can happen regardless of the application protocol being used, |
| 178 | +as it happens entirely in QUIC frames. |
| 179 | +So you can still use this connection to carry application data. |
| 180 | + |
| 181 | +Another really nice feature flowing from this is that this isn't a request-response protocol anymore. |
| 182 | +QUIC supports connection migration for clients, |
| 183 | +e.g. when your NAT router updates the mapping for some reason, |
| 184 | +or when you move from a Wifi network to mobile data, |
| 185 | +QUIC will detect this and migrate the connection to this new network, |
| 186 | +without losing any data or breaking the connection. |
| 187 | +And whenever that happens while the QAD extension is negotiated, |
| 188 | +a new reflexive transport address is observed and will be sent in a new OBSERVED_ADDRESS frame. |
| 189 | +Thus this becomes event-based rather than request-response. |
| 190 | + |
| 191 | + |
| 192 | +# QAD in `iroh` Relay Servers |
| 193 | + |
| 194 | +Since `iroh` 0.32, `iroh` and the relay servers have supported, |
| 195 | +and used, both QAD as well as STUN. |
| 196 | +Since the 0.90 release we have switched to QAD exclusively. |
| 197 | + |
| 198 | +The work is not finished yet though. |
| 199 | +Iroh still uses a special-purpose QUIC connection for QAD. |
| 200 | +At some point we would like to also support making the normal relay connection over QUIC when possible, |
| 201 | +in addition to the current HTTPS1.1/WebSocket connection. |
| 202 | +This would be one fewer connection to the relay server and truly allow us to benefit from the event-based nature of QAD. |
| 203 | +This is something for after the 1.0 release however. |
| 204 | + |
| 205 | + |
| 206 | +** ** |
| 207 | + |
| 208 | +### Footnotes |
0 commit comments