Skip to content

Commit e33b931

Browse files
flubramfoxmatheus23
authored
QAD blog post (#388)
Write a QAD blog post Co-authored-by: ramfox <[email protected]> Co-authored-by: Philipp Krüger <[email protected]>
1 parent d4e58f3 commit e33b931

File tree

1 file changed

+208
-0
lines changed

1 file changed

+208
-0
lines changed

src/app/blog/qad/page.mdx

Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
import { BlogPostLayout } from '@/components/BlogPostLayout'
2+
import {ThemeImage} from '@/components/ThemeImage'
3+
4+
export const post = {
5+
draft: false,
6+
author: 'Floris Bruynooghe',
7+
date: '2025-09-01',
8+
title: 'Moving from STUN to QUIC Address Discovery',
9+
description:
10+
"Moving STUN into QUIC",
11+
}
12+
13+
export const metadata = {
14+
title: post.title,
15+
description: post.description,
16+
openGraph: {
17+
title: post.title,
18+
description: post.description,
19+
images: [{
20+
url: `/api/og?title=Blog&subtitle=${post.title}`,
21+
width: 1200,
22+
height: 630,
23+
alt: post.title,
24+
type: 'image/png',
25+
}],
26+
type: 'article'
27+
}
28+
}
29+
30+
export default (props) => <BlogPostLayout article={post} {...props} />
31+
32+
# Holepunching
33+
34+
As you probably know, `iroh` is in the business of holepunching.
35+
The typical scenario is establishing a direct QUIC connection between two devices, like laptops or phones, both on different home networks.
36+
Home networks tend to have a [NAT] router in front of them,
37+
and tend to block new incoming connections even when using IPv6.
38+
To be fair: blocking random incoming connections to a home network is a sensible choice.
39+
40+
[NAT]: https://en.wikipedia.org/wiki/Network_address_translation
41+
42+
The simplified theory of how UDP holepunching works is that both endpoints send a packet to each other at the same time.
43+
Both routers see the *outgoing* datagram first, and when they receive the *incoming* datagram, it is considered to be the same connection and is allowed in.
44+
To achieve this in practice you need two things:
45+
46+
- A means of communicating the coordination.
47+
Iroh uses the relay server as a network path between the two endpoints for this.
48+
We explained this in more detail in the [iroh on QUIC Multipath] post.
49+
50+
[iroh on QUIC Multipath]: https://www.iroh.computer/blog/iroh-on-QUIC-multipath
51+
52+
- The address the NAT router is going to be using for the other endpoint – this is where you have to send your holepunching datagrams.
53+
54+
The second part is often called "address discovery", and it seems an impossible task.
55+
How are we supposed to predict how a random router on the internet is going to behave?
56+
57+
# NAT Types
58+
59+
NAT routers have existed for a very long time,
60+
and as the world has tried to understand them many words have been spilled classifying and naming them.
61+
It's a confusing mess.
62+
[RFC 4787] can be used as a jumping-off point to explore the bewildering number of updates and references to older RFCs.
63+
Practical people today mostly classify NATs into two types however:
64+
65+
[RFC 4787]: https://datatracker.ietf.org/doc/rfc4787/
66+
67+
- Destination Endpoint Independent
68+
- Destination Endpoint Dependent
69+
70+
Let's unpack that a bit more:
71+
a NAT router's job is to map an internal IP & port to an external IP & port,
72+
or let's call this mapping an *internal address* to an *external address* for simplicity.[^addr]
73+
When a new connection is created from inside the network an endpoint binds a socket on an internal source address,
74+
usually leaving exact IP & port choices to the kernel.
75+
When this endpoint sends out a datagram to the internet,
76+
the NAT router creates a mapping and sends the datagram from an external address of its choosing.
77+
Incoming datagrams to this external address are then looked up in the mapping table to deliver back to the original source address of the endpoint.
78+
79+
[^addr]: Technically we are dealing with *socket addresses*, which on IPv4 is indeed an IP address and port,
80+
but IPv6 adds in a scope and flow label into the socket address.
81+
These fields have some advanced uses but are often ignored,
82+
so it is easier to think of an IP & port 2-tuple.
83+
So naming this *address* is a bit of a handwavy term,
84+
though sufficient to understand the needed logic.
85+
86+
For a Destination Endpoint Independent mapping the mapping is straightforward:
87+
each unique source address is mapped to one of the available external addresses (an IP address & port combination),
88+
*regardless* of the destination address of the datagram.
89+
That means a single source address can send datagrams to many destinations on the internet,
90+
and they will all share the same external address on the NAT router.
91+
92+
For a Destination Endpoint Dependent mapping there could be several variations.
93+
However, a home router typically only has one external IP address, so only the external port can change.
94+
So the NAT router can pick a new port for each destination, even if the source address remains the same.
95+
96+
Now think back to holepunching:
97+
you need to know the external address the NAT router will map to,
98+
in order to send the holepunching datagrams to each other at the same time.
99+
With Destination Endpoint *Independent* NAT you can use the information from another connection for this.
100+
Destination Endpoint *Dependent* NAT however makes this much harder.
101+
There are still tricks you can do, but for now iroh does not yet support this.
102+
103+
104+
# Reflexive Transport Address
105+
106+
This brings us to the fancy term "Reflexive Transport Address".
107+
Consider you are a server sitting on the internet and you receive a datagram from an endpoint behind a NAT router.
108+
The IP header of the received datagram contains the source IP address,
109+
while the UDP header contains the source port number.
110+
The IP & port combination the server sees is the external address, the mapped address the NAT router made.
111+
To send a response, the server needs to send a datagram addressed to this observed source address.
112+
113+
In other words, the source address the server *observes*,
114+
is the address it sends responses to.
115+
Thus you can build a server that informs a client endpoint about the client's address as observed by the server.
116+
To the client this is the *Reflexive Transport Address*.
117+
118+
If the client is behind a NAT router this will be a different address than the client itself is sending from.
119+
So a client can use this to detect if it is behind a NAT.
120+
A client can go even further and use multiple such servers:
121+
now if it receives the same reflexive transport address twice,
122+
it is behind a Destination Endpoint Independent NAT.
123+
If it receives two different reflexive transport addresses,
124+
it is stuck behind a Destination Endpoint Dependent NAT.
125+
126+
127+
# STUN
128+
129+
Naturally such servers have existed for a while.
130+
As part of the standardization around audio-video calls in the form of SIP and WebRTC,
131+
there was a need for endpoints to learn about their reflexive transport addresses.
132+
For this the STUN spec was created,
133+
which by now has evolved into [RFC 8489].
134+
A sizable tome.
135+
136+
[RFC 8489]: https://datatracker.ietf.org/doc/html/rfc8489
137+
138+
Not going to lie about it: I've never read the full STUN spec.
139+
It contains a lot and can do many things.
140+
And yet, the part `iroh` actively used is surprisingly small:
141+
142+
- Generate a STUN transaction ID, just a few random bytes.
143+
- Send a STUN request to a STUN server in a UDP datagram.
144+
- Wait for a response from the server which matches the request's transaction ID.
145+
146+
That's it.
147+
148+
So why change working systems?
149+
Let's look at what we don't get from this:
150+
151+
- Encryption.
152+
While in theory you can encrypt STUN requests using DTLS it's not something that is done much.
153+
154+
- Reliability.
155+
It's a simple UDP-based protocol.
156+
If the request is lost you eventually time out and need to resend it – very primitive.
157+
158+
- Congestion Control.
159+
You will be sending application traffic over the same sockets as the STUN datagrams.
160+
However, STUN requests are sent outside of the normal flow of data,
161+
which makes packet loss much more likely if the application is busy.
162+
163+
All of these are things that are solved in QUIC:
164+
QUIC is a secure, reliable transport with advanced congestion control and loss detection.
165+
And we already use it for our application protocol so we won't have two different endpoints sending and receiving on the same socket.
166+
167+
168+
# QUIC Address Discovery
169+
170+
This is such an obvious idea that someone already wrote it down as an IETF draft (thanks Marten and Christian!):
171+
https://quicwg.org/address-discovery/draft-ietf-quic-address-discovery.html
172+
173+
QUIC Address Discovery, or QAD as we call it, is an extension to the QUIC protocol that gets negotiated during the QUIC handshake.
174+
If negotiated,
175+
the remote side will send you an OBSERVED_ADDRESS frame containing the reflexive transport address it observed for you.
176+
177+
One of the cool things is that this can happen regardless of the application protocol being used,
178+
as it happens entirely in QUIC frames.
179+
So you can still use this connection to carry application data.
180+
181+
Another really nice feature flowing from this is that this isn't a request-response protocol anymore.
182+
QUIC supports connection migration for clients,
183+
e.g. when your NAT router updates the mapping for some reason,
184+
or when you move from a Wifi network to mobile data,
185+
QUIC will detect this and migrate the connection to this new network,
186+
without losing any data or breaking the connection.
187+
And whenever that happens while the QAD extension is negotiated,
188+
a new reflexive transport address is observed and will be sent in a new OBSERVED_ADDRESS frame.
189+
Thus this becomes event-based rather than request-response.
190+
191+
192+
# QAD in `iroh` Relay Servers
193+
194+
Since `iroh` 0.32, `iroh` and the relay servers have supported,
195+
and used, both QAD as well as STUN.
196+
Since the 0.90 release we have switched to QAD exclusively.
197+
198+
The work is not finished yet though.
199+
Iroh still uses a special-purpose QUIC connection for QAD.
200+
At some point we would like to also support making the normal relay connection over QUIC when possible,
201+
in addition to the current HTTPS1.1/WebSocket connection.
202+
This would be one fewer connection to the relay server and truly allow us to benefit from the event-based nature of QAD.
203+
This is something for after the 1.0 release however.
204+
205+
206+
** **
207+
208+
### Footnotes

0 commit comments

Comments
 (0)