You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/app/blog/lets-write-a-dht/page.mdx
+9-8Lines changed: 9 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,7 +47,7 @@ So a distributed hash table seen as a black box is just like a hashtable, but sp
47
47
48
48
## Keys
49
49
50
-
Just like a normal hash table, a distributed hash table maps some key type to some value type. Keys in local hash tables can be of arbitrary size. The key that is actually used for lookup is a (e.g. 64 bit) hash of the value, and the hash table has additional logic to deal with rare but inevitable hash collisions. For distributed hash tables, typically you restrict the key to a fixed size and let the application deal with the mapping from the actual key to the hash table keyspace. E.g. the bittorrent mainline DHT uses a 20 byte keyspace, which is the size of a SHA1 hash. The main purpose of the mainline DHT is to find content providers for data based on a SHA1 hash of the data. But even with mainline there are cases where the actual key you want to look up is larger than the keyspace, e.g. bep_0044 where you want to look up some information for an ED25519 public key. In that case mainline does exactly what you would do in a local hash table - it hashes the public key using SHA1 and then uses the hash as the lookup key.
50
+
Just like a normal hash table, a distributed hash table maps some key type to some value type. Keys in local hash tables can be of arbitrary size. The key that is actually used for lookup is a (e.g. 64 bit) hash of the value, and the hash table has additional logic to deal with rare but inevitable hash collisions. For distributed hash tables, typically you restrict the key to a fixed size and let the application deal with the mapping from the actual key to the hash table keyspace. E.g. the bittorrent mainline DHT uses a 20 byte keyspace, which is the size of a SHA1 hash. The main purpose of the mainline DHT is to find content providers for data based on a SHA1 hash of the data. But even with mainline there are cases where the actual key you want to look up is larger than the keyspace, e.g. [bep_0044] where you want to look up some information for an ED25519 public key. In that case mainline does exactly what you would do in a local hash table - it hashes the public key using SHA1 and then uses the hash as the lookup key.
51
51
52
52
For iroh we are mainly interested in looking up content based on its BLAKE3 hash. Another use case for the DHT is to look up information for an iroh node id, which is an ED25519 public key. So it makes sense for a clean room implementation to choose a 32 byte keyspace. An arbitrary size key can be mapped to this keyspace using a cryptographic hash function with an astronomically low probability of collisions.
53
53
@@ -79,9 +79,9 @@ As mentioned above, in a DHT not every node has all the data. So we need some me
79
79
80
80
## Kademlia
81
81
82
-
The most popular routing algorithm for DHTs is [Kademlia]. The core idea of kademlia is to define a [metric] that gives a scalar distance between any two keys (points in the metric space) that fulfills the metric axioms. DHT nodes have a node id that gets mapped to the metric space, and you store the data on the `k` nodes that are closest to the key.
82
+
The most popular routing algorithm for DHTs is [Kademlia]. The core idea of Kademlia is to define a [metric] that gives a scalar distance between any two keys (points in the metric space) that fulfills the metric axioms. DHT nodes have a node id that gets mapped to the metric space, and you store the data on the `k` nodes that are closest to the key.
83
83
84
-
The metric chosen by kademlia is the XOR metric: the distance of two keys `a` and `b` is simply the bitwise xor of the keys. This is absurdly cheap to compute and fulfills all the metric axioms. It also helps with sparse routing tables, as we will learn later.
84
+
The metric chosen by Kademlia is the XOR metric: the distance of two keys `a` and `b` is simply the bitwise xor of the keys. This is absurdly cheap to compute and fulfills all the metric axioms. It also helps with sparse routing tables, as we will learn later.
85
85
86
86
If a node had perfect knowledge of all other nodes in the network, it could give you a perfect answer to the question "where should I store the data for key `key`". Just sort the set of all keys that correspond to node ids by distance to the key and return the `k` smallest values. For small to medium DHTs this is a viable strategy, since modern computers can easily store millions of 32 byte keys without breaking a sweat. But for either extremely large DHTs or nodes with low memory requirements, it is desirable to store just a subset of all keys.
87
87
@@ -116,17 +116,17 @@ A key property of a DHT compared to more rigid algorithms is that nodes should b
116
116
117
117
# RPC protocol
118
118
119
-
Now that we have a very rough idea what a distributed hashtable is meant to do, let's start defining the protocol that nodes will use to talk to each other. We are going to use [irpc] to define the protocol. This has the advantage that we can simulate a DHT consisting of thousands of node in memory initially for tests, and then use the same code with iroh connections as the underlying transport in production.
119
+
Now that we have a very rough idea what a distributed hashtable is meant to do, let's start defining the protocol that nodes will use to talk to each other. We are going to use [irpc] to define the protocol. This has the advantage that we can simulate a DHT consisting of thousands of nodes in memory for tests, and then use the same code with iroh connections as the underlying transport in production.
120
120
121
121
First of all, we need a way to store and retrieve values. This is basically just a key value store API for a multimap. This protocol in isolation is sufficient to implement a tracker, a node that has full knowledge of what is where.
122
122
123
123
<Note>
124
-
Every type we use in the RPC protocol must be serializable using serde so we can [postcard]serialize it. Postcard is a non self-describing format, so we need to make sure to keep the order of the enum cases if we want the protocol to be long term stable. All rpc requests, responses and the overall rpc enum have the `#[derive(Debug, Serialize, Deserialize)]` annotation, but we will omit this from the examples below for brevity.
124
+
Every type we use in the RPC protocol must be serializable so we can serialize it using [postcard]. Postcard is a non self-describing format, so we need to make sure to keep the order of the enum cases if we want the protocol to be long term stable. All rpc requests, responses and the overall rpc enum have the `#[derive(Debug, Serialize, Deserialize)]` annotation, but we will omit this from the examples below for brevity.
125
125
</Note>
126
126
127
127
## Values
128
128
129
-
An id is just a 32 byte blob, with conversions from iroh::NodeId and blake3::Hash.
129
+
An id is just a 32 byte blob, with conversions from [iroh::NodeId](https://docs.rs/iroh/latest/iroh/type.NodeId.html) and [blake3::Hash](https://docs.rs/blake3/latest/blake3/struct.Hash.html).
130
130
```rust
131
131
pubstructId([u8; 32]);
132
132
```
@@ -308,7 +308,7 @@ impl MemStorage {
308
308
309
309
## Routing implementation
310
310
311
-
Now it looks like we have run out of simple things to do and need to actually implement the routing part. The routing API does not care how the routing table is organized internally - it could just as well be the full set of nodes. But we want to implement the kademlia algorithm to get that nice power law distribution.
311
+
Now it looks like we have run out of simple things to do and need to actually implement the routing part. The routing API does not care how the routing table is organized internally - it could just as well be the full set of nodes. But we want to implement the Kademlia algorithm to get that nice power law distribution.
312
312
313
313
So let's define the routing table. First of all we need some simple integer arithmetic like xor and leading_zeros for 256 bit numbers. There are various crates that provide this, but since we don't need anything fancy like multiplication or division, we just quickly implemented it inline.
314
314
@@ -354,7 +354,7 @@ Now assuming that the system has some way to find valid DHT nodes, all we need i
354
354
355
355
## Insertion
356
356
357
-
Insertion means first computing which bucket the node should go into, and then inserting at that index. Computing the bucket index is computing the xor distance to our own node id, then counting leading zeros and flipping the result around, since we want bucket 0 to contain the closest nodes and bucket 255 to contain the furthest away nodes as per kademlia convention.
357
+
Insertion means first computing which bucket the node should go into, and then inserting at that index. Computing the bucket index is computing the xor distance to our own node id, then counting leading zeros and flipping the result around, since we want bucket 0 to contain the closest nodes and bucket 255 to contain the furthest away nodes as per Kademlia convention.
358
358
359
359
```rust
360
360
fnbucket_index(&self, target:&[u8; 32]) ->usize {
@@ -1242,5 +1242,6 @@ The next step is to write tests using actual iroh connections. We will have to d
0 commit comments