You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Adds ability to shard the main consuming goroutines. Alleviates a major CPU bottleneck. See DESIGN.md for more explanation.
- Adds ability to configure size of the buffered chan waiting for consumption
- Optimises metric collection on hot path. Use async instrument for cache metrics.
- Move various operations where possible, out of the hot path.
Copy file name to clipboardExpand all lines: pkg/processor/atlassiansamplingprocessor/DESIGN.md
+13-17Lines changed: 13 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,8 +8,8 @@ It also contains information about how to run in a production environment.
8
8
9
9
This section describes, in order, the path a trace takes when consumed by this processor.
10
10
11
-
1.`ConsumeTraces()` is invoked. This blocks on send to an unbuffered`chan`, and then returns.
12
-
2.`consumeChan()` reads the`chan` and processes the traces.
11
+
1.`ConsumeTraces()` is invoked. This organises the data by trace ID and shard, does an early decision check, sends to the shard`chan`, and then returns.
12
+
2.`shardListener()` reads its assigned`chan` and processes the traces.
13
13
3. The data is organised by trace ID, and the main loop in `processor.go` processes the data one trace ID at a time.
14
14
4. The decision caches are accessed to determine if a sampling decision has already been made for the current trace ID.
15
15
If a prior decision exists, this allows us to streamline the processing. When the cache indicates that the trace has
@@ -25,25 +25,21 @@ Least Recently Used (LRU) basis, meaning that adding new data to the cache may i
25
25
least recently accessed trace (i.e. the trace that last received a new span the longest time ago). When a trace is
26
26
evicted, it is considered "not sampled" and added to the decision cache.
27
27
28
-
## Synchronized Goroutine
28
+
## Synchronization / Sharding
29
29
30
-
The main operation of the processor is executed as a single goroutine, synchronized through an unbuffered channel.
30
+
The main processing of this component is done by async goroutines (shard listeners) which read off "shards" (channels).
31
+
Trace data is sharded in `ConsumeTraces()` before being sent to the appropriate shard for processing.
31
32
32
33
In the collector architecture, receivers typically function as servers that accept and process data using multiple goroutines.
33
34
Consequently, processors like this one are invoked concurrently through the `ConsumeTraces()` method.
34
-
To ensure synchronization, the processor sends data to a channel, which is then received by a
35
-
dedicated goroutine (`consumeChan()`). This design guarantees that all data is processed by a single goroutine.
36
-
It draws inspiration from the core collector's batch processor.
37
-
38
-
The decision to synchronize is primarily driven by the need to maintain the integrity of internal
39
-
caches while keeping the design simple. Allowing concurrent access to cached trace data would complicate the
40
-
code significantly and potentially lead to bugs, as experienced in the upstream tail sampling processor.
41
-
42
-
This is, of course, a trade-off. The processing throughput is limited by the capacity of a single goroutine,
43
-
creating a potential bottleneck. This can be alleviated by deploying more instances of the processor with reduced
44
-
memory allocation per instance (e.g., more nodes, each with less memory). If the bottleneck becomes a significant issue,
45
-
a future enhancement could involve sharding the processor. This would involve splitting the processing workload by trace
46
-
ID and maintaining separate caches and states for each shard.
35
+
To ensure synchronization, the processor sends data to channels, which is then received by a
36
+
dedicated shard listener. Spans belonging to the same trace will all be sent to the same shard, and that shard will
37
+
be processed entirely synchronously by the same shard listener - this ensures data integrity because writes are limited
38
+
to these shard listeners. Each shard listener can be thought of "owning" a section of the caches.
39
+
40
+
All shard listeners still access the same caches as each other, so a global lock is employed for any operations that may affect data cross-shard.
41
+
The prime example of an operation like this, is the resizing of the caches, which can be performed by any shard listener, but may delete data
42
+
belonging to a different shard listener. So, a stop-the-world kind of halt occurs briefly while the cache gets resized.
0 commit comments