Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,8 @@ public void checkProfilerWorking() {
try (AutoConfigTestProperties props =
new AutoConfigTestProperties()
.put(InferredSpansAutoConfig.ENABLED_OPTION, "true")
.put(InferredSpansAutoConfig.DURATION_OPTION, "500ms")
.put(InferredSpansAutoConfig.INTERVAL_OPTION, "500ms")
.put(InferredSpansAutoConfig.DURATION_OPTION, "1000ms")
.put(InferredSpansAutoConfig.INTERVAL_OPTION, "1000ms")
.put(InferredSpansAutoConfig.SAMPLING_INTERVAL_OPTION, "5ms")) {
OpenTelemetry otel = GlobalOpenTelemetry.get();
List<SpanProcessor> processors = OtelReflectionUtils.getSpanProcessors(otel);
Expand Down Expand Up @@ -159,7 +159,7 @@ public void checkProfilerWorking() {

private static void doSleep() {
try {
Thread.sleep(100);
Thread.sleep(300);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this has to be long enough for the inferred span code to observe it? WDYT if instead of sleeping you did something like

Span tx = tracer.spanBuilder("my-root").startSpan();
try (Scope scope = tx.makeCurrent()) {
  await().until(() -> !AutoConfiguredDataCapture.getSpans().isEmpty());
} finally {
  tx.end();
}

could that result in a more reliable test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inferred span would not be created before the span ends because that's how it infers the time. So that suggestion should timeout.

The current sequence should be

  1. profiler is started (we waited for that or assertion would have failed)
  2. we start a 100ms span
  3. the profiler is sampling at 5ms intervals so more than enough traces to infer a span (we only need 2, and we'd expect around 20 here)
  4. the span finishes and we generate first the span, then after a short delay the inferred span
  5. the await is waiting for both spans

Hmm, actually as I write this, it seems to me that the failure I am suggesting fixing would be a timeout failure. In fact shouldn't this should only fail with a timeout and the "size of" as a sub-reason why it timed out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inferred span would not be created before the span ends because that's how it infers the time. So that suggestion should timeout.

That is unfortunate. My experience is that short timeouts in tests can cause instability. In CI there may be things running in parallel which could also have a negative effect. I try with a 1s sleep.

the profiler is sampling at 5ms intervals so more than enough traces to infer a span (we only need 2, and we'd expect around 20 here)

according to output in https://scans.gradle.com/s/k77oi6bbrx2ew/tests/task/:inferred-spans:test/details/io.opentelemetry.contrib.inferredspans.InferredSpansAutoConfigTest/checkProfilerWorking()?page=eyJvdXRwdXQiOnsiMCI6MX19&top-execution=1 you got 0 stack traces

FINE: Parsing JFR chunk at offset	
Aug 20, 2025 8:51:13 PM io.opentelemetry.contrib.inferredspans.internal.SamplingProfiler processTraces	
FINE: Processing 0 stack traces	
Aug 20, 2025 8:51:13 PM io.opentelemetry.contrib.inferredspans.internal.SamplingProfiler$ActivationEvent handle	
FINE: Handling event timestamp=3,215,835,057,958 root=true threadId=1 activation=true	
Aug 20, 2025 8:51:14 PM io.opentelemetry.contrib.inferredspans.internal.SamplingProfiler$ActivationEvent startProfiling	
FINE: Create call tree (ImmutableSpanContext{traceId=7dcd14b33b36f375ace29d344a0a3118, spanId=824f6b362b41c0af, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}(clock-anchor: 1755719857533603501)) for thread 1	
Aug 20, 2025 8:51:14 PM io.opentelemetry.contrib.inferredspans.internal.SamplingProfiler$ActivationEvent handle	
FINE: Handling event timestamp=3,216,133,279,166 root=true threadId=1 activation=false	
Aug 20, 2025 8:51:14 PM io.opentelemetry.contrib.inferredspans.internal.SamplingProfiler$ActivationEvent stopProfiling	
FINE: End call tree (ImmutableSpanContext{traceId=7dcd14b33b36f375ace29d344a0a3118, spanId=824f6b362b41c0af, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}(clock-anchor: 1755719857533603501)) for thread 1	
Aug 20, 2025 8:51:14 PM io.opentelemetry.contrib.inferredspans.internal.SamplingProfiler$ActivationEvent stopProfiling	
FINE: Created no spans for thread 1 (count=0)	
Aug 20, 2025 8:51:14 PM io.opentelemetry.contrib.inferredspans.internal.SamplingProfiler processTraces	
FINE: Processing traces took 41,626us

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which implies that the profiler was suspended for the whole 100ms sleep. That's going to be rare but possible in the CI shared env world, and would explain why the error is rare. I'll bump the numbers even more, and apply to the other test failure as well

} catch (InterruptedException e) {
throw new RuntimeException(e);
}
Expand Down
Loading