Skip to content

Conversation

@mahendrabishnoi2
Copy link
Member

@mahendrabishnoi2 mahendrabishnoi2 commented Sep 17, 2025

Fixes #7004

This PR adds support for experimental otel.sdk.processor.span.processed metric in simple span processor.
Definition of metric at: https://github.com/open-telemetry/semantic-conventions/blob/v1.36.0/docs/otel/sdk-metrics.md

Experimental metrics are behind a feature flag: OTEL_GO_X_OBSERVABILITY

Observability Implementation Checklist

Observability Implementation Checklist

Based on the project Observability guidelines, ensure the following are completed:

Environment Variable Activation

  • Observability features are disabled by default
  • Features are activated through the OTEL_GO_X_OBSERVABILITY environment variable
  • Use consistent pattern with x.Observability.Enabled() check 1
  • Follow established experimental feature pattern 23

Encapsulation

  • Instrumentation is encapsulated within a dedicated struct (e.g., Instrumentation)
  • Instrumentation is not mixed into the instrumented component
  • Instrumentation code is in its own file or package if complex/reused
  • Instrumentation setup doesn't bloat the main component code

Initialization

  • Initialization is only done when observability is enabled
  • Setup is explicit and side-effect free
  • Return errors from initialization when appropriate
  • Use the global Meter provider (e.g., otel.GetMeterProvider())
  • Include proper meter configuration with:
    • Instrumentation package name is used for the Meter
    • Instrumentation version (e.g. Version)
    • Schema URL (e.g. SchemaURL)

Performance

  • Little to no overhead when observability is disabled
  • Expensive operations are only executed when observability is enabled
  • When enabled, instrumentation code paths are optimized to reduce allocation/computation overhead

Attribute and Option Allocation Management

  • Use sync.Pool for attribute slices and options with dynamic attributes
  • Pool objects are properly reset before returning to pool
  • Pools are scoped for maximum efficiency while ensuring correctness

Caching

  • Static attribute sets known at compile time are pre-computed and cached
  • Common attribute combinations use lookup tables/maps

Benchmarking

  • Benchmarks provided for all instrumentation code
  • Benchmark scenarios include both enabled and disabled observability
  • Benchmark results show impact on allocs/op, B/op, and ns/op (use b.ReportAllocs() in benchmarks)

Error Handling and Robustness

  • Errors are reported back to caller when possible
  • Partial failures are handled gracefully
  • Use partially initialized components when available
  • Return errors to caller instead of only using otel.Handle()
  • Use otel.Handle() only when component cannot report error to user

Context Propagation

  • Observability measurements receive the context from the function being measured (don't break context propagation by using context.Background())

Semantic Conventions Compliance

  • All metrics follow OpenTelemetry Semantic Conventions
  • Use the otelconv convenience package for metric semantic conventions
  • Component names follow semantic conventions
  • Use package path scope type as stable identifier for non-standard components
  • Component names are stable unique identifiers
  • Use global counter for uniqueness if necessary
  • Component ID counter is resettable for deterministic testing

Testing

  • Use deterministic testing with isolated state
  • Restore previous state after tests (t.Cleanup())
  • Isolate meter provider for testing
  • Use t.Setenv() for environment variable testing
  • Reset component ID counter for deterministic component names
  • Test order doesn't affect results

Benchmarks

> benchstat bmark.result
goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/otel/sdk/trace/internal/observ
cpu: Apple M1 Pro
                             │ bmark.result │
                             │    sec/op    │
SSP/SpanProcessed-8            146.7n ± 15%
SSP/SpanProcessedWithError-8   205.1n ±  3%
geomean                        173.5n

                             │ bmark.result │
                             │     B/op     │
SSP/SpanProcessed-8              280.0 ± 0%
SSP/SpanProcessedWithError-8     408.0 ± 0%
geomean                          338.0

                             │ bmark.result │
                             │  allocs/op   │
SSP/SpanProcessed-8              3.000 ± 0%
SSP/SpanProcessedWithError-8     3.000 ± 0%
geomean                          3.000

Footnotes

  1. https://github.com/open-telemetry/opentelemetry-go/blob/e4ab3141123d0811125a69823dbbe4d9ec5a9b8f/exporters/stdout/stdouttrace/internal/observ/instrumentation.go#L101-L103

  2. https://github.com/open-telemetry/opentelemetry-go/blob/e4ab3141123d0811125a69823dbbe4d9ec5a9b8f/exporters/stdout/stdouttrace/internal/x/x.go

  3. https://github.com/open-telemetry/opentelemetry-go/blob/e4ab3141123d0811125a69823dbbe4d9ec5a9b8f/sdk/internal/x/x.go

@codecov
Copy link

codecov bot commented Sep 17, 2025

Codecov Report

❌ Patch coverage is 95.58824% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.3%. Comparing base (2e5fdd1) to head (3a89a59).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sdk/trace/simple_span_processor.go 82.3% 2 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #7374   +/-   ##
=====================================
  Coverage   86.3%   86.3%           
=====================================
  Files        294     295    +1     
  Lines      25987   26054   +67     
=====================================
+ Hits       22427   22495   +68     
+ Misses      3187    3186    -1     
  Partials     373     373           
Files with missing lines Coverage Δ
sdk/trace/internal/observ/simple_span_processor.go 100.0% <100.0%> (ø)
sdk/trace/simple_span_processor.go 80.8% <82.3%> (+5.3%) ⬆️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mahendrabishnoi2 mahendrabishnoi2 marked this pull request as ready for review September 27, 2025 08:04
@MrAlias
Copy link
Contributor

MrAlias commented Oct 8, 2025

@mahendrabishnoi2 based your results:

> benchstat bmark.result
goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/otel/sdk/trace/internal/observ
cpu: Apple M1 Pro
                             │ bmark.result │
                             │    sec/op    │
SSP/SpanProcessed-8            146.7n ± 15%
SSP/SpanProcessedWithError-8   205.1n ±  3%
geomean                        173.5n

                             │ bmark.result │
                             │     B/op     │
SSP/SpanProcessed-8              280.0 ± 0%
SSP/SpanProcessedWithError-8     408.0 ± 0%
geomean                          338.0

                             │ bmark.result │
                             │  allocs/op   │
SSP/SpanProcessed-8              3.000 ± 0%
SSP/SpanProcessedWithError-8     3.000 ± 0%
geomean                          3.000

@mahendrabishnoi2
Copy link
Member Author

My bad, I forgot to run the results through benchstat.

Copy link
Contributor

@MrAlias MrAlias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good. The memory management of attributes during the measurement needs to be optimized though.

flc1125

This comment was marked as outdated.

@mahendrabishnoi2

This comment was marked as outdated.

@mahendrabishnoi2
Copy link
Member Author

Updated benchmarks
0 allocation and 90% decrease in op/s for happy path (without error)

ssp_3.txt
goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/otel/sdk/trace/internal/observ
cpu: Apple M1 Pro
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.6945 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.9358 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.7210 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.7109 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.6756 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.6617 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.6618 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.6620 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.6620 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessed-8         	1000000000	         0.6700 ns/op	       0 B/op	       0 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 9040986	       130.8 ns/op	     232 B/op	       3 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 9043536	       132.3 ns/op	     232 B/op	       3 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 9094186	       132.2 ns/op	     232 B/op	       3 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 9013471	       135.0 ns/op	     232 B/op	       3 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 9140390	       136.9 ns/op	     232 B/op	       3 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 8351335	       142.4 ns/op	     232 B/op	       3 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 9088309	       133.6 ns/op	     232 B/op	       3 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 8878872	       133.3 ns/op	     232 B/op	       3 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 9043981	       137.5 ns/op	     232 B/op	       3 allocs/op
BenchmarkSSP/SpanProcessedWithError-8         	 8916673	       134.7 ns/op	     232 B/op	       3 allocs/op
PASS
ok  	go.opentelemetry.io/otel/sdk/trace/internal/observ	21.455s
goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/otel/sdk/trace/internal/observ
cpu: Apple M1 Pro
                             │    ssp_1.txt    │              ssp_3.txt               │
                             │     sec/op      │    sec/op     vs base                │
SSP/SpanProcessed-8            146.7000n ± 15%   0.6728n ± 7%  -99.54% (p=0.000 n=10)
SSP/SpanProcessedWithError-8      205.1n ±  3%    134.2n ± 2%  -34.59% (p=0.000 n=10)
geomean                           173.5n          9.500n       -94.52%

                             │ ssp_1.txt  │                ssp_3.txt                │
                             │    B/op    │    B/op     vs base                     │
SSP/SpanProcessed-8            280.0 ± 0%     0.0 ± 0%  -100.00% (p=0.000 n=10)
SSP/SpanProcessedWithError-8   408.0 ± 0%   232.0 ± 0%   -43.14% (p=0.000 n=10)
geomean                        338.0                    ?                       ¹ ²
¹ summaries must be >0 to compute geomean
² ratios must be >0 to compute geomean

                             │ ssp_1.txt  │                ssp_3.txt                │
                             │ allocs/op  │ allocs/op   vs base                     │
SSP/SpanProcessed-8            3.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
SSP/SpanProcessedWithError-8   3.000 ± 0%   3.000 ± 0%         ~ (p=1.000 n=10) ¹
geomean                        3.000                    ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

@dmathieu dmathieu requested a review from flc1125 October 13, 2025 07:34
@MrAlias MrAlias added this to the v1.39.0 milestone Oct 16, 2025
@MrAlias
Copy link
Contributor

MrAlias commented Oct 17, 2025

@mahendrabishnoi2 this looks ready to merge. Please take a look at the conflicts and resolve them. sdk/trace/internal/x/README.md was moved to sdk/internal/x/README.md.

@MrAlias MrAlias merged commit f7d2882 into open-telemetry:main Oct 17, 2025
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Trace SDK observability - simple processor metrics

4 participants