Skip to content

Conversation

@grcevski
Copy link
Contributor

@grcevski grcevski commented Oct 30, 2025

This PR fixes the DB and messaging semantic conventions for metrics to include the server.address field. We are currently generating it as an IP address, so I used the new DNS code to implement an eBPF based RDNS for application metrics. I've only tested this in one scenario, the OATS SQL test, and we probably want to do more testing before we enable it by default.

I'm leaving couple of comments in the PR code to explain some of the changes.

Closes #753

@grcevski grcevski requested a review from a team as a code owner October 30, 2025 19:01
u8 p_type;
u8 dns_q;
u8 _pad1[1];
u8 _pad1[2];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed an unused field.

// with other instrumented processes
pid_info pid;
unsigned char buf[k_dns_max_len];
u8 _pad3[4];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to clamp_max a size for the BPF buffer, so I needed a power of 2. I've switched to 512 and I've added a padding.

bpf_clamp_umax(len, 512);
populate_dns_record(req, &p_conn, orig_dport, len, qr, hdr.id, conn_pid);

read_skb_bytes(skb, dns_off, req->buf, len);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a nasty bug. We were reading up to the max of req->buf, but read_skb_bytes failed to grab the last chunk of 16 bytes since we weren't specifically reading exactly as we should. This left us with garbage at the end.

@codecov
Copy link

codecov bot commented Oct 30, 2025

Codecov Report

❌ Patch coverage is 30.18868% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.37%. Comparing base (294ca1a) to head (876ed59).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pkg/transform/name_resolver.go 16.66% 22 Missing and 3 partials ⚠️
pkg/internal/rdns/store/memory.go 35.71% 8 Missing and 1 partial ⚠️
pkg/internal/netolly/flow/reverse_dns.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #849      +/-   ##
==========================================
- Coverage   55.42%   55.37%   -0.05%     
==========================================
  Files         251      251              
  Lines       21400    21443      +43     
==========================================
+ Hits        11861    11875      +14     
- Misses       8731     8757      +26     
- Partials      808      811       +3     
Flag Coverage Δ
integration-test 23.28% <30.18%> (+0.04%) ⬆️
integration-test-arm 0.00% <0.00%> (?)
integration-test-vm-${ARCH}-${KERNEL_VERSION} 0.00% <0.00%> (ø)
k8s-integration-test 2.78% <0.00%> (+<0.01%) ⬆️
oats-test 0.00% <0.00%> (ø)
unittests 46.50% <22.64%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

pid_connection_info_t *p_conn,
u16 orig_dport) {

if (size < sizeof(struct dnshdr)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add an additional way to read the DNS traffic since if OBI is configured with host network, it may not be able to read some internal pod traffic, in this case it wasn't able to see the docker internal DNS traffic between the docker network and the service itself. I reproduced it in our test examples, with setting network_mode: host for OBI.

In this scenario, the socket_filter doesn't see the traffic, but the udp_sendmsg and sock_recvmsg kprobes fire. I added a userspace buffer helper here so we can capture those DNS requests too.

If OBI is running in the network of the target process, we'll see the event twice, once from the sock_filter another time by the kprobes. This isn't a problem because we deduplicate the DNS requests in user space and won't let them be emitted twice.

return &InMemory{
entries: map[string][]string{},
func NewInMemory(cacheSize int) (*InMemory, error) {
cache, err := simplelru.NewLRU[string, []string](cacheSize, nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this implementation to ensure this cache is memory capped.

Copy link
Contributor

@mmat11 mmat11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@grcevski grcevski merged commit 82fdcc3 into open-telemetry:main Oct 31, 2025
51 checks passed
@grcevski grcevski deleted the update_db_semconv branch October 31, 2025 14:07
@MrAlias MrAlias added this to the v0.2.0 milestone Nov 3, 2025
@MrAlias MrAlias mentioned this pull request Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DB Semantic convention attributes need to be updated

3 participants