Skip to content

Conversation

@gjulianm
Copy link
Contributor

@gjulianm gjulianm commented Nov 4, 2025

What does this PR do?

This PR adds tagging support to the SLURM check, so that processes with GPU activity will get automatically tagged.

Tested in local slurm cluster.

Motivation

https://datadoghq.atlassian.net/browse/EBPF-885
Related agent PR DataDog/datadog-agent#42524 (integration will not fail if the agent PR is not merged, the agent will just return an empty tagset)

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@gjulianm gjulianm added the qa/skip-qa Automatically skip this PR for the next QA label Nov 4, 2025 — with Graphite App
@codecov
Copy link

codecov bot commented Nov 4, 2025

Codecov Report

❌ Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.00%. Comparing base (c1140d5) to head (0ec636c).
⚠️ Report is 14 commits behind head on master.

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@gjulianm gjulianm marked this pull request as ready for review November 4, 2025 11:59
@gjulianm gjulianm requested a review from a team as a code owner November 4, 2025 11:59
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 598 to 603
new_header = SCONTROL_TAG_MAPPING.get(header, f"slurm_{header.lower()}")
tags.append(f"{new_header}:{value}")

if new_header == "pid":
tags.extend(tagger.tag(f"process://{value}", tagger.ORCHESTRATOR))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard against missing process tags

The new call to tagger.tag assumes the tagger returns a list, but the tagger returns None when it has no tags for an entity (e.g., when the process is not yet known or the companion Agent change is not deployed). Extending tags with None raises TypeError: 'NoneType' object is not iterable, which will crash the SLURM check on nodes where process tags are unavailable—precisely the scenario the description says should be harmless. Wrap the result in or [] (or check for None) before extending so the check keeps running when no tags exist.

Useful? React with 👍 / 👎.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think tagger.tag will always return a list, but @codex address that feedback anyways just in case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions
Copy link

github-actions bot commented Nov 4, 2025

⚠️ The qa/skip-qa label has been added with shippable changes

The following files, which will be shipped with the agent, were modified in this PR and
the qa/skip-qa label has been added.

You can ignore this if you are sure the changes in this PR do not require QA. Otherwise, consider removing the label.

List of modified files that will be shipped with the agent
slurm/changelog.d/21822.added
slurm/datadog_checks/slurm/check.py

@gjulianm gjulianm removed the qa/skip-qa Automatically skip this PR for the next QA label Nov 4, 2025
@steveny91
Copy link
Contributor

@gjulianm Would you mind providing an example tag that is being added here? Or even better add it as a comment in the code?

Copy link
Contributor Author

gjulianm commented Nov 4, 2025

Added example tags as a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants