-
Notifications
You must be signed in to change notification settings - Fork 13
Resolve failing to submit to hortense GPU partition #250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ugh... our test are being cancelled because of a brownout https://github.blog/changelog/2025-01-15-github-actions-ubuntu-20-runner-image-brownout-dates-and-other-breaking-changes/ It is a good point though, I guess we should update the version of ubuntu used in these tests. But from a comment in our workflows:
Not entirely sure why, but I guess that means we'll loose python 3.6 support. We should discuss if that's acceptable to us, or if we need to do something else to keep that support. It does not really explain why Python 3.6 is not supported with newer ubuntu - we could give that another try and see what fails... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this fix as it resolves your issue. One remark is that you might consider to still purge the environment, but then reset export SLURM_CONF=/etc/slurm/slurm.conf_dodrio
using https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.env_vars
Anyway, you could do that in a follow-up PR if you prefer that approach. At least this PR fixes your immediate issue.
Edit: I can't merge this yet anyway because of the failing CI. So you could also still change it in this PR if you want to :)
If we want to keep running tests with Python 3.6, the approach we're taking in EasyBuild may be helpful: easybuilders/easybuild-framework#4783 |
That sets the environment variable on the partition. The environment variable does not need to be set on the partition. It only needs to be set on the system from where ReFrame is launching the jobs. And I do not think you can set in the config. That environment variable is also set and unset by a sticky module. so once environment is purged it will always unset the environment variable. So I do not think that it will work if I might set in the ci_config for instance. |
I think you can: https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.env_vars Though the docs are not entirely clear on whether this indeed modifies the environment where the reframe runtime is being run, rather than the test job. But considering the other options at the But, again, up to you :) I'm also happy to go for the 'don't purge' solution. Either way works. The advantage of being able to purge is that you have a more controlled environment: only what is set in the ReFrame config file will be set/loaded. Let me know what you prefer :) I'll at least retrigger the CI, I think the brownout is over... |
Tested
|
And you set this at the system level, not at the partition level in your config? Strange... That's not what I would expect. Also, I don't understand what the difference then is between setting this at the system and partition level... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm!
@casparvl can you hit merge? EESSI GPU hostinjections is now also set up on hortense and I'm testing as we speak and everything looks good |
No description provided.