Skip to content

Commit 4ed0235

Browse files
committed
Add Slurm default_auth_key
1 parent 72e7d8d commit 4ed0235

File tree

7 files changed

+24
-4
lines changed

7 files changed

+24
-4
lines changed

community/modules/scheduler/schedmd-slurm-gcp-v6-controller/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -323,6 +323,7 @@ limitations under the License.
323323
| <a name="input_controller_startup_scripts_timeout"></a> [controller\_startup\_scripts\_timeout](#input\_controller\_startup\_scripts\_timeout) | The timeout (seconds) applied to each script in controller\_startup\_scripts. If<br/>any script exceeds this timeout, then the instance setup process is considered<br/>failed and handled accordingly.<br/><br/>NOTE: When set to 0, the timeout is considered infinite and thus disabled. | `number` | `300` | no |
324324
| <a name="input_controller_state_disk"></a> [controller\_state\_disk](#input\_controller\_state\_disk) | A disk that will be attached to the controller instance template to save state of slurm. The disk is created and used by default.<br/> To disable this feature, set this variable to null.<br/><br/> NOTE: This will not save the contents at /opt/apps and /home. To preserve those, they must be saved externally. | <pre>object({<br/> type = string<br/> size = number<br/> })</pre> | <pre>{<br/> "size": 50,<br/> "type": "pd-ssd"<br/>}</pre> | no |
325325
| <a name="input_create_bucket"></a> [create\_bucket](#input\_create\_bucket) | Create GCS bucket instead of using an existing one. | `bool` | `true` | no |
326+
| <a name="input_default_auth_key"></a> [default\_auth\_key](#input\_default\_auth\_key) | Default auth key value ex. slurm.key | `string` | `""` | no |
326327
| <a name="input_deployment_name"></a> [deployment\_name](#input\_deployment\_name) | Name of the deployment. | `string` | n/a | yes |
327328
| <a name="input_disable_controller_public_ips"></a> [disable\_controller\_public\_ips](#input\_disable\_controller\_public\_ips) | DEPRECATED: Use `enable_controller_public_ips` instead. | `bool` | `null` | no |
328329
| <a name="input_disable_default_mounts"></a> [disable\_default\_mounts](#input\_disable\_default\_mounts) | DEPRECATED: Use `enable_default_mounts` instead. | `bool` | `null` | no |

community/modules/scheduler/schedmd-slurm-gcp-v6-controller/modules/slurm_files/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@ No modules.
7373
| <a name="input_controller_startup_scripts"></a> [controller\_startup\_scripts](#input\_controller\_startup\_scripts) | List of scripts to be ran on controller VM startup. | <pre>list(object({<br/> filename = string<br/> content = string<br/> }))</pre> | `[]` | no |
7474
| <a name="input_controller_startup_scripts_timeout"></a> [controller\_startup\_scripts\_timeout](#input\_controller\_startup\_scripts\_timeout) | The timeout (seconds) applied to each script in controller\_startup\_scripts. If<br/>any script exceeds this timeout, then the instance setup process is considered<br/>failed and handled accordingly.<br/><br/>NOTE: When set to 0, the timeout is considered infinite and thus disabled. | `number` | `300` | no |
7575
| <a name="input_controller_state_disk"></a> [controller\_state\_disk](#input\_controller\_state\_disk) | A disk that will be attached to the controller instance template to save state of slurm. The disk is created and used by default.<br/> To disable this feature, set this variable to null.<br/><br/> NOTE: This will not save the contents at /opt/apps and /home. To preserve those, they must be saved externally. | <pre>object({<br/> device_name = string<br/> })</pre> | <pre>{<br/> "device_name": null<br/>}</pre> | no |
76+
| <a name="input_default_auth_key"></a> [default\_auth\_key](#input\_default\_auth\_key) | Default auth key value ex. slurm.key | `string` | `""` | no |
7677
| <a name="input_disable_default_mounts"></a> [disable\_default\_mounts](#input\_disable\_default\_mounts) | Disable default global network storage from the controller<br/>- /home<br/>- /apps | `bool` | `false` | no |
7778
| <a name="input_enable_bigquery_load"></a> [enable\_bigquery\_load](#input\_enable\_bigquery\_load) | Enables loading of cluster job usage into big query.<br/><br/>NOTE: Requires Google Bigquery API. | `bool` | `false` | no |
7879
| <a name="input_enable_chs_gpu_health_check_epilog"></a> [enable\_chs\_gpu\_health\_check\_epilog](#input\_enable\_chs\_gpu\_health\_check\_epilog) | Enable a Cluster Health Sacnner(CHS) GPU health check that slurmd executes as an epilog script after completing a job step from a new job allocation.<br/>Compute nodes that fail GPU health check during epilog will be marked as drained. Find more details at:<br/>https://github.com/GoogleCloudPlatform/cluster-toolkit/tree/main/docs/CHS-Slurm.md | `bool` | `false` | no |

community/modules/scheduler/schedmd-slurm-gcp-v6-controller/modules/slurm_files/main.tf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ locals {
5959
# timeouts
6060
controller_startup_scripts_timeout = var.controller_startup_scripts_timeout
6161
compute_startup_scripts_timeout = var.compute_startup_scripts_timeout
62+
default_auth_key = var.default_auth_key
6263

6364
munge_mount = local.munge_mount
6465
slurm_key_mount = var.slurm_key_mount

community/modules/scheduler/schedmd-slurm-gcp-v6-controller/modules/slurm_files/scripts/setup.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
import argparse
1919
import logging
2020
import os
21+
import secrets
2122
import shutil
2223
import subprocess
2324
import stat
@@ -216,8 +217,11 @@ def setup_jwt_key():
216217
util.chown_slurm(jwt_key, mode=0o400)
217218

218219

219-
def _generate_key(p: Path) -> None:
220-
run(f"dd if=/dev/random of={p} bs=1024 count=1")
220+
def _generate_key(p: Path, default_value: str = "") -> None:
221+
if default_value != "":
222+
p.write_text(default_value)
223+
else:
224+
p.write_bytes(secrets.token_bytes(1024))
221225

222226

223227
def setup_key(lkp: util.Lookup) -> None:
@@ -234,7 +238,7 @@ def setup_key(lkp: util.Lookup) -> None:
234238
# Copy key from persistent state disk
235239
persist = slurmdirs.state / file_name
236240
if not persist.exists():
237-
_generate_key(persist)
241+
_generate_key(persist, lkp.cfg.default_auth_key)
238242

239243
shutil.copyfile(persist, dst)
240244
if lkp.cfg.enable_slurm_auth:
@@ -247,7 +251,7 @@ def setup_key(lkp: util.Lookup) -> None:
247251
if dst.exists():
248252
log.info("key already exists. Skipping key generation.")
249253
else:
250-
_generate_key(dst)
254+
_generate_key(dst, lkp.cfg.default_auth_key)
251255
if lkp.cfg.enable_slurm_auth:
252256
util.chown_slurm(dst, mode=0o400)
253257
else:

community/modules/scheduler/schedmd-slurm-gcp-v6-controller/modules/slurm_files/variables.tf

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -501,3 +501,9 @@ variable "controller_network_attachment" {
501501
type = string
502502
default = null
503503
}
504+
505+
variable "default_auth_key" {
506+
description = "Default auth key value ex. slurm.key"
507+
type = string
508+
default = ""
509+
}

community/modules/scheduler/schedmd-slurm-gcp-v6-controller/slurm_files.tf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,7 @@ module "slurm_files" {
156156
extra_logging_flags = var.extra_logging_flags
157157

158158
enable_slurm_auth = var.enable_slurm_auth
159+
default_auth_key = var.default_auth_key
159160

160161
enable_bigquery_load = var.enable_bigquery_load
161162
enable_external_prolog_epilog = var.enable_external_prolog_epilog

community/modules/scheduler/schedmd-slurm-gcp-v6-controller/variables.tf

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -453,6 +453,12 @@ EOD
453453
default = false
454454
}
455455

456+
variable "default_auth_key" {
457+
description = "Default auth key value ex. slurm.key"
458+
type = string
459+
default = ""
460+
}
461+
456462
variable "cloud_parameters" {
457463
description = "cloud.conf options. Defaults inherited from [Slurm GCP repo](https://github.com/GoogleCloudPlatform/slurm-gcp/blob/master/terraform/slurm_cluster/modules/slurm_files/README_TF.md#input_cloud_parameters)"
458464
type = object({

0 commit comments

Comments
 (0)