Skip to content

Conversation

nadig-google
Copy link
Contributor

This is the Initial checkin of the G4 Blueprint. It has been regressed for up to 2 nodes. The G4 VM-type was introduced in June 2025 and is a family of VMs with NVIDIA RTX PRO 6000 Blackwell.

The G4 VMs can power a variety of workloads, from cost-efficient inference, to advanced physical AI, robotics simulations, generative AI-enabled content creation, and next-generation game rendering.

@nadig-google nadig-google requested a review from mr0re1 August 31, 2025 10:25
@nadig-google nadig-google requested review from samskillman and a team as code owners August 31, 2025 10:25
@nadig-google nadig-google added the enhancement New feature or request label Aug 31, 2025
@nadig-google nadig-google requested a review from abbas1902 August 31, 2025 10:26
@nadig-google nadig-google assigned abbas1902 and unassigned mr0re1 Aug 31, 2025
@@ -0,0 +1,81 @@
# Copyright 2024 Google LLC
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copyright 2024 Google LLC
# Copyright 2025 Google LLC

new_image:
family: slurm-gcp-6-11-ubuntu-2204-lts-nvidia-570
project: schedmd-slurm-public
disk_size_gb: 200
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
disk_size_gb: 200
disk_size_gb: 100

enable_placement: false
node_count_static: 1
bandwidth_tier: gvnic_enabled
machine_type: g4-standard-48
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should we expose the fact that g4-standard-384 has 2 NICs? I'd consider having comments here then a commented out section below for the second NIC.

enable_controller_public_ips: true
instance_image: $(vars.new_image)
instance_image_custom: true

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

- g4_partition
- slurm_login
settings:
machine_type: e2-standard-2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
machine_type: e2-standard-2

settings:
machine_type: e2-standard-2
endpoint_versions:
enable_controller_public_ips: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
enable_controller_public_ips: true

deployment_name: ## Set Deployment Name Here ##
region: ## Set GCP Region Here ##
zone: ## Set GCP Zone ID Here ##
new_image:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
new_image:
instance_image:

machine_type: e2-standard-2
endpoint_versions:
enable_controller_public_ips: true
instance_image: $(vars.new_image)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
instance_image: $(vars.new_image)

source: community/modules/scheduler/schedmd-slurm-gcp-v6-login
use: [network]
settings:
machine_type: e2-standard-2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove and default to c2-standard-4

Suggested change
machine_type: e2-standard-2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants