Skip to content

Official AWS Batch step operator #3919

@SebastianScherer88

Description

@SebastianScherer88

Contact Details [Optional]

[email protected]

Feature Description

It would be great to get an official version of this interesting AWS Batch step operator into the main zenml library.

Problem or Use Case

I think it's quite common for people to have

  • AWS infra and
  • heterogenuous compute requirements in their pipeline steps - not everything needs to run on sagemaker

running a local (docker) orchestrator that can push individual components to powerful remote execution engines like AWS Batch sounds super useful to me - its definitely something i would be interested in (I work in ML as an ops engineer).

Happy to contribute based on the linked reference plugin implementation provided by you guys

Proposed Solution

A hardened, more configurable version of the linked plugin implementation that

  • allows for step resource configuration that get mapped canonically (where possible) onto AWS Batch resource specs
  • default to AWS infra settings that are compatible with current terraform setup utils (where possible)
  • integrates with every orchestrator that honours the canonical steplauncher appoach (i.e. not the LocalDockerOrchestrator)

Alternatives Considered

The official Sagemaker step operator. AWS Batch would be a cheaper (no ml.... instance sagemaker type $ markup) and more flexible way of launching scalable custom compute jobs

Additional Context

Implementation draft (unofficial AWS Batch step operator plugin)

Priority

Low - Nice to have

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Labels

contributionLabels for externally contributed implementationscore-teamIssues that are being handled by the core team

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions