Skip to content

Releases: meta-pytorch/monarch

0.1.0

22 Oct 05:02

Choose a tag to compare

🦋 Monarch v0.1.0 — Initial Release
We’re excited to announce the first public release of Monarch, a distributed programming framework for PyTorchbuilt around scalable actor messaging and direct memory access.
Monarch brings together ideas from actor-based concurrency, fault-tolerant supervision, and high-performance tensor communication to make distributed training simpler, more explicit, and faster.

🚀 Highlights

  1. Actor-Based Programming for PyTorch
    Define Python classes that run remotely as actors, send them messages, and coordinate distributed work using a clean, imperative API.
from monarch.actor import Actor, endpoint, this_host

training_procs = this_host().spawn_procs({"gpus": 8})

class Trainer(Actor):
    @endpoint
    def train(self, step: int): ...

trainers = training_procs.spawn("trainers", Trainer)
trainers.train.call(step=0).get()
  1. Scalable Messaging and Meshes
    Actors are organized into meshes — collections that support broadcast, gather, and other scalable communication primitives.
  2. Supervision and Fault Tolerance
    Monarch adopts supervision trees for error handling and recovery. Failures propagate predictably, allowing fine-grained restart and robust distributed workflows.
  3. High-Performance RDMA Transfers
    Full RDMA integration for CPU and GPU memory via libibverbs, providing zero-copy, one-sided tensor communication across processes and hosts.
  4. Distributed Tensors
    Native support for tensors sharded across processes — enabling distributed compute without custom data movement code.

⚠️ Early Development Notice
Monarch is experimental and under active development.
Expect incomplete APIs, rapid iteration, and evolving interfaces.
We welcome contributions — please discuss significant changes or ideas via issues before submitting PRs.

v0.0.0

03 Sep 17:15

Choose a tag to compare

v0.0.0 Pre-release
Pre-release