The development of world models in robotics has long been a cornerstone of advanced research, with most approaches relying heavily on vast, platform-specific datasets. These datasets, while valuable, often limit scalability and generalization to different robotic platforms, restricting their broader applicability.
In contrast, CYBER approaches world modeling from a "first principles" perspective, drawing inspiration from how humans naturally acquire skills through experience and interaction with their environment. CYBER is the first general Robotic Operational System designed to adapt to both teleoperated manipulation and human operation data, enabling robots to learn and predict across a wide range of tasks and environments. It builds with a Physical World Model, a cross-embodied Visual-Language Action Model (VLA), a Perception Model, a Memory Model, and a Control Model to help robots learn, predict, and memory across various tasks and embodiments.
At the same time, CYBER also provide millions of human operation datasets and baseline models over HuggingFace ๐ค to enhance embodied learning, and experimental evalaution tool box to help researchers to test and evaluate their models in both simulation and real world.
- ๐ ๏ธ Modular: Built with a modular architecture, allowing flexibility in various environments.
- ๐ Data-Driven: Leverages millions of human operation datasets to enhance embodied learning.
- ๐ Scalable: Scales across different robotic platforms, adapting to new environments and tasks.
- ๐ง Customizable: Allows for customization and fine-tuning to meet specific requirements.
- ๐ Extensible: Supports the addition of new modules and functionalities, enhancing capabilities.
- ๐ฆ Open Source: Open-source and freely available, fostering collaboration and innovation.
- ๐ฌ Experimental: Supports experimentation and testing, enabling continuous improvement.
CYBER is built with a modular architecture, allowing for flexibility and customization. Here are the key components:
- ๐ World Model: Learns from physical interactions to understand and predict the environment.
- ๐ฌ Action Model: Learns from actions and interactions to perform tasks and navigate.
- ๐๏ธ Perception Model: Processes sensory inputs to perceive and interpret surroundings.
- ๐ง Memory Model: Utilizes past experiences to inform current decisions.
- ๐ฎ Control Model: Manages control inputs for movement and interaction.
๐ World Model is now available. Additional models will be released soon.
You will need Anaconda installed on your machine. If you don't have it installed, you can follow the installation instructions here.
You can run the following commands to install CYBER:
bash scripts/build.sh
Alternatively, you can install it manually by following the steps below:
-
Create a clean conda environment:
conda create -n cyber python=3.10 && conda activate cyber
-
Install PyTorch and torchvision:
conda install pytorch==2.3.0 torchvision==0.18.0 cudatoolkit=11.1 -c pytorch -c nvidia
-
Install the CYBER package:
pip install -e .
CYBER leverages the power of Hugging Face for model sharing and collaboration. You can easily access and use our models through the Hugging Face platform.
Currently, four tasks are available for download:
- ๐ค Pipette: Bimanual human demonstration dataset of precision pipetting tasks for laboratory manipulation.
- ๐ค Take Item: Single-arm manipulation demonstrations of object pick-and-place tasks.
- ๐ค Twist Tube: Bimanual demonstration dataset of coordinated tube manipulation sequences.
- ๐ค Fold Towels: Bimanual manipulation demonstrations of deformable object folding procedures.
Our pretrained models will be released on Hugging Face soon:
-
Cyber-World-Large (Coming Soon)
-
Cyber-World-Small (Coming Soon)
For more details, please refer to the Hugging Face documentation.
Please refer to the experiments for more details on data downloading and model training.
โโโ ...
โโโ docs # documentation files and figures
โโโ docker # docker files for containerization
โโโ examples # example code snippets
โโโ tests # test cases and scripts
โโโ scripts # scripts for setup and utilities
โโโ experiments # model implementation and details
โ โโโ configs # model configurations
โ โโโ models # model training and evaluation scripts
โ โโโ notebooks # sample notebooks
โ โโโ ...
โโโ cyber # compression, model training, and dataset source code
โ โโโ dataset # dataset processing and loading
โ โโโ utils # utility functions
โ โโโ models # model definitions and architectures
โ โโโ action # visual language action model
โ โโโ control # robot platform control model
โ โโโ memory # lifelong memory model
โ โโโ perception # perception and scene understanding model
โ โโโ world # physical world model
โ โโโ ...
โโโ ...
Magvit2 and GENIE adapted from 1xGPT Challenge 1X Technologies. (2024). 1X World Model Challenge (Version 1.1) [Data set]
@inproceedings{wang2024hpt,
author = {Lirui Wang, Xinlei Chen, Jialiang Zhao, Kaiming He},
title = {Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers},
booktitle = {Neurips},
year = {2024}
}
@article{luo2024open,
title={Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation},
author={Luo, Zhuoyan and Shi, Fengyuan and Ge, Yixiao and Yang, Yujiu and Wang, Limin and Shan, Ying},
journal={arXiv preprint arXiv:2409.04410},
year={2024}
}
property | value | ||||
---|---|---|---|---|---|
name | CyberOrigin Dataset |
||||
url | https://github.com/CyberOrigin2077/Cyber |
||||
description | Cyber represents a model implementation that seamlessly integrates state-of-the-art (SOTA) world models with the proposed CyberOrigin Dataset, pushing the boundaries of artificial intelligence and machine learning. |
||||
provider |
|
||||
license |
|
If you have technical questions, please open a GitHub issue. For business development or other collaboration inquiries, feel free to contact us through email ๐ง ([email protected]). Enjoy! ๐