AMD PACE - AMD Platform Aware Compute Engine

To meet the demands of rapidly advancing research, we’re introducing AMD PACE — an inference-serving solution for high-performance LLMs on AMD platforms. AMD PACE makes it fast and easy to integrate research ideas and accelerate real-time deployment.

Engineered for AMD CPUs with AVX512 support, AMD PACE is a PyTorch extension that provides a framework for developing and testing novel kernel implementations and graph-level optimizations.

NOTE: AMD PACE is designed and tested for systems with AVX512 or higher support. On systems lacking AVX512, performance may degrade significantly due to fallback to slower reference implementations, or the library might not function as intended.

About

CPU-Optimized Inference: Engineered for CPU LLM workloads, AMD PACE delivers measurable performance gains over existing inference serving solutions such as vLLM using CPU friendly cache and kernel optimizations.
Speculative Decoding: AMD PACE features a built-in implementation of PARallel Draft Model Adaptation (PARD), a speculative decoding technique that can deliver up to 5× throughput improvement versus a standard autoregressive baseline. More in SpeculativeDecoding.md. You can easily enable PARD by providing a PardSpecDecodeConfig when initializing the LLMModel, as shown in the example .
Data Parallelism: Integrated data-parallel support significantly accelerates both speculative decoding and standard inference, enabling end-to-end speedups of up to 25×. AMD PACE achieves this by serving multiple requests concurrently across multiple model instances. The examples Multi-Instance and Speculative Decoding Multi-Instance demonstrates this architecture, where a pool of worker processes handles incoming requests in parallel, maximizing hardware utilization and overall throughput.
Ongoing Improvements: AMD PACE is designed to evolve with research needs and emerging production workloads. Its core mission is to serve as a research vehicle, providing a flexible and extensible framework for exploring forward-looking hardware optimizations and integrating new ideas from the fast-moving field of AI.

Installation

To install AMD PACE, follow the instructions below:

NOTE: AMD PACE will need gcc>=12, make and ccache (for ZenDNN build) installed.

On ubuntu, they can be installed with sudo apt install build-essential gcc-12 g++-12 ccache

We recommend to use miniforge environment for installing AMD PACE. Install miniforge from here. Once miniforge is installed, create a environment with python 3.12 as follows:
```
conda create -n pace-env-py3.12 python=3.12 -y
conda activate pace-env-py3.12
```
NOTE: AMD PACE is tested to work with Python 3.9 and above. Python 3.12 is recommended for the best compatibility with dependencies.
Install the required dependencies for AMD PACE as follows:
```
pip install -r requirements.txt
```
Build AMD PACE from source as follows:
```
pip install -r build_requirements.txt [-v] .
```
This will build AMD PACE and install it in the current environment. The -v option is optional and can be used to enable verbose output during the build process.

NOTE: It uses the new way of building packages with pip, for more details refer to PEP 517. The build_requirements.txt should be passed in during installation to ensure that the build environment is set up correctly, please refer to PEP 518 for more details.

For developers who need to build AMD PACE frequently, using pip with --no-build-isolation is recommended to avoid unnecessary overhead of creating isolated environments for each build. This speeds up the build process significantly. Make sure to have all the required dependencies installed in your environment before using this option.
```
pip install --no-build-isolation [-v] .
```
NOTE: Building AMD PACE, especially the oneDNN component, can require significant memory. If your system does not have enough RAM, the build process may fail or your machine may run out of memory.

Models Supported

The following models are supported by AMD PACE:

Benchmarks

Benchmarks for AMD PACE are available in the benchmarks directory. The benchmarks include:

Verbose

To enable verbose mode, set the environment variable PACE_LOG_LEVEL. The following levels are supported:

Level	Environment Variable
Debug	`export PACE_LOG_LEVEL=debug`
Profile	`export PACE_LOG_LEVEL=profile`
Info	`export PACE_LOG_LEVEL=info`
Warning	`export PACE_LOG_LEVEL=warning`
Error	`export PACE_LOG_LEVEL=error`

NOTE: By default, the log level is set to info.

External Dependencies

AMD PACE depends on the following libraries:

Library	Version
PyTorch	v2.7.0
OneDNN	v3.8
FBGEMM	v1.2.0

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
benchmarks/llm		benchmarks/llm
cmake		cmake
csrc		csrc
docs		docs
examples		examples
pace		pace
tests		tests
.flake8		.flake8
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE.txt		LICENSE.txt
README.md		README.md
build_requirements.txt		build_requirements.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AMD PACE - AMD Platform Aware Compute Engine

About

Contents

Installation

Models Supported

Benchmarks

Verbose

External Dependencies

About

Uh oh!

Releases 1

Uh oh!

Contributors 2

Uh oh!

Languages

License

amd/AMD-PACE

Folders and files

Latest commit

History

Repository files navigation

AMD PACE - AMD Platform Aware Compute Engine

About

Contents

Installation

Models Supported

Benchmarks

Verbose

External Dependencies

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Contributors 2

Uh oh!

Languages