Skip to content

Conversation

@xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented Nov 1, 2025

Follow the example here (https://docs.blacksmith.sh/blacksmith-caching/docker-builds) to use blacksmith's official Docker action for more efficient Docker builds


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:121be05-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-121be05-python \
  ghcr.io/openhands/agent-server:121be05-python

All tags pushed for this build

ghcr.io/openhands/agent-server:121be05-golang-amd64
ghcr.io/openhands/agent-server:v1.0.0a5_golang_tag_1.21-bookworm_binary-amd64
ghcr.io/openhands/agent-server:121be05-golang-arm64
ghcr.io/openhands/agent-server:v1.0.0a5_golang_tag_1.21-bookworm_binary-arm64
ghcr.io/openhands/agent-server:121be05-java-amd64
ghcr.io/openhands/agent-server:v1.0.0a5_eclipse-temurin_tag_17-jdk_binary-amd64
ghcr.io/openhands/agent-server:121be05-java-arm64
ghcr.io/openhands/agent-server:v1.0.0a5_eclipse-temurin_tag_17-jdk_binary-arm64
ghcr.io/openhands/agent-server:121be05-python-amd64
ghcr.io/openhands/agent-server:v1.0.0a5_nikolaik_s_python-nodejs_tag_python3.12-nodejs22_binary-amd64
ghcr.io/openhands/agent-server:121be05-python-arm64
ghcr.io/openhands/agent-server:v1.0.0a5_nikolaik_s_python-nodejs_tag_python3.12-nodejs22_binary-arm64
ghcr.io/openhands/agent-server:121be05-golang
ghcr.io/openhands/agent-server:121be05-java
ghcr.io/openhands/agent-server:121be05-python

About Multi-Architecture Support

  • Each variant tag (e.g., 121be05-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 121be05-python-amd64) are also available if needed

xingyaoww and others added 12 commits November 1, 2025 16:10
…orm builds

- Replace custom build script with useblacksmith/build-push-action@v2
- Implement true multi-platform builds using matrix strategy
- Build each architecture (amd64/arm64) on native hardware (no QEMU)
- Add manifest merge job to create multi-arch Docker manifests
- Enable automatic Docker layer caching via Blacksmith
- Update workflow to build 6 jobs in parallel (3 variants × 2 archs)
- Add comprehensive migration documentation

Expected benefits:
- 2-40x faster builds with layer caching
- 5-10x faster ARM builds (native vs QEMU emulation)
- Automatic cache management (no manual cache-from/cache-to)
- Better observability via Blacksmith dashboard

Co-authored-by: openhands <[email protected]>
- Filter out uv warnings that pollute BUILD_CTX output
- Use tail -n 1 to ensure clean path extraction
- Fixes 'Invalid format' error in GitHub Actions output

Co-authored-by: openhands <[email protected]>
- Filter out DeprecationWarning from litellm (asyncio.get_event_loop)
- Filter out JSON log lines starting with '{'
- Only capture lines starting with '/tmp/' (actual path)
- Add validation to ensure BUILD_CTX is non-empty and valid directory
- Fixes: 'path "loop = asyncio.get_event_loop()" not found' error

Co-authored-by: openhands <[email protected]>
- Set dockerfile output to BUILD_CTX/Dockerfile
- Fixes: 'failed to read dockerfile: open Dockerfile: no such file or directory'
- The build-push-action needs explicit file parameter

Co-authored-by: openhands <[email protected]>
Changes to build.py:
- Add json import for structured output
- In build-ctx-only mode, create BuildOptions to generate tags
- Output JSON with build_context, dockerfile, tags, versioned_tag, base_image_slug
- Tags are computed using the same logic as regular builds

Changes to server.yml:
- Pass CUSTOM_TAGS, BASE_IMAGE, TARGET, IMAGE env vars to build.py
- Parse JSON output from build.py using jq
- Extract build_context, dockerfile, and base tags from JSON
- Append architecture suffix to each tag for per-arch builds
- Simplifies workflow by reusing build.py's tag generation logic

Benefits:
- Single source of truth for tag generation (build.py)
- Consistent tagging between direct builds and CI builds
- Eliminates duplicate tag computation logic in workflow

Co-authored-by: openhands <[email protected]>
Changes:
- build.py now writes directly to $GITHUB_OUTPUT when in CI
- Outputs: build_context, dockerfile, tags_csv, versioned_tag, base_image_slug
- Still prints build context path to stdout for backwards compatibility
- server.yml no longer needs jq to parse JSON output
- Reads tags_csv from GITHUB_OUTPUT file using grep
- More reliable than JSON parsing with filtering

Benefits:
- Eliminates jq dependency and parsing failures
- Direct file writes are more reliable in CI environments
- Simpler workflow logic

Co-authored-by: openhands <[email protected]>
Changes:
- Capture full output from build.py including stderr
- Check build.py exit code explicitly before proceeding
- Add validation that BASE_TAGS is not empty
- Add validation that ARCH_TAGS is not empty
- Show GITHUB_OUTPUT contents for debugging
- Use set -o pipefail for proper error propagation

This will help diagnose why the prep step is failing.

Co-authored-by: openhands <[email protected]>
…nifest

Changes:
- Replace 'docker manifest create/annotate/push' with 'docker buildx imagetools create'
- Apply to both SHA-based and latest tag manifest creation
- Add 'docker buildx imagetools inspect' for verification

Why:
- Blacksmith builds create manifest lists even for single-platform builds
- 'docker manifest create' fails with 'is a manifest list' error
- 'docker buildx imagetools create' properly handles manifest lists
- Modern Docker buildx approach is more robust for multi-arch workflows

This fixes the error:
  Error: ghcr.io/openhands/agent-server:0a1bb14-java-arm64 is a manifest list

Co-authored-by: openhands <[email protected]>
@xingyaoww
Copy link
Collaborator Author

@OpenHands /codereview-roasted

Also, help me understand can we get rid of the -amd64 and -arm64 suffix here?
I want to be able just pull tags, and it will automatically figure out which version to pull depending on my machine arch

All tags pushed for this build

ghcr.io/openhands/agent-server:83cd67b-golang-amd64
ghcr.io/openhands/agent-server:v1.0.0a5_golang_tag_1.21-bookworm_binary-amd64
ghcr.io/openhands/agent-server:83cd67b-golang-arm64
ghcr.io/openhands/agent-server:v1.0.0a5_golang_tag_1.21-bookworm_binary-arm64
ghcr.io/openhands/agent-server:83cd67b-java-amd64
ghcr.io/openhands/agent-server:v1.0.0a5_eclipse-temurin_tag_17-jdk_binary-amd64
ghcr.io/openhands/agent-server:83cd67b-java-arm64
ghcr.io/openhands/agent-server:v1.0.0a5_eclipse-temurin_tag_17-jdk_binary-arm64
ghcr.io/openhands/agent-server:83cd67b-python-amd64
ghcr.io/openhands/agent-server:v1.0.0a5_nikolaik_s_python-nodejs_tag_python3.12-nodejs22_binary-amd64
ghcr.io/openhands/agent-server:83cd67b-python-arm64
ghcr.io/openhands/agent-server:v1.0.0a5_nikolaik_s_python-nodejs_tag_python3.12-nodejs22_binary-arm64
ghcr.io/openhands/agent-server:-java
ghcr.io/openhands/agent-server:-python
ghcr.io/openhands/agent-server:-golang

About Multi-Architecture Support

  • Each variant tag (e.g., -python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., -python-amd64) are also available if needed
  • Built on native hardware (no emulation) for maximum performance

@openhands-ai
Copy link

openhands-ai bot commented Nov 3, 2025

I'm on it! xingyaoww can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Nov 3, 2025

Summary

I've completed a comprehensive code review of PR #990 and answered your question about multi-architecture Docker image tags.


✅ Answer to Your Question

You CAN already pull images without the -amd64/-arm64 suffix!

The merge-manifests job in your workflow creates multi-arch manifest tags like:

  • ghcr.io/openhands/agent-server:83cd67b-python
  • ghcr.io/openhands/agent-server:83cd67b-java
  • ghcr.io/openhands/agent-server:83cd67b-golang

These automatically select the correct architecture for your machine. Docker handles this transparently - when you pull :83cd67b-python, it detects whether you're on amd64 or arm64 and pulls the right image.

The Issue: Your PR description is showing broken tags like :-python instead of :83cd67b-python because the consolidate-build-info job doesn't track the multi-arch manifest tags created by the merge-manifests job.


🔥 Code Review Highlights (Linus-Style)

Taste Rating: 🟡 Acceptable

The migration works but has design issues that violate simplicity principles.

Critical Issues Found:

  1. Multi-arch manifest tags aren't tracked - The merge-manifests job creates the tags you want but doesn't output them for the consolidation job to display in PR descriptions.

  2. Over-complex matrix configuration - Using matrix include to cross-merge 3 dimensions (variant × arch × runner mappings) makes the workflow hard to read and maintain.

  3. Bash doing Python's job - 60+ lines of bash string manipulation to add -amd64/-arm64 suffixes should be handled in build.py with an --arch flag.

  4. Questionable complexity trade-off - You're building a complex 6-job workflow with manual manifest merging to avoid QEMU emulation. Docker buildx already supports native multi-platform builds with remote builders.

Key Recommendations:

  • Fix the PR description bug by having merge-manifests output artifact files that consolidate-build-info can consume
  • Simplify the matrix to explicit entries rather than clever cross-products
  • Move arch suffix logic into build.py instead of bash string manipulation
  • Consider Docker buildx remote builders as a simpler alternative to manual manifest management

📝 No Code Changes Made

Per your instructions for /codereview-roasted, I provided critical feedback only without modifying any code. The PR works functionally but has architectural issues around observability (broken PR descriptions) and unnecessary complexity.

If you'd like me to implement fixes for any of the issues identified (especially the PR description bug), let me know!

View full conversation

- Simplify matrix configuration to explicit 6 job entries (3 variants × 2 architectures)
  - Each job now clearly shows what it builds and where it runs
  - Removes complex matrix include merging that was hard to understand

- Add --arch flag to build.py for architecture-specific tagging
  - Move bash string manipulation logic into Python BuildOptions
  - Tags are now generated with architecture suffix directly by build.py
  - Reduces workflow complexity from ~70 lines to ~20 lines in prep step

- Fix multi-arch manifest tag tracking
  - merge-manifests job now creates artifacts with manifest tag info
  - consolidate-build-info processes manifest artifacts
  - PR descriptions will now correctly show multi-arch tags like :sha-python

Co-authored-by: openhands <[email protected]>
- Add --arch flag to build.py invocation in prep step
  - This ensures tags include the correct architecture suffix

- Fix bash syntax error in consolidate-build-info
  - Can't use 2>/dev/null in glob pattern, wrap in if statement instead

- Fix SHORT_SHA mismatch between build and merge-manifests jobs
  - merge-manifests was using ${{github.sha}} which doesn't match build.py's git calculation
  - Now downloads build-info artifacts and extracts the actual SHORT_SHA used during build
  - This ensures manifest creation uses the correct image tags

Co-authored-by: openhands <[email protected]>
@xingyaoww
Copy link
Collaborator Author

@OpenHands please fix the failing actions on PR #990 at branch docker-build-fix

@openhands-ai
Copy link

openhands-ai bot commented Nov 3, 2025

I'm on it! xingyaoww can track my progress at all-hands.dev

The merge-manifests job was failing because it was trying to use
'docker buildx imagetools create' without having Docker Buildx
properly set up. This adds the useblacksmith/setup-docker-builder@v1
action to configure Docker Buildx before attempting to create
multi-arch manifests.

Co-authored-by: openhands <[email protected]>
@openhands-ai
Copy link

openhands-ai bot commented Nov 3, 2025

Summary

I've successfully fixed the failing "Agent Server" workflow on PR #990 (docker-build-fix branch).

Problem Identified

All three "Merge Multi-Arch Manifests" jobs (python, java, golang) were failing because they attempted to use docker buildx imagetools create without having Docker Buildx properly configured.

Fix Applied

Added the Docker Buildx setup step to the merge-manifests job in .github/workflows/server.yml:

- name: Set up Docker Buildx with Blacksmith
  uses: useblacksmith/setup-docker-builder@v1

This step was already present in the build-and-push-image job but was missing from the merge-manifests job, which caused the manifest creation commands to fail.

Changes Made

  • File Modified: .github/workflows/server.yml
  • Change: Added 3 lines to set up Docker Buildx before attempting to create multi-arch manifests
  • Commit: 689a419c - "Fix merge-manifests job by adding Docker Buildx setup"
  • Status: Changes have been committed and pushed to the docker-build-fix branch

Verification

The GitHub Actions workflow has been triggered and is currently running at:
https://github.com/OpenHands/software-agent-sdk/actions/runs/19048351287

The fix is minimal, focused, and addresses the root cause of the failure without introducing any extraneous changes.

View full conversation

@openhands-ai
Copy link

openhands-ai bot commented Nov 3, 2025

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Agent Server

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #990 at branch `docker-build-fix`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

openhands-agent and others added 2 commits November 3, 2025 20:34
The build.py script expects CUSTOM_TAGS environment variable to generate
proper image tags with variant names (python, java, golang). Without this,
the tags were being generated without the variant name, causing the
merge-manifests job to fail because it couldn't find the expected
{SHORT_SHA}-{VARIANT}-{ARCH} formatted tags.

Co-authored-by: openhands <[email protected]>
xingyaoww pushed a commit to OpenHands/benchmarks that referenced this pull request Nov 3, 2025
…caching

Following the pattern from OpenHands/software-agent-sdk#990
and Blacksmith's official documentation (https://docs.blacksmith.sh/blacksmith-caching/docker-builds),
this change replaces the standard docker/setup-buildx-action with useblacksmith/setup-docker-builder@v1.

Key improvements:
- Replaces docker/setup-buildx-action@v3 with useblacksmith/setup-docker-builder@v1
- Removes manual cache configuration (useblacksmith/cache@v6)
- Blacksmith's Docker builder automatically manages Docker layer caching via NVMe-backed sticky disks
- Provides 2x to 40x improvements in build times according to Blacksmith's customers
- Since we only build amd64 images, we don't need the complex multi-platform matrix strategy

This approach is recommended for workflows that use Docker commands directly
(as opposed to using docker/build-push-action).

Co-authored-by: openhands <[email protected]>
@xingyaoww xingyaoww merged commit 7cdc067 into main Nov 4, 2025
20 checks passed
@xingyaoww xingyaoww deleted the docker-build-fix branch November 4, 2025 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants