Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
c9b86c1
Add GitHub workflow for building SWE-Bench images with Blacksmith cac…
openhands-agent Oct 27, 2025
5752043
Use Blacksmith's setup-docker-builder action for faster Docker layer …
openhands-agent Nov 3, 2025
282f863
Merge commit 'bb150852c64a555806cfa939f31e8f9abd7b3791' into openhand…
xingyaoww Nov 4, 2025
8508006
revert unneed stuff
xingyaoww Nov 4, 2025
a565e77
simplify setup dependency
xingyaoww Nov 4, 2025
9bbd7fb
set eval-agent-server
xingyaoww Nov 4, 2025
c661b2c
fix line break
xingyaoww Nov 4, 2025
632432e
default to 10 for testing
xingyaoww Nov 4, 2025
c536903
run on all prs for debugging
xingyaoww Nov 4, 2025
efb731f
Fix pyarrow build issue by forcing binary wheel installation
openhands-agent Nov 4, 2025
29084f2
Pin Python version to 3.12 to fix pyarrow compatibility
openhands-agent Nov 4, 2025
551405b
Fix artifact upload naming to avoid invalid characters
openhands-agent Nov 4, 2025
90b6ed6
Fix artifact upload by archiving logs to avoid invalid filename chara…
openhands-agent Nov 4, 2025
3ba1e46
Fix Docker cache tag length exceeding 128 character limit
openhands-agent Nov 4, 2025
21bb226
Update patch with pre-commit formatting fixes
openhands-agent Nov 4, 2025
2f89775
checkout to v1.0.0 of sdk
xingyaoww Nov 6, 2025
dfb966b
update uv.lock
xingyaoww Nov 6, 2025
d04de8a
Merge commit 'dfb966bd2d3e4d2086223cf4ff85d998d15354d4' into openhand…
xingyaoww Nov 6, 2025
cdd7200
Revert "Fix Docker cache tag length exceeding 128 character limit"
xingyaoww Nov 6, 2025
001bcee
Fix log file mixing issue by using ProcessPoolExecutor
openhands-agent Nov 6, 2025
271b527
Improve Docker image tagging for reproducibility
openhands-agent Nov 6, 2025
92f04c1
refactor: omit target suffix for binary builds (default case)
openhands-agent Nov 6, 2025
49d9667
fix: update SDK to use SDK_VERSION for commit tags
openhands-agent Nov 6, 2025
c2711a3
refactor: remove SDK_VERSION_OVERRIDE logic
openhands-agent Nov 6, 2025
6d6845e
chore: update SDK to commit 85e436df
openhands-agent Nov 6, 2025
8d8ed8c
update agent-sdk version
xingyaoww Nov 7, 2025
8763fad
improve custom tags for swebench image
xingyaoww Nov 7, 2025
99927f8
Revert "update agent-sdk version"
xingyaoww Nov 7, 2025
8ed14f3
Merge commit '2ca8a917036ddb6ac069b3ecbb0f14ec616a4883' into openhand…
xingyaoww Nov 7, 2025
7e3c50e
update sha
xingyaoww Nov 7, 2025
c118297
fix: update run_infer.py to use new SDK tag format
openhands-agent Nov 7, 2025
4f3f9b1
refactor: deduplicate extract_custom_tag by importing from run_infer
openhands-agent Nov 7, 2025
26c3f02
docs: clarify SHORT_SHA source in run_infer.py
openhands-agent Nov 7, 2025
89e4cda
update sdk
xingyaoww Nov 7, 2025
eacfe0b
refactor
xingyaoww Nov 7, 2025
3a2c009
remove tagging changes
xingyaoww Nov 7, 2025
84c8876
bump commit
xingyaoww Nov 7, 2025
de46db7
simplify build script
xingyaoww Nov 7, 2025
bcbd455
bump version
xingyaoww Nov 7, 2025
96f2da6
bump
xingyaoww Nov 7, 2025
aad870b
bump
xingyaoww Nov 7, 2025
acee9cb
refactor build util into shared file
xingyaoww Nov 7, 2025
a4bf9e4
simplify build on the fly logic
xingyaoww Nov 7, 2025
9ef0d48
remove targets and platform
xingyaoww Nov 7, 2025
06e994a
Add automatic comment to issue #81 on successful build
openhands-agent Nov 7, 2025
fba2a55
Fix SDK URL and add workflow trigger information
openhands-agent Nov 7, 2025
0ab219f
Update .gitignore to properly allow .openhands/microagents/
openhands-agent Nov 7, 2025
aa8b452
Add error handling to skip comment when no images are built
openhands-agent Nov 7, 2025
a95969e
Fix manifest file path detection using find command
openhands-agent Nov 7, 2025
46b5266
bump sdk
xingyaoww Nov 7, 2025
16526b3
increase n work and n limit
xingyaoww Nov 7, 2025
90ee94e
Show only one tag per image in issue comment
openhands-agent Nov 7, 2025
2d10954
bump sdk commit
xingyaoww Nov 8, 2025
178123e
increase to 500 limit and 32 concurrency
xingyaoww Nov 8, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
216 changes: 216 additions & 0 deletions .github/workflows/build-swe-bench-images.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
name: Build SWE-Bench Images

on:
pull_request: # for debugging
workflow_dispatch:
inputs:
dataset:
description: 'Dataset name (e.g., princeton-nlp/SWE-bench_Verified)'
required: true
default: 'princeton-nlp/SWE-bench_Verified'
type: string
split:
description: 'Dataset split (e.g., test, dev)'
required: true
default: 'test'
type: string
max-workers:
description: 'Number of concurrent builds'
required: false
default: '2'
type: string
n-limit:
description: 'Limit number of images to build (for testing). Leave blank for no limit.'
required: false
default: '10'
type: string

# Reasonable defaults for automatic (push) runs; workflow_dispatch can override these.
env:
DATASET: princeton-nlp/SWE-bench_Verified
SPLIT: test
MAX_WORKERS: '32'
N_LIMIT: '500'

concurrency:
group: build-swe-bench-${{ github.ref }}
cancel-in-progress: false

jobs:
build-and-push:
runs-on:
labels: blacksmith-32vcpu-ubuntu-2204

# Allow pushing to GHCR and commenting on issues
permissions:
contents: read
packages: write
issues: write

steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
submodules: recursive

# If this was a manual dispatch, override defaults with provided inputs.
- name: Apply workflow_dispatch overrides (if any)
if: ${{ github.event_name == 'workflow_dispatch' }}
run: |
if [ -n "${{ inputs.dataset }}" ]; then echo "DATASET=${{ inputs.dataset }}" >> "$GITHUB_ENV"; fi
if [ -n "${{ inputs.split }}" ]; then echo "SPLIT=${{ inputs.split }}" >> "$GITHUB_ENV"; fi
if [ -n "${{ inputs.max-workers }}" ]; then echo "MAX_WORKERS=${{ inputs.max-workers }}" >> "$GITHUB_ENV"; fi
# Empty string means "no limit"
if [ -n "${{ inputs.n-limit }}" ]; then echo "N_LIMIT=${{ inputs.n-limit }}" >> "$GITHUB_ENV"; else echo "N_LIMIT=" >> "$GITHUB_ENV"; fi

- name: Set up Docker Buildx with Blacksmith
uses: useblacksmith/setup-docker-builder@v1

- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Install uv
uses: astral-sh/setup-uv@v7
with:
enable-cache: true

- name: Install dependencies
run: |
make build

- name: Build and push SWE-Bench images
run: |
set -euo pipefail

CMD="uv run benchmarks/swe_bench/build_images.py \
--dataset '${DATASET}' \
--split '${SPLIT}' \
--image ghcr.io/openhands/eval-agent-server \
--push \
--max-workers '${MAX_WORKERS}'"

# Only include --n-limit if provided (non-empty)
if [ -n "${N_LIMIT}" ]; then
CMD="$CMD --n-limit '${N_LIMIT}'"
fi

echo "Running: $CMD"
eval "$CMD"
env:
DOCKER_BUILDKIT: 1
BUILDKIT_PROGRESS: plain

- name: Upload build manifest
if: always()
uses: actions/upload-artifact@v4
with:
name: build-manifest-${{ github.run_id }}
path: |
builds/**/manifest.jsonl
builds/**/summary.json
retention-days: 30

- name: Archive build logs
if: always()
run: |
if [ -d builds ]; then
# Create tar archive to avoid filename restrictions (colons, etc.)
tar -czf build-logs.tar.gz builds/
echo "Build logs archived successfully"
else
echo "No builds directory found"
fi

- name: Upload build logs
if: always()
uses: actions/upload-artifact@v4
with:
name: build-logs-${{ github.run_id }}
path: build-logs.tar.gz
retention-days: 7
if-no-files-found: warn

- name: Display build summary
if: always()
run: |
if ls builds/*/summary.json >/dev/null 2>&1; then
echo "## Build Summary" >> "$GITHUB_STEP_SUMMARY"
cat builds/*/summary.json | python -m json.tool >> "$GITHUB_STEP_SUMMARY"
fi

- name: Comment on tracker issue
if: success()
run: |
# Get SDK version from submodule
SDK_SHA=$(git submodule status vendor/software-agent-sdk | awk '{print $1}' | sed 's/^[+-]//')

# Find all manifest.jsonl files
MANIFEST_FILES=$(find builds -name "manifest.jsonl" -type f 2>/dev/null)

if [ -z "$MANIFEST_FILES" ]; then
echo "No manifest.jsonl files found in builds directory"
echo "Build may have completed but produced no images"
exit 0
fi

# Count total images built
TOTAL_IMAGES=$(cat $MANIFEST_FILES 2>/dev/null | wc -l)

if [ "$TOTAL_IMAGES" -eq 0 ]; then
echo "No images found in manifest files"
echo "Skipping comment as there are no built images to report"
exit 0
fi

# Extract all tags and format them as a markdown list (one tag per image)
TAGS=$(cat $MANIFEST_FILES | python -c "
import sys
import json
for line in sys.stdin:
data = json.loads(line.strip())
if data.get('tags') and len(data['tags']) > 0:
# Only show the first tag per image to reduce clutter
print(f'- \`{data[\"tags\"][0]}\`')
")

# Determine how the workflow was triggered
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
TRIGGER="Manual trigger (workflow_dispatch)"
elif [ "${{ github.event_name }}" = "pull_request" ]; then
TRIGGER="Pull request [#${{ github.event.pull_request.number }}](${{ github.event.pull_request.html_url }})"
else
TRIGGER="${{ github.event_name }}"
fi

# Create the comment body
COMMENT_BODY=$(cat <<EOF
## Build Complete ✅

**Dataset:** \`${DATASET}\`
**Split:** \`${SPLIT}\`
**SDK Version:** [\`${SDK_SHA:0:7}\`](https://github.com/OpenHands/software-agent-sdk/commit/${SDK_SHA})
**Workflow Run:** [#${{ github.run_id }}](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})
**Triggered by:** ${TRIGGER}

<details>
<summary>Built Tags (${TOTAL_IMAGES} images)</summary>

${TAGS}

</details>
EOF
)

# Post comment to issue #81
curl -L -X POST \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${{ secrets.GITHUB_TOKEN }}" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"${{ github.api_url }}/repos/${{ github.repository }}/issues/81/comments" \
-d "$(jq -n --arg body "$COMMENT_BODY" '{body: $body}')"
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
6 changes: 3 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ cython_debug/
workspace/

# IDE and editor directories
.openhands/
.openhands/*
!.openhands/setup.sh
!.openhands/microagents/
.vscode/
Expand All @@ -215,5 +215,5 @@ workspace/
!.llm_config/example.json

# Evaluation outputs
./eval_outputs
./builds
eval_outputs/
builds/
2 changes: 1 addition & 1 deletion .openhands/microagents/repo.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ make build # Rebuild environment
5. Update README.md with usage instructions

# LLM Configuration
LLM configs use JSON matching the [LLM class schema](https://github.com/All-Hands-AI/agent-sdk/blob/main/openhands/sdk/llm/llm.py#L93):
LLM configs use JSON matching the [LLM class schema](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands/sdk/llm/llm.py#L93):
```json
{
"model": "litellm_proxy/anthropic/claude-sonnet-4-20250514",
Expand Down
Loading
Loading