Skip to content

Conversation

@achilleas-k
Copy link
Member

@achilleas-k achilleas-k commented Oct 22, 2025

Add new job type, for generating manifests using image-builder-cli. Support enabling the functionality using the IMAGE_BUILDER_EXPERIMENTAL environment variable.

Currently this doesn't support everything we need. Resolver options that require credentials and certificates aren't supported because we need a way to pass them to image-builder-cli. Also, long term, this feature is meant to support building images from bootc containers, which will require pulling the container to inspect it. We'll have to find a way to share that container with the build job to avoid pulling twice.

DRAFT: I should also add an integration test to build an actual image with this option enabled.

@achilleas-k achilleas-k force-pushed the newjob/ib-cli-manifest branch from 8bc3efd to 30a026e Compare October 28, 2025 14:05
@achilleas-k achilleas-k marked this pull request as ready for review October 28, 2025 14:25
@achilleas-k achilleas-k requested review from a team and thozza as code owners October 28, 2025 14:25
@achilleas-k achilleas-k requested review from croissanne and mvo5 and removed request for a team October 28, 2025 14:25
@achilleas-k
Copy link
Member Author

achilleas-k commented Oct 28, 2025

Integration test added. CI runners are having issues though (see #4909).

Copy link
Contributor

@mvo5 mvo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Some quick drive-by musing, very excited about this feature!

"github.com/stretchr/testify/assert"
)

func TestParseManifestPipelines(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

// read XDG_CACHE_HOME, which fails when running as _osbuild-composer
// TODO: make sure we use the same rpmmd cache that's used by the depsolve
// job for consistency
ExtraEnv: []string{"XDG_CACHE_HOME=/var/cache/osbuild-composer/rpmmd"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this comment, lets make sure that we have a --cachdir in ibcli for you :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this osbuild/image-builder-cli#358 help you here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth a small comment now that we have osbuild/image-builder-cli#358 open that this can be replaced with --rpmmd-cache ? Unless we want to use it this way of course :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we should definitely use the option when we get it. I think we still haven't decided exactly how we're going to get ib-cli on the workers yet, so we'll need to see when that new option becomes available. But I'll definitely add the note.

if [[ "${IMAGE_BUILDER_EXPERIMENTAL}" != "" ]]; then
if echo "${IMAGE_BUILDER_EXPERIMENTAL}" | grep -q "image-builder-manifest-generation=1"; then
greenprint "Verifying usage of experimental job type"
# verify using log message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smart!

@achilleas-k achilleas-k force-pushed the newjob/ib-cli-manifest branch 2 times, most recently from 6cba2df to a067379 Compare October 29, 2025 11:06
mvo5 added a commit to mvo5/image-builder-cli that referenced this pull request Oct 29, 2025
This commit adds a new option `--rpmmd-cache` that is similar
to the option in bootc-image-builder. It is also useful for
osbuild/osbuild-composer#4904
mvo5 added a commit to mvo5/image-builder-cli that referenced this pull request Oct 29, 2025
This commit adds a new option `--rpmmd-cache` that is similar
to the option in bootc-image-builder. It is also useful for
osbuild/osbuild-composer#4904
mvo5 added a commit to mvo5/image-builder-cli that referenced this pull request Oct 29, 2025
This commit adds a new option `--rpmmd-cache` that is similar
to the option in bootc-image-builder. It is also useful for
osbuild/osbuild-composer#4904
@achilleas-k achilleas-k force-pushed the newjob/ib-cli-manifest branch from a067379 to bba18ed Compare October 30, 2025 18:27
@achilleas-k
Copy link
Member Author

Rebased on #4909 to get the new terraform configs and get this thing tested.

@achilleas-k
Copy link
Member Author

I just realised while waiting for the tests to run that I also need to actually install ib-cli on the test runner when using it 😆

mvo5
mvo5 previously approved these changes Oct 31, 2025
Copy link
Contributor

@mvo5 mvo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Some small suggestions inline but all could be followups (or ignored, some are really just subjective) I think.

func (s *Server) enqueueComposeIBCLI(irs []imageRequest, channel string) (uuid.UUID, error) {
var osbuildJobID uuid.UUID
if len(irs) != 1 {
return osbuildJobID, HTTPError(ErrorInvalidNumberOfImageBuilds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(super nitpick) might be nice to provide a bit more information here, something like:

diff --git a/internal/cloudapi/v2/server.go b/internal/cloudapi/v2/server.go
index 7fd361516..080bfdf48 100644
--- a/internal/cloudapi/v2/server.go
+++ b/internal/cloudapi/v2/server.go
@@ -325,7 +325,7 @@ func (s *Server) enqueueComposeIBCLI(irs []imageRequest, channel string) (uuid.U
        logrus.Warnf("using experimental job type: %s", worker.JobTypeImageBuilderManifest)
        var osbuildJobID uuid.UUID
        if len(irs) != 1 {
-               return osbuildJobID, HTTPError(ErrorInvalidNumberOfImageBuilds)
+               return osbuildJobID, HTTPErrorWithInternal(ErrorInvalidNumberOfImageBuilds, fmt.Errorf("expected 1 image request, got %d", len(irs)))
        }
        ir := irs[0]
 

but it seems enqueCompse() has the same pattern so feel free to ignore.

// read XDG_CACHE_HOME, which fails when running as _osbuild-composer
// TODO: make sure we use the same rpmmd cache that's used by the depsolve
// job for consistency
ExtraEnv: []string{"XDG_CACHE_HOME=/var/cache/osbuild-composer/rpmmd"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth a small comment now that we have osbuild/image-builder-cli#358 open that this can be replaced with --rpmmd-cache ? Unless we want to use it this way of course :)

// TODO: extend to include all options
// - ostree
// - registration
// - bootc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this landed (and we added a way to pass a bootc-ref) we would be able to build bootc images in the service, right? That is exciting :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's the plan. The thing I want to work on after this is bootc manifest generation and the "passing of the container" from the depsolve worker to the build worker. Which doesn't seem trivial.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am looking into how bootc support could be added to composer and I have a question.

Can I assume that generation of manifests via this new feature would also download containers? Because then the building job would use this one as a dependency and once it is done, containers could be archived and sent over to the secured instance.


// handle experimental image-builder manifest generation option using the
// experimentalflags pkg from osbuild/images.
if experimentalflags.Bool("image-builder-manifest-generation") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(super nitpick) I personally would have combined this with the previous commit that added ImageBuilderManifestGeneration to the config but up to you how granular you want this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- aws/rhel-10.0-ga-aarch64
INTERNAL_NETWORK: ["true"]

# single test config for experimental feature
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(super nitpick) I wonder if this and the previous one (or even also the previous-previous one) commits could be combined but very subjective and fine as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to add repository handling here and I need to fix up the test to install ib-cli, so I'll do some history rewriting to reduce the commit count, sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@achilleas-k achilleas-k marked this pull request as draft October 31, 2025 11:59
@achilleas-k achilleas-k force-pushed the newjob/ib-cli-manifest branch 4 times, most recently from 9e97ce6 to 0639ba7 Compare October 31, 2025 17:19
@achilleas-k achilleas-k marked this pull request as ready for review October 31, 2025 17:19
mvo5 added a commit to mvo5/image-builder-cli that referenced this pull request Nov 3, 2025
This commit adds a new option `--rpmmd-cache` that is similar
to the option in bootc-image-builder. It is also useful for
osbuild/osbuild-composer#4904
lzap
lzap previously approved these changes Nov 3, 2025
// result itself should have the pipeline names (under ManifestInfo)
// parsed from the manifest itself.
osbuildJobResult.PipelineNames = manifestInfo.PipelineNames
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is there a reason to assign first and then check for nil versus checking the jobArgs.PipelineNames instead?

Copy link
Member Author

@achilleas-k achilleas-k Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason. Just felt natural at the time.
Do you find it's less readable this way? I can change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is okay. It is one way of doing it for sure :-)

@achilleas-k achilleas-k force-pushed the newjob/ib-cli-manifest branch from 0639ba7 to 8d42e60 Compare November 3, 2025 12:15
@achilleas-k achilleas-k changed the title Add new job type for generating manifests using image-builder-cli Add new job type for generating manifests using image-builder-cli [HMS-9049] Nov 3, 2025
@achilleas-k
Copy link
Member Author

I switched the test from Fedora 42 to RHEL 10.2-nightly. The previous one was building an AMI for Fedora, which fails because our test setup always adds the RHSM customization block to AMI builds and isn't supported on Fedora.

github-merge-queue bot pushed a commit to osbuild/image-builder-cli that referenced this pull request Nov 3, 2025
This commit adds a new option `--rpmmd-cache` that is similar
to the option in bootc-image-builder. It is also useful for
osbuild/osbuild-composer#4904
@achilleas-k achilleas-k force-pushed the newjob/ib-cli-manifest branch from 8d42e60 to 83a4df7 Compare November 3, 2025 13:31
Copy link
Member

@thozza thozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must have been a lot of work 👏

Besides my inline comments, I think that it would be good to enable ibcli manifest generation on most of the CloudAPI unit tests. The reason is that at least two of them will break with this change when using ibcli to generate manifests.

Comment on lines 52 to 57
if err := bpFile.Close(); err != nil {
return nil, fmt.Errorf("image-builder manifest: failed to close blueprint file: %w", err)
}

// TODO: catch and log remove errors
defer os.Remove(bpFile.Name())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should ideally defer Close() and Remove() as a common cleanup action right after calling CreateTemp(). Right now, the opened file handle can leak i.e. if json.Marshal() returns an error. In such a case, the file handle would not be closed, and the file would not get removed.


// MockExecCommand replaces the exec.Command() wrapper and returns a function
// that can be called to restore the original.
func MockExecCommand(mock func(name string, arg ...string) *exec.Cmd) func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: we tend to name the restore function value, to make the purpose of the function self-explanatory (i.e.

func MockSerializeManifestFunc(f func(ctx context.Context, manifestSource *manifest.Manifest, workers *worker.Server, dependencies manifestJobDependencies, manifestJobID uuid.UUID, seed int64)) (restore func()) {
).

Comment on lines 29 to 31
var execCommand = func(name string, arg ...string) *exec.Cmd {
return exec.Command(name, arg...)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: You could simply assign exec.Command to execCommand, without wrapping it in an additional function, no?


var actualCall []string
var cmd *exec.Cmd
worker.MockExecCommand(func(name string, arg ...string) *exec.Cmd {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You never defer the restore function, which is also thrown away on each call.

// image-builder-cli. Includes resolving all content types.
type ImageBuilderManifestJob struct {
// Arguments to the image-builder command line
Args ImageBuilderArgs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure that you are aware of this, but with this change, the blueprint.Blueprint becomes part of the Worker API, so any changes to it will affect the worker. It shouldn't be a problem, since the package is intended for user-facing serialization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and we need to think about this more. I think we're fine in general, we've been very good about keeping the blueprint stable, but I think this goes hand-in-hand with making ib-cli a versioned dependency.

Maybe it's better if the blueprint is serialised in composer instead and passed on as raw json instead. WDYT?

Comment on lines +58 to +79
// Parse the raw manifest into an incomplete representation of the manifest
// structure in order to retrieve the pipeline names in order. Any pipeline
// names that appear under the 'build' property of another pipeline are added
// to the build pipelines list. The rest are added to the payload pipelines.
func parseManifestPipelines(rawManifest []byte) (*worker.PipelineNames, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to dump this additional metadata in some form, in addition to the manifest? Maybe together with some of the info that we are then missing when uploading a built image?

Or we could simply add this information to the manifest itself under the metadata field -> https://github.com/osbuild/osbuild/blob/64f87a5fd0f7e3c86c78ef2a5cea30bd06dce209/schemas/osbuild2.json#L89-L90

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think both could be valid. This was a big of a workaround to get the PoC up and running, but we shouldn't be doing guesswork in the long run.

Comment on lines +644 to +648
// Ordered list of pipeline names in the manifest, separated into build and
// payload pipelines. These are parsed from the manifest itself and copied
// to the osbuild job result so it can properly order the osbuild job log
// by pipeline execution order.
PipelineNames *PipelineNames `json:"pipeline_names,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to simplify the flow and use this field even for manifests generated on the server side by composer. Something to consider for the future...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the current state where images defines them itself while it's generating the manifest, but that doesn't carry over to osbuild-composer when using ib-cli. So maybe we could start using the manifest metadata. I know @mvo5 and @supakeen were fond of this idea too and I resisted a bit, but now that we're here, I think it's necessary.

Comment on lines 97 to 99
if err := reposFile.Close(); err != nil {
return nil, fmt.Errorf("image-builder manifest: failed to close blueprint file: %w", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: should be ideally deferred to close the file in case of errors.

Comment on lines +153 to +156
# TODO: configure dependency pinning for image-builder like we do for osbuild
# https://issues.redhat.com/browse/HMS-9647
greenprint "Installing image-builder for experimental manifest generation"
sudo dnf -y install image-builder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also figure out a way to set the minimum required image-builder version in the SPEC file. The feature obviously relies on specific features in ibcli, which were added fairly recently (like the search for repos under repository/ in the --data-dir).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely. I held off making any such changes since it's "experimental" and service-only, but we'll need to at least make it an optional dependency at first with a minimum version.

if job.NDynamicArgs() > 0 {
var manifestJR worker.ManifestJobByIDResult
if job.NDynamicArgs() == 1 {
// Classic case of a compose request with the ManifestJobByID job as the single dependency
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should amend the comment to note the IBCLI manifest job

@achilleas-k
Copy link
Member Author

This must have been a lot of work 👏

Besides my inline comments, I think that it would be good to enable ibcli manifest generation on most of the CloudAPI unit tests. The reason is that at least two of them will break with this change when using ibcli to generate manifests.

Thank you! I didn't even consider looking into those.

achilleas-k and others added 15 commits November 3, 2025 18:17
Worker function that shells out to image-builder to generate a manifest.
Mock the call to exec.Command() to capture the command and arguments in
order to test the call.
Read the Blueprint from disk and compare it to the input Blueprint to
verify that it was written correctly for each test.
The loading needs to happen in the mocked exec command since the
Blueprint file gets deleted when the RunImageBuilderManifest() command
returns.
Add new job type, associated enqueueing method, const string and result
type for generating manifests using image-builder-cli.

The result type is an alias to ManifestJobByIDRestult.  This makes it
simpler to use the result of the new job type as an argument for the
osbuild job.
Add the JobInfo handler for the new job type and call it from the finish
job and error functions.
Add ImageBuilderManifestJobImpl and implement its Run() method.

The job implementation struct is currently empty, but it will require
access to the same secrets as all the content resolver jobs.
When using the images library directly to generate a manifest, we have
access to the manifest source (pre-serialized manifest).  This object
provides a list of pipeline names, separated into build pipelines
(manifestSource.BuildPipelines()) and payload pipelines
(manifestSource.PayloadPipelines()).  The pipeline names are used in two
jobs:
1. In the osbuild job, the log in the result is unordered because
   pipeline results are returned in a map.  The list of pipeline names
   is used in this case to sort the job log in the order each pipeline
   was executed.
2. The koji finalize job uses the two pipeline name lists to separate
   the metadata from org.osbuild.rpm stages between build and payload,
   so that it can store package lists alongside the compose result
   separated into build packages and payload packages.

When using image-builder-cli, we don't have access to these pipeline
name lists directly.  To get them, let's parse the manifest into a
partial structure, identify all the pipeline names, and infer the build
pipeline names from the build property of the rest of the pipelines.

This approach violates one of our (soft) rules about job arguments and
job results: That they should always be considered opaque json blobs.
However, in the case of the manifest, I think it's acceptable,
especially since it only relies on the top-level structure of the
pipelines property, which we guarantee to be stable in osbuild.
Add an option to the cloudapi server config that enables generating
manifests with the ImageBuilderManifest job type.  Also will make it
possible to enable this using an environment variable.

Use the experimentalflags package from osbuild/images to read the
IMAGE_BUILDER_EXPERIMENTAL environment variable and check that
'image-builder-manifest-generation' is defined.  If it is, enable the
ImageBuilderManifestGeneration option in the composer server config.
Support enabling the option in tests and check that the correct job
types appear in the queue.
To reduce the number of args and make the function calls a bit more
readable, bundle the boolean options in the newV2Server() function into
a v2ServerOpts struct.

Co-authored-by: Michael Vogt <[email protected]>
When calling image-builder manifest, we need to set repositories
configured in osbuild-composer for distros.  Without this, the
repository configs embedded in image-builder are used, but in the
service, we don't want to rely on those.  Instead, we need to use the
repositories configured in the service itself.

Write any configured repository configurations to a file in a temporary
directory under the repositories/ subdirectory.  This works with
image-builder's --data-dir option, which looks for a repository file
matching the target distribution name under the repositories/
subdirectory.
Support setting repository MTLS configs in the new
image-builder-manifest job type.  This is equivalent to the same feature
in the traditional depsolve job.  MTLS configs for repositories are
defined in the worker by mapping a repository URL to the appropriate
cert values (CA, key, and cert).  The values are set on the repository
configurations matching the repo URL and passed to the depsolver.
In the deploy script, after installing all the test RPMs, add an
override to the unit file that sets the IMAGE_BUILDER_EXPERIMENTAL
environment variable for the osbuild-composer service with any values
from the global runner environment.

If image-builder-manifest-generation is enabled, install image-builder
from the repos.

Add a single API test job (api.sh) to the gitlab CI config that uses the
new experimental job type.

In the api.sh script, when image-builder-manifest-generation is enabled
in the test, verify it was actually used by searching for the warning
that's logged when the job is enqueued.
@achilleas-k achilleas-k force-pushed the newjob/ib-cli-manifest branch from 83a4df7 to 04b6b7f Compare November 3, 2025 20:51
@achilleas-k
Copy link
Member Author

This must have been a lot of work 👏

Besides my inline comments, I think that it would be good to enable ibcli manifest generation on most of the CloudAPI unit tests. The reason is that at least two of them will break with this change when using ibcli to generate manifests.

Marking as draft because I haven't addressed these issues. I want to find a way to test them which would fail with the current state.

@achilleas-k achilleas-k marked this pull request as draft November 3, 2025 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants