bootc

Transactional, in-place operating system updates using OCI/Docker container images. bootc is the key component in a broader mission of bootable containers.

The original Docker container model of using "layers" to model applications has been extremely successful. This project aims to apply the same technique for bootable host systems - using standard OCI/Docker containers as a transport and delivery format for base operating system updates.

The container image includes a Linux kernel (in e.g. /usr/lib/modules), which is used to boot. At runtime on a target system, the base userspace is not itself running in a container by default. For example, assuming systemd is in use, systemd acts as pid1 as usual - there's no "outer" process.

Status

At the current time, bootc has not reached 1.0, and it is possible that some APIs and CLIs may change. For more information, see the 1.0 milestone.

However, the core underlying code uses the ostree project which has been powering stable operating system updates for many years. The stability here generally refers to the surface APIs, not the underlying logic.

Base images

Many users will be more interested in base (container) images.

Fedora/CentOS

Currently, the Fedora/CentOS bootc project is the most closely aligned upstream project.

For pre-built base images; any Fedora derivative already using ostree can be seamlessly converted into using bootc; for example, Fedora CoreOS can be used as a base image; you will want to also rpm-ostree install bootc in your image builds currently. There are some overlaps between bootc and ignition and zincati however; see this pull request for more information.

For other derivatives such as the "Atomic desktops", see discussion of relationships which particularly covers interactions with rpm-ostree.

Other

However, bootc itself is not tied to Fedora derivatives; this issue tracks the main blocker for other distributions.

Generic guidance for building images

The bootc project intends to be operating system and distribution independent as possible, similar to its related projects podman and systemd, etc.

The recommendations for creating bootc-compatible images will in general need to be owned by the OS/distribution - in particular the ones who create the default bootc base image(s). However, some guidance is very generic to most Linux systems (and bootc only supports Linux).

Let's however restate a base goal of this project:

The original Docker container model of using "layers" to model applications has been extremely successful. This project aims to apply the same technique for bootable host systems - using standard OCI/Docker containers as a transport and delivery format for base operating system updates.

Every tool and technique for creating application base images should apply to the host Linux OS as much as possible.

Understanding mutability

When run as a container (particularly as part of a build), bootc-compatible images have all parts of the filesystem (e.g. /usr in particular) as fully mutable state, and writing there is encouraged (see below).

When "deployed" to a physical or virtual machine, the container image files are read-only by default; for more, see filesystem.

Installing software

For package management tools like apt, dnf, zypper etc. (generically, $pkgsystem) it is very much expected that the pattern of

RUN $pkgsystem install somepackage && $pkgsystem clean all

type flow Just Works here - the same way as it does "application" container images. This pattern is really how Docker got started.

There's not much special to this that doesn't also apply to application containers; but see below.

Nesting OCI containers in bootc containers

The OCI format uses "whiteouts" represented in the tar stream as special .wh files, and typically consumed by the Linux kernel overlayfs driver as special 0:0 character devices. Without special work, whiteouts cannot be nested.

Hence, an invocation like

RUN podman pull quay.io/exampleimage/someimage

will create problems, as the podman runtime will create whiteout files inside the container image filesystem itself.

Special care and code changes will need to be made to container runtimes to support such nesting. Some more discussion in this tracker issue.

systemd units

The model that is most popular with the Docker/OCI world is "microservice" style containers with the application as pid 1, isolating the applications from each other and from the host system - as opposed to "system containers" which run an init system like systemd, typically also SSH and often multiple logical "application" components as part of the same container.

The bootc project generally expects systemd as pid 1, and if you embed software in your derived image, the default would then be that that software is initially launched via a systemd unit.

RUN dnf -y install postgresql && dnf clean all

Would typically also carry a systemd unit, and that service will be launched the same way as it would on a package-based system.

Users and groups

Note that the above postgresql today will allocate a user; this leads to the topic of users, groups and SSH keys.

Configuration

A key aspect of choosing a bootc-based operating system model is that code and configuration can be strictly "lifecycle bound" together in exactly the same way.

(Today, that's by including the configuration into the base container image; however a future enhancement for bootc will also support dynamically-injected ConfigMaps, similar to kubelet)

You can add configuration files to the same places they're expected by typical package systems on Debian/Fedora/Arch etc. and others - in /usr (preferred where possible) or /etc. systemd has long advocated and supported a model where /usr (e.g. /usr/lib/systemd/system) contains content owned by the operating system image.

/etc is machine-local state. However, per filesystem.md it's important to note that the underlying OSTree system performs a 3-way merge of /etc, so changes you make in the container image to e.g. /etc/postgresql.conf will be applied on update, assuming it is not modified locally.

Prefer using drop-in directories

These "locally modified" files can be a source of state drift. The best pattern to use is "drop-in" directories that are merged dynamically by the relevant software. systemd supports this comprehensively; see drop-ins for example in units.

And instead of modifying /etc/sudoers.conf, it's best practice to add a file into /etc/sudoers.d for example.

Not all software supports this, however; and this is why there is generic support for /etc.

Configuration in /usr vs /etc

Some software supports generic configuration both /usr and /etc - systemd, among others. Because bootc supports derivation (the way OCI containers work) - it is supported and encourged to put configuration files in /usr (instead of /etc) where possible, because then the state is consistently immutable.

One pattern is to replace a configuration file like /etc/postgresql.conf with a symlink to e.g. /usr/postgres/etc/postgresql.conf for example, although this can run afoul of SELinux labeling.

Secrets

There is a dedicated document for secrets, which is a special case of configuration.

Handling read-only vs writable locations

The high level pattern for bootc systems is summarized again this way:

  • Put read-only data and executables in /usr
  • Put configuration files in /usr (if they're static), or /etc if they need to be machine-local
  • Put "data" (log files, databases, etc.) underneath /var

However, some software installs to /opt/examplepkg or another location outside of /usr, and may include all three types of data undernath its single toplevel directory. For example, it may write log files to /opt/examplepkg/logs. A simple way to handle this is to change the directories that need to be writble to symbolic links to /var:

RUN apt|dnf install examplepkg && \
    mv /opt/examplepkg/logs && /var/log/examplepkg && \
    ln -sr /opt/examplepkg/logs /var/log/examplepkg

The Fedora/CentOS bootc puppet example is one instance of this.

Another option is to configure the systemd unit launching the service to do these mounts dynamically via e.g.

BindPaths=/var/log/exampleapp:/opt/exampleapp/logs

Container runtime vs "bootc runtime"

Fundamentally, bootc reuses the OCI image format as a way to transport serialized filesystem trees with included metadata such as a version label, etc.

However, bootc generally ignores the Container configuration section at runtime today.

Container runtimes like podman and docker of course will interpret this metadata when running a bootc container image as a container.

Labels

A key aspect of OCI is the ability to use standardized (or semi-standardized) labels. The are stored and rendered by bootc; especially the org.opencontainers.image.version label.

Example ignored runtime metadata, and recommendations

ENTRYPOINT and CMD (OCI: Entrypoint/Cmd)

Ignored by bootc.

It's recommended for bootc containers to set CMD /sbin/init; but this is not required.

The booted host system will launch from the bootloader, to the kernel+initramfs and real root however it is "physically" configured inside the image. Typically today this is using systemd in both the initramfs and at runtime; but this is up to how you build the image.

ENV (OCI: Env)

Ignored by bootc; to configure the global system environment you can change the systemd configuration. (Though this is generally not a good idea; instead it's usually better to change the environment of individual services)

EXPOSE (OCI: exposedPorts)

Ignored by bootc; it is agnostic to how the system firewall and network function at runtime.

USER (OCI: User)

Ignored by bootc; typically you should configure individual services inside the bootc container to run as unprivileged users instead.

HEALTHCHECK (OCI: no equivalent)

This is currently a Docker-specific metadata, and did not make it into the OCI standards. (Note podman healthchecks)

It is important to understand again is that there is no "outer container runtime" when a bootc container is deployed on a host. The system must perform health checking on itself (or have an external system do it).

Relevant links:

Kernel

When run as a container, the Linux kernel binary in /usr/lib/modules/$kver/vmlinuz is ignored. It is only used when a bootc container is deployed to a physical or virtual machine.

Security properties

When run as a container, the container runtime will by default apply various Linux kernel features such as namespacing to isolate the container processes from other system processes.

None of these isolation properties apply when a bootc system is deployed.

SELinux

For more on the intersection of SELinux and current bootc (OSTree container) images, see bootc images - SELinux.

Users and groups

This is one of the more complex topics. Generally speaking, bootc has nothing to do directly with configuring users or groups; it is a generic OS update/configuration mechanism. (There is currently just one small exception in that bootc install has a special case --root-ssh-authorized-keys argument, but it's very much optional).

Generic base images

Commonly OS/distribution base images will be generic, i.e. without any configuration. It is very strongly recommended to avoid hardcoded passwords and ssh keys with publicly-available private keys (as Vagrant does) in generic images.

Injecting SSH keys via systemd credentials

The systemd project has documentation for credentials which can be used in some environments to inject a root password or SSH authorized_keys. For many cases, this is a best practice.

At the time of this writing this relies on SMBIOS which is mainly configurable in local virtualization environments. (qemu).

Injecting users and SSH keys via cloud-init, etc.

Many IaaS and virtualization systems are oriented towards a "metadata server" (see e.g. AWS instance metadata) that are commonly processed by software such as cloud-init or Ignition or equivalent.

The base image you're using may include such software, or you can install it in your own derived images.

In this model, SSH configuration is managed outside of the bootable image. See e.g. GCP oslogin for an example of this where operating system identities are linked to the underlying Google accounts.

Adding users and credentials via custom logic (container or unit)

Of course, systems like cloud-init are not privileged; you can inject any logic you want to manage credentials via e.g. a systemd unit (which may launch a container image) that manages things however you prefer. Commonly, this would be a custom network-hosted source. For example, FreeIPA.

Another example in a Kubernetes-oriented infrastructure would be a container image that fetches desired authentication credentials from a CRD hosted in the API server. (To do things like this it's suggested to reuse the kubelet credentials)

Adding users and credentials statically in the container build

Relative to package-oriented systems, a new ability is to inject users and credentials as part of a derived build:

RUN useradd someuser

However, it is important to understand some two very important issues with this as it exists today (the shadow-utils implementation of useradd) and the default glibc files backend for the traditional /etc/passwd and /etc/shadow files.

It is common for user/group IDs are allocated dynamically, and this can result in "drift" (see below).

Further, if /etc/passwd is modified locally (because there is a machine-local user), then any added users injected via useradd will not appear on subsequent updates by default (they will be in /usr/etc/passwd instead - the default image version).

These "system users" that may be created by packaging tools invoking useradd (e.g. apt|dnf install httpd) that do not also install a sysusers.d file. Currently for example, this is the case with the CentOS Stream 9 httpd package. Per below, the general solution to this is to avoid invoking useradd in container builds, and prefer one of the below solutions.

User and group home directories and /var

For systems configured with persistent /home/var/home, any changes to /var made in the container image after initial installation will not be applied on subsequent updates. If for example you inject /var/home/someuser/.ssh/authorized_keys into a container build, existing systems will not get the updated authorized keys file.

Using DynamicUser=yes for systemd units

For "system" users it's strongly recommended to use systemd DynamicUser=yes where possible.

This is significantly better than the pattern of allocating users/groups at "package install time" (e.g. Fedora package user/group guidelines) because it avoids potential UID/GID drift (see below).

Using systemd-sysusers

See systemd-sysusers. For example in your derived build:

COPY mycustom-user.conf /usr/lib/sysusers.d

A key aspect of how this works is that sysusers will make changes to the traditional /etc/passwd file as necessary on boot. If /etc is persistent, this can avoid uid/gid drift (but in the general case it does mean that uid/gid allocation can depend on how a specific machine was upgraded over time).

Using systemd JSON user records

See JSON user records. Unlike sysusers, the canonical state for these live in /usr - if a subsequent image drops a user record, then it will also vanish from the system - unlike sysusers.d.

nss-altfiles

The nss-altfiles project (long) predates systemd JSON user records. It aims to help split "system" users into /usr/lib/passwd and /usr/lib/group. It's very important to understand that this aligns with the way the OSTree project handles the "3 way merge" for /etc as it relates to /etc/passwd. Currently, if the /etc/passwd file is modified in any way on the local system, then subsequent changes to /etc/passwd in the container image will not be applied.

Some base images may have nss-altfiles enabled by default; this is currently the case for base images built by rpm-ostree.

Commonly, base images will have some "system" users pre-allocated and managed via this file again to avoid uid/gid drift.

In a derived container build, you can also append users to /usr/lib/passwd for example. (At the time of this writing there is no command line to do so though).

Typically it is more preferable to use sysusers.d or DynamicUser=yes.

Machine-local state for users

At this point, it is important to understand the filesystem layout - the default is up to the base image.

The default Linux concept of a user has data stored in both /etc (/etc/passwd, /etc/shadow and groups) and /home. The choice for how these work is up to the base image, but a common default for generic base images is to have both be machine-local persistent state. In this model /home would be a symlink to /var/home/someuser.

Injecting users and SSH keys via at system provisioning time

For base images where /etc and /var are configured to persist by default, it will then be generally supported to inject users via "installers" such as Anaconda (interactively or via kickstart) or any others.

Typically generic installers such as this are designed for "one time bootstrap" and again then the configuration becomes mutable machine-local state that can be changed "day 2" via some other mechanism.

The simple case is a user with a password - typically the installer helps set the initial password, but to change it there is a different in-system tool (such as passwd or a GUI as part of Cockpit, GNOME/KDE/etc).

It is intended that these flows work equivalently in a bootc-compatible system, to support users directly installing "generic" base images, without requiring changes to the tools above.

Transient home directories

Many operating system deployments will want to minimize persistent, mutable and executable state - and user home directories are that

But it is also valid to default to having e.g. /home be a tmpfs to ensure user data is cleaned up across reboots (and this pairs particularly well with a transient /etc as well):

In order to set up the user's home directory to e.g. inject SSH authorized_keys or other files, a good approach is to use systemd tmpfiles.d snippets:

f~ /home/someuser/.ssh/authorized_keys 600 someuser someuser - <base64 encoded data>

which can be embedded in the image as /usr/lib/tmpfiles.d/someuser-keys.conf.

Or a service embedded in the image can fetch keys from the network and write them; this is the pattern used by cloud-init and afterburn.

UID/GID drift

Ultimately the /etc/passwd and similar files are a mapping between names and numeric identifiers. A problem then becomes when this mapping is dynamic and mixed with "stateless" container image builds.

For example today the CentOS Stream 9 postgresql package allocates a static uid of 26.

This means that

RUN dnf -y install postgresql

will always result in a change to /etc/passwd that allocates uid 26 and data in /var/lib/postgres will always be owned by that UID.

However in contrast, the cockpit project allocates a floating cockpit-ws user.

This means that each container image build (without additional work) may (due to RPM installation ordering or other reasons) result in the uid changing.

This can be a problem if that user maintains persistent state. Such cases are best handled by being converted to use sysusers.d (see Fedora change) - or again even better, using DynamicUser=yes (see above).

Secrets (e.g. container pull secrets)

To have bootc fetch updates from registry which requires authentication, you must include a pull secret in /etc/ostree/auth.json.

Another common case is to also fetch container images via podman or equivalent. There is a pull request to add /etc/containers/auth.json which would be shared by the two stacks by default.

Regardless, injecting this data is a good example of a generic "secret". The bootc project does not currently include one single opinionated mechanism for secrets.

Embedding in container build

This was mentioned above; you can include secrets in the container image if the registry server is suitably protected.

In some cases, embedding only "bootstrap" secrets into the container image is a viable pattern, especially alongside a mechanism for having a machine authenticate to a cluster. In this pattern, a provisioning tool (whether run as part of the host system or a container image) uses the bootstrap secret to lay down and keep updated other secrets (for example, SSH keys, certificates).

Via cloud metadata

Most production IaaS systems support a "metadata server" or equivalent which can securely host secrets - particularly "bootstrap secrets". Your container image can include tooling such as cloud-init or ignition which fetches these secrets.

Embedded in disk images

Another pattern is to embed bootstrap secrets only in disk images. For example, when generating a cloud disk image (AMI, OpenStack glance image, etc.) from an input container image, the disk image can contain secrets that are effectively machine-local state. Rotating them would require an additional management tool, or refreshing disk images.

Injected via baremetal installers

It is common for installer tools to support injecting configuration which can commonly cover secrets like this.

Injecting secrets via systemd credentials

The systemd project has documentation for credentials which applies in some deployment methodologies.

Management services

When running a fleet of systems, it is common to use a central management service. Commonly, these services provide a client to be installed on each system which connects to the central service. Often, the management service requires the client to perform a one time registration.

The following example shows how to install the client into a bootc image and run it at startup to register the system. This example assumes the management-client handles future connections to the server, e.g. via a cron job or a separate systemd service. This example could be modified to create a persistent systemd service if that is required. The Containerfile is not optimized in order to more clarly explain each step, e.g. it's generally better to invoke RUN a single time to avoid creating multiple layers in the image.

FROM <bootc base image>

# Typically when using a management service, it will determine when to upgrade the system.
# So, disable bootc-fetch-apply-updates.timer if it is included in the base image.
RUN systemctl disable bootc-fetch-apply-updates.timer

# Install the client from dnf, or some other method that applies for your client
RUN dnf install management-client -y && dnf clean all

# Bake the credentials for the management service into the image
ARG activation_key=

# The existence of .run_next_boot acts as a flag to determine if the
# registration is required to run when booting
RUN touch /etc/management-client/.run_next_boot

COPY <<"EOT" /usr/lib/systemd/system/management-client.service
[Unit]
Description=Run management client at boot
After=network-online.target
ConditionPathExists=/etc/management-client/.run_client_next_boot

[Service]
Type=oneshot
EnvironmentFile=/etc/management-client/.credentials
ExecStart=/usr/bin/management-client register --activation-key ${CLIENT_ACTIVATION_KEY}
ExecStartPre=/bin/rm -f /etc/management-client/.run_next_boot
ExecStop=/bin/rm -f /etc/management-client/.credentials

[Install]
WantedBy=multi-user.target
EOT

# Link the service to run at startup
RUN ln -s /usr/lib/systemd/system/management-client.service /usr/lib/systemd/system/multi-user.target.wants/management-client.service

# Store the credentials in a file to be used by the systemd service
RUN echo -e "CLIENT_ACTIVATION_KEY=${activation_key}" > /etc/management-client/.credentials

# Set the flag to enable the service to run one time
# The systemd service will remove this file after the registration completes the first time
RUN touch /etc/management-client/.run_next_boot

Managing upgrades

Right now, bootc is a quite simple tool that is designed to do just a few things well. One of those is transactionally fetching new operating system updates from a registry and booting into them, while supporting rollback.

The bootc upgrade verb

This will query the registry and queue an updated container image for the next boot.

This is backed today by ostree, implementing an A/B style upgrade system. Changes to the base image are staged, and the running system is not changed by default.

Use bootc upgrade --apply to auto-apply if there are queued changes.

There is also an opinionated bootc-fetch-apply-updates.timer and corresponding service available in upstream for operating systems and distributions to enable.

Man page: bootc-upgrade.

Changing the container image source

Another useful pattern to implement can be to use a management agent to invoke bootc switch (or declaratively via bootc edit) to implement e.g. blue/green deployments, where some hosts are rolled onto a new image independently of others.

bootc switch quay.io/examplecorp/os-prod-blue:latest

bootc switch has the same effect as bootc upgrade; there is no semantic difference between the two other than changing the container image being tracked.

This will preserve existing state in /etc and /var - for example, host SSH keys and home directories.

Man page: bootc-switch.

Rollback

There is a bootc rollback verb, and associated declarative interface accessible to tools via bootc edit. This will swap the bootloader ordering to the previous boot entry.

Man page: bootc-rollback.

Mirrored/disconnected upgrades

It is common (a best practice even) to maintain systems which default to being disconnected from the public Internet.

Pulling updates from a local mirror

The bootc project reuses the same container libraries that are in use by podman; this means that configuring containers-registries.conf allows bootc upgrade to fetch from local mirror registries.

Performing offline updates via USB

In a usage scenario where the operating system update is in a fully disconnected environment and you want to perform updates via e.g. inserting a USB drive, one can do this by copying the desired OS container image to e.g. an oci directory:

skopeo copy docker://quay.io/exampleos/myos:latest oci:/path/to/filesystem/myos.oci

Then once the USB device containing the myos.oci OCI directory is mounted on the target, use

bootc switch --transport oci /var/mnt/usb/myos.oci

The above command is only necessary once, and thereafter will be idempotent. Then, use bootc upgrade --apply to fetch and apply the update from the USB device.

This process can all be automated by creating systemd units that look for a USB device with a specific label, mount (optionally with LUKS for example), and then trigger the bootc upgrade.

Booting local builds

In some scenarios, you may want to boot a locally built container image, in order to apply a persistent hotfix to a specific server, or as part of a development/testing scenario.

Building a new local image

At the current time, the bootc host container storage is distinct from that of the podman container runtime storage (default configuration in /var/lib/containers).

It not currently streamlined to export the booted host container storage into the podman storage.

Hence today, to replicate the exact container image the host has booted, take the container image referenced in bootc status and turn it into a podman pull invocation.

Next, craft a container build file with your desired changes:

FROM <image>
RUN apt|dnf upgrade https://example.com/systemd-hotfix.package

Copying an updated image into the bootc storage

This command is straightforward; we just need to tell bootc to fetch updates from containers-storage, which is the local "application" container runtime (podman) storage:

$ bootc switch --transport containers-storage quay.io/fedora/fedora-bootc:40

From there, the new image will be queued for the next boot and a reboot will apply it.

For more on valid transports, see containers-transports.

NAME

bootc - Deploy and transactionally in-place with bootable container images

SYNOPSIS

bootc [-h|--help] [-V|--version] <subcommands>

DESCRIPTION

Deploy and transactionally in-place with bootable container images.

The `bootc` project currently uses ostree-containers as a backend to support a model of bootable container images. Once installed, whether directly via `bootc install` (executed as part of a container) or via another mechanism such as an OS installer tool, further updates can be pulled and `bootc upgrade`.

OPTIONS

-h, --help

: Print help (see a summary with -h)

-V, --version

: Print version

SUBCOMMANDS

bootc-upgrade(8)

: Download and queue an updated container image to apply

bootc-switch(8)

: Target a new container image reference to boot

bootc-rollback(8)

: Change the bootloader entry ordering; the deployment under `rollback` will be queued for the next boot, and the current will become rollback. If there is a `staged` entry (an unapplied, queued upgrade) then it will be discarded

bootc-edit(8)

: Apply full changes to the host specification

bootc-status(8)

: Display status

bootc-usr-overlay(8)

: Adds a transient writable overlayfs on `/usr` that will be discarded on reboot

bootc-install(8)

: Install the running container to a target

bootc-help(8)

: Print this message or the help of the given subcommand(s)

VERSION

v0.1.11

NAME

bootc-status - Display status

SYNOPSIS

bootc status [--json] [--booted] [-h|--help]

DESCRIPTION

Display status

This will output a YAML-formatted object using a schema intended to match a Kubernetes resource that describes the state of the booted system.

The exact API format is not currently declared stable.

OPTIONS

--json

: Output in JSON format

--booted

: Only display status for the booted deployment

-h, --help

: Print help (see a summary with -h)

VERSION

v0.1.11

NAME

bootc-upgrade - Download and queue an updated container image to apply

SYNOPSIS

bootc upgrade [--quiet] [--check] [--apply] [-h|--help]

DESCRIPTION

Download and queue an updated container image to apply.

This does not affect the running system; updates operate in an "A/B" style by default.

A queued update is visible as `staged` in `bootc status`.

Currently by default, the update will be applied at shutdown time via `ostree-finalize-staged.service`. There is also an explicit `bootc upgrade --apply` verb which will automatically take action (rebooting) if the system has changed.

However, in the future this is likely to change such that reboots outside of a `bootc upgrade --apply` do *not* automatically apply the update in addition.

OPTIONS

--quiet

: Dont display progress

--check

: Check if an update is available without applying it.

This only downloads an updated manifest and image configuration (i.e. typically kilobyte-sized metadata) as opposed to the image layers.

--apply

: Restart or reboot into the new target image.

Currently, this option always reboots. In the future this command will detect the case where no kernel changes are queued, and perform a userspace-only restart.

-h, --help

: Print help (see a summary with -h)

VERSION

v0.1.11

NAME

bootc-switch - Target a new container image reference to boot

SYNOPSIS

bootc switch [--quiet] [--transport] [--enforce-container-sigpolicy] [--ostree-remote] [--retain] [-h|--help] <TARGET>

DESCRIPTION

Target a new container image reference to boot.

This is almost exactly the same operation as `upgrade`, but additionally changes the container image reference instead.

## Usage

A common pattern is to have a management agent control operating system updates via container image tags; for example, `quay.io/exampleos/someuser:v1.0` and `quay.io/exampleos/someuser:v1.1` where some machines are tracking `:v1.0`, and as a rollout progresses, machines can be switched to `v:1.1`.

OPTIONS

--quiet

: Dont display progress

--transport=TRANSPORT [default: registry]

: The transport; e.g. oci, oci-archive. Defaults to `registry`

--enforce-container-sigpolicy

: This is the inverse of the previous `--target-no-signature-verification` (which is now a no-op).

Enabling this option enforces that `/etc/containers/policy.json` includes a default policy which requires signatures.

--ostree-remote=OSTREE_REMOTE

: Enable verification via an ostree remote

--retain

: Retain reference to currently booted image

-h, --help

: Print help (see a summary with -h)

<TARGET>

: Target image to use for the next boot

VERSION

v0.1.11

NAME

bootc-rollback - Change the bootloader entry ordering; the deployment under `rollback` will be queued for the next boot, and the current will become rollback. If there is a `staged` entry (an unapplied, queued upgrade) then it will be discarded

SYNOPSIS

bootc rollback [-h|--help]

DESCRIPTION

Change the bootloader entry ordering; the deployment under `rollback` will be queued for the next boot, and the current will become rollback. If there is a `staged` entry (an unapplied, queued upgrade) then it will be discarded.

Note that absent any additional control logic, if there is an active agent doing automated upgrades (such as the default `bootc-fetch-apply-updates.timer` and associated `.service`) the change here may be reverted. Its recommended to only use this in concert with an agent that is in active control.

A systemd journal message will be logged with `MESSAGE_ID=26f3b1eb24464d12aa5e7b544a6b5468` in order to detect a rollback invocation.

OPTIONS

-h, --help

: Print help (see a summary with -h)

VERSION

v0.1.11

NAME

bootc-usr-overlay - Adds a transient writable overlayfs on `/usr` that will be discarded on reboot

SYNOPSIS

bootc usr-overlay [-h|--help]

DESCRIPTION

Adds a transient writable overlayfs on `/usr` that will be discarded on reboot.

## Use cases

A common pattern is wanting to use tracing/debugging tools, such as `strace` that may not be in the base image. A system package manager such as `apt` or `dnf` can apply changes into this transient overlay that will be discarded on reboot.

## /etc and /var

However, this command has no effect on `/etc` and `/var` - changes written there will persist. It is common for package installations to modify these directories.

## Unmounting

Almost always, a system process will hold a reference to the open mount point. You can however invoke `umount -l /usr` to perform a "lazy unmount".

OPTIONS

-h, --help

: Print help (see a summary with -h)

VERSION

v0.1.11

man bootc-fetch-apply-updates.service

This systemd service and associated .timer unit simply invoke bootc upgrade --apply. It is a minimal demonstration of an "upgrade agent".

More information: bootc-upgrade.

The systemd unit is not enabled by default upstream, but it may be enabled in some operating systems.

Installing "bootc compatible" images

A key goal of the bootc project is to think of bootable operating systems as container images. Docker/OCI container images are just tarballs wrapped with some JSON. But in order to boot a system (whether on bare metal or virtualized), one needs a few key components:

  • bootloader
  • kernel (and optionally initramfs)
  • root filesystem (xfs/ext4/btrfs etc.)

The bootloader state is managed by the external bootupd project which abstracts over bootloader installs and upgrades. The invocation of bootc install will always run bootupd to handle bootloader installation to the target disk. The default expectation is that bootloader contents and install logic come from the container image in a bootc based system.

The Linux kernel (and optionally initramfs) is embedded in the container image; the canonical location is /usr/lib/modules/$kver/vmlinuz, and the initramfs should be in initramfs.img in that directory.

The bootc install command bridges the two worlds of a standard, runnable OCI image and a bootable system by running tooling logic embedded in the container image to create the filesystem and bootloader setup dynamically. This requires running the container via --privileged; it uses the running Linux kernel on the host to write the file content from the running container image; not the kernel inside the container.

There are two sub-commands: bootc install to-disk and boot install to-filesystem.

However, nothing else (external) is required to perform a basic installation to disk - the container image itself comes with a baseline self-sufficient installer that sets things up ready to boot.

Internal vs external installers

The bootc install to-disk process only sets up a very simple filesystem layout, using the default filesystem type defined in the container image, plus hardcoded requisite platform-specific partitions such as the ESP.

In general, the to-disk flow should be considered mainly a "demo" for the bootc install to-filesystem flow, which can be used by "external" installers today. For example, in the Fedora/CentOS bootc project project, there are two "external" installers in Anaconda and bootc-image-builder.

More on this below.

Executing bootc install

The two installation commands allow you to install the container image either directly to a block device (bootc install to-disk) or to an existing filesystem (bootc install to-filesystem).

The installation commands MUST be run from the container image that will be installed, using --privileged and a few other options. This means you are (currently) not able to install bootc to an existing system and install your container image. Failure to run bootc from a container image will result in an error.

Here's an example of using bootc install (root/elevated permission required):

podman run --rm --privileged --pid=host -v /var/lib/containers:/var/lib/containers -v /dev:/dev --security-opt label=type:unconfined_t <image> bootc install to-disk /path/to/disk

Note that while --privileged is used, this command will not perform any destructive action on the host system. Among other things, --privileged makes sure that all host devices are mounted into container. /path/to/disk is the host's block device where <image> will be installed on.

The --pid=host --security-opt label=type:unconfined_t today make it more convenient for bootc to perform some privileged operations; in the future these requirement may be dropped.

The -v /var/lib/containers:/var/lib/containers option is required in order for the container to access its own underlying image, which is used by the installation process.

Jump to the section for install to-filesystem later in this document for additional information about that method.

"day 2" updates, security and fetch configuration

By default the bootc install path will find the pull specification used for the podman run invocation and use it to set up "day 2" OS updates that bootc update will use.

For example, if you invoke podman run --privileged ... quay.io/examplecorp/exampleos:latest bootc install ... then the installed operating system will fetch updates from quay.io/examplecorp/exampleos:latest. This can be overridden via --target_imgref; this is handy in cases like performing installation in a manufacturing environment from a mirrored registry.

By default, the installation process will verify that the container (representing the target OS) can fetch its own updates.

Additionally note that to perform an install with a target image reference set to an authenticated registry, you must provide a pull secret. One path is to embed the pull secret into the image in /etc/ostree/auth.json. Alternatively, the secret can be added after an installation process completes and managed separately; in that case you will need to specify --skip-fetch-check.

Configuring the default root filesystem type

To use the to-disk installation flow, the container should include a root filesystem type. If it does not, then each user will need to specify install to-disk --filesystem.

To set a default filesystem type for bootc install to-disk as part of your OS/distribution base image, create a file named /usr/lib/bootc/install/00-<osname>.toml with the contents of the form:

[install.filesystem.root]
type = "xfs"

Configuration files found in this directory will be merged, with higher alphanumeric values taking precedence. If for example you are building a derived container image from the above OS, you could create a 50-myos.toml that sets type = "btrfs" which will override the prior setting.

For other available options, see bootc-install-config.

Installing an "unconfigured" image

The bootc project aims to support generic/general-purpose operating systems and distributions that will ship unconfigured images. An unconfigured image does not have a default password or SSH key, etc.

For more information, see Image building and configuration guidance.

More advanced installation with to-filesystem

The basic bootc install to-disk logic is really a pretty small (but opinionated) wrapper for a set of lower level tools that can also be invoked independently.

The bootc install to-disk command is effectively:

  • mkfs.$fs /dev/disk
  • mount /dev/disk /mnt
  • bootc install to-filesystem --karg=root=UUID=<uuid of /mnt> --imgref $self /mnt

There may be a bit more involved here; for example configuring --block-setup tpm2-luks will configure the root filesystem with LUKS bound to the TPM2 chip, currently via systemd-cryptenroll.

Some OS/distributions may not want to enable it at all; it can be configured off at build time via Cargo features.

Using bootc install to-filesystem

The usual expected way for an external storage system to work is to provide root=<UUID> type kernel arguments. At the current time a separate /boot filesystem is also required (mainly to enable LUKS) so you will also need to provide e.g. --boot-mount-spec UUID=....

The bootc install to-filesystem command allows an operating system or distribution to ship a separate installer that creates more complex block storage or filesystem setups, but reuses the "top half" of the logic. For example, a goal is to change Anaconda to use this.

Using bootc install to-disk --via-loopback

Because every bootc system comes with an opinionated default installation process, you can create a raw disk image (that can e.g. be booted via virtualization) via e.g.:

truncate -s 10G myimage.raw
podman run --rm --privileged --pid=host --security-opt label=type:unconfined_t  -v /var/lib/containers:/var/lib/containers -v .:/output <yourimage> bootc install to-disk --generic-image --via-loopback /output/myimage.raw

Notice that we use --generic-image for this use case.

Set the environment variable BOOTC_DIRECT_IO=on to create the loopback device with direct-io enabled.

Using bootc install to-existing-root

This is a variant of install to-filesystem, which maximizes convenience for using an existing Linux system, converting it into the target container image. Note that the /boot (and /boot/efi) partitions will be reinitialized - so this is a somewhat destructive operation for the existing Linux installation.

Also, because the filesystem is reused, it's required that the target system kernel support the root storage setup already initialized.

The core command should look like this (root/elevated permission required):

podman run --rm --privileged -v /dev:/dev -v /var/lib/containers:/var/lib/containers -v /:/target \
             --pid=host --security-opt label=type:unconfined_t \
             <image> \
             bootc install to-existing-root

It is assumed in this command that the target rootfs is pased via -v /:/target at this time.

As noted above, the data in /boot will be wiped, but everything else in the existing operating / is NOT automatically cleaned up. This can be useful, because it allows the new image to automatically import data from the previous host system! For example, container images, database, user home directory data, config files in /etc are all available after the subsequent reboot in /sysroot (which is the "physical root").

A special case of this trick is using the --root-ssh-authorized-keys flag to inherit root's SSH keys (which may have been injected from e.g. cloud instance userdata via a tool like cloud-init). To do this, just add --root-ssh-authorized-keys /target/root/.ssh/authorized_keys to the above.

Using bootc install to-filesystem --source-imgref <imgref>

By default, bootc install has to be run inside a podman container. With this assumption, it can escape the container, find the source container image (including its layers) in the podman's container storage and use it to create the image.

When --source-imgref <imgref> is given, bootc no longer assumes that it runs inside podman. Instead, the given container image reference (see containers-transports(5) for accepted formats) is used to fetch the image. Note that bootc install still has to be run inside a chroot created from the container image. However, this allows users to use a different sandboxing tool (e.g. bubblewrap).

This argument is mainly useful for 3rd-party tooling for building disk images from bootable containers (e.g. based on osbuild).

Configuring machine-local state

Per the filesystem section, /etc and /var are machine-local state by default. If you want to inject additional content after the installation process, at the current time this can be done by manually finding the target "deployment root" which will be underneath /ostree/deploy/<stateroot/deploy/.

Installation software such as Anaconda do this today to implement generic %post scripts and the like.

However, it is very likely that a generic bootc API to do this will be added.

NAME

bootc-install - Install the running container to a target

SYNOPSIS

bootc install [-h|--help] <subcommands>

DESCRIPTION

Install the running container to a target.

## Understanding installations

OCI containers are effectively layers of tarballs with JSON for metadata; they cannot be booted directly. The `bootc install` flow is a highly opinionated method to take the contents of the container image and install it to a target block device (or an existing filesystem) in such a way that it can be booted.

For example, a Linux partition table and filesystem is used, and the bootloader and kernel embedded in the container image are also prepared.

A bootc installed container currently uses OSTree as a backend, and this sets it up such that a subsequent `bootc upgrade` can perform in-place updates.

An installation is not simply a copy of the container filesystem, but includes other setup and metadata.

OPTIONS

-h, --help

: Print help (see a summary with -h)

SUBCOMMANDS

bootc-install-to-disk(8)

: Install to the target block device

bootc-install-to-filesystem(8)

: Install to the target filesystem

bootc-install-to-existing-root(8)

: Perform an installation to the host root filesystem

bootc-install-print-configuration(8)

: Output JSON to stdout that contains the merged installation configuration as it may be relevant to calling processes using `install to-filesystem` that want to honor e.g. `root-fs-type`

bootc-install-help(8)

: Print this message or the help of the given subcommand(s)

VERSION

v0.1.11

% bootc-install-config(5)

NAME

bootc-install-config.toml

DESCRIPTION

The bootc install process supports some basic customization. This configuration file is in TOML format, and will be discovered by the installation process in via "drop-in" files in /usr/lib/bootc/install that are processed in alphanumerical order.

The individual files are merged into a single final installation config, so it is supported for e.g. a container base image to provide a default root filesystem type, that can be overridden in a derived container image.

install

This is the only defined toplevel table.

The install section supports two subfields:

  • block: An array of supported to-disk backends enabled by this base container image; if not specified, this will just be direct. The only other supported value is tpm2-luks. The first value specified will be the default. To enable both, use block = ["direct", "tpm2-luks"].
  • filesystem: See below.
  • kargs: An array of strings; this will be appended to the set of kernel arguments.

filesystem

There is one valid field:

  • root: An instance of "filesystem-root"; see below

filesystem-root

There is one valid field:

type: This can be any basic Linux filesystem with a mkfs.$fstype. For example, ext4, xfs, etc.

Examples

[install.filesystem.root]
type = "xfs"
[install]
kargs = ["nosmt", "console=tty0"]

SEE ALSO

bootc(1)

NAME

bootc-install-to-disk - Install to the target block device

SYNOPSIS

bootc install to-disk [--wipe] [--block-setup] [--filesystem] [--root-size] [--source-imgref] [--target-transport] [--target-imgref] [--enforce-container-sigpolicy] [--target-ostree-remote] [--skip-fetch-check] [--disable-selinux] [--karg] [--root-ssh-authorized-keys] [--generic-image] [--via-loopback] [-h|--help] <DEVICE>

DESCRIPTION

Install to the target block device

OPTIONS

--wipe

: Automatically wipe all existing data on device

--block-setup=BLOCK_SETUP

: Target root block device setup.

direct: Filesystem written directly to block device tpm2-luks: Bind unlock of filesystem to presence of the default tpm2 device.\


[*possible values: *direct, tpm2-luks]

--filesystem=FILESYSTEM

: Target root filesystem type\


[*possible values: *xfs, ext4, btrfs]

--root-size=ROOT_SIZE

: Size of the root partition (default specifier: M). Allowed specifiers: M (mebibytes), G (gibibytes), T (tebibytes).

By default, all remaining space on the disk will be used.

--source-imgref=SOURCE_IMGREF

: Install the system from an explicitly given source.

By default, bootc install and install-to-filesystem assumes that it runs in a podman container, and it takes the container image to install from the podmans container registry. If --source-imgref is given, bootc uses it as the installation source, instead of the behaviour explained in the previous paragraph. See skopeo(1) for accepted formats.

--target-transport=TARGET_TRANSPORT [default: registry]

: The transport; e.g. oci, oci-archive. Defaults to `registry`

--target-imgref=TARGET_IMGREF

: Specify the image to fetch for subsequent updates

--enforce-container-sigpolicy

: This is the inverse of the previous `--target-no-signature-verification` (which is now a no-op). Enabling this option enforces that `/etc/containers/policy.json` includes a default policy which requires signatures

--target-ostree-remote=TARGET_OSTREE_REMOTE

: Enable verification via an ostree remote

--skip-fetch-check

: By default, the accessiblity of the target image will be verified (just the manifest will be fetched). Specifying this option suppresses the check; use this when you know the issues it might find are addressed.

A common reason this may fail is when one is using an image which requires registry authentication, but not embedding the pull secret in the image so that updates can be fetched by the installed OS "day 2".

--disable-selinux

: Disable SELinux in the target (installed) system.

This is currently necessary to install *from* a system with SELinux disabled but where the target does have SELinux enabled.

--karg=KARG

: Add a kernel argument. This option can be provided multiple times.

Example: --karg=nosmt --karg=console=ttyS0,114800n8

--root-ssh-authorized-keys=ROOT_SSH_AUTHORIZED_KEYS

: The path to an `authorized_keys` that will be injected into the `root` account.

The implementation of this uses systemd `tmpfiles.d`, writing to a file named `/etc/tmpfiles.d/bootc-root-ssh.conf`. This will have the effect that by default, the SSH credentials will be set if not present. The intention behind this is to allow mounting the whole `/root` home directory as a `tmpfs`, while still getting the SSH key replaced on boot.

--generic-image

: Perform configuration changes suitable for a "generic" disk image. At the moment:

- All bootloader types will be installed - Changes to the system firmware will be skipped

--via-loopback

: Instead of targeting a block device, write to a file via loopback

-h, --help

: Print help (see a summary with -h)

<DEVICE>

: Target block device for installation. The entire device will be wiped

VERSION

v0.1.11

NAME

bootc-install-to-filesystem - Install to the target filesystem

SYNOPSIS

bootc install to-filesystem [--root-mount-spec] [--boot-mount-spec] [--replace] [--acknowledge-destructive] [--skip-finalize] [--source-imgref] [--target-transport] [--target-imgref] [--enforce-container-sigpolicy] [--target-ostree-remote] [--skip-fetch-check] [--disable-selinux] [--karg] [--root-ssh-authorized-keys] [--generic-image] [-h|--help] <ROOT_PATH>

DESCRIPTION

Install to the target filesystem

OPTIONS

--root-mount-spec=ROOT_MOUNT_SPEC

: Source device specification for the root filesystem. For example, UUID=2e9f4241-229b-4202-8429-62d2302382e1

--boot-mount-spec=BOOT_MOUNT_SPEC

: Mount specification for the /boot filesystem.

At the current time, a separate /boot is required. This restriction will be lifted in future versions. If not specified, the filesystem UUID will be used.

--replace=REPLACE

: Initialize the system in-place; at the moment, only one mode for this is implemented. In the future, it may also be supported to set up an explicit "dual boot" system\


Possible values:

  • wipe: Completely wipe the contents of the target filesystem. This cannot be done if the target filesystem is the one the system is booted from

  • alongside: This is a destructive operation in the sense that the bootloader state will have its contents wiped and replaced. However, the running system (and all files) will remain in place until reboot

--acknowledge-destructive

: If the target is the running systems root filesystem, this will skip any warnings

--skip-finalize

: The default mode is to "finalize" the target filesystem by invoking `fstrim` and similar operations, and finally mounting it readonly. This option skips those operations. It is then the responsibility of the invoking code to perform those operations

--source-imgref=SOURCE_IMGREF

: Install the system from an explicitly given source.

By default, bootc install and install-to-filesystem assumes that it runs in a podman container, and it takes the container image to install from the podmans container registry. If --source-imgref is given, bootc uses it as the installation source, instead of the behaviour explained in the previous paragraph. See skopeo(1) for accepted formats.

--target-transport=TARGET_TRANSPORT [default: registry]

: The transport; e.g. oci, oci-archive. Defaults to `registry`

--target-imgref=TARGET_IMGREF

: Specify the image to fetch for subsequent updates

--enforce-container-sigpolicy

: This is the inverse of the previous `--target-no-signature-verification` (which is now a no-op). Enabling this option enforces that `/etc/containers/policy.json` includes a default policy which requires signatures

--target-ostree-remote=TARGET_OSTREE_REMOTE

: Enable verification via an ostree remote

--skip-fetch-check

: By default, the accessiblity of the target image will be verified (just the manifest will be fetched). Specifying this option suppresses the check; use this when you know the issues it might find are addressed.

A common reason this may fail is when one is using an image which requires registry authentication, but not embedding the pull secret in the image so that updates can be fetched by the installed OS "day 2".

--disable-selinux

: Disable SELinux in the target (installed) system.

This is currently necessary to install *from* a system with SELinux disabled but where the target does have SELinux enabled.

--karg=KARG

: Add a kernel argument. This option can be provided multiple times.

Example: --karg=nosmt --karg=console=ttyS0,114800n8

--root-ssh-authorized-keys=ROOT_SSH_AUTHORIZED_KEYS

: The path to an `authorized_keys` that will be injected into the `root` account.

The implementation of this uses systemd `tmpfiles.d`, writing to a file named `/etc/tmpfiles.d/bootc-root-ssh.conf`. This will have the effect that by default, the SSH credentials will be set if not present. The intention behind this is to allow mounting the whole `/root` home directory as a `tmpfs`, while still getting the SSH key replaced on boot.

--generic-image

: Perform configuration changes suitable for a "generic" disk image. At the moment:

- All bootloader types will be installed - Changes to the system firmware will be skipped

-h, --help

: Print help (see a summary with -h)

<ROOT_PATH>

: Path to the mounted root filesystem.

By default, the filesystem UUID will be discovered and used for mounting. To override this, use `--root-mount-spec`.

VERSION

v0.1.11

NAME

bootc-install-to-existing-root - Perform an installation to the host root filesystem

SYNOPSIS

bootc install to-existing-root [--replace] [--source-imgref] [--target-transport] [--target-imgref] [--enforce-container-sigpolicy] [--target-ostree-remote] [--skip-fetch-check] [--disable-selinux] [--karg] [--root-ssh-authorized-keys] [--generic-image] [--acknowledge-destructive] [-h|--help] [ROOT_PATH]

DESCRIPTION

Perform an installation to the host root filesystem

OPTIONS

--replace=REPLACE [default: alongside]

: Configure how existing data is treated\


Possible values:

  • wipe: Completely wipe the contents of the target filesystem. This cannot be done if the target filesystem is the one the system is booted from

  • alongside: This is a destructive operation in the sense that the bootloader state will have its contents wiped and replaced. However, the running system (and all files) will remain in place until reboot

--source-imgref=SOURCE_IMGREF

: Install the system from an explicitly given source.

By default, bootc install and install-to-filesystem assumes that it runs in a podman container, and it takes the container image to install from the podmans container registry. If --source-imgref is given, bootc uses it as the installation source, instead of the behaviour explained in the previous paragraph. See skopeo(1) for accepted formats.

--target-transport=TARGET_TRANSPORT [default: registry]

: The transport; e.g. oci, oci-archive. Defaults to `registry`

--target-imgref=TARGET_IMGREF

: Specify the image to fetch for subsequent updates

--enforce-container-sigpolicy

: This is the inverse of the previous `--target-no-signature-verification` (which is now a no-op). Enabling this option enforces that `/etc/containers/policy.json` includes a default policy which requires signatures

--target-ostree-remote=TARGET_OSTREE_REMOTE

: Enable verification via an ostree remote

--skip-fetch-check

: By default, the accessiblity of the target image will be verified (just the manifest will be fetched). Specifying this option suppresses the check; use this when you know the issues it might find are addressed.

A common reason this may fail is when one is using an image which requires registry authentication, but not embedding the pull secret in the image so that updates can be fetched by the installed OS "day 2".

--disable-selinux

: Disable SELinux in the target (installed) system.

This is currently necessary to install *from* a system with SELinux disabled but where the target does have SELinux enabled.

--karg=KARG

: Add a kernel argument. This option can be provided multiple times.

Example: --karg=nosmt --karg=console=ttyS0,114800n8

--root-ssh-authorized-keys=ROOT_SSH_AUTHORIZED_KEYS

: The path to an `authorized_keys` that will be injected into the `root` account.

The implementation of this uses systemd `tmpfiles.d`, writing to a file named `/etc/tmpfiles.d/bootc-root-ssh.conf`. This will have the effect that by default, the SSH credentials will be set if not present. The intention behind this is to allow mounting the whole `/root` home directory as a `tmpfs`, while still getting the SSH key replaced on boot.

--generic-image

: Perform configuration changes suitable for a "generic" disk image. At the moment:

- All bootloader types will be installed - Changes to the system firmware will be skipped

--acknowledge-destructive

: Accept that this is a destructive action and skip a warning timer

-h, --help

: Print help (see a summary with -h)

[ROOT_PATH] [default: /target]

: Path to the mounted root; its expected to invoke podman with `-v /:/target`, then supplying this argument is unnecessary

VERSION

v0.1.11

"bootc compatible" images

At the current time, it does not work to just do:

FROM fedora
RUN dnf -y install kernel

or

FROM debian
RUN apt install linux

And get an image compatible with bootc. Supporting any base image is an eventual goal, however there are a few reasons why this doesn't yet work. The biggest reason is SELinux labeling support; the underlying ostree stack currently handles this and requires that the "base image" have a pre-computed set of labels that can be used for any derived layers.

Building bootc compatible base images

As a corollary to base-image limitations, the build process for generating base images currently requires running through ostree tooling to generate an "ostree commit" which has some special formatting in the base image.

The two most common ways to do this are to either:

  1. compose a compatible OCI image directly via rpm-ostree compose image
  2. encapsulate an ostree commit using rpm-ostree compose container-encapsulate

The first method is most direct, as it streamlines the process of creating a base image and writing to a registry. The second method may be preferable if you already have a build process that produces ostree commits as an output (e.g. using osbuild to produce ostree commit artifacts.)

The requirement for both methods is that your initial treefile/manifest MUST include the bootc package in list of packages included in your compose.

However, the ostree usage is an implementation detail and the requirement on this will be lifted in the future.

Standard metadata for bootc compatible images

It is strongly recommended to do:

LABEL containers.bootc 1

This will signal that this image is intended to be usable with bootc.

Deriving from existing base images

It's important to emphasize that from one of these specially-formatted base images, every tool and technique for container building applies! In other words it will Just Work to do

FROM <bootc base image>
RUN dnf -y install foo && dnf clean all

You can then use podman build, buildah, docker build, or any other container build tool to produce your customized image. The only requirement is that the container build tool supports producing OCI container images.

The ostree container commit command

You may find some references to this; it is no longer very useful and is not recommended.

The bootloader setup

At the current time bootc relies on the bootupd project which handles bootloader installs and upgrades. The invocation of bootc install will always run bootupd to perform installations. Additionally, bootc upgrade will currently not upgrade the bootloader; you must invoke bootupctl update.

SELinux

Container runtimes such as podman and docker commonly apply a "coarse" SELinux policy to running containers. See container-selinux. It is very important to understand that non-bootc base images do not (usually) have any embedded security.selinux metadata at all; all labels on the toplevel container image are dynamically generated per container invocation, and there are no individually distinct e.g. etc_t and usr_t types.

In contrast, with the current OSTree backend for bootc, when the base image is built, label metadata is included in special metadata files in /sysroot/ostree that correspond to components of the base image.

When a bootc container is deployed, the system will use these default SELinux labels. Further non-OSTree layers will be dynamically labeled using the base policy.

Hence, at the current time it will not work to override the labels for files in derived layers by using e.g.

RUN semanage fcontext -a -t httpd_sys_content_t "/web(/.*)?"

(This command will write to /etc/selinux/policy/$policy/)

It will never work to do e.g.:

RUN chcon -t foo_t /usr/bin/foo

Because the container runtime state will deny the attempt to "physically" set the security.selinux extended attribute. In contrast per above, future support for custom labeling will by default be done by customizing the policy file_contexts.

Toplevel directories

In particular, a common problem is that inside a container image, it's easy to create arbitrary toplevel directories such as e.g. /app or /aimodel etc. But in some SELinux policies such as Fedora derivatives, these will be labeled as default_t which few domains can access.

References:

Filesystem

As noted in other chapters, the bootc project inherits a lot of code from the ostree project.

However, bootc is intending to be a "fresh, new container-native interface".

First, it is strongly recommended that bootc consumers use the ostree composefs backend; to do this, ensure that you have a /usr/lib/ostree/prepare-root.conf that contains at least

[composefs]
enabled = true

This will ensure that the entire / is a read-only filesystem.

Understanding container build/runtime vs deployment

When run as a container (e.g. as part of a container build), the filesystem is fully mutable in order to allow derivation to work. For more on container builds, see build guidance.

The rest of this document describes the state of the system when "deployed" to a physical or virtual machine, and managed by bootc.

Understanding physical vs logical root with /sysroot

When the system is fully booted, it is into the equivalent of a chroot. The "physical" host root filesystem will be mounted at /sysroot. For more on this, see filesystem: sysroot.

This chroot filesystem is called a "deployment root". All the remaining filesystem paths below are part of a deployment root which is used as a final target for the system boot. The target deployment is determined via the ostree= kernel commandline argument.

/usr

The overall recommendation is to keep all operating system content in /usr, with directories such as /bin being symbolic links to /usr/bin, etc. See UsrMove for example.

However, with composefs enabled /usr is not different from /; they are part of the same immutable image. So there is not a fundamental need to do a full "UsrMove" with a bootc system.

/usr/local

The OSTree upstream recommendation suggests making /usr/local a symbolic link to /var/usrlocal. But because the emphasis of a bootc-oriented system is on users deriving custom container images as the default entrypoint, it is recommended here that base images configure /usr/local be a regular directory (i.e. the default).

Projects that want to produce "final" images that are themselves not intended to be derived from in general can enable that symbolic link in derived builds.

/etc

The /etc directory contains mutable persistent state by default; however, it is suppported to enable the etc.transient config option.

When in persistent mode, it inherits the OSTree semantics of performing a 3-way merge across upgrades. In a nutshell:

  • The new default /etc is used as a base
  • The diff between current and previous /etc is applied to the new /etc
  • Locally modified files in /etc different from the default /usr/etc (of the same deployment) will be retained

The implmentation of this defaults to being executed by ostree-finalize-staged.service at shutdown time, before the new bootloader entry is created.

The rationale for this design is that in practice today, many components of a Linux system end up shipping default configuration files in /etc. And even if the default package doesn't, often the software only looks for config files there by default.

Some other image-based update systems do not have distinct "versions" of /etc and it may be populated only set up at a install time, and untouched thereafter. But that creates "hysteresis" where the state of the system's /etc is strongly influenced by the initial image version. This can lead to problems where e.g. a change to /etc/sudoers.conf (to give on simple example) would require external intervention to apply.

For more on configuration file best practices, see Building.

/var

Content in /var persists by default; it is however supported to make it or subdirectories mount points (whether network or tmpfs). There is exactly one /var. If it is not a distinct partition, then "physically" currently it is a bind mount into /ostree/deploy/$stateroot/var and shared across "deployments" (bootloader entries).

As of OSTree v2024.3, by default content in /var acts like a Docker VOLUME /var.

This means that the content from the container image is copied at initial installation time, and not updated thereafter.

Note this is very different from the handling of /etc. The rationale for this is that /etc is relatively small configuration files, and the expected configuration files are often bound to the operating system binaries in /usr.

But /var has arbitrarily large data (system logs, databases, etc.). It would also not be expected to be rolled back if the operating system state is rolled back. A simple exmaple is that an apt|dnf downgrade postgresql should not affect the physical database in general in /var/lib/postgres. Similarly, a bootc update or rollback should not affect this application data.

Having /var separate also makes it work cleanly to "stage" new operating system updates before applying them (they're downloaded and ready, but only take effect on reboot).

In general, this is the same rationale for Docker VOLUME: decouple the application code from its data.

A common case is for applications to want some directory structure (e.g. /var/lib/postgresql) to be pre-created. It's recommended to use systemd tmpfiles.d for this. An even better approach where applicable is StateDirectory= in units.

Other directories

It is not supported to ship content in /run or /proc or other API Filesystems in container images.

Besides those, for other toplevel directories such as /usr /opt, they will be lifecycled with the container image.

/opt

In the default suggested model of using composefs (per above) the /opt directory will be read-only, alongside other toplevels such as /usr.

Some software (especially "3rd party" deb/rpm packages) expect to be able to write to a subdirectory of /opt such as /opt/examplepkg.

See building images for recommendations on how to build container images and adjust the filesystem for cases like this.

Enabling transient root

However, some use cases may find it easier to enable a fully transient writable rootfs by default. To do this, set the

[root]
transient = true

option in prepare-root.conf. In particular this will allow software to write (transiently) to /opt, with symlinks to /var for content that should persist.

Filesystem: Physical /sysroot

The bootc project uses ostree as a backend, and maps fetched container images to a deployment.

stateroot

The underlying ostree CLI and API tooling expose a concept of stateroot, which is not yet exposed via bootc. The stateroot used by bootc install is just named default.

The stateroot concept allows having fully separate parallel operating system installations with fully separate /etc and /var, while still sharing an underlying root filesystem.

In the future, this functionality will be exposed and used by bootc.

/sysroot mount

When booted, the physical root will be available at /sysroot as a read-only mount point. This is a key aspect of how bootc upgrade operates: it fetches the updated container image and writes new files to /sysroot/ostree.

Beyond that and debugging/introspection, there are few use cases for tooling to operate on the physical root.

Expanding the root filesystem

One notable use case that does need to operate on /sysroot is expanding the root filesystem.

Some higher level tools such as e.g. cloud-init may (reasonably) expect the / mount point to be the physical root. Tools like this will need to be adjusted to instead detect this and operate on /sysroot.

Growing the block device

Fundamentally bootc is agnostic to the underlying block device setup. How to grow the root block device depends on the underlying storage stack, from basic partitions to LVM. However, a common tool is the growpart utility from cloud-init.

Growing the filesytem

The systemd project ships a systemd-growfs tool and corresponding systemd-growfs@ services. This is a relatively thin abstraction over detecting the target root filesystem type and running the underlying tool such as xfs_growfs.

At the current time, most Linux filesystems require the target to be mounted writable in order to grow. Hence, an invocation of system-growfs /sysroot or xfs_growfs /sysroot will need to be further wrapped in a temporary mount namespace.

Using a MountFlags=slave drop-in stanza for systemd-growfs@sysroot.service is recommended, along with an ExecStartPre=mount -o remount,rw /sysroot.

Detecting bootc/ostree systems

For tools like cloud-init that want to operate generically, conditionally detecting this situation can be done via e.g.:

  • Checking for / being an overlay mount point
  • Checking for /sysroot/ostree

Container storage

The bootc project uses ostree and specifically the ostree-rs-ext Rust library which handles storage of container images on top of an ostree-based system.

Architecture

flowchart TD
    bootc --- ostree-rs-ext --- ostree-rs --- ostree
    ostree-rs-ext --- containers-image-proxy-rs --- skopeo --- containers/image

There were two high level goals that drove the design of the current system architecture:

  • Support seamless in-place migrations from existing ostree systems
  • Avoid requiring deep changes to the podman stack

A simple way to explain the current architecture is that podman uses two Go libraries:

Whereas ostree uses a custom container storage, not containers/storage.

Mapping container images to ostree

OCI images are effectively just a standardized format of tarballs wrapped with JSON - specifically "layers" of tarballs.

The ostree-rs-ext project maps layers to OSTree commits. Each layer is stored separately, under an ostree "ref" (like a git branch) under the ostree/container/ namespace:

$ ostree refs ostree/container

Layers

The ostree/container/blob namespace tracks storage of a container layer identified by its blob ID (sha256 digest).

Images

At the current time, ostree always boots into a "flattened" filesystem tree. This is generated as both a hardlinked checkout as well as a composefs image.

The flattened tree is constructed and committed into the ostree/container/image namespace. The commit metadata also includes the OCI manifest and config objects.

This is implmented in the ostree-rs-ext/container module.

SELinux labeling

A major wrinkle is supporting SELinux labeling. The labeling configuration is defined as regular expressions included in /etc/selinux/$policy/contexts/.

The current implementation relies on the fact that SELinux labels for base images were pre-computed. The first step is to check out the "ostree base" layers for the base image.

All derived layers have labels computed from the base image policy. This causes a known bug where derived layers can't include custom policy: https://github.com/ostreedev/ostree-rs-ext/issues/510

Origin files

ostree has the concept of an origin file which defines the source of truth for upgrades. The container image reference for each deployment is included in its origin.

Booting

A core aspect of this entire design is that once a container image is fetched into the ostree storage, from there on it just appears as an "ostree commit", and so all code built on top can work with it.

For example, the ostree-prepare-root.service which runs in the initramfs is currently agnostic to whether the filesystem tree originated from an OCI image or some other mechanism; it just targets a prepared flattened filesystem tree.

This is what is referenced by the ostree= kernel commandline.

Relationship with other projects

bootc is the key component in a broader mission of bootable containers. Here's its relationship to other moving parts.

Relationship with podman

It gets a bit confusing to talk about shipping bootable operating systems in container images. Again, to be clear: we are reusing container images as:

  • A build mechanism (including running as a standard OCI container image)
  • A transport mechanism

But, actually when a bootc container is booted, podman (or docker, etc.) is not involved. The storage used for the operating system content is distinct from /var/lib/containers. podman image prune --all will not delete your operating system.

That said, a toplevel goal of bootc is alignment with the https://github.com/containers ecosystem, which includes podman. But more specifically at a technical level, today bootc uses skopeo and hence indirectly containers/image as a way to fetch container images.

This means that bootc automatically also honors many of the knobs available in /etc/containers - specifically things like containers-registries.conf.

In other words, if you configure podman to pull images from your local mirror registry, then bootc will automatically honor that as well.

The simple way to say it is: A goal of bootc is to be the bootable-container analogue for podman, which runs application containers. Everywhere one might run podman, one could also consider using bootc.

Relationship with Image Builder (osbuild)

There is a new bootc-image-builder project that is dedicated to the intersection of these two!

Relationship with Kubernetes

Just as podman does not depend on a Kubernetes API server, bootc will also not depend on one.

However, there are also plans for bootc to also understand Kubernetes API types. See configmap/secret support for example.

Perhaps in the future we may actually support some kind of Pod analogue for representing the host state. Or we may define a CRD which can be used inside and outside of Kubernetes.

Relationship with rpm-ostree

Today both bootc and rpm-ostree use the ostree project as a backing model. Hence, when using a container source, rpm-ostree upgrade and bootc upgrade are effectively equivalent; you can use either command.

However with rpm-ostree (or, perhaps re-framed as "dnf image"), it will continue to work to e.g. dnf install (i.e. rpm-ostree install) on the client side system. In addition there are other client-side mutation commands such as rpm-ostree initramfs --enable.

However, as soon as you mutate the system in this way, bootc upgrade will error out as it will not understand how to upgrade the system. The bootc project currently takes a relatively hard stance that system state should come from a container image.

The way kernel argument work also uses ostree on the backend in both cases, so using e.g. rpm-ostree kargs will also work on a system updating via bootc.

Overall, rpm-ostree is used in several important projects and will continue to be maintained for many years to come.

However, for use cases which want a "pure" image based model, using bootc will be more appealing. bootc also does not e.g. drag in dependencies on libdnf and the RPM stack.

bootc also has the benefit of starting as a pure Rust project; and while it doesn't have an IPC mechanism today, the surface of such an API will be significantly smaller.

Further, bootc does aim to include some of the functionality of zincati.

But all this said: It will be supported to use both bootc and rpm-ostree together; they are not exclusive. For example, bootc status at least will still function even if packages are layered.

Future bootc <-> podman binding

All the above said, it is likely that at some point bootc will switch to hard binding with podman. This will reduce the role of ostree, and hence break compatibility with rpm-ostree. When such work lands, we will still support at least a "one way" transition from an ostree backend. But once this happens there are no plans to teach rpm-ostree to use podman too.

Relationship with Fedora CoreOS (and Silverblue, etc.)

Per above, it is a toplevel goal to support a seamless, transactional update from existing OSTree based systems, which includes these Fedora derivatives.

For Fedora CoreOS specifically, see this tracker issue.

See also OstreeNativeContainerStable.

How does the use of OCI artifacts intersect with this effort?

The "bootc compatible" images are OCI container images; they do not rely on the OCI artifact specification or OCI referrers API.

It is foreseeable that users will need to produce "traditional" disk images (i.e. raw disk images, qcow2 disk images, Amazon AMIs, etc.) from the "bootc compatible" container images using additional tools. Therefore, it is reasonable that some users may want to encapsulate those disk images as an OCI artifact for storage and distribution. However, it is not a goal to use bootc to produce these "traditional" disk images nor to facilitate the encapsulation of those disk images as OCI artifacts.

Relationship with systemd "particles"

There is an excellent vision blog entry that puts together a coherent picture for how a systemd (and uapi-group.org) oriented Linux based operating system can be put together, and the rationale for doing so.

The "bootc vision" aligns with parts of this, but differs in emphasis and also some important technical details - and some of the emphasis and details have high level ramifications. Simply stated: related but different.

System emphasis

The "particle" proposal mentions that the desktop case is most interesting; the bootc belief is that servers are equally important and interesting. In practice, this is not a real point of differentiation, because the systemd project has done an excellent job in catering to all use cases (desktop, embedded, server) etc.

An important aspect related to this is that the bootc project exists and must interact with many ecosystems, from "systemd-oriented Linux" to Android and Kubernetes. Hence, we would not explicitly compare with just ChromeOS, but also with e.g. Kairos and many others.

Design goals

Many of the toplevel design goals do overall align. It is clear that e.g. Discoverable Disk Images and OCI images align on managing systems in an image-oriented fashion.

A difference on goal 11

Goal 11 states:

Things should not require explicit installation. i.e. every image should be a live image. For installation it should be sufficient to dd an OS image onto disk.

The bootc install approach is explicitly intending to support things such as e.g. static IP addresses provisioned via kernel arguments at install time; it is not a goal for installations to be equivalent to dd. The bootc creator has experience with systems that install this way, and it creates practical problems in nontrivial scenarios such as "Advanced Format" disk drives, etc.

New Goal: An explicit alignment with cloud-native

The bootc project has an explicit goal to to take formats, cues and inspiration from the container and cloud-native ecosystem. More on this in several sections below.

New Goal: Continued explicit support for "unlocked" systems

A strong emphasis of the particle approach is "sealed" systems that chain from Secure Boot. bootc aims to support the same. And in practice, nothing in "particles" strictly requires Secure Boot etc.

However, bootc has a stronger emphasis on continuing to support "unlocked" systems into the foreseeable future in which key (even root level) operating system changes can be that are outside of an explicit signed state and feel equally first class, not just "developer system extensions".

Or stated more simply, it will be explicitly supported to create bootc-based operating systems that boot as e.g. a cloud instance or as desktop machine that defaults to an unlocked state and provides good ergonomics in this scenario for managing user owned state across operating system upgrades too.

Hermetic /usr

One of the biggest differences starts with this. The idea of having the entire operating system self-contained in /usr is a good one. However, there is an immense amount of prior history and details that make this hard to support in many generalized cases.

This tracking issue is a good starting point - it's mostly about /etc (see below).

bootc design: Carve out sub mounts

Instead, the bootc model allows arbitrary directory roots starting from / to be included in the base operating system image.

This first notable difference is rooted in bootc taking a stronger cue from the opencontainers ecosystem (including docker/podman/Kubernetes). There are no restrictions on application container filesystem layout (everything is ephemeral by default, and persistence must be explicit); bootc aims to be closer to this.

There is still alignment: bootc design does strongly encourage operating system state to live underneath /usr - it should be the default place for all operating system executable binaries and default configuration. It should be read-only by default.

/etc

Today, the bootc project uses ostree as a backend, and a key semantic ostree provides for /etc is a "3 way merge".

This has several important differences. First, it means that /etc does get updated by default for unchanged configuration files.

The default proposal for "particle" OSes to deal with "legacy" config files in /etc is to copy them on first OS install (e.g. /usr/share/factory).

This creates serious problems for all the software (for example, OpenSSH) that put config files there; - having the default configuration updated (e.g. for a security issue) for a package manager but not an image based update is not viable.

However a key point of alignment between the two is that we still aim to have /etc exist and be useful! Writing files there, whether from vi or config management tooling must continue to work. Both bootc and systemd "particle" systems should still Feel Like Unix - in contrast to e.g. Android.

At the current time, this is implemeted in ostree; as bootc moves towards stronger integration with podman, it is likely that this logic will simply be moved into bootc instead on top of podman. Alternatively perhaps, podman itself may grow some support for specifying this merge semantic for containers.

Other persistent state: /var

Supporting arbitrary toplevel files in / on operating system updates conflicts with a desire to have e.g. /home be persistent by default.

Hence, bootc emphasizes having e.g. /home/var/home as a default symlink in base images.

Aside from /home and /etc, it is common on most Linux systems to have most persistent state under /var, so this is not a major point of difference otherwise.

Other toplevel files/directories

Even the operating systems have completed "UsrMerge" still have legacy compatibility symlinks required in /, e.g. /bin/usr/bin. We still need to support shipping these for many cases, and they are an important part of operating system state. Having them not be explicitly managed by OS updates is hence suboptimal.

Related to this, bootc will continue to support operating systems that have not completed UsrMerge.

Discoverable Disk images and booting

The bootc project will not use Discoverable Disk Images. Instead, we orient as strongly around opencontainers/image-spec i.e. OCI/Docker images.

This is the biggest technical difference that strongly influences many other aspects of operating system design and experience.

It is an explicit goal of the bootc project that it should feel as natural as possible for someone familiar with "application containers" from podman/Docker/Kubernetes to take their tools and knowledge and apply that to the base operating system too.

Technical heart: composefs

There is a very strong security rationale behind much of the design proposal of "particles" and DDIs. It is absolutely true today, quoting the blog:

That said, I think [OCI has] relatively weak properties, in particular when it comes to security, since immutability/measurements and similar are not provided. This means, unlike for system extensions and portable services a complete trust chain with attestation and per-app cryptographically protected data is much harder to implement sanely.

The composefs project aims to close this gap, and the bootc project will use it, and has an explicit goal to align with e.g. podman in using it too.

Effectively, everywhere one might use a DDI, bootc will usually support a container image. (However for some things like system configuration files, bootc may aim to instead support e.g. plain ConfigMap files which are signed for example).

System booting

The bootloader

The strong emphasis of the UAPI-group is on UEFI. However, the world is a bit broader than that; the bootc project also will explicitly continue to support:

  • GNU Grub for multiple reasons; among them that unfortunately x86 BIOS systems will not disappear entirely in the next 10 years even.
  • Android Boot - because some hardware manufacturers ship it, and we want to support operating systems that must work on this hardware.
  • zipl because it's how things work on s390x, and there is significant alignment in terms of emphasizing a "unified kernel" style flow.

Boot loader configs

bootc aims to align with the idea of generic bootloader-independent config files where possible; today it uses ostree. For more on this, see ostree and bootloaders.

The kernel and initramfs

There is agreement that in order to achieve integrity, there must be a strong link between the kernel and the first userspace code that executes in the initial RAM disk.

Building on the bootloader statement above: bootc will support UKI, but not require it.

The root filesystem

In the bootc model, the root filesystem defaults to a single physical Linux filesystem (e.g. xfs, ext4, btrfs etc.). It is of course supported to mount other partitions and filesystems; doing so is encouraged even for /var. , where one ends up with some space constraints around the OS /usr partition due to dm-verity.

This is a rather large difference already from particles; the root filesystem contains the operating system too; it is not a separate partition. One thing this helps significantly with is dealing with the "space management" problems that dm-verity introduces (need for a partition to have unused empty space to grow, and also a fixed-size ultimate capacity limit).

Locating the root

bootc does not mandate or emphasize any particular way to locate the root filesystem; parts of the discoverable partitions specification specifically the "root partition" may be used. Or, the root filesystem can be found the traditional way, via a local root= kernel argument.

Another point of contrast from the particle emphasis is that while we encourage encrypting the root filesystem, it is not required. Particularly some use cases in cloud environments perform encryption at the hypervisor level and do not want additional overhead of doing so per virtual machine.

Locating the base container image

Until this point, we have been operating under external constraints; no one is creating a bootloader that directly understands how to start a container image, for example. We've gotten as far as running a Linux userspace in the initial RAM disk, and the physical root filesystem is mounted.

Here, we circle back to composefs. One can think of composefs as effectively a way to manage something like dm-verity, but using files.

What bootc builds on top of that is to target a specfic container image rootfs that is part of the "physical" root. Today, this is implemented again using ostree, via the ostree= kernel commandline argument. In the future, it is likely to be a bootc.image. However, integration with other bootloaders (such as Android Boot) require us to interact with externally-specified fixed kernel arguments.

Ultimately, the initramfs will contain logic to find the desired root container, which again is just a set of files stored in the "physical" root filesystem.

Chaining integrity from the initramfs

One can think of composefs as effectively a way to manage something like dm-verity, but supporting multiple ones stored inside a standard Linux filesystem.

For "sealed" systems, the bootc project suggests a default model where there is an "ephemeral key" that binds the UKI (or equivalent) and the real root. For a bit more on this, see ostree and composefs. Effectively, at image build time an "ephemeral" key is generated which signs the composefs digest of the container image. The public half of this key is injected into the UKI, which is itself signed e.g. for Secure Boot.

At boot time, the initramfs will use its embedded public key to verify the composefs digest of the target root - and from there, overlayfs in the Linux kernel combined with fs-verity will continually verify the integrity of all operating system root files we use.

At the current time, there is not one single standardized approach for signing composefs images. Ultimately, a composefs image has a digest, and signing and verification of that digest can be done via any signing tool. For more on this, see this issue.

bootc itself will not mandate one mechanism currently. However, it is very likely that we will ship an optionally-enabled opinionated mechanism that uses basic ed25519 signatures for example.

This is effectively equivalent to the particle approach of embedding a verity root hash into the kernel commandline - it means that the booted Linux kernel will only be capable of mounting that one specific root filesystem. Note that this model is effectively the same as e.g. Fedora uses to sign kernel modules.

However, an "ephemeral key" is not the only valid way to do things; for some operating system creators it may be very desirable to continue to be able to make root OS image changes without changing the UKI (and hence re-signing it). Instead, another valid approach is to simply maintain a persistent public/private keypair. This allows disconnecting the build of userspace and kernel, but also means that there is less strict verification between kernel and userspace (e.g. downgrade attacks become possible).

Chaining integrity to configuration and application containers

composefs is explicitly designed to be useful as a backend for "application" containers (e.g. podman). There is again not one single mechanism for signing and verification; in some use cases, it may be enough to boot the operating system enough to implement "network as source of truth" - for example, the public keys for verification of application containers might be fetched from a remote server. Then before any application containers are run, we dynamically fetch the relevant keys from a server which was trusted.

The bootc project will align with podman in general, and make it easy to implement a mechanism that chains keys stored alongside the operating system into composefs-signed application containers.

Configuration (effectively starting from /etc and the kernel commandline) in a "sealed" system is a complex topic. Many operating system builds will want to disable the default "etc merge" and make /etc always lifecycle bound with the OS: commonly writable but ephemeral.

This topic is covered more in the next section.

Modularity

A goal of "particles" is to add integrity into "general purpose" Linux OSes and distributions - supporting a world where there are a lot of users that simply directly install an OS from an upstream OS such as Debian or Fedora. This has a lot of implications; among them that e.g. the Secure Boot signatures etc. are made by the OS creator, not the user.

A big emphasis for the bootc project in contrast a design where it is normal and expected for many users to derive (via standard container build technology) from the base image produced by the OS upstream.

This is just a difference in emphasis: "particles" can clearly be built fully customized by the end customer, and bootc fully supports booting "stock" images.

But still: the bootc project will again much more strongly push any scenario that desires truly strong integrity towards making and managing custom derived builds.

Extensions and security

In "unlocked" scenarios, the bootc project will continue to support a "traditional Unix" feeling where persistent changes to /etc can be written and maintained. Similarly, it will continue to be supported to have machine-local kernel arguments. There is significant value in migrating "package based" systems to "image based" systems, even if they are still "unsigned" or "unlocked".

The particle model calls for tools like confext that use DDIs. The "backend" of this (managing merged dynamic filesystem trees with overlayfs) and its relationship with systemd units is still relevant, but the bootc approach will again not expose DDIs to the user. Instead, our approach will take cues from the cloud-native world and use e.g. Kubernetes ConfigMap and support signatures on these.

More Modularity: Secondary OS installs

This uses OCI containers, which will work the same as the host.

Developer Mode

This topic heavily diverges between the "unlocked" and "sealed" cases. In the unlocked case, the bootc project aims to still continue to make it feel very "first class" to perform arbitrary machine-local mutations. Instead of managing overlay DDIs, bootc will make it trivial and obvious to use local container builds using any standard container build tooling.

Package managers

In order to ease the transition for users coming from package systems, the bootc project suggests that package managers like apt and dnf etc. learn how to become a frontend for "local" container builds too. In other words, apt|dnf install foo would become shorthand for a container build like:

FROM <localhost>
RUN apt|dnf install foo

Transitioning from unlocked, mutable local state to server-built images

Building on the above, a key point of bootc is to make it easy and obvious how to go from an "unlocked" system with potential unmanaged state towards a system built and managed using standard OCI container image build systems and tooling. For example, there should be a command like apt|dnf print-containerfile. (The problem is more complex than this of course, as we would likely want to capture some changes from /etc - but also some of those changes may include secrets, which are their own sub-topic)

Democratizing Code Signing

Strong alignment here.

Running the OS itself in a container

This is equally obvious to do when the host and the linked container runtime (e.g. podman) again use the same tools.

Parameterizing Kernels

In "unlocked" scenarios (per above) we will continue to use bootloader configuration that is unsigned.

We will not (in contrast to particles) try to strongly support a "partially sealed, general purpose" model. More on this below.

Most cases for "sealed" systems will want to entirely lock the kernel commandline, not even using a bootloader at all and hence there is no mechanism to configure it locally at all. However, as discussed in various venues around UKI, "sealed" systems can become complex to deploy where there is a need for machine (or machine-type) specific kernel arguments:

The bootc project default approach for this is to lean into the container-native world, using derivation to create a machine-independent "base image", then create derived, machine (or machine-class) specific images that are in turn signed.

Updating Images

A big differentiation here is that bootc will reuse container technology for fetching updates. The operating system and application containers will be signed with e.g. sigstore or similar for network fetching. The signature will cover the composefs digest, which enables continuous verification.

Managing storage of container images using composefs is more complex than systemd-sysupdate writing to a partition, but significantly more flexible. For more on this, see upstream composefs.

Kernel in images

The bootc and particle approaches are aligned on storing the kernel binary in /usr/lib/modules/$kver. On the bootc side, a key bit here is that bootc will extract the kernel and initramfs (or just UKI) and put it in the appropriate place - this is implemented as a transactional operation. There are significant details that can vary for how this works (because unlike particles, bootc aims to support non-EFI setups as well), but the high level idea is similar.

Boot Counting + Assessment

This topic relates to the previous one; because of multiple bootloaders, there is not one single approach. The systemd automatic boot assessment is good where it can be used, but we also will support e.g. Android bootloaders.

Picking the Newest Version

Because the storage of images is not just files or partitions, bootc will not expose to the user/administrator a semantic of strvercmp or package-manager oriented versioning semantics. Instead, the implementation of "latest" will be implemented in a more Kubernetes-oriented fashion of having "local" API objects with spec and status. This makes it easy and obvious for higher level management (e.g. cluster) tooling to orchestrate updates in a Kubernetes-style fashion.

Home Directory Management

The bootc project will not do anything with this. We will support systemd-homed where users want it, but in many dedicated servers and managed devices the idea of persistent user "home directories" are more of an anti-pattern.

Partition Setup

The biggest difference again here is that bootc is oriented closer to a single root partition by default that includes the OS, system/app containers and persistent local state all as one unit.

Trust chain

In contrast to particles, the bootc project does not aim to by default emphasize a model of using sysexts from the initramfs because its primary use case occurs when using a "partially sealed" system. And per above (re kernels) it is insufficient for other cases.

Without this in the mix then, the trust chain is simple to describe: the kernel+initramfs are verified by the bootloader, the initramfs contains the key and logic necessary to verify the composefs digest of the root, and the root starts to verify everything else.

File System Choice

As mentioned above, any Linux filesystem is valid for the root. For "sealed" systems using composefs will cover integrity and there is not a distinct need for dm-integrity.

OS Installation vs. OS Instantiation

The bootc project is just less partition-oriented and more towards multiple-composefs-in-root oriented. However the high level goal is shared of making it easy to "re-provision" and keeping the install-time flow as close as possible.

Building Images According to this Model

This is a key point of bootc: we aim for operating systems and distributions to ship their own bootc-compatible base images that can be used as a default derivation source. These images are just OCI images that will follow simple rules (as mentioned above, the kernel is found in /usr/lib/modules/$kver/vmlinuz) for example for the extra state to boot.

However in order to enable "sealed" systems (using signed composefs digests), the container build system will need support for this. But, it is a goal to standardize the composefs metadata needed alongside the OCI, and to support this in the broader container ecosystem of tools (e.g. docker, podman) as well as bootc.

Final words

This document is obviously very heavily inspired by the original blog.

A point of divergence is that a goal of the bootc project is to strongly influence the existing operating systems and distributions and help them migrate their customers into an image-based world - and to make practical compromises in order to aid that goal.

But, the bootc project strongly agrees with the idea of finding common ground (the "50% shared" case). At a practical level, this project will take a hard dependency on systemd and on the container ecosystem, extending bridges where they exist, working on shared standards and approaches between the two.