bootc
Transactional, in-place operating system updates using OCI/Docker container images. bootc is the key component in a broader mission of bootable containers.
The original Docker container model of using "layers" to model applications has been extremely successful. This project aims to apply the same technique for bootable host systems - using standard OCI/Docker containers as a transport and delivery format for base operating system updates.
The container image includes a Linux kernel (in e.g. /usr/lib/modules
),
which is used to boot. At runtime on a target system, the base userspace is
not itself running in a container by default. For example, assuming
systemd is in use, systemd acts as pid1 as usual - there's no "outer" process.
Status
At the current time, bootc has not reached 1.0, and it is possible that some APIs and CLIs may change. For more information, see the 1.0 milestone.
However, the core underlying code uses the ostree project which has been powering stable operating system updates for many years. The stability here generally refers to the surface APIs, not the underlying logic.
Base images
Many users will be more interested in base (container) images.
Fedora/CentOS
Currently, the Fedora/CentOS bootc project is the most closely aligned upstream project.
For pre-built base images; any Fedora derivative already using ostree
can be seamlessly converted into using bootc;
for example, Fedora CoreOS can be used as a
base image; you will want to also rpm-ostree install bootc
in your image builds currently.
There are some overlaps between bootc
and ignition
and zincati
however; see
this pull request for more information.
For other derivatives such as the "Atomic desktops", see discussion of relationships which particularly covers interactions with rpm-ostree.
Other
However, bootc itself is not tied to Fedora derivatives; this issue tracks the main blocker for other distributions.
Generic guidance for building images
The bootc project intends to be operating system and distribution independent as possible, similar to its related projects podman and systemd, etc.
The recommendations for creating bootc-compatible images will in general need to be owned by the OS/distribution - in particular the ones who create the default bootc base image(s). However, some guidance is very generic to most Linux systems (and bootc only supports Linux).
Let's however restate a base goal of this project:
The original Docker container model of using "layers" to model applications has been extremely successful. This project aims to apply the same technique for bootable host systems - using standard OCI/Docker containers as a transport and delivery format for base operating system updates.
Every tool and technique for creating application base images should apply to the host Linux OS as much as possible.
Understanding mutability
When run as a container (particularly as part of a build), bootc-compatible
images have all parts of the filesystem (e.g. /usr
in particular) as fully
mutable state, and writing there is encouraged (see below).
When "deployed" to a physical or virtual machine, the container image files are read-only by default; for more, see filesystem.
Installing software
For package management tools like apt
, dnf
, zypper
etc.
(generically, $pkgsystem
) it is very much expected that
the pattern of
RUN $pkgsystem install somepackage && $pkgsystem clean all
type flow Just Works here - the same way as it does "application" container images. This pattern is really how Docker got started.
There's not much special to this that doesn't also apply to application containers; but see below.
Nesting OCI containers in bootc containers
The OCI format uses
"whiteouts" represented in the tar stream as special .wh
files, and typically
consumed by the Linux kernel overlayfs
driver as special 0:0
character
devices. Without special work, whiteouts cannot be nested.
Hence, an invocation like
RUN podman pull quay.io/exampleimage/someimage
will create problems, as the podman
runtime will create whiteout files
inside the container image filesystem itself.
Special care and code changes will need to be made to container runtimes to support such nesting. Some more discussion in this tracker issue.
systemd units
The model that is most popular with the Docker/OCI world is "microservice" style containers with the application as pid 1, isolating the applications from each other and from the host system - as opposed to "system containers" which run an init system like systemd, typically also SSH and often multiple logical "application" components as part of the same container.
The bootc project generally expects systemd as pid 1, and if you embed software in your derived image, the default would then be that that software is initially launched via a systemd unit.
RUN dnf -y install postgresql && dnf clean all
Would typically also carry a systemd unit, and that service will be launched the same way as it would on a package-based system.
Users and groups
Note that the above postgresql
today will allocate a user;
this leads to the topic of users, groups and SSH keys.
Configuration
A key aspect of choosing a bootc-based operating system model is that code and configuration can be strictly "lifecycle bound" together in exactly the same way.
(Today, that's by including the configuration into the base container image; however a future enhancement for bootc will also support dynamically-injected ConfigMaps, similar to kubelet)
You can add configuration files to the same places they're
expected by typical package systems on Debian/Fedora/Arch
etc. and others - in /usr
(preferred where possible)
or /etc
. systemd has long advocated and supported
a model where /usr
(e.g. /usr/lib/systemd/system
)
contains content owned by the operating system image.
/etc
is machine-local state. However, per filesystem.md
it's important to note that the underlying OSTree
system performs a 3-way merge of /etc
, so changes you
make in the container image to e.g. /etc/postgresql.conf
will be applied on update, assuming it is not modified
locally.
Prefer using drop-in directories
These "locally modified" files can be a source of state drift. The best pattern to use is "drop-in" directories that are merged dynamically by the relevant software. systemd supports this comprehensively; see drop-ins for example in units.
And instead of modifying /etc/sudoers.conf
, it's best practice to add
a file into /etc/sudoers.d
for example.
Not all software supports this, however; and this is why there
is generic support for /etc
.
Configuration in /usr vs /etc
Some software supports generic configuration both /usr
and /etc
- systemd,
among others. Because bootc supports derivation (the way OCI
containers work) - it is supported and encourged to put configuration
files in /usr
(instead of /etc
) where possible, because then
the state is consistently immutable.
One pattern is to replace a configuration file like
/etc/postgresql.conf
with a symlink to e.g. /usr/postgres/etc/postgresql.conf
for example, although this can run afoul of SELinux labeling.
Secrets
There is a dedicated document for secrets, which is a special case of configuration.
Handling read-only vs writable locations
The high level pattern for bootc systems is summarized again this way:
- Put read-only data and executables in
/usr
- Put configuration files in
/usr
(if they're static), or/etc
if they need to be machine-local - Put "data" (log files, databases, etc.) underneath
/var
However, some software installs to /opt/examplepkg
or another
location outside of /usr
, and may include all three types of data
undernath its single toplevel directory. For example, it
may write log files to /opt/examplepkg/logs
. A simple way to handle
this is to change the directories that need to be writble to symbolic links
to /var
:
RUN apt|dnf install examplepkg && \
mv /opt/examplepkg/logs /var/log/examplepkg && \
ln -sr /opt/examplepkg/logs /var/log/examplepkg
The Fedora/CentOS bootc puppet example is one instance of this.
Another option is to configure the systemd unit launching the service to do these mounts dynamically via e.g.
BindPaths=/var/log/exampleapp:/opt/exampleapp/logs
Container runtime vs "bootc runtime"
Fundamentally, bootc
reuses the OCI image format
as a way to transport serialized filesystem trees with included metadata such as a version
label, etc.
A bootc container operates in two basic modes. First, when invoked by a container run time such as podman
or docker
(typically as part of a build process), the bootc container behaves exactly the same as any other container. For example, although there is a kernel embedded in the container image, it is not executed - the host kernel is used. There's no additional mount namespaces, etc. Ultimately, the container runtime is in full control here.
The second, and most important mode of operation is when a bootc container is installed to a physical or virtual machine. Here, bootc is in control; the container runtime used to build is no longer relevant. However, it's very important to understand that bootc's role is quite limited:
- On boot, there is code in the initramfs to do a "chroot" equivalent into the target filesystem root
- On upgrade, bootc will fetch new content, but this will not affect the running root
Crucially, besides setting up some mounts, bootc itself does not act as any kind of "container runtime". It does not set up pid or other namespace, does not change cgroups, etc. That remains the role of other code (typically systemd). bootc
is not a persistent daemon by default; it does not impose any runtime overhead.
Another example of this: While one can add Container configuration metadata, bootc
generally ignores that at runtime today.
Labels
A key aspect of OCI is the ability to use standardized (or semi-standardized)
labels. The are stored and rendered by bootc
; especially the
org.opencontainers.image.version
label.
Example ignored runtime metadata, and recommendations
ENTRYPOINT
and CMD
(OCI: Entrypoint
/Cmd
)
Ignored by bootc.
It's recommended for bootc containers to set CMD /sbin/init
; but this is not required.
The booted host system will launch from the bootloader, to the kernel+initramfs and real root however it is "physically" configured inside the image. Typically today this is using systemd in both the initramfs and at runtime; but this is up to how you build the image.
ENV
(OCI: Env
)
Ignored by bootc; to configure the global system environment you can change the systemd configuration. (Though this is generally not a good idea; instead it's usually better to change the environment of individual services)
EXPOSE
(OCI: exposedPorts
)
Ignored by bootc; it is agnostic to how the system firewall and network function at runtime.
USER
(OCI: User
)
Ignored by bootc; typically you should configure individual services inside the bootc container to run as unprivileged users instead.
HEALTHCHECK
(OCI: no equivalent)
This is currently a Docker-specific metadata, and did not make it into the OCI standards. (Note podman healthchecks)
It is important to understand again is that there is no "outer container runtime" when a bootc container is deployed on a host. The system must perform health checking on itself (or have an external system do it).
Relevant links:
- bootc rollback
- CentOS Automotive SIG unattended updates (note that as of right now, greenboot does not yet integrate with bootc)
- https://systemd.io/AUTOMATIC_BOOT_ASSESSMENT/
Kernel
When run as a container, the Linux kernel binary in
/usr/lib/modules/$kver/vmlinuz
is ignored. It
is only used when a bootc container is deployed
to a physical or virtual machine.
Security properties
When run as a container, the container runtime will by default apply various Linux kernel features such as namespacing to isolate the container processes from other system processes.
None of these isolation properties apply when a bootc system is deployed.
SELinux
For more on the intersection of SELinux and current bootc (OSTree container) images, see bootc images - SELinux.
Users and groups
This is one of the more complex topics. Generally speaking, bootc has nothing to
do directly with configuring users or groups; it is a generic OS
update/configuration mechanism. (There is currently just one small exception in
that bootc install
has a special case --root-ssh-authorized-keys
argument,
but it's very much optional).
Generic base images
Commonly OS/distribution base images will be generic, i.e. without any configuration. It is very strongly recommended to avoid hardcoded passwords and ssh keys with publicly-available private keys (as Vagrant does) in generic images.
Injecting SSH keys via systemd credentials
The systemd project has documentation for credentials which can be used in some environments to inject a root password or SSH authorized_keys. For many cases, this is a best practice.
At the time of this writing this relies on SMBIOS which is mainly configurable in local virtualization environments. (qemu).
Injecting users and SSH keys via cloud-init, etc.
Many IaaS and virtualization systems are oriented towards a "metadata server" (see e.g. AWS instance metadata) that are commonly processed by software such as cloud-init or Ignition or equivalent.
The base image you're using may include such software, or you can install it in your own derived images.
In this model, SSH configuration is managed outside of the bootable image. See e.g. GCP oslogin for an example of this where operating system identities are linked to the underlying Google accounts.
Adding users and credentials via custom logic (container or unit)
Of course, systems like cloud-init
are not privileged; you
can inject any logic you want to manage credentials via
e.g. a systemd unit (which may launch a container image)
that manages things however you prefer. Commonly,
this would be a custom network-hosted source. For example,
FreeIPA.
Another example in a Kubernetes-oriented infrastructure would be a container image that fetches desired authentication credentials from a CRD hosted in the API server. (To do things like this it's suggested to reuse the kubelet credentials)
Adding users and credentials statically in the container build
Relative to package-oriented systems, a new ability is to inject users and credentials as part of a derived build:
RUN useradd someuser
However, it is important to understand some two very important issues
with this as it exists today (the shadow-utils
implementation of useradd
)
and the default glibc files
backend for the traditional /etc/passwd
and /etc/shadow
files.
It is common for user/group IDs are allocated dynamically, and this can result in "drift" (see below).
Further, if /etc/passwd
is modified locally (because there is a machine-local user),
then any added users injected via useradd
will not appear on subsequent updates by default (they will be
in /usr/etc/passwd
instead - the default image version).
These "system users" that may be created by packaging tools invoking useradd
(e.g. apt|dnf install httpd
) that do
not also install a sysusers.d
file. Currently for example, this is the case with
the CentOS Stream 9 httpd
package. Per below, the general solution to this
is to avoid invoking useradd
in container builds, and prefer one of the below
solutions.
User and group home directories and /var
For systems configured with persistent /home
→ /var/home
, any changes to /var
made
in the container image after initial installation will not be applied on subsequent updates. If for example you inject /var/home/someuser/.ssh/authorized_keys
into a container build, existing systems will not get the updated authorized keys file.
Using DynamicUser=yes for systemd units
For "system" users it's strongly recommended to use systemd DynamicUser=yes where possible.
This is significantly better than the pattern of allocating users/groups at "package install time" (e.g. Fedora package user/group guidelines) because it avoids potential UID/GID drift (see below).
Using systemd-sysusers
See systemd-sysusers. For example in your derived build:
COPY mycustom-user.conf /usr/lib/sysusers.d
A key aspect of how this works is that sysusers
will make changes
to the traditional /etc/passwd
file as necessary on boot. If
/etc
is persistent, this can avoid uid/gid drift (but
in the general case it does mean that uid/gid allocation can
depend on how a specific machine was upgraded over time).
Using systemd JSON user records
See JSON user records. Unlike sysusers
,
the canonical state for these live in /usr
- if a subsequent
image drops a user record, then it will also vanish
from the system - unlike sysusers.d
.
nss-altfiles
The nss-altfiles project
(long) predates systemd JSON user records. It aims to help split
"system" users into /usr/lib/passwd
and /usr/lib/group
. It's
very important to understand that this aligns with the way
the OSTree project handles the "3 way merge" for /etc
as it
relates to /etc/passwd
. Currently, if the /etc/passwd
file is
modified in any way on the local system, then subsequent changes
to /etc/passwd
in the container image will not be applied.
Some base images may have nss-altfiles
enabled by default;
this is currently the case for base images built by
rpm-ostree.
Commonly, base images will have some "system" users pre-allocated and managed via this file again to avoid uid/gid drift.
In a derived container build, you can also append users
to /usr/lib/passwd
for example. (At the time of this
writing there is no command line to do so though).
Typically it is more preferable to use sysusers.d
or DynamicUser=yes
.
Machine-local state for users
At this point, it is important to understand the filesystem layout - the default is up to the base image.
The default Linux concept of a user has data stored in both /etc
(/etc/passwd
, /etc/shadow
and groups)
and /home
. The choice for how these work is up to the base image, but
a common default for generic base images is to have both be machine-local persistent state.
In this model /home
would be a symlink to /var/home/someuser
.
Injecting users and SSH keys via at system provisioning time
For base images where /etc
and /var
are configured to persist by default, it
will then be generally supported to inject users via "installers" such
as Anaconda (interactively or
via kickstart) or any others.
Typically generic installers such as this are designed for "one time bootstrap" and again then the configuration becomes mutable machine-local state that can be changed "day 2" via some other mechanism.
The simple case is a user with a password - typically the installer helps
set the initial password, but to change it there is a different in-system
tool (such as passwd
or a GUI as part of Cockpit, GNOME/KDE/etc).
It is intended that these flows work equivalently in a bootc-compatible system, to support users directly installing "generic" base images, without requiring changes to the tools above.
Transient home directories
Many operating system deployments will want to minimize persistent, mutable and executable state - and user home directories are that
But it is also valid to default to having e.g. /home
be a tmpfs
to ensure user data is cleaned up across reboots (and this pairs particularly
well with a transient /etc
as well):
In order to set up the user's home directory to e.g. inject SSH authorized_keys
or other files, a good approach is to use systemd tmpfiles.d
snippets:
f~ /home/someuser/.ssh/authorized_keys 600 someuser someuser - <base64 encoded data>
which can be embedded in the image as /usr/lib/tmpfiles.d/someuser-keys.conf
.
Or a service embedded in the image can fetch keys from the network and write them; this is the pattern used by cloud-init and afterburn.
UID/GID drift
Ultimately the /etc/passwd
and similar files are a mapping
between names and numeric identifiers. A problem then becomes
when this mapping is dynamic and mixed with "stateless"
container image builds.
For example today the CentOS Stream 9 postgresql
package
allocates a static uid of 26
.
This means that
RUN dnf -y install postgresql
will always result in a change to /etc/passwd
that allocates uid 26
and data in /var/lib/postgres
will always be owned by that UID.
However in contrast, the cockpit project allocates a floating cockpit-ws user.
This means that each container image build (without additional work) may (due to RPM installation ordering or other reasons) result in the uid changing.
This can be a problem if that user maintains persistent state.
Such cases are best handled by being converted to use sysusers.d
(see Fedora change) - or again even better, using DynamicUser=yes
(see above).
Kernel arguments
The default bootc model uses "type 1" bootloader config
files stored in /boot/loader/entries
, which define arguments
provided to the Linux kernel.
The set of kernel arguments can be machine-specific state, but can also be managed via container updates.
The bootloader entries are currently written by the OSTree backend.
More on Linux kernel arguments: https://docs.kernel.org/admin-guide/kernel-parameters.html
/usr/lib/bootc/kargs.d
Many bootc use cases will use generic "OS/distribution" kernels.
In order to support injecting kernel arguments, bootc supports
a small custom config file format in /usr/lib/bootc/kargs.d
in
TOML format, that have the following form:
# /usr/lib/bootc/kargs.d/10-example.toml
kargs = ["mitigations=auto,nosmt"]
There is also support for making these kernel arguments
architecture specific via the match-architectures
key:
# /usr/lib/bootc/kargs.d/00-console.toml
kargs = ["console=ttyS0,114800n8"]
match-architectures = ["x86_64"]
NOTE: The architecture matching here accepts values defined
by the Rust standard library
(using the architecture of the bootc
binary itself).
In some cases for Linux, this matches the value of uname -m
, but
definitely not all. For example, on Fedora derivatives there is ppc64le
,
but in Rust only powerpc64
. A common discrepancy is that
Debian derivatives use amd64
, whereas Rust (and Fedora derivatives)
use x86_64
.
Changing kernel arguments post-install via kargs.d
Changes to kargs.d
files included in a container build
are honored post-install; the difference between the set of
kernel arguments is applied to the current bootloader
configuration. This will preserve any machine-local
kernel arguments.
Kernel arguments injected at installation time
The bootc install
flow supports a --karg
to provide
install-time kernel arguments. These become machine-local
state.
Higher level install tools (ideally at least using bootc install to-filesystem
can inject kernel arguments this way) too; for example,
the Anaconda installer
has a bootloader
verb which ultimately uses an API
similar to this.
Post-install, it is supported for any tool to edit
the /boot/loader/entries
files, which are in a standardized
format.
Typically, /boot
is mounted read-only to limit
the set of tools which write to this filesystem.
At the current time, bootc
does not itself offer
an API to manipulate kernel arguments maintained per-machine.
Other projects such as rpm-ostree
do, via e.g. rpm-ostree kargs
.
Injecting default arguments into custom kernels
The Linux kernel supports building in arguments into the kernel
binary, at the time of this writing via the config CMDLINE
build option. If you are building a custom kernel, then
it often makes sense to use this instead of /usr/lib/bootc/kargs.d
for example.
Secrets (e.g. container pull secrets)
To have bootc
fetch updates from registry which requires authentication,
you must include a pull secret in /etc/ostree/auth.json
(or
as of recent versions in /usr/lib/ostree/auth.json
).
Another common case is to also fetch container images via
podman
or equivalent. There is a pull request to add /etc/containers/auth.json
which would be shared by the two stacks by default.
Regardless, injecting this data is a good example of a generic "secret". The bootc project does not currently include one single opinionated mechanism for secrets.
Using a credential helper
In order to use a credential helper as configured in registries.conf
such as credential-helpers = ["ecr-login"]
, you must currently
also write a "no-op" authentication file with the contents {}
(i.e. an
empty JSON object, not an empty file) into the pull secret location.
Embedding in container build
This was mentioned above; you can include secrets in the container image if the registry server is suitably protected.
In some cases, embedding only "bootstrap" secrets into the container image is a viable pattern, especially alongside a mechanism for having a machine authenticate to a cluster. In this pattern, a provisioning tool (whether run as part of the host system or a container image) uses the bootstrap secret to lay down and keep updated other secrets (for example, SSH keys, certificates).
Via cloud metadata
Most production IaaS systems support a "metadata server" or equivalent
which can securely host secrets - particularly "bootstrap secrets".
Your container image can include tooling such as cloud-init
or ignition
which fetches these secrets.
Embedded in disk images
Another pattern is to embed bootstrap secrets only in disk images. For example, when generating a cloud disk image (AMI, OpenStack glance image, etc.) from an input container image, the disk image can contain secrets that are effectively machine-local state. Rotating them would require an additional management tool, or refreshing disk images.
Injected via baremetal installers
It is common for installer tools to support injecting configuration which can commonly cover secrets like this.
Injecting secrets via systemd credentials
The systemd project has documentation for credentials which applies in some deployment methodologies.
Management services
When running a fleet of systems, it is common to use a central management service. Commonly, these services provide a client to be installed on each system which connects to the central service. Often, the management service requires the client to perform a one time registration.
The following example shows how to install the client into a bootc image and run it at startup to register the system. This example assumes the management-client handles future connections to the server, e.g. via a cron job or a separate systemd service. This example could be modified to create a persistent systemd service if that is required. The Containerfile is not optimized in order to more clarly explain each step, e.g. it's generally better to invoke RUN a single time to avoid creating multiple layers in the image.
FROM <bootc base image>
# Typically when using a management service, it will determine when to upgrade the system.
# So, disable bootc-fetch-apply-updates.timer if it is included in the base image.
RUN systemctl disable bootc-fetch-apply-updates.timer
# Install the client from dnf, or some other method that applies for your client
RUN dnf install management-client -y && dnf clean all
# Bake the credentials for the management service into the image
ARG activation_key=
# The existence of .run_next_boot acts as a flag to determine if the
# registration is required to run when booting
RUN touch /etc/management-client/.run_next_boot
COPY <<"EOT" /usr/lib/systemd/system/management-client.service
[Unit]
Description=Run management client at boot
After=network-online.target
ConditionPathExists=/etc/management-client/.run_client_next_boot
[Service]
Type=oneshot
EnvironmentFile=/etc/management-client/.credentials
ExecStart=/usr/bin/management-client register --activation-key ${CLIENT_ACTIVATION_KEY}
ExecStartPre=/bin/rm -f /etc/management-client/.run_next_boot
ExecStop=/bin/rm -f /etc/management-client/.credentials
[Install]
WantedBy=multi-user.target
EOT
# Link the service to run at startup
RUN ln -s /usr/lib/systemd/system/management-client.service /usr/lib/systemd/system/multi-user.target.wants/management-client.service
# Store the credentials in a file to be used by the systemd service
RUN echo -e "CLIENT_ACTIVATION_KEY=${activation_key}" > /etc/management-client/.credentials
# Set the flag to enable the service to run one time
# The systemd service will remove this file after the registration completes the first time
RUN touch /etc/management-client/.run_next_boot
Managing upgrades
Right now, bootc is a quite simple tool that is designed to do just a few things well. One of those is transactionally fetching new operating system updates from a registry and booting into them, while supporting rollback.
The bootc upgrade
verb
This will query the registry and queue an updated container image for the next boot.
This is backed today by ostree, implementing an A/B style upgrade system. Changes to the base image are staged, and the running system is not changed by default.
Use bootc upgrade --apply
to auto-apply if there are queued changes.
There is also an opinionated bootc-fetch-apply-updates.timer
and corresponding
service available in upstream for operating systems and distributions
to enable.
Man page: bootc-upgrade.
Changing the container image source
Another useful pattern to implement can be to use a management agent
to invoke bootc switch
(or declaratively via bootc edit
)
to implement e.g. blue/green deployments,
where some hosts are rolled onto a new image independently of others.
bootc switch quay.io/examplecorp/os-prod-blue:latest
bootc switch
has the same effect as bootc upgrade
; there is no
semantic difference between the two other than changing the
container image being tracked.
This will preserve existing state in /etc
and /var
- for example,
host SSH keys and home directories.
Man page: bootc-switch.
Rollback
There is a bootc rollback
verb, and associated declarative interface
accessible to tools via bootc edit
. This will swap the bootloader
ordering to the previous boot entry.
Man page: bootc-rollback.
Accessing registries and disconnected updates
The bootc
project uses the containers/image
library to fetch container images (the same used by podman
) which means it honors almost all
the same configuration options in /etc/containers
.
Insecure registries
Container clients such as podman pull
and docker pull
have a --tls-verify=false
flag which says to disable TLS verification when accessing the registry. bootc
has no such option. Instead, you can globally configure the option
to disable TLS verification when accessing a specific registry via the
/etc/containers/registries.conf.d
configuration mechanism, for example:
# /etc/containers/registries.conf.d/local-registry.conf
[[registry]]
location="localhost:5000"
insecure=true
For more, see containers-registries.conf.
Disconnected and offline updates
It is common (a best practice even) to maintain systems which default to being disconnected from the public Internet.
Pulling updates from a local mirror
Everything in the section remapping and mirroring images applies to bootc as well.
Performing offline updates via USB
In a usage scenario where the operating system update is in a fully
disconnected environment and you want to perform updates via e.g. inserting
a USB drive, one can do this by copying the desired OS container image to
e.g. an oci
directory:
skopeo copy docker://quay.io/exampleos/myos:latest oci:/path/to/filesystem/myos.oci
Then once the USB device containing the myos.oci
OCI directory is mounted
on the target, use
bootc switch --transport oci /var/mnt/usb/myos.oci
The above command is only necessary once, and thereafter will be idempotent.
Then, use bootc upgrade --apply
to fetch and apply the update from the USB device.
This process can all be automated by creating systemd units that look for a USB device with a specific label, mount (optionally with LUKS for example), and then trigger the bootc upgrade.
Logically Bound Images
About logically bound images
This feature enables an association of container "app" images to a base bootc system image. Use cases for this include:
- Logging (e.g. journald->remote log forwarder container)
- Monitoring (e.g. Prometheus node_exporter)
- Configuration management agents
- Security agents
These types of things are commonly not updated outside of the host, and there's a secondary important property: We always want them present and available on the host, possibly from very early on in the boot. In contrast with default usage of tools like podman
or docker
, images may be pulled dynamically after the boot starts; requiring functioning networking, etc. For example if the remote registry is unavailable temporarily, the host system may run for a longer period of time without log forwarding or monitoring, which can be very undesirable.
Another simple way to say this is that logically bound images allow you to reference container images with the same confidence you can with ExecStart=
in a systemd unit.
The term "logically bound" was created to contrast with physically bound images. There are some trade-offs between the two approaches. Some benefits of logically bound images are:
- The bootc system image can be updated without re-downloading the app image bits.
- The app images can be updated without modifying the bootc system image, this would be especially useful for development work
Using logically bound images
Each image is defined in a Podman Quadlet .image
or .container
file. An image is selected to be bound by creating a symlink in the /usr/lib/bootc/bound-images.d
directory pointing to a .image
or .container
file.
With these defined, during a bootc upgrade
or bootc switch
the bound images defined in the new bootc image will be automatically pulled into the bootc image storage, and are available to container runtimes such as podman by explicitly configuring them to point to the bootc storage as an "additional image store", via e.g.:
podman --storage-opt=additionalimagestore=/usr/lib/bootc/storage run <image> ...
An example Containerfile
FROM quay.io/myorg/myimage:latest
COPY ./my-app.image /usr/share/containers/systemd/my-app.image
COPY ./another-app.container /usr/share/containers/systemd/another-app.container
RUN ln -s /usr/share/containers/systemd/my-app.image /usr/lib/bootc/bound-images.d/my-app.image && \
ln -s /usr/share/containers/systemd/another-app.container /usr/lib/bootc/bound-images.d/another-app.container
In the .container
definition, you should use:
GlobalArgs=--storage-opt=additionalimagestore=/usr/lib/bootc/storage
Pull secret
Images are fetched using the global bootc pull secret by default (/etc/ostree/auth.json
). It is not yet supported to configure PullSecret
in these image definitions.
Garbage collection
The bootc image store is owned by bootc; images will be garbage collected when they are no longer referenced
by a file in /usr/lib/bootc/bound-images.d
.
Installation
Logically bound images must be present in the default container store (/var/lib/containers
) when invoking
bootc install; the images will be copied into the target system and present
directly at boot, alongside the bootc base image.
Limitations
The only field parsed and honored by bootc currently is the Image
field of a .image
or .container
file.
Other pull-relevant flags such as PullSecret=
for example are not supported (see above).
Another example unsupported flag is Arch
(the default host architecture is always used).
There is no mechanism to inject arbitrary arguments to the podman pull
(or equivalent)
invocation used by bootc. However, many properties used for container registry interaction
can be configured via containers-registries.conf
and apply to all commands operating on that image.
It is not currently supported in general to launch "rootless" containers from system-owned
image stores in general, whether from /var/lib/containers
or the /usr/lib/bootc/storage
.
There is no integration between bootc and "rootless" storage today, and none is planned.
Instead, it's recommended to ensure that your "system" or "rootful" containers drop
privileges. More in e.g. https://github.com/containers/podman/discussions/13728.
Distro/OS installer support
At the current time, logically bound images are not supported by Anaconda.
Comparison with default podman systemd units
In the comparison below, the term "floating" will be used for non-logically bound images. These images are often fetched by e.g. podman-systemd and may be upgraded, added or removed independently of the host upgrade lifecycle.
Lifecycle
- Floating image: The images are downloaded by the machine the first time it starts (requiring networking typically). Tools such as
podman auto-update
can be used to upgrade them independently of the host. - Logically bound image: The images are referenced by the bootable container and are ensured to be available when the (bootc based) server starts. The image is always upgraded via
bootc upgrade
and appears read-only to other processes (e.g.podman
).
Upgrades, rollbacks and garbage collection
- Floating image: Managed by the user (
podman auto-update
,podman image prune
). This can be triggered at anytime independent of the host upgrades or rollbacks, and host upgrades/rollbacks do not affect the set of images. - Logically bound image: Managed exclusively by
bootc
during upgrades. The logically bound images corresponding to rollback deployments will also be retained.bootc
performs garbage collection of unused images.
"rootless" container image
- Floating image: Supported.
- Logically bound image: Not supported (
bootc
cannot be invoked as non-root). Instead, it's recommended to just drop most privileges for launched logically bound containers.
Booting local builds
In some scenarios, you may want to boot a locally built container image, in order to apply a persistent hotfix to a specific server, or as part of a development/testing scenario.
Building a new local image
At the current time, the bootc host container storage is distinct
from that of the podman
container runtime storage (default
configuration in /var/lib/containers
).
It not currently streamlined to export the booted host container storage into the podman storage.
Hence today, to replicate the exact container image the
host has booted, take the container image referenced
in bootc status
and turn it into a podman pull
invocation.
Next, craft a container build file with your desired changes:
FROM <image>
RUN apt|dnf upgrade https://example.com/systemd-hotfix.package
Copying an updated image into the bootc storage
This command is straightforward; we just need to tell bootc
to fetch updates from containers-storage
, which is the
local "application" container runtime (podman) storage:
$ bootc switch --transport containers-storage quay.io/fedora/fedora-bootc:40
From there, the new image will be queued for the next boot
and a reboot
will apply it.
For more on valid transports, see containers-transports.
NAME
bootc - Deploy and transactionally in-place with bootable container images
SYNOPSIS
bootc [-h|--help] [-V|--version] <subcommands>
DESCRIPTION
Deploy and transactionally in-place with bootable container images.
The `bootc` project currently uses ostree-containers as a backend to support a model of bootable container images. Once installed, whether directly via `bootc install` (executed as part of a container) or via another mechanism such as an OS installer tool, further updates can be pulled and `bootc upgrade`.
OPTIONS
-h, --help
: Print help (see a summary with -h)
-V, --version
: Print version
SUBCOMMANDS
bootc-upgrade(8)
: Download and queue an updated container image to apply
bootc-switch(8)
: Target a new container image reference to boot
bootc-rollback(8)
: Change the bootloader entry ordering; the deployment under `rollback` will be queued for the next boot, and the current will become rollback. If there is a `staged` entry (an unapplied, queued upgrade) then it will be discarded
bootc-edit(8)
: Apply full changes to the host specification
bootc-status(8)
: Display status
bootc-usr-overlay(8)
: Adds a transient writable overlayfs on `/usr` that will be discarded on reboot
bootc-install(8)
: Install the running container to a target
bootc-container(8)
: Operations which can be executed as part of a container build
bootc-help(8)
: Print this message or the help of the given subcommand(s)
VERSION
v1.1.4
NAME
bootc-status - Display status
SYNOPSIS
bootc status [--format] [--format-version] [--booted] [-h|--help]
DESCRIPTION
Display status
If standard output is a terminal, this will output a description of the bootc system state. If standard output is not a terminal, output a YAML-formatted object using a schema intended to match a Kubernetes resource that describes the state of the booted system.
## Parsing output via programs
Either the default YAML format or `--format=json` can be used. Do not attempt to explicitly parse the output of `--format=humanreadable` as it will very likely change over time.
## Programmatically detecting whether the system is deployed via bootc
Invoke e.g. `bootc status --json`, and check if `status.booted` is not `null`.
OPTIONS
--format=FORMAT
: The output format\
\
*Possible values:*
- humanreadable: Output in Human Readable format
- yaml: Output in YAML format
- json: Output in JSON format
--format-version=FORMAT_VERSION
: The desired format version. There is currently one supported version, which is exposed as both `0` and `1`. Pass this option to explicitly request it; it is possible that another future version 2 or newer will be supported in the future
--booted
: Only display status for the booted deployment
-h, --help
: Print help (see a summary with -h)
VERSION
v1.1.4
NAME
bootc-upgrade - Download and queue an updated container image to apply
SYNOPSIS
bootc upgrade [--quiet] [--check] [--apply] [-h|--help]
DESCRIPTION
Download and queue an updated container image to apply.
This does not affect the running system; updates operate in an "A/B" style by default.
A queued update is visible as `staged` in `bootc status`.
Currently by default, the update will be applied at shutdown time via `ostree-finalize-staged.service`. There is also an explicit `bootc upgrade --apply` verb which will automatically take action (rebooting) if the system has changed.
However, in the future this is likely to change such that reboots outside of a `bootc upgrade --apply` do *not* automatically apply the update in addition.
OPTIONS
--quiet
: Dont display progress
--check
: Check if an update is available without applying it.
This only downloads an updated manifest and image configuration
(i.e. typically kilobyte-sized metadata) as opposed to the image
layers.
--apply
: Restart or reboot into the new target image.
Currently, this option always reboots. In the future this command
will detect the case where no kernel changes are queued, and perform
a userspace-only restart.
-h, --help
: Print help (see a summary with -h)
VERSION
v1.1.4
NAME
bootc-switch - Target a new container image reference to boot
SYNOPSIS
bootc switch [--quiet] [--apply] [--transport] [--enforce-container-sigpolicy] [--ostree-remote] [--retain] [-h|--help] <TARGET>
DESCRIPTION
Target a new container image reference to boot.
This is almost exactly the same operation as `upgrade`, but additionally changes the container image reference instead.
## Usage
A common pattern is to have a management agent control operating system updates via container image tags; for example, `quay.io/exampleos/someuser:v1.0` and `quay.io/exampleos/someuser:v1.1` where some machines are tracking `:v1.0`, and as a rollout progresses, machines can be switched to `v:1.1`.
OPTIONS
--quiet
: Dont display progress
--apply
: Restart or reboot into the new target image.
Currently, this option always reboots. In the future this command
will detect the case where no kernel changes are queued, and perform
a userspace-only restart.
--transport=TRANSPORT [default: registry]
: The transport; e.g. oci, oci-archive, containers-storage. Defaults to `registry`
--enforce-container-sigpolicy
: This is the inverse of the previous `--target-no-signature-verification` (which is now a no-op).
Enabling this option enforces that \`/etc/containers/policy.json\`
includes a default policy which requires signatures.
--ostree-remote=OSTREE_REMOTE
: Enable verification via an ostree remote
--retain
: Retain reference to currently booted image
-h, --help
: Print help (see a summary with -h)
<TARGET>
: Target image to use for the next boot
VERSION
v1.1.4
NAME
bootc-rollback - Change the bootloader entry ordering; the deployment under `rollback` will be queued for the next boot, and the current will become rollback. If there is a `staged` entry (an unapplied, queued upgrade) then it will be discarded
SYNOPSIS
bootc rollback [-h|--help]
DESCRIPTION
Change the bootloader entry ordering; the deployment under `rollback` will be queued for the next boot, and the current will become rollback. If there is a `staged` entry (an unapplied, queued upgrade) then it will be discarded.
Note that absent any additional control logic, if there is an active agent doing automated upgrades (such as the default `bootc-fetch-apply-updates.timer` and associated `.service`) the change here may be reverted. Its recommended to only use this in concert with an agent that is in active control.
A systemd journal message will be logged with `MESSAGE_ID=26f3b1eb24464d12aa5e7b544a6b5468` in order to detect a rollback invocation.
OPTIONS
-h, --help
: Print help (see a summary with -h)
VERSION
v1.1.4
NAME
bootc-usr-overlay - Adds a transient writable overlayfs on `/usr` that will be discarded on reboot
SYNOPSIS
bootc usr-overlay [-h|--help]
DESCRIPTION
Adds a transient writable overlayfs on `/usr` that will be discarded on reboot.
## Use cases
A common pattern is wanting to use tracing/debugging tools, such as `strace` that may not be in the base image. A system package manager such as `apt` or `dnf` can apply changes into this transient overlay that will be discarded on reboot.
## /etc and /var
However, this command has no effect on `/etc` and `/var` - changes written there will persist. It is common for package installations to modify these directories.
## Unmounting
Almost always, a system process will hold a reference to the open mount point. You can however invoke `umount -l /usr` to perform a "lazy unmount".
OPTIONS
-h, --help
: Print help (see a summary with -h)
VERSION
v1.1.4
% bootc-fetch-apply-updates(5)
NAME
bootc-fetch-apply-updates.service
DESCRIPTION
This service causes bootc
to perform the following steps:
- Check the source registry for an updated container image
- If one is found, download it
- Reboot
This service also comes with a companion bootc-fetch-apply-updates.timer
systemd unit. The current default systemd timer shipped in the upstream
project is enabled for daily updates.
However, it is fully expected that different operating systems and distributions choose different defaults.
CUSTOMIZING UPDATES
Note that all three of these steps can be decoupled; they are:
bootc upgrade --check
bootc upgrade
bootc upgrade --apply
SEE ALSO
bootc(1)
% bootc-status-updated.path(8)
NAME
bootc-status-updated.path
DESCRIPTION
This unit watches the bootc
root directory (/ostree/bootc) for
modification, and triggers the companion bootc-status-updated.target
systemd unit.
The bootc
program updates the mtime on its root directory when the
contents of bootc status
changes as a result of an
update/upgrade/edit/switch/rollback operation.
SEE ALSO
bootc(1), bootc-status-updated.target(8)
% bootc-status-updated.target(8)
NAME
bootc-status-updated.target
DESCRIPTION
This unit is triggered by the companion bootc-status-updated.path
systemd unit. This target is intended to enable users to add custom
services to trigger as a result of bootc status
changing.
Add the following to your unit configuration to active it when bootc status
changes:
[Install]
WantedBy=bootc-status-updated.target
SEE ALSO
bootc(1), bootc-status-updated.path(8)
Using bootc via API
At the current time, bootc is primarily intended to be driven via a fork/exec model. The core CLI verbs are stable and will not change.
Using bootc edit
and bootc status --json
While bootc does not depend on Kubernetes, it does currently also offer a Kubernetes style API, especially oriented towards the spec and status and other conventions.
In general, most use cases of driving bootc via API are probably
most easily done by forking off bootc upgrade
when desired,
and viewing bootc status --json --format-version=1
.
JSON Schema
The current API org.containers.bootc/v1
is stable.
In order to support the future introduction of a v2
or newer format, please change your code now to explicitly
request --format-version=1
as referenced above. (Available
since bootc 0.1.15, --format-version=0
in bootc 0.1.14).
There is a JSON schema generated from the Rust source code available here: host-v1.schema.json.
A common way to use this is to run a code generator such as go-jsonschema on the input schema.
Installing "bootc compatible" images
A key goal of the bootc project is to think of bootable operating systems as container images. Docker/OCI container images are just tarballs wrapped with some JSON. But in order to boot a system (whether on bare metal or virtualized), one needs a few key components:
- bootloader
- kernel (and optionally initramfs)
- root filesystem (xfs/ext4/btrfs etc.)
The bootloader state is managed by the external bootupd
project which abstracts over bootloader installs and upgrades. The invocation of
bootc install
will always run bootupd
to handle bootloader installation
to the target disk. The default expectation is that bootloader contents and install logic
come from the container image in a bootc
based system.
The Linux kernel (and optionally initramfs) is embedded in the container image; the canonical location
is /usr/lib/modules/$kver/vmlinuz
, and the initramfs should be in initramfs.img
in that directory.
The bootc install
command bridges the two worlds of a standard, runnable OCI image
and a bootable system by running tooling logic embedded
in the container image to create the filesystem and bootloader setup dynamically.
This requires running the container via --privileged
; it uses the running Linux kernel
on the host to write the file content from the running container image; not the kernel
inside the container.
There are two sub-commands: bootc install to-disk
and boot install to-filesystem
.
However, nothing else (external) is required to perform a basic installation to disk - the container image itself comes with a baseline self-sufficient installer that sets things up ready to boot.
Internal vs external installers
The bootc install to-disk
process only sets up a very simple
filesystem layout, using the default filesystem type defined in the container image,
plus hardcoded requisite platform-specific partitions such as the ESP.
In general, the to-disk
flow should be considered mainly a "demo" for
the bootc install to-filesystem
flow, which can be used by "external" installers
today. For example, in the Fedora/CentOS bootc project
project, there are two "external" installers in Anaconda and bootc-image-builder
.
Executing bootc install
The two installation commands allow you to install the container image
either directly to a block device (bootc install to-disk
) or to an existing
filesystem (bootc install to-filesystem
).
The installation commands MUST be run from the container image
that will be installed, using --privileged
and a few
other options. This means you are (currently) not able to install bootc
to an existing system and install your container image. Failure to run
bootc
from a container image will result in an error.
Here's an example of using bootc install
(root/elevated permission required):
podman run --rm --privileged --pid=host -v /var/lib/containers:/var/lib/containers -v /dev:/dev --security-opt label=type:unconfined_t <image> bootc install to-disk /path/to/disk
Note that while --privileged
is used, this command will not perform any
destructive action on the host system. Among other things, --privileged
makes sure that all host devices are mounted into container. /path/to/disk
is
the host's block device where <image>
will be installed on.
The --pid=host --security-opt label=type:unconfined_t
today
make it more convenient for bootc to perform some privileged
operations; in the future these requirement may be dropped.
The -v /var/lib/containers:/var/lib/containers
option is required in order
for the container to access its own underlying image, which is used by
the installation process.
Jump to the section for install to-filesystem
later
in this document for additional information about that method.
"day 2" updates, security and fetch configuration
By default the bootc install
path will find the pull specification used
for the podman run
invocation and use it to set up "day 2" OS updates that bootc update
will use.
For example, if you invoke podman run --privileged ... quay.io/examplecorp/exampleos:latest bootc install ...
then the installed operating system will fetch updates from quay.io/examplecorp/exampleos:latest
.
This can be overridden via --target_imgref
; this is handy in cases like performing
installation in a manufacturing environment from a mirrored registry.
By default, the installation process will verify that the container (representing the target OS) can fetch its own updates.
Additionally note that to perform an install with a target image reference set to an
authenticated registry, you must provide a pull secret. One path is to embed the pull secret into
the image in /etc/ostree/auth.json
.
Alternatively, the secret can be added after an installation process completes and managed separately;
in that case you will need to specify --skip-fetch-check
.
Configuring the default root filesystem type
To use the to-disk
installation flow, the container should include a root filesystem
type. If it does not, then each user will need to specify install to-disk --filesystem
.
To set a default filesystem type for bootc install to-disk
as part of your OS/distribution base image,
create a file named /usr/lib/bootc/install/00-<osname>.toml
with the contents of the form:
[install.filesystem.root]
type = "xfs"
Configuration files found in this directory will be merged, with higher alphanumeric values
taking precedence. If for example you are building a derived container image from the above OS,
you could create a 50-myos.toml
that sets type = "btrfs"
which will override the
prior setting.
For other available options, see bootc-install-config.
Installing an "unconfigured" image
The bootc project aims to support generic/general-purpose operating systems and distributions that will ship unconfigured images. An unconfigured image does not have a default password or SSH key, etc.
For more information, see Image building and configuration guidance.
More advanced installation with to-filesystem
The basic bootc install to-disk
logic is really a pretty small (but opinionated) wrapper
for a set of lower level tools that can also be invoked independently.
The bootc install to-disk
command is effectively:
mkfs.$fs /dev/disk
mount /dev/disk /mnt
bootc install to-filesystem --karg=root=UUID=<uuid of /mnt> --imgref $self /mnt
There may be a bit more involved here; for example configuring
--block-setup tpm2-luks
will configure the root filesystem
with LUKS bound to the TPM2 chip, currently via systemd-cryptenroll.
Some OS/distributions may not want to enable it at all; it can be configured off at build time via Cargo features.
Using bootc install to-filesystem
The usual expected way for an external storage system to work
is to provide root=<UUID>
and rootflags
kernel arguments
to describe to the inital RAM disk how to find and mount the
root partition. For more on this, see the below section
discussing mounting the root filesystem.
Note that if a separate /boot
is needed (e.g. for LUKS) you will also need to provide --boot-mount-spec UUID=...
.
The bootc install to-filesystem
command allows an operating
system or distribution to ship a separate installer that creates more complex block
storage or filesystem setups, but reuses the "top half" of the logic.
For example, a goal is to change Anaconda
to use this.
Using bootc install to-disk --via-loopback
Because every bootc
system comes with an opinionated default installation
process, you can create a raw disk image (that can e.g. be booted via virtualization)
via e.g.:
truncate -s 10G myimage.raw
podman run --rm --privileged --pid=host --security-opt label=type:unconfined_t -v /dev:/dev -v /var/lib/containers:/var/lib/containers -v .:/output <yourimage> bootc install to-disk --generic-image --via-loopback /output/myimage.raw
Notice that we use --generic-image
for this use case.
Set the environment variable BOOTC_DIRECT_IO=on
to create the loopback device with direct-io enabled.
Using bootc install to-existing-root
This is a variant of install to-filesystem
, which maximizes convenience for using
an existing Linux system, converting it into the target container image. Note that
the /boot
(and /boot/efi
) partitions will be reinitialized - so this is a
somewhat destructive operation for the existing Linux installation.
Also, because the filesystem is reused, it's required that the target system kernel support the root storage setup already initialized.
The core command should look like this (root/elevated permission required):
podman run --rm --privileged -v /dev:/dev -v /var/lib/containers:/var/lib/containers -v /:/target \
--pid=host --security-opt label=type:unconfined_t \
<image> \
bootc install to-existing-root
It is assumed in this command that the target rootfs is pased via -v /:/target
at this time.
As noted above, the data in /boot
will be wiped, but everything else in the existing
operating /
is NOT automatically cleaned up. This can
be useful, because it allows the new image to automatically import data from the previous
host system! For example, container images, database, user home directory data, config
files in /etc
are all available after the subsequent reboot in /sysroot
(which
is the "physical root").
A special case of this trick is using the --root-ssh-authorized-keys
flag to inherit
root's SSH keys (which may have been injected from e.g. cloud instance userdata
via a tool like cloud-init
). To do this, just add
--root-ssh-authorized-keys /target/root/.ssh/authorized_keys
to the above.
Using bootc install to-filesystem --source-imgref <imgref>
By default, bootc install
has to be run inside a podman container. With this assumption,
it can escape the container, find the source container image (including its layers) in
the podman's container storage and use it to create the image.
When --source-imgref <imgref>
is given, bootc
no longer assumes that it runs inside podman.
Instead, the given container image reference (see containers-transports(5)
for accepted formats) is used to fetch the image. Note that bootc install
still has to be
run inside a chroot created from the container image. However, this allows users to use
a different sandboxing tool (e.g. bubblewrap).
This argument is mainly useful for 3rd-party tooling for building disk images from bootable containers (e.g. based on osbuild).
Finding and configuring the physical root filesystem
On a bootc system, the "physical root" is different from the "logical root" of the booted container. For more on that, see filesystem. This section is about how the physical root filesystem is discovered.
Systems using systemd will often default to using systemd-fstab-generator and/or systemd-gpt-auto-generator. Support for the latter though for the root filesystem is conditional on EFI and a bootloader implementing the bootloader interface.
Outside of the discoverable partition model, a common baseline default for installers is to set root=UUID=
(and optionally rootflags=
) kernel arguments as machine specific state.
When using install to-filesystem
, you should provide these as explicit
kernel arguments.
Some installation tools may want to generate an /etc/fstab
. An important
consideration is that when composefs is on by default (as it is expected
to be) it will no longer work to have an entry for /
in /etc/fstab
(or a systemd .mount
unit) that handles remounting the rootfs with
updated options after exiting the initrd.
In general, prefer using the rootflags
kernel argument for that
use case; it ensures that the filesystem is mounted with the
correct options to start, and avoid having an entry for /
in /etc/fstab
.
The physical root is mounted at /sysroot
. It is an option
for legacy /etc/fstab
references for /
to use
/sysroot
by default, but rootflags
is prefered.
Configuring machine-local state
Per the filesystem section, /etc
and /var
are machine-local
state by default. If you want to inject additional content after the installation
process, at the current time this can be done by manually finding the
target "deployment root" which will be underneath /ostree/deploy/<stateroot/deploy/
.
Installation software such as Anaconda
do this today to implement generic %post
scripts and the like.
However, it is very likely that a generic bootc API to do this will be added.
NAME
bootc-install - Install the running container to a target
SYNOPSIS
bootc install [-h|--help] <subcommands>
DESCRIPTION
Install the running container to a target.
## Understanding installations
OCI containers are effectively layers of tarballs with JSON for metadata; they cannot be booted directly. The `bootc install` flow is a highly opinionated method to take the contents of the container image and install it to a target block device (or an existing filesystem) in such a way that it can be booted.
For example, a Linux partition table and filesystem is used, and the bootloader and kernel embedded in the container image are also prepared.
A bootc installed container currently uses OSTree as a backend, and this sets it up such that a subsequent `bootc upgrade` can perform in-place updates.
An installation is not simply a copy of the container filesystem, but includes other setup and metadata.
OPTIONS
-h, --help
: Print help (see a summary with -h)
SUBCOMMANDS
bootc-install-to-disk(8)
: Install to the target block device
bootc-install-to-filesystem(8)
: Install to an externally created filesystem structure
bootc-install-to-existing-root(8)
: Install to the host root filesystem
bootc-install-ensure-completion(8)
: Intended for use in environments that are performing an ostree-based installation, not bootc
bootc-install-print-configuration(8)
: Output JSON to stdout that contains the merged installation configuration as it may be relevant to calling processes using `install to-filesystem` that in particular want to discover the desired root filesystem type from the container image
bootc-install-help(8)
: Print this message or the help of the given subcommand(s)
VERSION
v1.1.4
% bootc-install-config(5)
NAME
bootc-install-config.toml
DESCRIPTION
The bootc install
process supports some basic customization. This configuration file
is in TOML format, and will be discovered by the installation process in via "drop-in"
files in /usr/lib/bootc/install
that are processed in alphanumerical order.
The individual files are merged into a single final installation config, so it is supported for e.g. a container base image to provide a default root filesystem type, that can be overridden in a derived container image.
install
This is the only defined toplevel table.
The install
section supports two subfields:
block
: An array of supportedto-disk
backends enabled by this base container image; if not specified, this will just bedirect
. The only other supported value istpm2-luks
. The first value specified will be the default. To enable both, useblock = ["direct", "tpm2-luks"]
.filesystem
: See below.kargs
: An array of strings; this will be appended to the set of kernel arguments.match_architectures
: An array of strings; this filters the install config.
filesystem
There is one valid field:
root
: An instance of "filesystem-root"; see below
filesystem-root
There is one valid field:
type
: This can be any basic Linux filesystem with a mkfs.$fstype
. For example, ext4
, xfs
, etc.
Examples
[install.filesystem.root]
type = "xfs"
[install]
kargs = ["nosmt", "console=tty0"]
SEE ALSO
bootc(1)
NAME
bootc-install-to-disk - Install to the target block device
SYNOPSIS
bootc install to-disk [--wipe] [--block-setup] [--filesystem] [--root-size] [--source-imgref] [--target-transport] [--target-imgref] [--enforce-container-sigpolicy] [--target-ostree-remote] [--skip-fetch-check] [--disable-selinux] [--karg] [--root-ssh-authorized-keys] [--generic-image] [--bound-images] [--stateroot] [--via-loopback] [-h|--help] <DEVICE>
DESCRIPTION
Install to the target block device.
This command must be invoked inside of the container, which will be installed. The container must be run in `--privileged` mode, and hence will be able to see all block devices on the system.
The default storage layout uses the root filesystem type configured in the container image, alongside any required system partitions such as the EFI system partition. Use `install to-filesystem` for anything more complex such as RAID, LVM, LUKS etc.
OPTIONS
--wipe
: Automatically wipe all existing data on device
--block-setup=BLOCK_SETUP
: Target root block device setup.
direct: Filesystem written directly to block device tpm2-luks: Bind
unlock of filesystem to presence of the default tpm2 device.\
\
\[*possible values: *direct, tpm2-luks\]
--filesystem=FILESYSTEM
: Target root filesystem type\
\
\[*possible values: *xfs, ext4, btrfs\]
--root-size=ROOT_SIZE
: Size of the root partition (default specifier: M). Allowed specifiers: M (mebibytes), G (gibibytes), T (tebibytes).
By default, all remaining space on the disk will be used.
--source-imgref=SOURCE_IMGREF
: Install the system from an explicitly given source.
By default, bootc install and install-to-filesystem assumes that it
runs in a podman container, and it takes the container image to
install from the podmans container registry. If \--source-imgref is
given, bootc uses it as the installation source, instead of the
behaviour explained in the previous paragraph. See skopeo(1) for
accepted formats.
--target-transport=TARGET_TRANSPORT [default: registry]
: The transport; e.g. oci, oci-archive, containers-storage. Defaults to `registry`
--target-imgref=TARGET_IMGREF
: Specify the image to fetch for subsequent updates
--enforce-container-sigpolicy
: This is the inverse of the previous `--target-no-signature-verification` (which is now a no-op). Enabling this option enforces that `/etc/containers/policy.json` includes a default policy which requires signatures
--target-ostree-remote=TARGET_OSTREE_REMOTE
: Enable verification via an ostree remote
--skip-fetch-check
: By default, the accessiblity of the target image will be verified (just the manifest will be fetched). Specifying this option suppresses the check; use this when you know the issues it might find are addressed.
A common reason this may fail is when one is using an image which
requires registry authentication, but not embedding the pull secret
in the image so that updates can be fetched by the installed OS
\"day 2\".
--disable-selinux
: Disable SELinux in the target (installed) system.
This is currently necessary to install \*from\* a system with
SELinux disabled but where the target does have SELinux enabled.
--karg=KARG
: Add a kernel argument. This option can be provided multiple times.
Example: \--karg=nosmt \--karg=console=ttyS0,114800n8
--root-ssh-authorized-keys=ROOT_SSH_AUTHORIZED_KEYS
: The path to an `authorized_keys` that will be injected into the `root` account.
The implementation of this uses systemd \`tmpfiles.d\`, writing to a
file named \`/etc/tmpfiles.d/bootc-root-ssh.conf\`. This will have
the effect that by default, the SSH credentials will be set if not
present. The intention behind this is to allow mounting the whole
\`/root\` home directory as a \`tmpfs\`, while still getting the SSH
key replaced on boot.
--generic-image
: Perform configuration changes suitable for a "generic" disk image. At the moment:
\- All bootloader types will be installed - Changes to the system
firmware will be skipped
--bound-images=BOUND_IMAGES [default: stored]
: How should logically bound images be retrieved\
\
*Possible values:*
- stored: Bound images must exist in the sources root container
storage (default)
- pull: Bound images will be pulled and stored directly in the
targets bootc container storage
--stateroot=STATEROOT
: The stateroot name to use. Defaults to `default`
--via-loopback
: Instead of targeting a block device, write to a file via loopback
-h, --help
: Print help (see a summary with -h)
<DEVICE>
: Target block device for installation. The entire device will be wiped
VERSION
v1.1.4
NAME
bootc-install-to-filesystem - Install to an externally created filesystem structure
SYNOPSIS
bootc install to-filesystem [--root-mount-spec] [--boot-mount-spec] [--replace] [--acknowledge-destructive] [--skip-finalize] [--source-imgref] [--target-transport] [--target-imgref] [--enforce-container-sigpolicy] [--target-ostree-remote] [--skip-fetch-check] [--disable-selinux] [--karg] [--root-ssh-authorized-keys] [--generic-image] [--bound-images] [--stateroot] [-h|--help] <ROOT_PATH>
DESCRIPTION
Install to an externally created filesystem structure.
In this variant of installation, the root filesystem alongside any necessary platform partitions (such as the EFI system partition) are prepared and mounted by an external tool or script. The root filesystem is currently expected to be empty by default.
OPTIONS
--root-mount-spec=ROOT_MOUNT_SPEC
: Source device specification for the root filesystem. For example, UUID=2e9f4241-229b-4202-8429-62d2302382e1
If not provided, the UUID of the target filesystem will be used.
--boot-mount-spec=BOOT_MOUNT_SPEC
: Mount specification for the /boot filesystem.
This is optional. If \`/boot\` is detected as a mounted partition,
then its UUID will be used.
--replace=REPLACE
: Initialize the system in-place; at the moment, only one mode for this is implemented. In the future, it may also be supported to set up an explicit "dual boot" system\
\
*Possible values:*
- wipe: Completely wipe the contents of the target filesystem.
This cannot be done if the target filesystem is the one the
system is booted from
- alongside: This is a destructive operation in the sense that the
bootloader state will have its contents wiped and replaced.
However, the running system (and all files) will remain in place
until reboot
--acknowledge-destructive
: If the target is the running systems root filesystem, this will skip any warnings
--skip-finalize
: The default mode is to "finalize" the target filesystem by invoking `fstrim` and similar operations, and finally mounting it readonly. This option skips those operations. It is then the responsibility of the invoking code to perform those operations
--source-imgref=SOURCE_IMGREF
: Install the system from an explicitly given source.
By default, bootc install and install-to-filesystem assumes that it
runs in a podman container, and it takes the container image to
install from the podmans container registry. If \--source-imgref is
given, bootc uses it as the installation source, instead of the
behaviour explained in the previous paragraph. See skopeo(1) for
accepted formats.
--target-transport=TARGET_TRANSPORT [default: registry]
: The transport; e.g. oci, oci-archive, containers-storage. Defaults to `registry`
--target-imgref=TARGET_IMGREF
: Specify the image to fetch for subsequent updates
--enforce-container-sigpolicy
: This is the inverse of the previous `--target-no-signature-verification` (which is now a no-op). Enabling this option enforces that `/etc/containers/policy.json` includes a default policy which requires signatures
--target-ostree-remote=TARGET_OSTREE_REMOTE
: Enable verification via an ostree remote
--skip-fetch-check
: By default, the accessiblity of the target image will be verified (just the manifest will be fetched). Specifying this option suppresses the check; use this when you know the issues it might find are addressed.
A common reason this may fail is when one is using an image which
requires registry authentication, but not embedding the pull secret
in the image so that updates can be fetched by the installed OS
\"day 2\".
--disable-selinux
: Disable SELinux in the target (installed) system.
This is currently necessary to install \*from\* a system with
SELinux disabled but where the target does have SELinux enabled.
--karg=KARG
: Add a kernel argument. This option can be provided multiple times.
Example: \--karg=nosmt \--karg=console=ttyS0,114800n8
--root-ssh-authorized-keys=ROOT_SSH_AUTHORIZED_KEYS
: The path to an `authorized_keys` that will be injected into the `root` account.
The implementation of this uses systemd \`tmpfiles.d\`, writing to a
file named \`/etc/tmpfiles.d/bootc-root-ssh.conf\`. This will have
the effect that by default, the SSH credentials will be set if not
present. The intention behind this is to allow mounting the whole
\`/root\` home directory as a \`tmpfs\`, while still getting the SSH
key replaced on boot.
--generic-image
: Perform configuration changes suitable for a "generic" disk image. At the moment:
\- All bootloader types will be installed - Changes to the system
firmware will be skipped
--bound-images=BOUND_IMAGES [default: stored]
: How should logically bound images be retrieved\
\
*Possible values:*
- stored: Bound images must exist in the sources root container
storage (default)
- pull: Bound images will be pulled and stored directly in the
targets bootc container storage
--stateroot=STATEROOT
: The stateroot name to use. Defaults to `default`
-h, --help
: Print help (see a summary with -h)
<ROOT_PATH>
: Path to the mounted root filesystem.
By default, the filesystem UUID will be discovered and used for
mounting. To override this, use \`\--root-mount-spec\`.
VERSION
v1.1.4
NAME
bootc-install-to-existing-root - Install to the host root filesystem
SYNOPSIS
bootc install to-existing-root [--replace] [--source-imgref] [--target-transport] [--target-imgref] [--enforce-container-sigpolicy] [--target-ostree-remote] [--skip-fetch-check] [--disable-selinux] [--karg] [--root-ssh-authorized-keys] [--generic-image] [--bound-images] [--stateroot] [--acknowledge-destructive] [-h|--help] [ROOT_PATH]
DESCRIPTION
Install to the host root filesystem.
This is a variant of `install to-filesystem` that is designed to install "alongside" the running host root filesystem. Currently, the host root filesystems `/boot` partition will be wiped, but the content of the existing root will otherwise be retained, and will need to be cleaned up if desired when rebooted into the new root.
OPTIONS
--replace=REPLACE [default: alongside]
: Configure how existing data is treated\
\
*Possible values:*
- wipe: Completely wipe the contents of the target filesystem.
This cannot be done if the target filesystem is the one the
system is booted from
- alongside: This is a destructive operation in the sense that the
bootloader state will have its contents wiped and replaced.
However, the running system (and all files) will remain in place
until reboot
--source-imgref=SOURCE_IMGREF
: Install the system from an explicitly given source.
By default, bootc install and install-to-filesystem assumes that it
runs in a podman container, and it takes the container image to
install from the podmans container registry. If \--source-imgref is
given, bootc uses it as the installation source, instead of the
behaviour explained in the previous paragraph. See skopeo(1) for
accepted formats.
--target-transport=TARGET_TRANSPORT [default: registry]
: The transport; e.g. oci, oci-archive, containers-storage. Defaults to `registry`
--target-imgref=TARGET_IMGREF
: Specify the image to fetch for subsequent updates
--enforce-container-sigpolicy
: This is the inverse of the previous `--target-no-signature-verification` (which is now a no-op). Enabling this option enforces that `/etc/containers/policy.json` includes a default policy which requires signatures
--target-ostree-remote=TARGET_OSTREE_REMOTE
: Enable verification via an ostree remote
--skip-fetch-check
: By default, the accessiblity of the target image will be verified (just the manifest will be fetched). Specifying this option suppresses the check; use this when you know the issues it might find are addressed.
A common reason this may fail is when one is using an image which
requires registry authentication, but not embedding the pull secret
in the image so that updates can be fetched by the installed OS
\"day 2\".
--disable-selinux
: Disable SELinux in the target (installed) system.
This is currently necessary to install \*from\* a system with
SELinux disabled but where the target does have SELinux enabled.
--karg=KARG
: Add a kernel argument. This option can be provided multiple times.
Example: \--karg=nosmt \--karg=console=ttyS0,114800n8
--root-ssh-authorized-keys=ROOT_SSH_AUTHORIZED_KEYS
: The path to an `authorized_keys` that will be injected into the `root` account.
The implementation of this uses systemd \`tmpfiles.d\`, writing to a
file named \`/etc/tmpfiles.d/bootc-root-ssh.conf\`. This will have
the effect that by default, the SSH credentials will be set if not
present. The intention behind this is to allow mounting the whole
\`/root\` home directory as a \`tmpfs\`, while still getting the SSH
key replaced on boot.
--generic-image
: Perform configuration changes suitable for a "generic" disk image. At the moment:
\- All bootloader types will be installed - Changes to the system
firmware will be skipped
--bound-images=BOUND_IMAGES [default: stored]
: How should logically bound images be retrieved\
\
*Possible values:*
- stored: Bound images must exist in the sources root container
storage (default)
- pull: Bound images will be pulled and stored directly in the
targets bootc container storage
--stateroot=STATEROOT
: The stateroot name to use. Defaults to `default`
--acknowledge-destructive
: Accept that this is a destructive action and skip a warning timer
-h, --help
: Print help (see a summary with -h)
[ROOT_PATH] [default: /target]
: Path to the mounted root; this is now not necessary to provide. Historically it was necessary to ensure the host rootfs was mounted at here via e.g. `-v /:/target`
VERSION
v1.1.4
NAME
bootc-container-lint - Perform relatively inexpensive static analysis checks as part of a container build
SYNOPSIS
bootc container lint [--rootfs] [-h|--help]
DESCRIPTION
Perform relatively inexpensive static analysis checks as part of a container build.
This is intended to be invoked via e.g. `RUN bootc container lint` as part of a build process; it will error if any problems are detected.
OPTIONS
--rootfs=ROOTFS [default: /]
: Operate on the provided rootfs
-h, --help
: Print help (see a summary with -h)
VERSION
v1.1.4
"bootc compatible" images
It is a toplevel goal of this project to tightly integrate with the OCI ecosystem and make booting containers a normal activity.
However, there are a number of basic requirements and integration points, some of which have distribution-specific variants.
Further at the current time, the bootc project makes a lot of use of ostree, and this can appear in the base image requirements.
ostree-in-container
With bootc 1.1.3
or later, it is no longer required to have a /ostree
directory
present in the base image.
To generate container images which do include /ostree
from scratch,
the underlying ostree container
tooling is designed to operate
on an existing ostree commit, and the ostree container encapsulate
command can turn the commit into an OCI image. If you already
have a pipeline which prdouces ostree commits as an output
(e.g. using osbuild
to produce ostree
commit artifacts), then this allows a
seamless transition to a bootc/OCI compatible ecosystem.
Higher level base image build tooling
A well tested tool to produce compatible base images is
rpm-ostree compose image
,
which is used by the Fedora base image.
Standard image content
The bootc project provides a baseimage reference
set of configuration files for base images. In particular at
the current time the content defined by base
must be used
(or recreated). There is also suggested integration there with
e.g. dracut
to ensure the initramfs is set up, etc.
Standard metadata for bootc compatible images
It is strongly recommended to do:
LABEL containers.bootc 1
This will signal that this image is intended to be usable with bootc
.
Deriving from existing base images
It's important to emphasize that from one of these specially-formatted base images, every tool and technique for container building applies! In other words it will Just Work to do
FROM <bootc base image>
RUN dnf -y install foo && dnf clean all
You can then use podman build
, buildah
, docker build
, or any other container
build tool to produce your customized image. The only requirement is that the
container build tool supports producing OCI container images.
Kernel
The Linux kernel (and optionally initramfs) is embedded in the container image; the canonical location
is /usr/lib/modules/$kver/vmlinuz
, and the initramfs should be in initramfs.img
in that directory. You should not include any content in /boot
in your container image.
Bootc will take care of copying the kernel/initramfs as needed from the container image to
/boot
.
Future work for supporting UKIs will follow the recommendations of the uapi-group in Locations for Distribution-built UKIs Installed by Package Managers.
The bootc container lint
command will check this.
The ostree container commit
command
You may find some references to this; it is no longer very useful and is not recommended.
The bootloader setup
At the current time bootc relies on the bootupd
project which handles bootloader installs and upgrades. The invocation of
bootc install
will always run bootupd
to perform installations.
Additionally, bootc upgrade
will currently not upgrade the bootloader;
you must invoke bootupctl update
.
SELinux
Container runtimes such as podman
and docker
commonly
apply a "coarse" SELinux policy to running containers.
See container-selinux.
It is very important to understand that non-bootc base
images do not (usually) have any embedded security.selinux
metadata
at all; all labels on the toplevel container image
are dynamically generated per container invocation,
and there are no individually distinct e.g. etc_t
and
usr_t
types.
In contrast, with the current OSTree backend for bootc,
it is possible to include label metadata (and precomputed ostree
checksums) in special metadata files in /sysroot/ostree
that correspond
to components of the base image. This is optional as of bootc v1.1.3.
File content in derived layers will be labeled using the default file
contexts (from /etc/selinux
). For example, you can do this (as of
bootc 1.1.0):
RUN semanage fcontext -a -t httpd_sys_content_t "/web(/.*)?"
(This command will write to /etc/selinux/$policy/policy/
.)
It will currently not work to do e.g.:
RUN chcon -t foo_t /usr/bin/foo
Because the container runtime state will deny the attempt to
"physically" set the security.selinux
extended attribute.
In the future, it is likely however that we add support
for handling the security.selinux
extended attribute in tar
streams; but this can only currently be done with a custom
build process.
Toplevel directories
In particular, a common problem is that inside a container image,
it's easy to create arbitrary toplevel directories such as
e.g. /app
or /aimodel
etc. But in some SELinux policies
such as Fedora derivatives, these will be labeled as default_t
which few domains can access.
References:
composefs
It is strongly recommended to enable the ostree composefs backend (but not strictly required) for bootc.
A reference enablement file to do so is in the base image content referenced above.
More in ostree-prepare-root.
Filesystem
As noted in other chapters, the bootc project currently depends on ostree project for storing the base container image. Additionally there is a containers/storage instance for logically bound images.
However, bootc is intending to be a "fresh, new container-native interface", and ostree is an implementation detail.
First, it is strongly recommended that bootc consumers use the ostree
composefs backend; to do this,
ensure that you have a /usr/lib/ostree/prepare-root.conf
that contains at least
[composefs]
enabled = true
This will ensure that the entire /
is a read-only filesystem which
is very important for achieving correct semantics.
Understanding container build/runtime vs deployment
When run as a container (e.g. as part of a container build), the filesystem is fully mutable in order to allow derivation to work. For more on container builds, see build guidance.
The rest of this document describes the state of the system when
"deployed" to a physical or virtual machine, and managed by bootc
.
Timestamps
bootc uses ostree, which currently squashes all timestamps to zero. This is now viewed as an implementation bug and will be changed in the future. For more information, see this tracker issue.
Understanding physical vs logical root with /sysroot
When the system is fully booted, it is into the equivalent of a chroot
.
The "physical" host root filesystem will be mounted at /sysroot
.
For more on this, see filesystem: sysroot.
This chroot
filesystem is called a "deployment root". All the remaining
filesystem paths below are part of a deployment root which is used as a
final target for the system boot. The target deployment is determined
via the ostree=
kernel commandline argument.
/usr
The overall recommendation is to keep all operating system content in /usr
,
with directories such as /bin
being symbolic links to /usr/bin
, etc.
See UsrMove for example.
However, with composefs enabled /usr
is not different from /
;
they are part of the same immutable image. So there is not a fundamental
need to do a full "UsrMove" with a bootc system.
/usr/local
The OSTree upstream recommendation suggests making /usr/local
a symbolic
link to /var/usrlocal
. But because the emphasis of a bootc-oriented system is
on users deriving custom container images as the default entrypoint,
it is recommended here that base images configure /usr/local
be a regular
directory (i.e. the default).
Projects that want to produce "final" images that are themselves not intended to be derived from in general can enable that symbolic link in derived builds.
/etc
The /etc
directory contains mutable persistent state by default; however,
it is suppported to enable the etc.transient
config option,
see below as well.
When in persistent mode, it inherits the OSTree semantics of performing a 3-way merge across upgrades. In a nutshell:
- The new default
/etc
is used as a base - The diff between current and previous
/etc
is applied to the new/etc
- Locally modified files in
/etc
different from the default/usr/etc
(of the same deployment) will be retained
The implementation of this defaults to being executed by ostree-finalize-staged.service
at shutdown time, before the new bootloader entry is created.
The rationale for this design is that in practice today, many components of a Linux system end up shipping
default configuration files in /etc
. And even if the default package doesn't, often the software
only looks for config files there by default.
Some other image-based update systems do not have distinct "versions" of /etc
and
it may be populated only set up at a install time, and untouched thereafter. But
that creates "hysteresis" where the state of the system's /etc
is strongly
influenced by the initial image version. This can lead to problems
where e.g. a change to /etc/sudoers.conf
(to give on simple example)
would require external intervention to apply.
For more on configuration file best practices, see Building.
/usr/etc
The /usr/etc
tree is generated client side and contains the default container image's
view of /etc
. This should generally be considered an internal implementation detail
of bootc/ostree. Do not explicitly put files into this location, it can create
undefined behavior. There is a check for this in bootc container lint
.
/var
Content in /var
persists by default; it is however supported to make it or subdirectories
mount points (whether network or tmpfs
). There is exactly one /var
. If it is
not a distinct partition, then "physically" currently it is a bind mount into
/ostree/deploy/$stateroot/var
and shared across "deployments" (bootloader entries).
As of OSTree v2024.3, by default content in /var acts like a Docker VOLUME /var.
This means that the content from the container image is copied at initial installation time, and not updated thereafter.
Note this is very different from the handling of /etc
. The rationale for this is
that /etc
is relatively small configuration files, and the expected configuration
files are often bound to the operating system binaries in /usr
.
But /var
has arbitrarily large data (system logs, databases, etc.). It would
also not be expected to be rolled back if the operating system state is rolled
back. A simple example is that an apt|dnf downgrade postgresql
should not
affect the physical database in general in /var/lib/postgres
. Similarly,
a bootc update or rollback should not affect this application data.
Having /var
separate also makes it work cleanly to "stage" new
operating system updates before applying them (they're downloaded
and ready, but only take effect on reboot).
In general, this is the same rationale for Docker VOLUME
: decouple the application
code from its data.
A common case is for applications to want some directory structure (e.g. /var/lib/postgresql
) to be pre-created.
It's recommended to use systemd tmpfiles.d
for this. An even better approach where applicable is StateDirectory=
in units.
Other directories
It is not supported to ship content in /run
or /proc
or other API Filesystems in container images.
Besides those, for other toplevel directories such as /usr
/opt
, they will be lifecycled with the container image.
/opt
In the default suggested model of using composefs (per above) the /opt
directory will be read-only, alongside
other toplevels such as /usr
.
Some software (especially "3rd party" deb/rpm packages) expect to be able to write to
a subdirectory of /opt
such as /opt/examplepkg
.
See building images for recommendations on how to build container images and adjust the filesystem for cases like this.
However, for some use cases, it may be easier to allow some level of mutability. There are two options for this, each with separate trade-offs: transient roots and state overlays.
Other toplevel directories
Creating other toplevel directories and content (e.g. /afs
, /arbitrarymountpoint
)
or in general further nested data is supported - just create the directory
as part of your container image build process (e.g. RUN mkdir /arbitrarymountpoint
).
These directories will be lifecycled with the container image state,
and appear immutable by default, the same as all other directories
such as /usr
and /opt
.
Mounting separate filesystems there can be done by the usual mechanisms
of /etc/fstab
, systemd .mount
units, etc.
SELinux for arbitrary toplevels
Note that operating systems using SELinux may use a label such as
default_t
for unknown toplevel directories, which may not be
accessible by some processes. In this situation you currently may
need to also ensure a label is defined for them in the file contexts.
Enabling transient root
This feature enables a fully transient writable rootfs by default. To do this, set the
[root]
transient = true
option in /usr/lib/ostree/prepare-root.conf
. In particular this will allow software to
write (transiently, i.e. until the next reboot) to all top-level directories,
including /usr
and /opt
, with symlinks to /var
for content that should
persist.
This can be combined with etc.transient
as well (below).
More on prepare-root: https://ostreedev.github.io/ostree/man/ostree-prepare-root.html
Enabling transient etc
The default (per above) is to have /etc
persist. If however you do
not need to use it for any per-machine state, then enabling a transient
/etc
is a great way to reduce the amount of possible state drift. Set
the
[etc]
transient = true
option in /usr/lib/ostree/prepare-root.conf
.
This can be combined with root.transient
as well (above).
More on prepare-root: https://ostreedev.github.io/ostree/man/ostree-prepare-root.html
Enabling state overlays
This feature enables a writable overlay on top of /opt
(or really, any
toplevel or subdirectory baked into the image that is normally read-only).
Changes persist across reboots but during updates, new files from the container
image override any locally modified version. All other files persist.
To enable this feature, simply instantiate the ostree-state-overlay@.service
unit template on the target path. For example, for /opt
:
RUN systemctl enable ostree-state-overlay@opt.service
Filesystem: Physical /sysroot
The bootc project uses ostree as a backend, and maps fetched container images to a deployment.
stateroot
The underlying ostree
CLI and API tooling expose a concept of stateroot
, which
is not yet exposed via bootc
. The stateroot
used by bootc install
is just named default
.
The stateroot concept allows having fully separate parallel operating
system installations with fully separate /etc
and /var
, while
still sharing an underlying root filesystem.
In the future, this functionality will be exposed and used by bootc
.
/sysroot mount
When booted, the physical root will be available at /sysroot
as a
read-only mount point and the logical root /
will be a bind mount
pointing to a deployment directory under /sysroot/ostree
. This is a
key aspect of how bootc upgrade
operates: it fetches the updated
container image and writes the base image files (using OSTree storage
to /sysroot/ostree/repo
).
Beyond that and debugging/introspection, there are few use cases for tooling to operate on the physical root.
bootc-owned container storage
For logically bound images,
bootc maintains a dedicated containers/storage
instance using the overlay
backend (the same type of thing that backs /var/lib/containers
).
This storage is accessible via a /usr/lib/bootc/storage
symbolic link which points into
/sysroot
. (Avoid directly referencing the /sysroot
target)
At the current time, this storage is not used for the base bootable image. This unified storage issue tracks unification.
Expanding the root filesystem
One notable use case that does need to operate on /sysroot
is expanding the root filesystem.
Some higher level tools such as e.g. cloud-init
may (reasonably)
expect the /
mount point to be the physical root. Tools like
this will need to be adjusted to instead detect this and operate
on /sysroot
.
Growing the block device
Fundamentally bootc is agnostic to the underlying block device setup.
How to grow the root block device depends on the underlying
storage stack, from basic partitions to LVM. However, a
common tool is the growpart
utility from cloud-init
.
Growing the filesytem
The systemd project ships a systemd-growfs
tool and corresponding systemd-growfs@
services. This is
a relatively thin abstraction over detecting the target
root filesystem type and running the underlying tool such as
xfs_growfs
.
At the current time, most Linux filesystems require
the target to be mounted writable in order to grow. Hence,
an invocation of system-growfs /sysroot
or xfs_growfs /sysroot
will need to be further wrapped in a temporary mount namespace.
Using a MountFlags=slave
drop-in stanza for systemd-growfs@sysroot.service
is recommended, along with an ExecStartPre=mount -o remount,rw /sysroot
.
Detecting bootc/ostree systems
For tools like cloud-init
that want to operate generically,
conditionally detecting this situation can be done via e.g.:
- Checking for
/
being anoverlay
mount point - Checking for
/sysroot/ostree
Container storage
The bootc project uses ostree and specifically the ostree-rs-ext Rust library which handles storage of container images on top of an ostree-based system for the booted host, and additionally there is a containers/storage instance for logically bound images.
Architecture
flowchart TD bootc --- ostree-rs-ext --- ostree-rs --- ostree ostree-rs-ext --- containers-image-proxy-rs --- skopeo --- containers/image bootc --- podman --- image-storage["containers/{image,storage}"]
There were two high level goals that drove the design of the current system architecture:
- Support seamless in-place migrations from existing ostree systems
- Avoid requiring deep changes to the podman stack
A simple way to explain the current architecture is that podman uses two Go libraries:
Whereas ostree uses a custom container storage, not containers/storage
.
Mapping container images to ostree
OCI images are effectively just a standardized format of tarballs wrapped with JSON - specifically "layers" of tarballs.
The ostree-rs-ext project maps layers to OSTree commits. Each layer
is stored separately, under an ostree "ref" (like a git branch)
under the ostree/container/
namespace:
$ ostree refs ostree/container
Layers
The ostree/container/blob
namespace tracks storage of a container layer
identified by its blob ID (sha256 digest).
Images
At the current time, ostree always boots into a "flattened" filesystem tree. This is generated as both a hardlinked checkout as well as a composefs image.
The flattened tree is constructed and committed into the
ostree/container/image
namespace. The commit metadata also includes
the OCI manifest and config objects.
This is implemented in the ostree-rs-ext/container module.
SELinux labeling
A major wrinkle is supporting SELinux labeling. The labeling configuration
is defined as regular expressions included in /etc/selinux/$policy/contexts/
.
The current implementation relies on the fact that SELinux labels for base images were pre-computed. The first step is to check out the "ostree base" layers for the base image.
All derived layers have labels computed from the base image policy. This causes a known bug where derived layers can't include custom policy: https://github.com/ostreedev/ostree-rs-ext/issues/510
Origin files
ostree has the concept of an origin
file which defines the source
of truth for upgrades. The container image reference for each deployment
is included in its origin.
Booting
A core aspect of this entire design is that once a container image is fetched into the ostree storage, from there on it just appears as an "ostree commit", and so all code built on top can work with it.
For example, the ostree-prepare-root.service
which runs in
the initramfs is currently agnostic to whether the filesystem tree originated
from an OCI image or some other mechanism; it just targets a
prepared flattened filesystem tree.
This is what is referenced by the ostree=
kernel commandline.
Logically bound images
In addition to the base image, bootc supports logically bound images.
bootc image
Experimental features are subject to change or removal. Please do provide feedback on them.
Tracking issue: https://github.com/containers/bootc/issues/690
Using bootc image copy-to-storage
This experimental command is intended to aid in booting local builds.
Invoking this command will default to copying the booted container image into the containers-storage:
area as used by e.g. podman
, under the image tag localhost/bootc
by default. It can
then be managed independently; used as a base image, pushed to a registry, etc.
Run bootc image copy-to-storage --help
for more options.
Example workflow:
$ bootc image copy-to-storage
$ cat Containerfile
FROM localhost/bootc
...
$ podman build -t localhost/bootc-custom .
$ bootc switch --transport containers-storage localhost/bootc-custom
Interactive progress with --progress-fd
This is an experimental feature; tracking issue: https://github.com/containers/bootc/issues/1016
While the bootc status
tooling allows a client to discover the state
of the system, during interactive changes such as bootc upgrade
or bootc switch
it is possible to monitor the status of downloads
or other operations at a fine-grained level with --progress-fd
.
The format of data output over --progress-fd
is JSON Lines
which is a series of JSON objects separated by newlines (the intermediate
JSON content is guaranteed not to contain a literal newline).
You can find the JSON schema describing this version here: progress-v0.schema.json.
Deploying a new image with either switch or upgrade consists
of three stages: pulling
, importing
, and staging
. The pulling
step
downloads the image from the registry, offering per-layer and progress in
each message. The importing
step imports the image into storage and consists
of a single step. Finally, staging
runs a variety of staging
tasks. Currently, they are staging the image to disk, pulling bound images,
and removing old images.
Note that new stages or fields may be added at any time.
Importing and staging are affected by disk speed and the total image size. Pulling is affected by network speed and how many layers invalidate between pulls. Therefore, a large image with a good caching strategy will have longer importing and staging times, and a small bespoke container image will have negligible importing and staging times.
Package manager integration
A toplevel goal of bootc is to encourage a default model
where Linux systems are built and delivered as (container) images.
In this model, the default usage of package managers such as apt
and dnf
will be at container build time.
However, one may end up shipping the package manager tooling onto
the end system. In some cases this may be desirable even, to allow
workflows with transient overlays using e.g. bootc usroverlay
.
Detecting image-based systems
bootc is not the only image based system; there are many. A common
emphasis is on having the operating system content in /usr
,
and for that filesystem to be mounted read-only at runtime.
A first recommendation here is that package managers should
detect if /usr
is read-only, and provide a useful error
message referring users to documentation guidance.
An example of a non-bootc case is "Live CD" environments,
where the physical media is readonly. Some Live operating system environments end
up mounting a transient writable overlay (whether via e.g. devicemapper or overlayfs)
that make the system appear writable, but it's arguably clearer not to do so by
default. Detecting /usr
as read-only here and providing the same information
would make sense.
Running a read-only system via podman/docker
The historical default for docker (inherited into podman) is that
the /
is a writable (but transient) overlayfs. However, e.g. podman
supports a --read-only
flag, and Kubernetes pods offer a
securityContext.readOnlyRootFilesystem
flag.
Running containers in production in this way is a good idea, for exactly the same reasons that bootc defaults to mounting the system read-only.
Ensure that your package manager offers a useful error message in this mode. Today for example:
$ podman run --read-only --rm -ti debian apt update
Reading package lists... Done
E: List directory /var/lib/apt/lists/partial is missing. - Acquire (30: Read-only file system)
$ podman run --read-only --rm -ti quay.io/fedora/fedora:40 dnf -y install strace
Config error: [Errno 30] Read-only file system: '/var/log/dnf.log': '/var/log/dnf.log'
However note that both of these fail on /var
being read-only; in a default bootc
model, it won't be. A more accurate check is thus closer to:
$ podman run --read-only --rm -ti --tmpfs /var quay.io/fedora/fedora:40 dnf -y install strace
...
Error: Transaction test error:
installing package strace-6.9-1.fc40.x86_64 needs 2MB more space on the / filesystem
$ podman run --read-only --rm --tmpfs /var -ti debian /bin/sh -c 'apt update && apt -y install strace'
...
dpkg: error processing archive /var/cache/apt/archives/libunwind8_1.6.2-3_amd64.deb (--unpack):
unable to clean up mess surrounding './usr/lib/x86_64-linux-gnu/libunwind-coredump.so.0.0.0' before installing another version: Read-only file system
These errors message are misleading and confusing for the user. A more useful error may look like e.g.:
$ podman run --read-only --rm --tmpfs /var -ti debian /bin/sh -c 'apt update && apt -y install strace'
error: read-only /usr detected, refusing to operate. See `man apt-image-based` for more information.
Detecting bootc specifically
You may also reasonably want to detect that the operating system is specifically
using bootc
. This can be done via e.g.:
bootc status --format=json | jq -r .spec.image
If the output of that field is non-null
, then the system is a bootc system
tracking the specified image.
Transient overlays
Today there is a simple bootc usroverlay
command that adds a transient writable overlayfs for /usr
.
This makes many package manager operations work; conceptually it is similar
to the writable overlay that many "Live CDs" use. However, one cannot change the kernel
this way for example.
An optional integration that package managers can do is to detect this transient overlay situation and inform the user that the changes will be ephemeral.
Persistent changes
A bootc system by default does have a writable, persistent data store that holds multiple container image versions (more in filesystem).
Systems such as rpm-ostree implement a "hybrid" mechanism where packages can be persistently layered and re-applied; the system effectively does a "local build", unioning the intermediate filesystems.
One aspect of how rpm-ostree implements this is by caching individual unpacked RPMs as ostree commits in the ostree repo.
This section will be expanded later; you may also be able to find more information in booting local builds.
Relationship with other projects
bootc is the key component in a broader mission of bootable containers. Here's its relationship to other moving parts.
Relationship with podman
It gets a bit confusing to talk about shipping bootable operating systems in container images. Again, to be clear: we are reusing container images as:
- A build mechanism (including running as a standard OCI container image)
- A transport mechanism
But, actually when a bootc container is booted, podman (or docker, etc.) is not involved.
The storage used for the operating system content is distinct from /var/lib/containers
.
podman image prune --all
will not delete your operating system.
That said, a toplevel goal of bootc is alignment with the https://github.com/containers ecosystem, which includes podman. But more specifically at a technical level, today bootc uses skopeo and hence indirectly containers/image as a way to fetch container images.
This means that bootc automatically also honors many of the knobs available in /etc/containers
- specifically
things like containers-registries.conf.
In other words, if you configure podman
to pull images from your local mirror registry, then bootc
will automatically honor that as well.
The simple way to say it is: A goal of bootc
is to be the bootable-container analogue for podman
, which runs application containers. Everywhere one might run podman
, one could also consider using bootc
.
Relationship with Image Builder (osbuild)
There is a new bootc-image-builder project that is dedicated to the intersection of these two!
Relationship with Kubernetes
Just as podman
does not depend on a Kubernetes API server, bootc
will also not depend on one.
However, there are also plans for bootc
to also understand Kubernetes API types. See configmap/secret support for example.
Perhaps in the future we may actually support some kind of Pod
analogue for representing the host state. Or we may define a CRD which can be used inside and outside of Kubernetes.
Relationship with ostree
OSTree provides many things:
- a git-like repo for OS data from which you can check out an entire rootfs
- a bootloader integration layer
- a transport layer for pulling content over HTTP
With bootc, the OSTree transport layer is not used. Instead, content is pulled
as OCI containers using skopeo
as mentioned above. However, this content
is then imported into the local OSTree repo to perform a deployment checkout.
The role of OSTree may further shrink in the future, especially as tighter
integration with podman and composefs occurs, but it will remain an important
part of the bootc stack (in particular the bootloader integration layer and
management of deployment roots).
Relationship with rpm-ostree
As mentioned above, bootc uses OSTree as a backing model, and so does
rpm-ostree. Hence, when using a container source, rpm-ostree upgrade
and
bootc upgrade
are effectively equivalent; you can use either command.
Differences from rpm-ostree
- The ostree project never tried to have an opinionated "install" mechanism,
but bootc does with
bootc install to-filesystem
- Bootc has additional features such as
/usr/lib/bootc/kargs.d
and logically bound images.
Client side changes
Currently all functionality for client-side changes
such as rpm-ostree install
or rpm-ostree initramfs --enable
continue to work, because of the shared base.
However, as soon as you mutate the system in this way, bootc upgrade
will error out as it will not understand how to upgrade
the system. The bootc project currently takes a relatively
hard stance that system state should come from a container image.
The way kernel argument work also uses ostree on the backend
in both cases, so using e.g. rpm-ostree kargs
will also work
on a system updating via bootc.
Overall, rpm-ostree is used in several important projects and will continue to be maintained for many years to come.
However, for use cases which want a "pure" image based model,
using bootc
will be more appealing. bootc also does not
e.g. drag in dependencies on libdnf
and the RPM stack.
bootc also has the benefit of starting as a pure Rust project; and while it doesn't have an IPC mechanism today, the surface of such an API will be significantly smaller.
Further, bootc does aim to include some of the functionality of zincati.
But all this said: It will be supported to use both bootc and rpm-ostree together; they are not exclusive.
For example, bootc status
at least will still function even if packages are layered.
Future bootc <-> podman binding
All the above said, it is likely that at some point bootc will switch to hard binding with podman. This will reduce the role of ostree, and hence break compatibility with rpm-ostree. When such work lands, we will still support at least a "one way" transition from an ostree backend. But once this happens there are no plans to teach rpm-ostree to use podman too.
Relationship with Fedora CoreOS (and Silverblue, etc.)
Per above, it is a toplevel goal to support a seamless, transactional update from existing OSTree based systems, which includes these Fedora derivatives.
For Fedora CoreOS specifically, see this tracker issue.
See also OstreeNativeContainerStable.
How does the use of OCI artifacts intersect with this effort?
The "bootc compatible" images are OCI container images; they do not rely on the OCI artifact specification or OCI referrers API.
It is foreseeable that users will need to produce "traditional" disk images (i.e. raw disk images, qcow2 disk images, Amazon AMIs, etc.) from the "bootc compatible" container images using additional tools. Therefore, it is reasonable that some users may want to encapsulate those disk images as an OCI artifact for storage and distribution. However, it is not a goal to use bootc
to produce these "traditional" disk images nor to facilitate the encapsulation of those disk images as OCI artifacts.
Relationship with systemd "particles"
There is an excellent vision blog entry that puts together a coherent picture for how a systemd (and uapi-group.org) oriented Linux based operating system can be put together, and the rationale for doing so.
The "bootc vision" aligns with parts of this, but differs in emphasis and also some important technical details - and some of the emphasis and details have high level ramifications. Simply stated: related but different.
System emphasis
The "particle" proposal mentions that the desktop case is most interesting; the bootc belief is that servers are equally important and interesting. In practice, this is not a real point of differentiation, because the systemd project has done an excellent job in catering to all use cases (desktop, embedded, server) etc.
An important aspect related to this is that the bootc project exists and must interact with many ecosystems, from "systemd-oriented Linux" to Android and Kubernetes. Hence, we would not explicitly compare with just ChromeOS, but also with e.g. Kairos and many others.
Design goals
Many of the toplevel design goals do overall align. It is clear that e.g. Discoverable Disk Images and OCI images align on managing systems in an image-oriented fashion.
A difference on goal 11
Goal 11 states:
Things should not require explicit installation. i.e. every image should be a live image. For installation it should be sufficient to dd an OS image onto disk.
The bootc install
approach is explicitly intending to support things such
as e.g. static IP addresses provisioned via kernel arguments at install time;
it is not a goal for installations to be equivalent to dd
. The bootc creator has experience with systems that install this way, and it creates practical problems in nontrivial scenarios such as "Advanced Format" disk drives, etc.
New Goal: An explicit alignment with cloud-native
The bootc project has an explicit goal to to take formats, cues and inspiration from the container and cloud-native ecosystem. More on this in several sections below.
New Goal: Continued explicit support for "unlocked" systems
A strong emphasis of the particle approach is "sealed" systems that chain from Secure Boot. bootc aims to support the same. And in practice, nothing in "particles" strictly requires Secure Boot etc.
However, bootc has a stronger emphasis on continuing to support "unlocked" systems into the foreseeable future in which key (even root level) operating system changes can be that are outside of an explicit signed state and feel equally first class, not just "developer system extensions".
Or stated more simply, it will be explicitly supported to create bootc-based operating systems that boot as e.g. a cloud instance or as desktop machine that defaults to an unlocked state and provides good ergonomics in this scenario for managing user owned state across operating system upgrades too.
Hermetic /usr
One of the biggest differences starts with this. The idea of having
the entire operating system self-contained in /usr
is a good one. However,
there is an immense amount of prior history and details that make this
hard to support in many generalized cases.
This tracking issue is a good starting point - it's mostly about /etc
(see below).
bootc design: Carve out sub mounts
Instead, the bootc model allows arbitrary directory roots starting from /
to be included in the base operating system image.
This first notable difference is rooted in bootc taking a stronger cue from the opencontainers ecosystem (including docker/podman/Kubernetes). There are no restrictions on application container filesystem layout (everything is ephemeral by default, and persistence must be explicit); bootc aims to be closer to this.
There is still alignment: bootc design does strongly encourage operating
system state to live underneath /usr
- it should be the default place for all
operating system executable binaries and default configuration. It should be
read-only by default.
/etc
Today, the bootc project uses ostree as a backend,
and a key semantic ostree provides for /etc
is a "3 way merge".
This has several important differences. First, it means that /etc
does get updated
by default for unchanged configuration files.
The default proposal for "particle" OSes to deal with "legacy" config files in
/etc
is to copy them on first OS install (e.g. /usr/share/factory
).
This creates serious problems for all the software (for example, OpenSSH) that put config files there; - having the default configuration updated (e.g. for a security issue) for a package manager but not an image based update is not viable.
However a key point of alignment between the two is that we still aim to
have /etc
exist and be useful! Writing files there, whether from vi
or config management tooling must continue to work. Both bootc and systemd "particle"
systems should still Feel Like Unix - in contrast to e.g. Android.
At the current time, this is implemeted in ostree; as bootc moves towards stronger integration with podman, it is likely that this logic will simply be moved into bootc instead on top of podman. Alternatively perhaps, podman itself may grow some support for specifying this merge semantic for containers.
Other persistent state: /var
Supporting arbitrary toplevel files in /
on operating system
updates conflicts with a desire to have e.g. /home
be persistent by default.
Hence, bootc emphasizes having e.g. /home
→ /var/home
as
a default symlink in base images.
Aside from /home
and /etc
, it is common on most Linux systems to
have most persistent state under /var
, so this is not a major point
of difference otherwise.
Other toplevel files/directories
Even the operating systems have completed "UsrMerge" still have
legacy compatibility symlinks required in /
, e.g. /bin
→ /usr/bin
.
We still need to support shipping these for many cases, and they
are an important part of operating system state. Having them
not be explicitly managed by OS updates is hence suboptimal.
Related to this, bootc will continue to support operating systems that have not completed UsrMerge.
Discoverable Disk images and booting
The bootc project will not use Discoverable Disk Images. Instead, we orient as strongly around opencontainers/image-spec i.e. OCI/Docker images.
This is the biggest technical difference that strongly influences many other aspects of operating system design and experience.
It is an explicit goal of the bootc project that it should feel as natural as possible for someone familiar with "application containers" from podman/Docker/Kubernetes to take their tools and knowledge and apply that to the base operating system too.
Technical heart: composefs
There is a very strong security rationale behind much of the design proposal of "particles" and DDIs. It is absolutely true today, quoting the blog:
That said, I think [OCI has] relatively weak properties, in particular when it comes to security, since immutability/measurements and similar are not provided. This means, unlike for system extensions and portable services a complete trust chain with attestation and per-app cryptographically protected data is much harder to implement sanely.
The composefs project aims to close this gap, and the bootc project will use it, and has an explicit goal to align with e.g. podman in using it too.
Effectively, everywhere one might use a DDI, bootc will usually support a container image. (However for some things like system configuration files, bootc may aim to instead support e.g. plain ConfigMap files which are signed for example).
System booting
The bootloader
The strong emphasis of the UAPI-group is on UEFI. However, the world is a bit broader than that; the bootc project also will explicitly continue to support:
- GNU Grub for multiple reasons; among them that unfortunately x86 BIOS systems will not disappear entirely in the next 10 years even.
- Android Boot - because some hardware manufacturers ship it, and we want to support operating systems that must work on this hardware.
- zipl because it's how things work on s390x, and there is significant alignment in terms of emphasizing a "unified kernel" style flow.
Boot loader configs
bootc aims to align with the idea of generic bootloader-independent config files where possible; today it uses ostree. For more on this, see ostree and bootloaders.
The kernel and initramfs
There is agreement that in order to achieve integrity, there must be a strong link between the kernel and the first userspace code that executes in the initial RAM disk.
Building on the bootloader statement above: bootc will support UKI, but not require it.
The root filesystem
In the bootc model, the root filesystem defaults to a single physical Linux filesystem (e.g. xfs
, ext4
, btrfs
etc.). It is of course supported to mount other partitions and filesystems; doing so is encouraged even for /var
. , where one ends up with some space constraints around the OS /usr
partition due to dm-verity.
This is a rather large difference already from particles; the root filesystem contains the operating system too; it is not a separate partition. One thing this helps significantly with is dealing with the "space management" problems that dm-verity introduces (need for a partition to have unused empty space to grow, and also a fixed-size ultimate capacity limit).
Locating the root
bootc does not mandate or emphasize any particular way to locate the root filesystem;
parts of the discoverable partitions specification specifically the "root partition" may be
used. Or, the root filesystem can be found the traditional way, via a local root=
kernel argument.
Another point of contrast from the particle emphasis is that while we encourage encrypting the root filesystem, it is not required. Particularly some use cases in cloud environments perform encryption at the hypervisor level and do not want additional overhead of doing so per virtual machine.
Locating the base container image
Until this point, we have been operating under external constraints; no one is creating a bootloader that directly understands how to start a container image, for example. We've gotten as far as running a Linux userspace in the initial RAM disk, and the physical root filesystem is mounted.
Here, we circle back to composefs. One can think of composefs as effectively a way to manage something like dm-verity, but using files.
What bootc builds on top of that is to target a specfic container image rootfs
that is part of the "physical" root. Today, this is implemented again using ostree, via the ostree=
kernel commandline argument. In the future, it is likely to be a bootc.image
.
However, integration with other bootloaders (such as Android Boot) require us to interact
with externally-specified fixed kernel arguments.
Ultimately, the initramfs will contain logic to find the desired root container, which again is just a set of files stored in the "physical" root filesystem.
Chaining integrity from the initramfs
One can think of composefs as effectively a way to manage something like dm-verity, but supporting multiple ones stored inside a standard Linux filesystem.
For "sealed" systems, the bootc project suggests a default model where there is an "ephemeral key" that binds the UKI (or equivalent) and the real root. For a bit more on this, see ostree and composefs. Effectively, at image build time an "ephemeral" key is generated which signs the composefs digest of the container image. The public half of this key is injected into the UKI, which is itself signed e.g. for Secure Boot.
At boot time, the initramfs will use its embedded public key to verify the composefs digest of the target root - and from there, overlayfs in the Linux kernel combined with fs-verity will continually verify the integrity of all operating system root files we use.
At the current time, there is not one single standardized approach for signing composefs images. Ultimately, a composefs image has a digest, and signing and verification of that digest can be done via any signing tool. For more on this, see this issue.
bootc itself will not mandate one mechanism currently. However, it is very likely that we will ship an optionally-enabled opinionated mechanism that uses basic ed25519 signatures for example.
This is effectively equivalent to the particle approach of embedding a verity root hash into the kernel commandline - it means that the booted Linux kernel will only be capable of mounting that one specific root filesystem. Note that this model is effectively the same as e.g. Fedora uses to sign kernel modules.
However, an "ephemeral key" is not the only valid way to do things; for some operating system creators it may be very desirable to continue to be able to make root OS image changes without changing the UKI (and hence re-signing it). Instead, another valid approach is to simply maintain a persistent public/private keypair. This allows disconnecting the build of userspace and kernel, but also means that there is less strict verification between kernel and userspace (e.g. downgrade attacks become possible).
Chaining integrity to configuration and application containers
composefs is explicitly designed to be useful as a backend for "application" containers (e.g. podman). There is again not one single mechanism for signing and verification; in some use cases, it may be enough to boot the operating system enough to implement "network as source of truth" - for example, the public keys for verification of application containers might be fetched from a remote server. Then before any application containers are run, we dynamically fetch the relevant keys from a server which was trusted.
The bootc project will align with podman in general, and make it easy to implement a mechanism that chains keys stored alongside the operating system into composefs-signed application containers.
Configuration (effectively starting from /etc
and the kernel commandline) in a "sealed" system is a complex topic. Many operating system builds will want to disable the default "etc merge" and make /etc
always lifecycle bound with the OS: commonly writable but ephemeral.
This topic is covered more in the next section.
Modularity
A goal of "particles" is to add integrity into "general purpose" Linux OSes and distributions - supporting a world where there are a lot of users that simply directly install an OS from an upstream OS such as Debian or Fedora. This has a lot of implications; among them that e.g. the Secure Boot signatures etc. are made by the OS creator, not the user.
A big emphasis for the bootc project in contrast a design where it is normal and expected for many users to derive (via standard container build technology) from the base image produced by the OS upstream.
This is just a difference in emphasis: "particles" can clearly be built fully customized by the end customer, and bootc fully supports booting "stock" images.
But still: the bootc project will again much more strongly push any scenario that desires truly strong integrity towards making and managing custom derived builds.
Extensions and security
In "unlocked" scenarios, the bootc project will continue to support a "traditional Unix" feeling where persistent changes to /etc
can be written and maintained. Similarly, it will continue to be supported to have machine-local kernel arguments.
There is significant value in migrating "package based" systems to "image based" systems, even if they are still "unsigned" or "unlocked".
The particle model calls for tools like confext that use DDIs. The "backend" of this (managing merged dynamic filesystem trees with overlayfs) and its relationship with systemd units is still relevant, but the bootc approach will again not expose DDIs to the user. Instead, our approach will take cues from the cloud-native world and use e.g. Kubernetes ConfigMap and support signatures on these.
More Modularity: Secondary OS installs
This uses OCI containers, which will work the same as the host.
Developer Mode
This topic heavily diverges between the "unlocked" and "sealed" cases. In the unlocked case, the bootc project aims to still continue to make it feel very "first class" to perform arbitrary machine-local mutations. Instead of managing overlay DDIs, bootc
will make it trivial and obvious to use local container builds using any standard container build tooling.
Package managers
In order to ease the transition for users coming from package systems, the bootc project suggests that package managers like apt
and dnf
etc. learn how to become a frontend for "local" container builds too. In other words, apt|dnf install foo
would become shorthand for a container build like:
FROM <localhost>
RUN apt|dnf install foo
Transitioning from unlocked, mutable local state to server-built images
Building on the above, a key point of bootc
is to make it easy and obvious how to go from an "unlocked" system with potential unmanaged state towards a system built and managed using standard OCI container image build systems and tooling. For example, there should be a command like apt|dnf print-containerfile
. (The problem is more complex than this of course, as we would likely want to capture some changes from /etc
- but also some of those changes may include secrets, which are their own sub-topic)
Democratizing Code Signing
Strong alignment here.
Running the OS itself in a container
This is equally obvious to do when the host and the linked container runtime (e.g. podman) again use the same tools.
Parameterizing Kernels
In "unlocked" scenarios (per above) we will continue to use bootloader configuration that is unsigned.
We will not (in contrast to particles) try to strongly support a "partially sealed, general purpose" model. More on this below.
Most cases for "sealed" systems will want to entirely lock the kernel commandline, not even using a bootloader at all and hence there is no mechanism to configure it locally at all. However, as discussed in various venues around UKI, "sealed" systems can become complex to deploy where there is a need for machine (or machine-type) specific kernel arguments:
- Deploying the RT kernel often wants to use isolcpus=.
- Setting static IP addresses on the kernel commandline to enable network bound disk encryption for the rootfs
The bootc project default approach for this is to lean into the container-native world, using derivation to create a machine-independent "base image", then create derived, machine (or machine-class) specific images that are in turn signed.
Updating Images
A big differentiation here is that bootc will reuse container technology for fetching updates. The operating system and application containers will be signed with e.g. sigstore or similar for network fetching. The signature will cover the composefs digest, which enables continuous verification.
Managing storage of container images using composefs is more complex than systemd-sysupdate
writing to a partition, but significantly more flexible. For more on this, see upstream composefs.
Kernel in images
The bootc and particle approaches are aligned on storing the kernel binary in /usr/lib/modules/$kver
. On the bootc side, a key bit here is that bootc will extract the kernel and initramfs (or just UKI) and put it in the appropriate place - this is implemented as a transactional operation. There are significant details that can vary for how this works (because unlike particles, bootc aims to support non-EFI setups as well), but the high level idea is similar.
Boot Counting + Assessment
This topic relates to the previous one; because of multiple bootloaders, there is not one single approach. The systemd automatic boot assessment is good where it can be used, but we also will support e.g. Android bootloaders.
Picking the Newest Version
Because the storage of images is not just files or partitions, bootc will not expose to the user/administrator a semantic of strvercmp
or package-manager oriented versioning semantics. Instead, the implementation of "latest" will be implemented in a more Kubernetes-oriented fashion of having "local" API objects with spec and status. This makes it easy and obvious for higher level management (e.g. cluster)
tooling to orchestrate updates in a Kubernetes-style fashion.
Home Directory Management
The bootc project will not do anything with this. We will support systemd-homed where users want it, but in many dedicated servers and managed devices the idea of persistent user "home directories" are more of an anti-pattern.
Partition Setup
The biggest difference again here is that bootc is oriented closer to a single root partition by default that includes the OS, system/app containers and persistent local state all as one unit.
Trust chain
In contrast to particles, the bootc project does not aim to by default emphasize a model of using sysexts from the initramfs because its primary use case occurs when using a "partially sealed" system. And per above (re kernels) it is insufficient for other cases.
Without this in the mix then, the trust chain is simple to describe: the kernel+initramfs are verified by the bootloader, the initramfs contains the key and logic necessary to verify the composefs digest of the root, and the root starts to verify everything else.
File System Choice
As mentioned above, any Linux filesystem is valid for the root. For "sealed" systems using composefs will cover integrity and there is not a distinct need for dm-integrity.
OS Installation vs. OS Instantiation
The bootc project is just less partition-oriented and more towards multiple-composefs-in-root oriented. However the high level goal is shared of making it easy to "re-provision" and keeping the install-time flow as close as possible.
Building Images According to this Model
This is a key point of bootc: we aim for operating systems and distributions to ship their own bootc-compatible base images that can be used as a default derivation source. These images are just OCI images that will follow simple rules (as mentioned above, the kernel is found in /usr/lib/modules/$kver/vmlinuz
) for example for the extra state to boot.
However in order to enable "sealed" systems (using signed composefs digests), the container build system will need support for this. But, it is a goal to standardize the composefs metadata needed alongside the OCI, and to support this in the broader container ecosystem of tools (e.g. docker, podman) as well as bootc.
Final words
This document is obviously very heavily inspired by the original blog.
A point of divergence is that a goal of the bootc project is to strongly influence the existing operating systems and distributions and help them migrate their customers into an image-based world - and to make practical compromises in order to aid that goal.
But, the bootc project strongly agrees with the idea of finding common ground (the "50% shared" case). At a practical level, this project will take a hard dependency on systemd and on the container ecosystem, extending bridges where they exist, working on shared standards and approaches between the two.