Create a container image for remote actions¶

On EngFlow, most remote actions run inside containers. Containers provide isolation between actions and bundle everything you need to run remote actions in your custom environment.

When using remote execution with a Linux or Windows platform, you'll need to specify a container image to use. You'll usually want to prepare a custom image, but there are some publicly available images you can use, too. If you already have a container image used by your CI runners, make sure it meets the requirements, and use that. Ideally, your local and remote environments are similar so actions may run either locally or remotely.

You don't need a container image for macOS remote builds; macOS does not support containers.

Requirements for a container image¶

Your container image should have everything your remote actions need to run successfully except for files provided to your actions as inputs.

In most cases, you'll need to install a C/C++ toolchain, but if you use a fully hermetic toolchain in your Bazel workspace, you don't need to also install the toolchain in your container image. Read Tradeoff: hermetic toolchains vs. container toolchains below if you're unsure which approach to take.

Commands must be able to run within your container as a regular user, not as root on Linux or Administrator on Windows.

Your container image needs:

LinuxWindows

A complete shell environment with Bash, coreutils, and other tools. Most Linux base images have this by default.
A C/C++ toolchain. On most Linux distributions, you can install build-essential or clang. Any GCC- or Clang-like compiler compatible with Bazel should work. Even if you aren't building C++ code, many rule sets depend on this.
Any other run-time environments and tools needed by your actions.
HOME must be set to a writable directory. Actions should avoid writing files outside the working directory or /tmp, but many tools assume they can create files in HOME.
Optional: a user named engflow with uid=108 and a group named engflow with gid=114. The EngFlow server runs commands with this uid and gid, and some actions expect them to have names in /etc/passwd and /etc/group.

A Windows Server 2022 base image. This also requires a Windows Server 2022 or Windows 11 host. See Windows container version compatibility for details.
msys2 installed in the default location at C:\msys64. C:\msys64\usr\bin must be in PATH. Many rules, wrappers, and genrules require a Bash shell and related tools.
A C/C++ toolchain. You can either use one of the msys2 toolchains, Microsoft Visual Studio Build Tools (MSVC), or any other compiler that accepts similar flags. Even if you aren't building C++ code, many rule sets depend on this.
Any other run-time environments and tools needed by your actions.

By default, EngFlow runs containers as the user ContainerUser.

Publicly available images¶

If your actions don't have specific requirements, the bazel-toolchains project provides pre-configured toolchains and container images based on Ubuntu 16.04.

Building your own image¶

To build your own container image, you can start with one of the Dockerfiles shown below. To build, run:

Bash
docker build --tag engflow-container-image:latest .

LinuxWindows

Docker
FROM debian:latest
ENV DEBIAN_FRONTEND noninteractive

RUN groupadd \
  --gid 114 \
  engflow && \
  useradd \
  --home-dir /home/engflow \
  --create-home \
  --uid 108 \
  --gid 114 \
  engflow
ENV HOME=/home/engflow

RUN apt-get update --quiet --quiet --yes && \
  apt-get install --quiet --quiet --fix-broken --yes \
  clang \
  curl \
  python3 \
  python-is-python3 \
  zip && \
  rm -rf /var/lib/apt/lists/*

Refer to https://github.com/EngFlow/example/tree/main/platform/windows_x64/docker for a complete example.

Our Windows Dockerfile is longer, relying on several Powershell scripts to build and install dependencies. Those are too long to show here.

Storing your image¶

We recommend storing your container image in a Docker registry close to the region where your cluster primarily operates. This minimizes the time a newly started worker instance takes to fetch your image. It also reduces network costs. You have several options:

Create a private AWS Elastic Container Registry (ECR) or a Google Cloud Artifact Registry (GAR) in the same region and availability zone as your cluster. If this is created in your cluster's AWS account or GCP project, your cluster will automatically have pull access.
Create a private registry in another cloud account, preferably in the same region and availability zone. Work with EngFlow customer success engineers to authenticate your cluster with this registry.
Use a public registry, like Docker Hub.

After you've chosen a registry, you can tag and push your image with the commands below, adjusting for your host, registry, and repository names.

Bash
TAG=HOST_NAME/REGISTRY_NAME/REPO_NAME:latest
docker tag engflow-container-image:latest "$TAG"
docker push "$TAG"

If your container image is larger than 4 GiB, please let EngFlow customer success engineers know. They may need to adjust the service configuration to ensure good performance.

Configuring Bazel to use your container image¶

When you configure Bazel remote execution, you'll need to set a container-image execution property in your platform target, which will be based on this tag.

First, find the image's SHA-256 sum. Run the command below, adjusting the last argument to match the tag you ran docker push with.

Bash
docker inspect --format="docker://{{index .RepoDigests 0}}" "$TAG"

That should print a URL like:

Text Only
1	`docker://HOST_NAME/REGISTRY_NAME/REPO_NAME:latest@sha256:YOUR_IMAGE_SHA256`

You can add this to your platform target as below:

Python
platform(
    name = "linux_x64",
    constraint_values = [...],
    exec_properties = {
        "container-image": "docker://HOST_NAME/REGISTRY_NAME/REPO_NAME:latest@sha256:YOUR_IMAGE_SHA256",
    }
)

Tradeoff: hermetic toolchains vs. container toolchains¶

There are two ways to install and define a C++ toolchain for remote execution.

A hermetic toolchain's files are part of the build graph and are represented as inputs to every action requiring the toolchain. The toolchain files may be checked into source control or retrieved from an external source using a module or repository rule.
A container toolchain uses files installed in a container image. The build tool is not aware of these files, and they are not part of the build graph.

There is no correctness difference between the two approaches. If a toolchain file changes in either approach, the build tool will rebuild actions that depend on the toolchain.

Container toolchains typically have a performance advantage over hermetic toolchains. A C++ toolchain consists of thousands of files. Requiring the build tool to hash and verify a hermetic toolchain incurs startup delays. Since these files are inputs to every action that depends on them, the build graph takes more space in memory. The remote execution system also spends more time staging and verifying input trees. Container toolchains generally have lower overhead in all these dimensions. However, if your container image is very large, you may see slower remote execution worker start time. This can be a problem if you update your container image frequently.

Container toolchains are usually easier to maintain than hermetic toolchains. A hermetic toolchain generally requires you to build a C++ toolchain from source or to use a publicly available binary toolchain built outside normal distribution channels. With a container toolchain, you typically just pick a Linux distribution base image (like Debian or Ubuntu) and install the packages you want (clang or gcc).

Hermetic toolchains are most often used in large monorepos and in environments where both local and remote builds are important and are expected to produce identical results without requiring containers.