Content-Addressable Storage¶
This document describes how to configure the built-in distributed CAS as well as a backup CAS service for an EngFlow Remote Execution cluster.
Distributed CAS¶
The EngFlow Remote Execution comes with a built-in distributed CAS that reuses
worker disks to provide persistent storage of files. It automatically replicates
all stored files to up to three worker instances, see --replica_count
,
and makes new copies whenever a worker instance is lost.
CAS Layout¶
The total disk space is subdivided into the core OS, the per-action execution directories (consisting of input and output trees), and the CAS (consisting of replicas and a local cache).
Example
For example, a dual-core worker node with 10 GB disk might be subdivided into 2 GB for the OS, two executors with 1 GB each, leaving 6 GB for the CAS, of which we allocate up to 3 GB for replicas.
OS | Execution Directories | CAS |
---|---|---|
2 GB | 2 workers * 1 GB | 6 GB |
3 GB Replicas | 3 GB Cache |
Disk size: 2 GB + 2 * 1 GB + 6 GB = 10 GB
For simplicity, we recommend controlling the CAS size indirectly by setting
--disk_size
to the total disk size. The worker then computes the
CAS size as follows:
- It reserves 20% of the disk size for the operating system.
- It reserves
--max_output_size
times the number of local executors (see--worker_config
for the execution directories. - The remainder is allocated to the CAS, half of which is storage space for replicas.
Note that the input trees are typically hard-linked into the CAS, so it is not necessary to account for the maximum input tree size here. It is an error if the number of executors times the maximum input tree size is smaller than the CAS cache.
You should observe the following rules when adding additional packages to the base image, increasing the number of executors, or increasing the maximum input and output tree sizes:
-
When adding packages to the base image: the OS size should stay below 10% of the disk size. If it is larger, then you either need to remove packages, or increase the disk size.
-
When increasing the number of executors: with the default maximum input and output tree sizes of 1 GB each, you will need approximately 4-5 GB of disk per executor.
-
When increasing the maximum input or output tree sizes: your disk should be at least 3 times the sum of the maximum input and output tree sizes per executor.
Configuring an External Persistent Storage Service¶
In addition to the built-in distributed CAS, EngFlow Remote Execution supports backing up CAS data to an external persistent storage service. If files or file metadata is lost from the distributed CAS, the cluster automatically falls back to the persistent storage service. This can be used to run with smaller disks, or to support auto-scaling.
The EngFlow Remote Execution software is designed to minimize read and write accesses to the storage service. That is, it will preferentially use the built-in distributed CAS to fetch a file rather than the storage service.
Google Cloud Storage¶
In order to configure GCS as a fallback storage mechanism, you have to create a GCS bucket, ensure that both workers and schedulers have read and write access to the bucket, and then configure the location with the following flags:
If you are running outside of GCP, and you are not using application default
credentials, then you also have to specify the location of a credentials file
using --gcs_credentials
.
Amazon S3¶
In order to configure Amazon S3 as a fallback storage mechanism, you have to create a S3 bucket, ensure that both workers and schedulers have read and write access to the bucket, and then configure the location with the following flags:
The IAM account needs these permissions:
IAM Permissions | |
---|---|
If you are running outside of AWS, you have to use application default credentials,
i.e. define the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
environment variables.
Garbage Collection of Persistent Storage¶
By default, the EngFlow Remote Execution service will not manage the lifecycle of replicas stored in persistent storage, and objects will live forever. However, optional garbage collection support allows removing old, unused objects.
Garbage collection is controlled by one parameter,
--external_storage_gc_window_days
. Set this parameter to the minimum number of
days to keep unused CAS objects in persistent storage. Garbage collection works
by ensuring live objects are copied to new paths at least every GC-window
days.
One difficulty with external storage garbage collection is clients that cache the existence of CAS blobs. For safety, clients must not assume a blob is in the server CAS longer than the GC window period. Bazel's builds-without-the-bytes feature caches CAS existence information for the lifetime of the Bazel server. Therefore, Bazel servers using builds-without-the-bytes must not live longer than the GC window period for correctness.