This document describes how to setup EngFlow Remote Execution in a VM-based deployment on Google Compute Engine (GCE) which is part of Google Cloud Platform (GCP). If you want to use Google Kubernetes Engine (GKE), please see the Kubernetes Setup instead.
In addition to the baseline requirements for running the EngFlow Remote Execution Service, you will need administrative access to a GCE project to create VM images and instance templates, to start VMs, and so on.
Automated Setup using Packer and Terraform
We provide a Packer config to create the base image, and a minimal Terraform config to start the cluster. The Terraform config includes a service account, an optional GCS bucket, scheduler and worker pools, an optional auto-scaler, and a TCP load balancer.
Both may need to be adjusted for your desired build environment and production deployment. We recommend first starting a basic cluster and adjusting the configuration only after verifying its operation.
1. Setup Google Application Credentials
We recommend creating an additional service account to handle base image creation and cluster setup. This service account requires roles to create VM images and configure and start VMs.
You need to provide credentials for this service account to the
terraform tools to setup the cluster. Download credentials as a
and set the
GOOGLE_APPLICATION_CREDENTIALS environment variable:
2. Create a Base Image
The included Terraform template uses the same base image for both scheduler and worker instances. If you want to use separate images (for example when you want to install extra tools on workers), then you need to adjust the Terraform config.
To create the base image with Packer, you need to install the packer command line tool (installation instructions).
Before generating the image, you should inspect the packer configuration in
setup/gcp/base-image.json, and modify it to fit your requirements - however,
we recommend first starting a minimal cluster and verifying its operation.
For base OS we recommend using Debian 10. You can use other versions or other
distributions too (e.g. Ubuntu 18.04) as long as the
package is installed.
To generate the image, change into the
setup/gcp directory and run:
packer build base-image.json
The resulting image will be called
3. Start the Cluster
We provide a Terraform config file in
setup/gcp/main.tf, which includes
scheduler and worker templates, instance group managers (configured with a fixed
size), and an internal TCP load balancer. It also embeds the license file as
well as the service config (
At a minimum, you need to configure the following options before starting a cluster:
project_name- the GCP project name the cluster should run under
availability_zone- the target zone where the cluster should run
You should edit the
setup/gcp/main.tf file and set these to the desired
values. Additionally, you can configure the number of schedulers and workers in
the cluster as well as configure
Terraform remote state.
Start the cluster using:
terraform init terraform apply
Once the cluster is running, Terraform prints the IP address of the load balancer, which is the end point for Bazel to talk to. Note that the default configuration only allows connections from other machines in the same GCE network.
Note that the TCP load balancer does not distribute requests from a single client coming over a single connection. On the other hand, GCP’s HTTP/2 load balancers do not support TLS client authentication (mTLS).
This section outlines the manual process for setting up a cluster on GCE.
1. Create a Service Account
You have to create a new service account for the Remote Execution cluster. This service account is required on the scheduler and worker instances to perform discovery and to log monitoring data to Google Cloud Operations (formerly StackDriver). It must have at least the following role:
Compute Viewer aka
Used to auto-detect live scheduler and worker instances.
If you enable Google Cloud Operations (formerly StackDriver) monitoring with
, the service account requires these
Monitoring Metric Writer aka
Necessary to write metrics to GCO.
Cloud Trace Agent aka
Necessary to write performance traces to GCO; only needed if you set
--monitoring_trace_probabilityto a non-zero value.
If you configure Google Cloud Storage as a backup CAS/Action Cache, then the service account requires these additional roles:
Storage Object Admin aka
Necessary to read, write, and delete objects to and from GCS.
If you use Docker images stored in Google Container Registry (GCR), then the service account may require these additional roles:
Storage Object Viewer aka
Necessary to read Docker images from GCR. Note that this is a subset of Storage Object Admin (needed for GCS), so you do not need both.
2. Create a Base Image
You can use the same base image for both scheduler and worker instances, or you can create separate images (for example when you want to install extra tools on workers).
Start a clean VM
We support the following distributions for the base image:
- Debian 10 (Buster)
- Ubuntu 18.04 (Bionic Beaver)
SSH into the VM
sudo apt update && sudo apt upgrade
sudo apt install ./engflow-re-services.deb
sudo apt install docker.io
Copy your license file to
sudo mv license /etc/engflow/license
Copy your configuration file to
sudo mv config /etc/engflow/config
You can customize the base image at this point if you need additional software installed. However, we recommend using Docker images for customization rather than running actions directly on the underlying VM.
Pull the Docker image you plan to use, e.g.,
docker pull gcr.io/cloud-marketplace/google/rbe-ubuntu16-04.
Note: the RBE Docker images require authenticating with gcloud first:
gcloud auth configure-docker
Stop the VM
Create an image snapshot of the VM
3. Create Instance Templates
You need to create at least two templates - one for the worker instances and one for the scheduler instances. Depending on your intended cluster layout, you may need multiple worker templates. The steps to create instance templates are very similar in all cases.
Create a new Instance Template
Give it a descriptive name, e.g.,
Select a VM configuration, following the baseline requirements:
- Scheduler: Quad-core, 4 GB RAM
- Worker: Single-Core, 1 GB RAM
Select the VM image created previously; set disk size following the baseline requirements
Select the Service account create previously
Management -> Labels:
Management -> Startup Script:
- Worker: #!/bin/bash systemctl start worker
- Scheduler: #!/bin/bash systemctl start scheduler
Do not enable HTTP or HTTPS in the firewall configuration unless you want to expose the cluster to the public internet.
4. Start the Cluster
You need to start both schedulers and workers using the previously created templates; you can start them in any order.
Create a new Instance Group from one of the templates
Configure auto-scaling or set a fixed number of instances
Configure the Health Check: TCP to the internal port
5. Create a TCP Load Balancer
Create a new TCP Load Balancer
Backend: select the scheduler instance group
Frontend: TCP, port 443