Generate Config Files
What information does the config generator need?
Version 1.43 of the documentation is no longer actively maintained. The site that you are currently viewing is an archived snapshot. For up-to-date documentation, see the latest version.
This document describes how to setup and use a Kubernetes-based cluster of the EngFlow Remote Execution software.
Before deploying to an actual Kubernetes cluster, you can try the service on a single machine.
The EngFlow Remote Execution service consists of Schedulers and Workers. These are containerized processes, whose every instance runs in its own Kubernetes Pod. From here on, we will mean the same thing by “instance” and “Pod”.
Minimum hardware requirements for each Pod:
Worker: 10 GB SSD, 1 vCPU core, 1 GB RAM plus 1 GB RAM per vCPU core used for execution
Scheduler: 4 vCPU cores, 16 GB RAM
Kubernetes cluster requirements:
Node Pools: since Scheduler and worker Pods have different hardware requirements, they must be deployed to different kinds of Kubernetes Nodes, so you will need two Node Pools in your cluster.
Node Labels: to let Kubernetes deploy Pods to the right kinds of machines,
Nodes in the worker pool and in the scheduler pool must have different Node
Labels, e.g. engflow_nodetype: worker-node
and engflow_nodetype: scheduler-node
.
Hardware: if you configure Kubernetes to deploy more Pods than Nodes available in a pool, you should size these Nodes to whole multiples of the above requirements to avoid starving Pods of resources.
Minimum software requirements:
Docker images:
Base OS: Ubuntu 18.04 or Debian 10
Java: openjdk-11-jdk-headless
or openjdk-8-jdk-headless
Client: Bazel 1.0.0
Security requirements:
Ports on worker nodes must not be reachable from the public internet. Either the cluster must run on a private network, or an appropriately configured firewall must be used.
Only the public port on scheduler nodes may be reachable from the public internet. If a scheduler’s public port is reachable, then both server- and client-authentication must be configured such that it is impossible for an external party to impersonate either server (man-in-the-middle attacks) or client (no access to unauthenticated clients).
You must setup server authentication using TLS. The included demo certificates should be considered public information, and must not be used for publicly accessible schedulers.
You must either setup client authentication or configure a firewall or VPN such that only trusted clients are able to connect to the cluster.
In a Kubernetes-based cluster, actions are only partially sandboxed, and can access and modify critical local resources as well as access any secrets stored on worker nodes. Such clusters must only run trusted code.
First you need to generate configuration files, use them to build container images for the EngFlow services, and finally deploy them to your Kubernetes cluster.
Run our provided config generation script and answer the questions:
python3 ./setup/k8s/gen-k8s-config.py
If you need help with any of the script’s questions, see Generate Config Files.
Adding packages to the base image: in the Dockerfiles, you can change the base OS layer, add more layers, or install more tools in the worker images. (But be careful not to remove or replace software required by the Remote Execution service stack.) If you plan to use TLS with your own certificates, you should copy them into the Scheduler’s Docker image.
Adjusting the pod configuration: in the yaml files, you can adjust Pod instance counts, Docker image names, image pull policy, etc. (see Kubernetes objects).
Disk size and layout: The total disk space is subdivided into the core OS, the per-action execution directories (consisting of input and output trees), and the CAS (consisting of replicas and a local cache).
For example, a dual-core worker node with 10 GB disk might be subdivided into 2 GB for the OS, two executors with 1 GB each, leaving 6 GB for the CAS, of which we allocate up to 3 GB for replicas.
OS | Execution Directories | CAS | |
---|---|---|---|
2 GB | 2 workers * 1 GB | 6 GB | |
3 GB Replicas | 3 GB Cache | ||
Disk size: 2 GB + 2 * 1 GB + 6 GB = 10 GB |
For simplicity, we recommend controlling the CAS size indirectly by
setting --disk_size
to the total disk size. The worker then computes
the CAS size as follows:
--max_output_size
times the number of local executors
(see --worker_config
) for the execution directories.Note that the input trees are typically hard-linked into the CAS, so it is not necessary to account for the maximum input tree size here. It is an error if the number of executors times the maximum input tree size is smaller than the CAS cache.
You should observe the following rules when adding additional packages to the base image, increasing the number of executors, or increasing the maximum input and output tree sizes:
When adding packages to the base image: the OS size should stay below 10% of the disk size. If it is larger, then you either need to remove packages, or increase the disk size.
When increasing the number of executors: with the default maximum input and output tree sizes of 1 GB each, you will need approximately 4-5 GB of disk per executor.
When increasing the maximum input or output tree sizes: your disk should be at least 3 times the sum of the maximum input and output tree sizes per executor.
TLS configuration: in the config
file, you can adjust the EngFlow
service options (see Command line options).
If you plan to use TLS with your own certificates, you should edit the
--tls_certificate
and --tls_key
here, or if you plan not to use
TLS at all, you have to add --insecure=true
here.
Run the docker
and kubectl
commands printed by gen-k8s-config.py
.
These commands will build Docker images, create Kubernetes objects, and
deploy the service to Kubernetes. With OpenShift, you can of course use oc
instead of kubectl
.
Now you should have a running Kubernetes cluster with Scheduler and Worker
Pods, and a Load Balancer Service called scheduler-grpc-service
in front
of the Schedulers.
Find the public IP address and port of the Load Balancer scheduler-grpc-service
.
You can do this from the web-based Kubernetes management console.
If you plan not to use TLS, you can skip this step.
If you plan to use TLS with the certificate we supplied (engflow-ca.crt
), then
add a DNS entry for demo.engflow.com
with the Load Balancer’s IP address.
Alternatively, for testing you can temporarily add an entry to /etc/hosts
.
What information does the config generator need?
What are the relevant Kubernetes objects?
Testing EngFlow Remote Execution on a single machine