Version 1.50 of the documentation is no longer actively maintained. The site that you are currently viewing is an archived snapshot. For up-to-date documentation, see the latest version.
AWS Setup
This page is about VM-based clusters on AWS EC2. For Kubernetes clusters on AWS EKS (Amazon Elastic Kubernetes Service), see Kubernetes Setup.)
These instructions are currently for running a Linux cluster on AWS. Need support for macOS instances? Contact us.
Summary
- Unpack the deployment kit
- Add your license
- Setup access credentials
- Create an AMI for the service
- Configure the service
- Deploy the cluster
- Verify the cluster
- Configure the client
- Run a build
Requirements
In addition to the baseline requirements for running the EngFlow Remote Execution Service, you will need administrative access to an AWS account to create VM images and instance templates, to start VMs, and so on.
1. Unpack the deployment kit
Unpack engflow-re-<VERSION>.zip
.
It contains:
- this documentation (
./index.html
) - the service package (
./setup/engflow-re-services.deb
) - Packer config to build a service AMI (
./setup/aws/base-image.json
) - Terraform config to deploy a cluster (
./setup/aws/main.tf
) - EngFlow config file (
./setup/aws/config
)
It does not contain a valid license file: ./setup/license
is empty.
It is recommended to create a Git repo of the unpacked files so you can save your current configuration and track changes.
2. Add your license
Replace ./setup/license
with the license file we sent you.
3. Setup access credentials
Packer and Terraform need credentials with sufficient permissions.
-
In AWS IAM, create a user with these permissions:
- AmazonEC2FullAccess
- Security Token Service
- AmazonS3FullAccess
For finer-grained permissions see the Packer documentation.
-
Setup the environment.
$ aws configure
Alternatively, if you have a credentials CSV file:
$ export AWS_ACCESS_KEY_ID=$(tail -n1 credentials.csv | cut -d, -f3) $ export AWS_SECRET_ACCESS_KEY=$(tail -n1 credentials.csv | cut -d, -f4)
4. Create an AMI for the service
All service instances (schedulers and workers) will run on this AMI.
(You can also use separate AMIs in case you want to install extra tools on the worker image. We’ll not discuss that here.)
-
Customize
./setup/aws/base-image.json
The default
base_image_ami
is Debian 10.5 (Buster) from https://wiki.debian.org/Cloud/AmazonEC2Image/Buster in theus-west-1
region. You can also use newer versions and Ubuntu 18.04 or later should also work as long as theopenjdk-11-jdk-headless
package is installed.Remember that AMIs are region-specific: make sure
image_region
matches the region of thebase_image_ami
.If you are using the Debian AMI from a different region make sure to use the correct region under the AMD64 AMI ID column on https://wiki.debian.org/Cloud/AmazonEC2Image/Buster.
-
Build the AMI.
We recommend using Packer 1.6.0 or later.
$ cd setup/aws/ $ packer build -var base_image_ami=<BASE_AMI> -var image_region=<BASE_AMI_REGION> base-image.json
Example (Debian 10.5 in the
eu-central-1
region):$ packer build -var base_image_ami=ami-0e2b90ca04cae8da5 -var image_region=eu-central-1 base-image.json
You can define the variables either in the “variables” map of the
base-image.json
, or on the command line (as shown above). -
Note the resulting AMI.
When Packer finished, it prints a similar output:
(...) Build 'amazon-ebs' finished. ==> Builds finished. The artifacts of successful builds are: --> amazon-ebs: AMIs were created: eu-central-1: ami-0123456789abcdef0
5. Configure the service
All service instances (schedulers and workers) will use the same config file.
For a first time trial setup we recommend using the default
./setup/aws/config
.
Later (and especially before productionizing) you should customize this config. Consider:
- network settings (e.g.
--public_port
,--private_ip_selector
) - authentication
(e.g.
--tls_certificate
,--client_auth
) - execution strategies
(e.g.
--allow_docker
,--allow_sandbox
) - executor pools
- storage use (
--external_storage
) - monitoring
- JVM flags
See the Service Options Reference for more info.
You can add options for schedulers and workers in the same file. If you need
separate options for them, add those to the right
aws_launch_template.user_data
in ./setup/aws/main.tf
.
6. Deploy the cluster
For a first time trial setup we recommend deploying a simple cluster, e.g. with 2 schedulers and 3 workers, and with an internal-facing load balancer. The example commands below do that.
-
Customize
./setup/aws/main.tf
Parameters (e.g. worker and scheduler count) are defined at the top.
This file includes definitions for a VPC, security group, instance profile, scheduler and worker templates, auto-scaling pools (configured with a static size), an Elastic IP (optional), and a Network Load Balancer (internal or internet-facing).
-
Setup a key pair for SSH.
You will need it to connect to schedulers and workers to look at their logs.
Create a key pair in the EC2 web console. Remember that the key pair is region-specific, and must be in the same region as the cluster.
-
Deploy the cluster.
We recommend using Terraform 0.12 or later.
Use the AMI built by Packer, and a zone in the same region as the AMI.
$ cd setup/aws/ $ terraform init $ terraform apply -var image_id=<AMI> \ -var availability_zone=<ZONE> \ -var ssh_key_name=<KEY_PAIR_NAME>
Example:
$ terraform apply -var image_id=ami-0123456789abcdef0 \ -var availability_zone=eu-central-1a \ -var ssh_key_name=my-key \ -var scheduler_count=2 \ -var worker_count=3 \ -var public_nlb=false
If the Terraform call fails, check that you have valid credentials (
AWS_SECRET_ACCESS_KEY
andAWS_ACCESS_KEY_ID
environment variables), or try deleting the./setup/aws/.terraform
directory and runningterraform init
again. -
Note the Load Balancer’s IP.
When Terraform completes, it prints something like:
Apply complete! Resources: 20 added, 0 changed, 0 destroyed. Outputs: service_address = 10.0.1.10:443 (internal)
Note: as of 2020-09-28 the Load Balancer will not answer pings, but will route gRPC traffic on port 443 to schedulers.
-
Check the EC2 web console.
You should see all running instances, the configured load balancer, the load balancer’s target group, and the schedulers listed as the target group’s targets.
7. Verify the cluster
Instances discover each other by asking AWS for a list of endpoints in the security group.
As of 2020-08-19, the service does not have a status page yet. You need to connect to an instance using SSH from an EC2 instance in the same VPC or from a machine in a peered network.
Once you connected to a worker, look at the output of the service:
$ journalctl --unit worker
In the log, you should see a cluster formed:
Aug 12 15:33:33 ip-10-0-1-29 scheduler_service[790]: Members {size:5, ver:5} [
Aug 12 15:33:33 ip-10-0-1-29 scheduler_service[790]: Member [10.0.1.29]:10081 - 381a734e-3213-4054-9aa5-e32e159f78e3 this
Aug 12 15:33:33 ip-10-0-1-29 scheduler_service[790]: Member [10.0.1.117]:10081 - 5e178daf-fca1-4709-a408-0261d7c8133e
Aug 12 15:33:33 ip-10-0-1-29 scheduler_service[790]: Member [10.0.1.200]:10081 - 9914770f-d255-424b-9438-fa3d70b2b67d
Aug 12 15:33:33 ip-10-0-1-29 scheduler_service[790]: Member [10.0.1.173]:10081 - 8affa7e5-b4b9-4118-8e88-9591136b407a
Aug 12 15:33:33 ip-10-0-1-29 scheduler_service[790]: Member [10.0.1.239]:10081 - 5e9211fb-438e-4987-8c72-a0d430299adf
Aug 12 15:33:33 ip-10-0-1-29 scheduler_service[790]: ]
On schedulers you should see two clusters: the same one as above, and another one only containing schedulers.
8. Configure the client
If you ran Terraform with -var public_nlb=false
, you will need an EC2
instance in the cluster’s VPC or a machine in a peered network.
(If you ran Terraform with -var public_nlb=true
, then anyone can connect to
the service through the public address. Consider all security risks before
choosing this option.)
-
Install Docker, git, and Bazel.
$ sudo apt install -y git docker.io $ sudo adduser $(whoami) docker $ curl -L -o bazel https://github.com/bazelbuild/bazel/releases/download/3.3.0/bazel-3.3.0-linux-x86_64 $ git clone https://github.com/EngFlow/example $ chmod +x bazel
-
Add a mapping from the load balancer IP to
demo.engflow.com
.$ sudo vi /etc/hosts
Add the IP printed by Terraform:
10.0.1.10 demo.engflow.com
-
Log out and log in again
9. Run a build
Run in a terminal:
$ cd example
$ ../bazel build --config=engflow //java/...
Note: the first build can take a while as Bazel first downloads the docker image locally, and the cluster software then downloads the docker image on each worker. You will not see a performance improvement for builds of the example project; it is too small to benefit from the remote execution cluster.