Cluster Startup¶
This document describes how to configure the EngFlow Remote Execution cluster in order for the scheduler and worker instances to discover and connect to each other.
Specifying IP and Port¶
Most importantly, you need to configure the internal and external network connections for each instance. The internal network connection is used for cluster-internal traffic, which is security-sensitive and must be private. The external network connection is used to listen for client requests.
Typically, machines will have multiple network adapters, and need to select the correct network on which to listen for internal or external connections.
The internal network is selected with the --private_ip_selector
flag. You should set this to the same internal CIDR range that is also used by
the underlying deployment infrastructure. You can additionally configure the
internal port with the --private_port
flag, although this should
typically not be necessary. Note that the instances use additional ports for
internal traffic which are derived from the one configured here.
In addition to the primary private port configured with
--private_port
, scheduler instances communicate over
--private_port
+ 1000,
and all instances communicate over --private_port
+ 2000. Also
see the Network Traffic page for
more details.
For example, on GCP, private IPs are typically chosen from the 10.0.0.0/8
private network block, whereas a local Docker might use the 172.16.0.0/12
block.
By default, the schedulers listen on all available networks for incoming
external client requests in addition to listening on the private IP for internal
traffic. You can alternatively configure the schedulers to listen only on the
private IP for external traffic by setting
--public_bind_to_any=false
. The public port is specified with
the --public_port
flag.
Discovery Mechanism¶
In order for scheduler and worker instances to automatically detect each other,
you have to configure an appropriate discovery mechanism using the
--discovery
flag. We recommend using the available mechanisms
from the underlying deployment platform, e.g., using GCP APIs on GCP, AWS APIs
on AWS, and Kubernetes APIs on Kubernetes. For each of these, you may also need
to configure appropriate end points and scopes, such as the GCP zone or AWS
region. See the 'service discovery' section in the options reference
for a complete list.
Alternatively, the service also supports multicast and static discovery.
In multicast discovery, the instances use the multicast protocol to discover each other, which requires that they are on the same local network. This mechanism works without prior knowledge of the instance IPs.
In static discovery, you have to provide a static list of instances (at least one), which all other instances connect to on startup. This requires prior knowledge of the respective IP addresses and port numbers.
Verification¶
Every time a machine joins or leaves the cluster, schedulers and workers log the members they know about.
The log looks like:
Text Only | |
---|---|
There are two logical clusters: one for only schedulers, and one for all
instances. So schedulers print two Members
groups, and workers print just
one.
If you connect (e.g. via SSH) to any of the machines then you can observe these logs, for example:
Bash | |
---|---|