Service Options Reference¶
Description of all command-line options that configure action execution platforms.
Options common to all instances¶
aws_autoscaling_send_health_frequency¶
--aws_autoscaling_send_health_frequency=0s
(duration)
If set to a non-zero value, this instance will report its health to the AWS auto scaling group at the frequency specified.
aws_cloudformation_send_resource_signal¶
--aws_cloudformation_send_resource_signal=false
(boolean)
Whether to send a resource signal to CloudFormation when the server is running on instance startup.
discovery_port¶
--discovery_port=0
(integer)
Port that schedulers advertise the service discovery service on. If not set, this is inferred from the private port of the local instance.
experimental_filter_known_replicas¶
--experimental_filter_known_replicas=true
(boolean)
Experimental flag to guard a bugfix (#9432), about reading from CAS nodes known to own a replica. When true, we attempt the read only if we believe the CAS node is alive. When false, we don't perform this check.
graceful_shutdown_wait_time¶
--graceful_shutdown_wait_time=2s
(duration)
Time to wait for connections to drain after removing the node from service discovery. Set to 0 to inhibit graceful shutdown.
grpc_max_calls_per_connection¶
--grpc_max_calls_per_connection=0
(integer)
Sets the maximum number of concurrent calls per incoming gRPC connection. The default is 400 on schedulers and 10 times the number of executors on workers.
grpc_max_message_size¶
--grpc_max_message_size=20mib
(capacity)
The max message size for incoming gRPC calls.
healthz_port¶
--healthz_port=0
(integer)
If set to a positive value, enables a HTTP 1.1 server suitable for health checks.
incompatible_reject_instance_name¶
--incompatible_reject_instance_name=false
(boolean)
When true, rejects all incoming calls that set instance_name
with INVALID_ARGUMENT
.
internal_grpc_keep_alive_time¶
--internal_grpc_keep_alive_time=60s
(duration; previous name: --grpc_keep_alive_time
)
The keep-alive time for cluster-internal gRPC connections.
internal_tcp_connect_timeout¶
--internal_tcp_connect_timeout=5s
(duration)
The connection timeout for cluster-internal gRPC connections.
log_file¶
--log_file=
(string)
The location(-pattern) of a log file. Empty means 'Unused'. See also: https://docs.oracle.com/en/java/javase/11/docs/api/java.logging/java/util/logging/FileHandler.html. If the path ends in '.json', single-line JSON output will be written.
log_file_count¶
--log_file_count=20
(integer)
Maximum number of log file rotations before re-using log file names.
log_file_limit¶
--log_file_limit=100mb
(capacity)
Maximum size of a log file before it is rotated.
Set to 0 for unbounded. Though that is accepted, it is NOT recommended: it can fill up the disk.
log_level¶
--log_level=INFO
(string)
The verbosity level of local logging. Valid values are OFF, SEVERE, WARNING, INFO, and CONFIG.
log_to_stderr¶
--log_to_stderr=true
(boolean)
Output logs to standard error. Note systemd expects logs to go to stderr in order to manage them.
private_bind_to_any¶
--private_bind_to_any=false
(boolean; previous name: --bind_to_any
)
Whether to configure the internal communication to listen on all local IPs. If your cluster is not connected to the public internet, and the --private_ip_selector
mechanism does not work, then this flag might be usable as a workaround. DO NOT enable this for machines that are connected to the public internet.
private_ip¶
--private_ip=null
(string)
IP address to advertise for cluster-internal gRPC calls. This option is only needed when nodes run in an isolated network (e.g. with docker containers using network mode bridge
) and can't be reached from other nodes using the discoverable IP address of the current node. Should only be used with --discovery=static
. Please also use --private_bind_to_any
when using this option.
private_ip_selector¶
--private_ip_selector=192.168.0.0/16
(string; previous name: --local_ip_selector
)
A CIDR mask that is used to select a local IPv4 address for each instance. This should match whatever address range the underlying platform uses to generate local IPs. If this does not match any local IP, then the instance will attempt to do a reverse lookup on its own hostname. The instance fails if the hostname resolves to a loopback address. In order to set a fully-specified address, use a /32 selector. The resulting IP is only used for cluster-internal communication. DO NOT use a public address range here.
private_port¶
--private_port=9321
(integer; previous name: --internal_port
)
Port to use for cluster-internal communication. You need to configure your network to allow traffic on this port between machines in the same cluster. In addition, you need to allow traffic on port + 1000
(schedulers only) and port + 2000
(all instances); also see --incompatible_use_low_offsets
.
run_common_member¶
--run_common_member=true
(boolean)
Run a member of the "common" Hazelcast cluster. Requires that --service_discovery_mode be "builtin". Machines that do not run Hazelcast should not have a the engflow_re_cluster_name tag in AWS or the engflow_re_cluster_name label in GCP.
Options to configure scheduler instances¶
action_cache_size¶
--action_cache_size=2gb
(capacity)
The maximum amount of memory that can be used for action cache entries by each scheduler. The resolution is 1 megabyte, the value is rounded down to the nearest whole megabyte as needed. (Example: 1600kb
is rounded down to 1mb
.) Setting a value smaller than 1mb disables the cache.
alternative_tls_trusted_certificates¶
--alternative_tls_trusted_certificates=[]
(list of strings)
A list of files or secretstore URLs to load alternative trusted certificates from. All certificates provided here can be used to authenticate clients when --client_auth=mtls
is set. See --tls_trusted_certificate
for more details.
auth_service¶
--auth_service=
(string)
Required when --client_auth=external
, ignored otherwise. The auth service endpoint, must be "grpc://localhost:NNN" where NNN is the port.
basic_auth_htpasswd¶
--basic_auth_htpasswd=/etc/engflow/htpasswd
(string)
Path to a htpasswd file containing user names and APR1-encrypted passwords. In a cluster with multiple scheduler instances, all of them must use the same password file. The server automatically reloads the file on changes (based on the last-modified time).
client_auth¶
--client_auth=none
(one of: {deny
, none
, mtls
, gcp_rbe
, external
})
The mechanism for determining gRPC authentication and permissions. Depending on the value, you also need to pass options to configure the authentication mechanism and permissions granted to each client.
For none
, use --principal_based_permissions=*->role
to set permissions.
For mtls
, see the --tls_trusted_certificate
flag for more details and use principal_based_permissions
to set per-user permissions.
For gcp_rbe
, see the --gcp_rbe_auth_project
flag for more details.
For external
, please contact us for details.
For github_token
, see the --experimental_github_auth_container
flag for more details.
enable_bes¶
--enable_bes=true
(boolean; previous name: --experimental_bes
)
Enables gRPC endpoints that implement the Build Event Service, allowing schedulers to handle the Build Event Protocol.
experimental_enable_fetch_api¶
--experimental_enable_fetch_api=false
(boolean)
Whether to enable support for the Asset fetch API. The fetch API is implemented by calling curl
with appropriate parameters using the remote execution API. Note that only http and https URLs are supported.
In Bazel, enable use of the Asset fetch API by setting --experimental_remote_downloader
to the same value as --remote_executor
or --remote_cache
.
experimental_fetch_api_docker_image¶
--experimental_fetch_api_docker_image=
(string)
Only used when --experimental_enable_fetch_api=true
. If set to a non-empty value, then this is parsed as a canonical docker URL (e.g., docker://alpine/curl@sha256000...
), which is in turn used as the container for all fetch calls. The referenced Docker image must be accessible and contain curl
at /usr/bin/curl
. If the image has an entrypoint, it is ignored.
experimental_fetch_api_max_attempts¶
--experimental_fetch_api_max_attempts=5
(integer)
Only used when --experimental_enable_fetch_api=true
. Number of attempts for internal file uploads and action executions. This flag does not apply if curl fails with a non-zero exit code, only if there are gRPC protocol errors.
experimental_force_mnemonic_pool_name¶
--experimental_force_mnemonic_pool_name=[]
(list of strings)
A list of mnemonic=pool-name pairs which are used to override pool names provided by the client. Use this to route actions to specific pools based on mnemonics. See Executor pools.
Note that this feature requires a client that provides action mnemonics; Bazel 5.0.0 and newer support this.
experimental_github_auth_container¶
--experimental_github_auth_container=
(string)
Required when --client_auth=github_token
, ignored otherwise. Specifies an existing container on ghcr.io
.
Format: organisation_name/container_name:tag
, all lower-case.
Example: engflow/hello-world:1.0
The container should be private. Only GitHub Action runners of the organisation with a valid GITHUB_TOKEN
shall have access. Other than that the container can be anything; it won't be pulled nor ran, only have its existence checked.
experimental_mnemonic_based_invocation_affinity¶
--experimental_mnemonic_based_invocation_affinity=[]
(list of strings)
A list of mnemonics for which we reuse executors within the same invocation. This can reduce action setup time for actions with similar input trees but can also increase runtime for actions that use remote persistent workers.
Note that this feature requires a client that provides action mnemonics; Bazel 5.0.0 and newer support this.
experimental_strict_transport_security¶
--experimental_strict_transport_security=0d
(duration)
Ignored unless --strict_http_headers
is true
. If set to a non-zero duration, sets the Strict-Transport-Security
header with the given duration as the max-age
on all HTTP responses. When the UI is accessed over an HTTPS connection and this header is returned, all future accesses to the same domain are forced to use HTTPS for at least the given duration. DO NOT SET THIS unless you are certain that you do not want to access this domain using HTTP for the forseeable future.
experimental_web_login_expiration¶
--experimental_web_login_expiration=23h
(duration)
Only used when --http_auth=google_login
or --http_auth=okta_login
or --http_auth=oidc_login
. The amount of time for which a web login token is valid. This should be set to your company's max login policy if applicable.
extend_replicas_on_cache_hit¶
--extend_replicas_on_cache_hit=true
(boolean)
Whether to extend replica timeouts when there is a cache hit in the action cache. If this is true, then the action cache service only returns an action cache entry if the replica timeouts for all output files could be successfully extended. Otherwise it does not attempt to extend the timeouts. Setting this to false can improve performance at the increased risk of returning errors later when the client attempts to fetch the corresponding files from the CAS. We strongly recommend leaving this enabled when using build-without-the-bytes.
force_pool_name¶
--force_pool_name=null
(string)
If set to a non-empty value, the scheduler ignores the pool name provided in the action and uses this one instead to schedule the action. See Executor pools.
gcp_rbe_auth_project¶
--gcp_rbe_auth_project=null
(string; previous name: --experimental_google_auth_project
)
Sets the GCP project to use when looking up permissions for OAuth 2.0-authenticated clients if --client_auth=gcp_rbe
. The actual permissions are configured through GCP IAM by assigning the existing Google Cloud 'Remote Build Execution' roles to specific users or service accounts.
If you are using Bazel, you can authenticate as follows: for the first-time login, run gcloud auth application-default login
. Afterwards, you can run Bazel with the --google_default_credentials
flag. Alternatively, you can download a Json file with access keys and use Bazel's --google_credentials
option to specify the path to that file.
Note that EngFlow does not control the existence or availability of these GCP roles and cannot guarantee that this option continues to work. Furthermore, we cannot report usage of these permissions to GCP, so they may show up as 'over-granted' in the IAM permissions console.
Use with caution.
google_client_id¶
--google_client_id=
(string; previous name: --experimental_google_client_id
)
Must be set if and only if --enable_bes=true
and --http_auth=google_login
.
The client ID from the "Client ID for Web application" page in GCP to enable using Google OAuth to authenticate users on the UI. You must have this client ID correctly configured in GCP to complete the authentication workflow. Note that the email address returned from Google will be matched against the --principal_based_permissions
flag to determine permission level.
grpc_initial_flow_control_window¶
--grpc_initial_flow_control_window=1mib
(capacity)
The initial flow control window for incoming gRPC calls.
http_auth¶
--http_auth=[deny]
(list of strings)
The mechanism(s) for determining HTTP2 authentication and permissions. Depending on the values, you also need to pass options to configure the authentication mechanism and permissions granted to each client.
Note that the /healthz
page never requires authentication.
For none
, use --principal_based_permissions=*->role
to set permissions.
For basic
, use --basic_auth_htpasswd
to set the path to the password file with Apache MD5-encoded passwords, and --principal_based_permissions
to control per-user permissions. See the Authentication section for examples.
For google_login
the --google_client_id
flag must also be set.
For okta_login
the --okta_client_id
and --okta_issuer_uri
flags must also be set.
For oidc_login
the --oidc_config
flags must also be set.
http_public_bind_to_any¶
--http_public_bind_to_any=true
(boolean)
Only used when the HTTP and gRPC ports are split, i.e. --http_public_port
has a positive value different from --public_port
.
This flag is similar to --public_bind_to_any
but affects only the --http_public_port
.
http_public_port¶
--http_public_port=-1
(integer)
The public port on which this cluster listens for HTTP connections. When this is set to a positive integer different from --public_port
, then HTTP and gRPC ports are split: HTTP is served on this port and gRPC on the --public_port
. Otherwise they are both served on the --public_port
.
Note that typical Linux installations prevent non-root processes from listening on ports 0-1024.
incompatible_force_mnemonic_pool_name_respects_explicit_pools¶
--incompatible_force_mnemonic_pool_name_respects_explicit_pools=false
(boolean)
If this is true
, --experimental_force_mnemonic_pool_name
only changes the pool for actions that do not explicitly specify a pool.
insecure¶
--insecure=false
(boolean)
Whether to use unencrypted connections. We strongly recommend providing a TLS certificate and key (self-signed if necessary) and avoid setting this flag. This can be temporarily used for testing on a closed network. If this is set, then the settings for --tls_certificate
and --tls_key
are ignored.
local_cas_existence_cache_expiry¶
--local_cas_existence_cache_expiry=30m
(duration)
The maximum time to cache the existence of CAS entries. This flag is ignored if external storage is enabled; use --cas_existence_cache_max_size
to control the cache size in that case.
If external storage is disabled, files are only stored in the distributed CAS, which is limited in size, and only guarantees the presence of files for the duration of --default_replica_timeout
. Caching existence for longer than that can result in an increased rate of PRECONDITION_FAILED
gRPC errors for Execute calls, but should be otherwise safe.
Setting this flag to 0 disables the cache.
max_batch_size¶
--max_batch_size=10mb
(capacity)
The maximum batch size that clients are allowed to send to the CAS server batchUpdateBlobs call. This is a form of write-combining that might result in improved performance under the right network conditions. Only set this if you have benchmark results indicating that it is a net win. Note that some clients may not combine writes regardless of this server-side setting. If this is larger than the max gRPC message size, it is silently reduced to that value.
max_queue_time¶
--max_queue_time=1h
(duration)
The maximum amount of time an action is allowed to queue before it is aborted.
max_queue_time_in_empty_pool¶
--max_queue_time_in_empty_pool=5m
(duration)
The maximum amount of time an action is allowed to queue before it is aborted if it is assigned to a pool which has never had any executors. This is intentionally shorter than the maximum queue time to detect cases where the client is accidentally misconfigured.
If the worker pool is configured with auto-scaling, and it can scale down to zero workers, then this should be at least as long as the auto-scaling delay plus the time to boot a worker instance. Otherwise a recently restarted scheduler may time out actions prematurely.
max_replicate_concurrency¶
--max_replicate_concurrency=0
(integer)
The maximum number of concurrent replicate calls from a scheduler. A negative or zero value indicates no limit. This may be useful to limit the CAS read/write load.
metadata_replica_count¶
--metadata_replica_count=3
(integer)
The number of replicas to use for scheduler metadata such as the action cache. Must be at least one. Setting this to one can cause metadata loss when a scheduler is restarted, resulting in reduced build performance and build errors.
mtls_expiration¶
--mtls_expiration=90d
(duration)
Only used when --tls_trusted_certificate
and --tls_trusted_key
are set. Sets the amount of time in days for which the generated client certificates are valid. Set to zero to disable the functionality.
oidc_config¶
--oidc_config=[]
(list of strings)
The path(s) to the OpenId Connect config file(s), each of which is either an absolute path or the id of a secret, prepended by secretstore://
.
The contents should be a JSON string with the fields "issuer" (one of "GOOGLE", "KEYCLOAK", "OKTA", "OTHER", if not set defaults to "OTHER"), "client_id", "client_secret" (only required if using authorization code flow), "discovery_uri".
oidc_config_admin¶
--oidc_config_admin=[]
(list of strings)
The path(s) the the OpenId Connect config file(s) for EngFlow admins. See --oidc_config
for details
okta_client_id¶
--okta_client_id=
(string)
Must be set if and only if --enable_bes=true
and --http_auth=okta_login
.
The client ID for an app with sign-in method "OIDC - OpenID Connect" and application type "Web Application". The Client ID must be configured to be used with a client secret for client authentication, provided via --okta_client_secret
. For grant types, select both "Authorization Code" and "Implicit (hybrid)" and "Allow ID Token with implicit grant type". Note that the email address returned from okta will be matched against the --principal_based_permissions
flag to determine permission level.
okta_issuer_uri¶
--okta_issuer_uri=
(string)
Must be set if and only if --enable_bes=true
and --http_auth=okta_login
.
The Issuer UI of the okta authorization server to use for the --okta_client_id
.
principal_based_permissions¶
--principal_based_permissions=[]
(list of strings)
Configures the permissions for each principal. This option provides a generic mechanism for configuring permissions where principals are cryptographically authenticated through some other mechanism, such as TLS client certificates or OAuth 2.0 bearer tokens.
Each value must specify a principal and a role as principal->role
. Principals can be specified directly (e.g. alice@example.com
, bob
), or as all users in a a domain (e.g. *@example.com
), or everyone *
(just the star character). These are the only supported wildcard functions of the *
character. Roles must be one of none
, admin
, user
, cache-reader
, and cache-writer
.
Permissions are evaluated based on most-specific to least-specific rather than in the order specified. Therefore, an exact principal match wins over a domain-based match, and the default setting applies only if no other rule applies.
Note that some authentication mechanisms implicitly refuse a connection if the client principal cannot be determined.
profile_to_event_store¶
--profile_to_event_store=true
(boolean; previous name: --experimental_profile_to_event_store
)
Ignored if --enable_bes=false
. If set to true, the scheduler collects server-side profiling information, aggregates the data by build id, writes it to the event store, and provides the profile for download as a Chrome json profile in the UI. This can help troubleshoot performance issues in a build.
public_bind_to_any¶
--public_bind_to_any=true
(boolean)
Whether to configure the public port to listen on all local IPs. If set to false, then the scheduler nodes will only listen on the internal IP addresses specified with --private_ip_selector
. DO NOT leave this at true for clusters that are connected to the public internet and that do not have authentication configured.
public_port¶
--public_port=8080
(integer)
The public port on which this cluster listens for gRPC and HTTP connections.
By default, this port serves both types of requests. You can use --http_public_port
to serve HTTP requests on another port, for example to route them through a proxy.
Note that typical Linux installations prevent non-root processes from listening on ports 0-1024.
strict_http_headers¶
--strict_http_headers=true
(boolean)
If set to true, sets a number of HTTP headers on all HTTP responses to improve the security of the UI, such as X-Frame-Options
and Content-Security-Policy
.
tls_certificate¶
--tls_certificate=
(string)
The path to the TLS certificate chain (in base-64 encoded X.509 format with OpenSSL BEGIN/END CERTIFICATE guards) to be used by the schedulers to authenticate themselves to clients on the public cluster port(s) (--public_port
and, if specified, --http_public_port
).
A certificate and key are required to support encrypted connections. If this is a self-signed certificate, then you also have to configure the client with the same certificate. If you want to use unencrypted connections, you have to set --insecure=true
.
tls_cipher_suites¶
--tls_cipher_suites=[TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256, TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256]
(list of strings)
Configures the set of ciphers that are supported on incoming TLS connections (server-side). The default list follows Mozilla's recommendations.
tls_key¶
--tls_key=
(string)
The file name of the TLS key (in base-64 encoded binary PKCS#8 format with OpenSSL BEGIN/END PRIVATE KEY guards) that matches the certificate given as --tls_certificate
.
tls_trusted_certificate¶
--tls_trusted_certificate=
(string)
Required when --client_auth=mtls
. The file name or secretstore URL of a certificate that is used by the schedulers to authenticate clients (aka mutual TLS authentication or mTLS).
You can generate client certificates yourself and sign them with the corresponding private key, or you can pass the key via --tls_trusted_key
and set --mtls_expiration
to allow logged-in users to generate their own certificates via the web UI.
In addition, you have to grant permissions to those authenticated clients using the --principal_based_permissions
flag.
If you want to provide more than one trusted certificate, you can pass additional file names or secretstore URLs via the --alternative_tls_trusted_certificate
flag.
Bazel supports TLS client authentication as of version 3.1: use Bazel's --tls_client_certificate
and --tls_client_key
options to enable client authentication.
tls_trusted_key¶
--tls_trusted_key=
(string)
The file name or secretstore URL of the TLS key that matches the certificate given as --tls_trusted_certificate
. If provided and if --mtls_expiration
is not zero, then logged-in users can generate their own client certificates via the web UI, which are signed with the key provided here.
The key provided here must match the certificate provided as --tls_trusted_certificate
.
Options to configure service discovery¶
cluster_name¶
--cluster_name=default
(string)
Only used when --discovery=gcp
or aws
. The cluster name used to auto-detect instances belonging to the same cluster. All instances must be tagged as engflow_re_cluster_name=[cluster_name]
and scheduler instances must additionally be tagged as engflow_re_scheduler_name=[cluster_name]
.
common_hazelcast_partition_count¶
--common_hazelcast_partition_count=0
(integer)
Count of Hazelcast partitions in the common cluster. This option is not safe to change when the cluster is running. See https://docs.hazelcast.com/hazelcast/5.1/capacity-planning#partition-count.
discovery¶
--discovery=multicast
(one of: {gcp
, aws
, k8s
, static
, multicast
})
Select the discovery mechanism to use. This usually matches the platform that the software runs on.
gcp_zones¶
--gcp_zones=
(string)
Only used when --discovery=gcp
. A comma-separated list of GCP zones in which to look for instances. If unset, discovery only searches the current zone where this instance runs.
hazelcast_aws_az¶
--hazelcast_aws_az=
(string)
Only used when --discovery=aws
. If specified, then --aws_region
is ignored.
Selects the AWS availability zone to scan for instances. If unset, the current zone is detected from the EC2 Instance Metadata Service.
hazelcast_aws_region¶
--hazelcast_aws_region=
(string; previous name: --aws_region
)
Only used when --discovery=aws
, ignored when --hazelcast_aws_az
is specified. Selects the AWS region to scan for instances. If unset, the current region is used.
hazelcast_die_on_demotion¶
--hazelcast_die_on_demotion=true
(boolean)
Crash the process if the frontend Hazelcast cluster member within it transitions from a master to a non-master member.
incompatible_use_low_offsets¶
--incompatible_use_low_offsets=false
(boolean)
Hazelcast requires one or two ports in addition to the private ports (and the public ports for schedulers); if this is set, then it uses private_port + 1
(all instances) and private_port + 2
(schedulers). This flag must be set identically across all instances in the same cluster; changing the value is an incompatible change.
k8s_all_pods_service¶
--k8s_all_pods_service=null
(string)
Only used when --discovery=k8s
. Name of the Kubernetes NodePort service that connects to all Pods.
k8s_master¶
--k8s_master=null
(string)
Only used when --discovery=k8s
. DNS or IP:port of the Kubernetes Master. Leave it empty to use the default; usually you don't need to specify this flag.
If some pods can't discover others and print errors like Failure in executing REST call (...) Caused by: java.net.UnknownHostException: kubernetes.default.svc
, then override this flag with https://IP:port
where IP and port are that of the Kubernetes Master (see output of kubectl cluster-info
).
k8s_namespace¶
--k8s_namespace=engflow-re
(string)
Only used when --discovery=k8s
. Name of the Kubernetes namespace. All Kubernetes objects should be in this namespace, and it must match the namespace value in the yaml files.
k8s_scheduler_pods_service¶
--k8s_scheduler_pods_service=null
(string)
Only used when --discovery=k8s
. Name of the Kubernetes NodePort service that connects to all scheduler Pods.
static_cas_node¶
--static_cas_node=[]
(list of strings)
Only used when --discovery=static
. IP address and port of another CAS node (scheduler or worker), e.g. 1.2.3.4:5678
. The port must be that instance's --private_port
+ 2000. This instance joins that instance's cluster.
You don't have to list all instances' IP and port, but at least one that you list must be online so this one can join. The more instances you list, the less sensitive your cluster will be to machine start order. If you omit the port, nodes may fail to form a cluster. Also see --incompatible_use_low_offsets
.
static_scheduler¶
--static_scheduler=[]
(list of strings)
Only used when --discovery=static
. IP address and port of another scheduler (for the schedulers-only cluster), e.g. 1.2.3.4:5678
. The port must be that instance's --private_port
+ 1000. This instance joins that instance's cluster.
You don't have to list all instances' IP and port, but at least one that you list must be online so this one can join. The more instances you list, the less sensitive your cluster will be to machine start order. If you omit the port, nodes may fail to form a cluster. Also see --incompatible_use_low_offsets
.
Options to configure the CAS¶
bytestream_chunk_size¶
--bytestream_chunk_size=1mib
(capacity; previous name: --bytestream_read_chunk_size
)
Size of file chunks streamed by the ByteStream Read gRPC API.
cas_path¶
--cas_path=/tmp/base/
(string)
The path under which the local CAS is stored and local execution trees are created. The local CAS and the local execution trees should be on the same file system to support hard-links and atomic file moves.
default_replica_timeout¶
--default_replica_timeout=1h
(duration)
The duration for which replicas are retained. Expired replicated files may be deleted if space is needed for new files. This applies to all CAS writes and existence checks, either initiated by a client, or initiated by a worker to store action outputs. Therefore, this needs to be set conservatively to the longest required duration - at a minimum, it should be set to the longest duration a single build can take. As of 2020-04-09, this is the only way to set replica durations.
disk_size¶
--disk_size=0
(capacity)
The total disk size. If this is set to a non-zero value, then the CAS and replica sizes are computed automatically based on this number. Specifically, we set the total CAS size (--max_cas_size
) to 80% of the given number (effectively reserving 20% of the space for the OS), minus the number of workers (--worker_config
) times the maximum output tree size (--max_output_size
). We set the max replica size to half that number.
If this flag is set to a non-zero value, then the --max_cas_size
and --max_replica_size
options are ignored. If neither this nor --max_cas_size
and --max_replica_size
are set, the total disk size is derived from the size of the volume --cas_path
is on.
enable_distributed_cas¶
--enable_distributed_cas=true
(boolean)
Whether this instance should participate in the distributed CAS. If this is true, the instance makes some or all local files available to other instances in the cluster. If false, the instance does not make local files available. However, it still uses the local disk to cache files for local use. The main use case for disabling this flag is for satellite clusters where a subset of machines is remote to the majority of the cluster and should not make their files available to the main cluster. Note that these instances can still pull files from the other instances in the main cluster, not just from external storage.
experimental_async_storage_uploads¶
--experimental_async_storage_uploads=false
(boolean)
If false, wait for successful uploads to both the distributed CAS and external storage. If true, do not wait for uploads to external storage to complete.
experimental_force_lru¶
--experimental_force_lru=true
(boolean)
Only used when --external_storage=none
.
This flag selects between two policies for storing metadata in the distributed CAS. If disabled, the metadata expires some time after the default replica life time. If enabled, the cluster evicts metadata based on available space rather than time. That means that metadata may be lost before the replica life time expires if there are a lot of files in the cluster, but has the advantage that metadata can stay around for longer if there is space available.
max_cas_size¶
--max_cas_size=0
(capacity)
The maximum total size of the local CAS, including replicas and locally cached files. The local CAS keeps files as long as possible, and only evicts them when this value is exceeded. Therefore, this needs to be smaller than the total available disk space by at least the number of local executors times the maximum output size per action when using hardlinks for inputs, or the combined input and output size when using copies (see --worker_config
, --max_input_size
, and --max_output_size
). This flag is ignored if --disk_size
is not set to 0.
max_replica_size¶
--max_replica_size=0
(capacity)
The part of the local CAS that is available to replicas. That is, the total space used by replicas on the local machine may not exceed this value. This must be less than the CAS size minus the number of local executors times the maximum input size per action (see --worker_config
and --max_input_size
); otherwise the worker can run out of disk space. This flag is ignored if --disk_size
is not set to 0.
recover_cas_blobs¶
--recover_cas_blobs=true
(boolean)
Transitional option to roll out a bugfix.
If true, workers will scan the --cas_path
for left-behind (or pre-loaded) CAS content.
If false, workers ignore such blobs. Please let EngFlow know if you find the need to disable this flag.
replica_count¶
--replica_count=1
(integer)
The number of replicas for each CAS entry corresponding to a file that has a retention duration (see --default_replica_timeout
). This must not exceed the number of nodes that participate in the distributed CAS (typically the same as the worker nodes). The system automatically re-replicates files if an existing node is lost, as long as the file does not exceed its retention duration (measured from the time it was written or existence-checked). As of 2020-04-09, the maximum supported --replica_count
is 3.
replica_tracker_heap_size¶
--replica_tracker_heap_size=200mib
(capacity)
Size allocated for replica tracker entries in worker's JVM heap.The value provided is truncated to the nearest mebibyte.
use_linux_acls¶
--use_linux_acls=false
(boolean)
If this is true, then the worker sets ACLs on the work directory to allow actions to access files in the input tree and itself to access files in the output tree, regardless of ownership. This makes it possible to run actions as another user, e.g., under Docker.
Enabling this flag requires the setfacl
tool to be available on the host machine (e.g., on Debian, by installing the acl
package) and that the file system supports ACLs (newer versions of Debian and Ubuntu support ACLs by default). Do not enable this flag on MacOS or Windows.
verify_cas_blobs_on_startup¶
--verify_cas_blobs_on_startup=none
(one of: {none
, blocking
})
Only used when --recover_cas_blobs=true
.
If set to 'blocking', workers verify all CAS blobs on startup, deleting any entries that are inconsistent with the expected digest.
If set to 'none', workers do not verify the CAS blobs. This can significantly reduce worker startup time, especially if the CAS is large.
workers_handle_fallback_requests¶
--workers_handle_fallback_requests=true
(boolean)
If true, bytestream read requests for digests that do not exist in the CAS will be forwarded to a random CAS node. That CAS node will try to serve the read request from external storage. If successful, the CAS node will replicate the blob locally for future use.
Options to configure backup storage¶
cas_existence_cache_expiry¶
--cas_existence_cache_expiry=0s
(duration; previous name: --experimental_cas_existence_cache_expiry
)
Used only when --external_storage
is not none
. Specifies the maximum time existence cache hits are kept in external storage.
The default of 0 means that entries can be kept indefinitely; this is safe because the external storage GC explicitly flushes the cache when switching to a new generation, and items are only deleted from the old generation.
Note that the existence cache size is set by --cas_existence_cache_max_size
; that is the recommended way to limit memory consumption by the cache.
Note the related flag --local_cas_existence_cache_expiry
applies only to the existence cache for the distributed CAS.
cas_existence_cache_max_size¶
--cas_existence_cache_max_size=10000000
(integer; previous name: --experimental_cas_existence_cache_max_size
)
Used only when --external_storage
is not none
. Specifies the maximum number of entries in the CAS existence cache. Setting a higher value increases memory use (~100 bytes / entry) but can significantly reduce the number of calls and upload traffic to the storage backend. Setting this value to 0
disables the cache; setting it to -1
means no upper bound.
Note the related flag --cas_existence_cache_expiry
to set the expiration time.
experimental_read_timeout¶
--experimental_read_timeout=2m
(duration)
Sets a timeout for proxy calls that acts as a fail-safe if the client reads very slowly, or if it does not propagate cancellation correctly; several versions of Bazel have this bug.
external_storage¶
--external_storage=none
(one of: {none
, gcs
, s3
})
The kind of external storage to use to back up replicas, in addition to storing them on the worker machines. none
means no backup, gcs
means Google Cloud Storage (GCS), s3
means Amazon S3.
Deprecation: the values gcp
and aws
(synonyms for gcs
and s3
) are also supported, but deprecated. They will no longer be supported in version 2.0 and later.
external_storage_gc_window_days¶
--external_storage_gc_window_days=0
(integer)
Number of days to keep unused external blobs. A non-zero value enables a 'generational' garbage collector; a new generation is created every N days, with reads being served from both the current and previous generation and any such files copied to the current generation. Data that is older than one generation is deleted automatically.
external_storage_scheduler_threads¶
--external_storage_scheduler_threads=50
(integer)
Only used when --external_storage
is not none
. Specifies how many threads to use to serve external storage requests on schedulers. The value is a positive integer.
external_storage_worker_threads¶
--external_storage_worker_threads=50
(integer)
Only used when --external_storage
is not none
. Specifies how many threads to use to serve external storage requests on workers. The value is a positive integer.
gcs_blobs_root¶
--gcs_blobs_root=blobs
(string)
Only used when --external_storage=gcs
. Path in the GCS bucket for blobs.
gcs_bucket¶
--gcs_bucket=null
(string)
Only used when --external_storage=gcs
. Name of the GCS bucket.
gcs_credentials¶
--gcs_credentials=null
(string)
Only used when --external_storage=gcs
. Path to the JSON file with the GCS Service Account's credentials. Can be empty if GOOGLE_APPLICATION_CREDENTIALS
is set to the JSON file's path.
gcs_project_id¶
--gcs_project_id=null
(string)
Only used when --external_storage=gcs
. Name of the GCP project ID for GCS use.
s3_blobs_root¶
--s3_blobs_root=blobs
(string)
Only used when --external_storage=s3
. Path in the S3 bucket for blobs.
If not empty, then we recommend you specify a relative path (foo/bar
) and not an absolute path (/foo/bar
). This is because Amazon S3 (and possibly other S3 implementations) treat a leading '/' to be part of the first directory segment.
We also suggest not to add a trailing /
; this is added automatically.
If the blobs root is non-empty, the final path of a blob is <blobs_root>/<subdir>/<blob>
; otherwise it is <subdir>/<blob>
.
s3_bucket¶
--s3_bucket=null
(string)
Only used when --external_storage=s3
. Name of the S3 bucket.
s3_endpoint¶
--s3_endpoint=null
(string)
Only used when --external_storage=s3
. Set this to override the computed S3 endpoint. This allows running against compatible implementations of S3.
s3_region¶
--s3_region=null
(string)
Only used when --external_storage=s3
. Name of the S3 bucket's region. Can be empty if AWS_REGION
is set to this value.
Options to configure readonly backup storage¶
experimental_readonly_read_timeout¶
--experimental_readonly_read_timeout=2m
(duration)
Sets a timeout for proxy calls that acts as a fail-safe if the client reads very slowly, or if it does not propagate cancellation correctly; several versions of Bazel have this bug.
readonly_external_storage¶
--readonly_external_storage=none
(one of: {none
, gcs
, s3
})
The kind of external storage to use to back up replicas, in addition to storing them on the worker machines. none
means no backup, gcs
means Google Cloud Storage (GCS), s3
means Amazon S3.
Deprecation: the values gcp
and aws
(synonyms for gcs
and s3
) are also supported, but deprecated. They will no longer be supported in version 2.0 and later.
readonly_external_storage_threads¶
--readonly_external_storage_threads=50
(integer)
Only used when --external_storage
is not none
. Specifies how many threads to use to serve external storage requests on workers. The value is a positive integer.
readonly_gcs_blobs_root¶
--readonly_gcs_blobs_root=blobs
(string)
Only used when --external_storage=gcs
. Path in the GCS bucket for blobs.
readonly_gcs_bucket¶
--readonly_gcs_bucket=null
(string)
Only used when --external_storage=gcs
. Name of the GCS bucket.
readonly_gcs_credentials¶
--readonly_gcs_credentials=null
(string)
Only used when --external_storage=gcs
. Path to the JSON file with the GCS Service Account's credentials. Can be empty if GOOGLE_APPLICATION_CREDENTIALS
is set to the JSON file's path.
readonly_gcs_project_id¶
--readonly_gcs_project_id=null
(string)
Only used when --external_storage=gcs
. Name of the GCP project ID for GCS use.
readonly_s3_blobs_root¶
--readonly_s3_blobs_root=blobs
(string)
Only used when --external_storage=s3
. Path in the S3 bucket for blobs.
If not empty, then we recommend you specify a relative path (foo/bar
) and not an absolute path (/foo/bar
). This is because Amazon S3 (and possibly other S3 implementations) treat a leading '/' to be part of the first directory segment.
We also suggest not to add a trailing /
; this is added automatically.
If the blobs root is non-empty, the final path of a blob is <blobs_root>/<subdir>/<blob>
; otherwise it is <subdir>/<blob>
.
readonly_s3_bucket¶
--readonly_s3_bucket=null
(string)
Only used when --external_storage=s3
. Name of the S3 bucket.
readonly_s3_endpoint¶
--readonly_s3_endpoint=null
(string)
Only used when --external_storage=s3
. Set this to override the computed S3 endpoint. This allows running against compatible implementations of S3.
readonly_s3_region¶
--readonly_s3_region=null
(string)
Only used when --external_storage=s3
. Name of the S3 bucket's region. Can be empty if AWS_REGION
is set to this value.
Options to configure the event store service¶
event_blobs_root¶
--event_blobs_root=bes
(string; previous name: --experimental_event_blobs_root
)
Relative path within the storage location under which event store blobs should be stored. For disk storage, use --event_disk_path
to change the absolute path.
event_bucket¶
--event_bucket=null
(string; previous name: --experimental_event_bucket
)
Only used when --event_storage=gcs
or --event_storage=s3
. Name of the bucket to store BEP events.
event_disk_path¶
--event_disk_path=/tmp/engflow/
(string; previous name: --experimental_event_disk_path
)
Absolute path under which event store blobs should be stored if disk storage is enabled.
event_gcp_project_id¶
--event_gcp_project_id=null
(string; previous name: --experimental_event_gcp_project_id
)
Only used when --event_storage=gcs
. Name of the GCP project ID for GCS use to store BEP events.
event_s3_endpoint¶
--event_s3_endpoint=null
(string; previous name: --experimental_event_s3_endpoint
)
Only used when --event_storage=s3
. The base URL for the S3 instance if using another service with an S3 compatible API.
event_s3_region¶
--event_s3_region=null
(string; previous name: --experimental_event_s3_region
)
Only used when --event_storage=s3
. The region that the S3 bucket is located.
event_storage¶
--event_storage=disk
(one of: {null
, in_memory
, disk
, gcs
, s3
, azure
}; previous name: --experimental_event_storage
)
The kind of external storage to use to store BEP events. DO NOT use in_memory in production environments!
Options to configure the execution service¶
action_execution_attempts¶
--action_execution_attempts=3
(integer)
How many times an action should be attempted if one of the retry conditions is true. These are controlled through separate flags, such as --experimental_retry_failure_due_to_signal
.
allow_docker¶
--allow_docker=false
(boolean)
Whether to enable dockerized execution. In order to use dockerized execution, the client also needs to send docker image ids, and the worker must have the corresponding docker images available. As of 2020-04-14, dockerized execution is only supported on Linux VMs.
This flag is ignored on macOS workers.
allow_local¶
--allow_local=false
(boolean)
Whether to enable local execution. You must enable one of --allow_local
, --allow_sandbox
, or --allow_docker
to be able to run actions at all. If multiple flags are enabled, then the strategy is selected based on the requested execution platform. In that case, the worker selects the first of docker
, sandbox
, and local
in that order.
allow_sandbox¶
--allow_sandbox=false
(boolean; previous name: --sandbox
)
Whether to enable sandboxed execution. If enabled, sandboxed execution is used for actions that do not specify a docker image. Also see --allow_local
.
This enables the use of --sandbox_binary_path
as a wrapper for each action. The behavior of the upstream linux-sandbox
binary is to create a new user namespace and init process. It can optionally create a network namespace to block network access (see --sandbox_allow_network_access
), mount a tmpfs (see --sandbox_tmpfs_dir
).
You can additionally control sandboxing features through action platform settings.
debug_execute_requests¶
--debug_execute_requests=false
(boolean)
If this is true, the worker prints the execute request in full detail to the log. This can generate very large amounts of output, so use with caution.
docker_additional_env¶
--docker_additional_env=[]
(list of strings)
A list of additional environment variables that are set in every docker container. Changes to this flag are non-hermetic, i.e., the system returns existing cache entries and does not force a rerun of the affected actions.
docker_additional_mounts¶
--docker_additional_mounts=[]
(list of strings)
A list of additional directories that are mounted into every docker container of the form /path/to/something
or /outside_path=/inside_path
. All paths must be absolute, and outside paths must exist on the local machine (where this service runs); inside paths may or may not exist.
Changes to this flag are non-hermetic, i.e., the system returns existing cache entries and does not force a rerun of the affected actions.
docker_allow_any_runtime¶
--docker_allow_any_runtime=true
(boolean)
If false, then requesting a specific runtime will fail the execution unless it is explicitly allowed using --docker_allowed_runtimes
.
docker_allow_network_access¶
--docker_allow_network_access=true
(boolean)
If true
, then actions can request access to sibling containers and the internet using the dockerNetwork
platform setting. Otherwise actions requesting such access fail.
When enabled, action execution containers that are started with dockerNetwork=standard
will be connected to a Docker bridge network. The network's name is set in the execution container as the $HOST_NETWORK_NAME
environment variable.
When disabled, the value of --docker_default_network_mode
is ignored and taken to be off
.
docker_allow_requesting_capabilities¶
--docker_allow_requesting_capabilities=true
(boolean)
If false, then requesting capabilities will fail the execution.
docker_allow_reuse¶
--docker_allow_reuse=true
(boolean)
Whether to allow reusing Docker containers. If true, we allow reusing a running Docker container for subsequent actions that specify the same image id and Docker options; otherwise we start a new container for every action. Individual actions or builds can opt-out of container reuse with the dockerReuse
platform option. Depending on the underlying machine, Docker startup can take several seconds.
docker_allow_sibling_containers¶
--docker_allow_sibling_containers=true
(boolean)
If true, then actions can request access to docker with the dockerSiblingContainers
platform setting. Otherwise actions requesting such access fail.
docker_allowed_runtimes¶
--docker_allowed_runtimes=[]
(list of strings)
Ignored if --docker_allow_any_runtime=true
. A list of runtimes that clients are allowed to set. If you want to allow the default runtime, you have to add the empty string to this list.
docker_clean_tmp¶
--docker_clean_tmp=false
(boolean)
Only used when --docker_allow_reuse=true
. Whether to clean /tmp after reusable Docker actions.
docker_content_trust¶
--docker_content_trust=false
(boolean)
Whether to enable docker's signature verification. When enabled, docker only allows running signed images.
docker_cpu_limit¶
--docker_cpu_limit=set
(one of: {none
, count
, set
})
Whether and how to limit docker action CPU usage. Use 'none' to apply no per-action limit, 'count' to set the maximum CPU usage in number of cores, and 'set' to restrict the action to a specific set of cores. Both 'count' and 'set' are computed from the --worker_config
option; 'count' simply applies the number of cores, whereas 'set' computes non-overlapping CPU masks starting at 0. We recommend using 'set' if possible, and 'count' otherwise. Use 'none' only if CPU limitation does not work for some reason. Note that the 'set' setting assumes that the worker service has full control over the machine - another process assigning the same CPUs on the same machine can lead to conflicts and performance issues.
docker_default_network_mode¶
--docker_default_network_mode=off
(one of: {off
, standard
, host
})
Only used when --allow_docker=true
. Ignored and considered to be off
if --docker_allow_network_access=false
.
Specifies the default network mode for dockerized actions that don't request any particular dockerNetwork
platform option.
docker_disallowed_capabilities¶
--docker_disallowed_capabilities=[]
(list of strings; previous name: --docker_blacklisted_capabilities
)
A list of capabilities that must not be set in execution requests. A request setting a capability provided here fails execution.
docker_drop_capabilities¶
--docker_drop_capabilities=[]
(list of strings)
A list of docker capabilities that are dropped by default in addition to those that are already dropped by docker.
docker_enable_ipv6¶
--docker_enable_ipv6=false
(boolean)
Whether to enable IPv6 for the Docker network.
docker_enforce_known_capabilities¶
--docker_enforce_known_capabilities=true
(boolean)
If true, then all capabilities that are requested to be added are checked against a list of known capabilities before they are passed to docker. If any requested capability is not known, execution fails.
docker_extra_flags¶
--docker_extra_flags=[]
(list of strings)
Extra flags to pass to docker run
.
docker_ipv6_cidr¶
--docker_ipv6_cidr=fd00::/16
(string)
Only used when --docker_enable_ipv6=true
. The subnet CIDR range for IPv6 Docker networks. Worker instances use this to generate random IPv6 subnets for each executor; each generated subnet will begin with the given prefix, and have a subnet length given by --docker_ipv6_subnet_length
. This can either be a private subnet (starting with fd00
), which does not allow any outgoing IPv6 traffic, or it can be public, in which case it should be based on the IPv6 subnet assigned to the underlying machine.
For example, if the machine uses 2001:0db8:3333:4444:5555:6666:7777:8888/64
, and this flag is set to 2001:0db8:3333:4444:ff00::/72
, and the subnet length is 96
, then the worker generates random subnets that look like 2001:0db8:3333:4444:ffXX:XXXX::/96
, with each X
replaced by a random hexadecimal digit.
Note: the value given here can be identical to the value configured in the Docker daemon's fixed-cidr-v6
configuration option.
docker_ipv6_subnet_length¶
--docker_ipv6_subnet_length=112
(integer)
Only used when --docker_enable_ipv6=true
. The subnet CIDR prefix length for IPv6 Docker networks; the generated Docker subnets will have 2^(128-X) addresses. See the documentation of --docker_ipv6_cidr
for more details.
docker_max_kernel_memory¶
--docker_max_kernel_memory=0
(capacity)
This is passed to docker to limit the amount of kernel memory available to each action. If unset, then there is no limit applied to docker; memory use is still limited by the available machine memory.
docker_max_memory¶
--docker_max_memory=0
(capacity)
Deprecated. Use --worker_config
with a ram
setting instead. If both are set, then we pass the maximum of the two values to docker to limit the amount of memory available to each action. If both are unset, then there is no limit applied to docker; memory use is still limited by the available machine memory.
docker_process_limit¶
--docker_process_limit=10000
(integer)
The maximum number of concurrent processes for a single action. This helps prevent runaway processes and fork bombs. Set to -1 for no limit, but beware this allows build actions to fork bomb.
docker_split_exec_run¶
--docker_split_exec_run=true
(boolean)
If true, the worker uses separate docker run
and docker exec
commands to run each action, which allows reusing Docker containers. If false, the service uses a single docker run
command, which disables reuse of Docker containers. Individual actions or builds can opt-out of reuse by setting the dockerReuse
platform option to False
.
docker_use_process_wrapper¶
--docker_use_process_wrapper=true
(boolean)
Whether to run Docker actions through the process wrapper. This also requires setting --process_wrapper_binary_path
. Note that this may fail at runtime if the selected Docker container is not compatible with the process-wrapper binary, which is usually linked against libc and libstdc++ among other system libraries.
experimental_always_retry_missing_worker_failures¶
--experimental_always_retry_missing_worker_failures=false
(boolean)
If enabled, schedulers will always retry if a worker returns UNAVAILABLE
.
experimental_docker_force_reuse¶
--experimental_docker_force_reuse=false
(boolean)
Whether to enforce reusing Docker containers. This is ignored if --docker_allow_reuse
is false. If both are true, then the service attempts to reuse running Docker containers regardless of the client setting for the dockerReuse
platform option.
experimental_docker_use_platform_user¶
--experimental_docker_use_platform_user=false
(boolean)
Setting this flag changes the user / group selection for actions. If this flag is false, actions are run as the same user / group as the worker service. If this flag is true, then actions are run as 'nobody:nogroup' by default, and can optionally run as 'root:root' if the dockerRunAsRoot
platform option is set to True
. Setting this flag to true additionally requires --use_linux_acls=true
; otherwise actions will fail due to file system access restrictions.
experimental_force_module_cache_path_for_mnemonics¶
--experimental_force_module_cache_path_for_mnemonics=[]
(list of strings)
A list of mnemonics for which Engflow will force a value for -fmodules-cache-path. This is currently only useful for Objective-C actions.
experimental_persistent_worker¶
--experimental_persistent_worker=false
(boolean)
Whether to enable experimental support for remote persistent workers. Persistent workers are a mechanism in Bazel to reduce startup overhead for compilers and other tools and is widely used for Java-based tools. Note that enabling support on the worker is not sufficient to use persistent workers - the client must also annotate the persistent worker inputs.
As of 2020-10-28, this requires a patched Bazel binary.
experimental_persistent_worker_and_docker¶
--experimental_persistent_worker_and_docker=true
(boolean)
Whether to enable experimental support for remote persistent workers, running inside a container. Persistent workers are a mechanism in Bazel to reduce startup overhead for compilers and other tools and is widely used for Java-based tools. Note that enabling support on the worker is not sufficient to use persistent workers - the client must also annotate the persistent worker inputs.
As of 2020-10-28, this requires a patched Bazel binary.
This flag is ignored on macOS workers.
experimental_persistent_worker_expand_param_files¶
--experimental_persistent_worker_expand_param_files=true
(boolean)
Bazel expands params files (passed with '@filename' to the worker), but considers this legacy behavior; the new '-flagfile' and '--flagfile' arguments are never expanded. This flag controls expansion for '@filename' parameters. If disabled, the service does not expand these parameters, which differs from Bazel, and may not be compatible with all persistent worker implementations.
experimental_retry_failure_due_to_signal¶
--experimental_retry_failure_due_to_signal=false
(boolean)
Whether to retry actions that fail due to a system signal (128 < exit code < 255). Use --action_execution_attempts
to control the maximum number of attempts.
experimental_retry_persistent_worker_on_error¶
--experimental_retry_persistent_worker_on_error=null
(string)
If this flag is set to a regex pattern, then persistent worker actions that returns a non-zero exit code and an error message matching the given pattern. This causes the current persistent worker to be shut down, and can therefore result in slower actions; use sparingly.
The total number of attempts is controlled via --action_execution_attempts
.
extra_xcode¶
--extra_xcode=[]
(list of strings)
A list of paths to Xcode installations.
ignore_unknown_platform_properties¶
--ignore_unknown_platform_properties=false
(boolean)
Whether to ignore unknown platform properties. If false, then actions that set unknown platform properties return an error. Otherwise such properties are silently ignored. Note that changing this flag does not affect existing entries in the action cache, i.e., the server may return cached entries even if re-executing the action would return an error due to unknown properties. All properties are part of the cache key.
incompatible_docker_prevent_memory_swap¶
--incompatible_docker_prevent_memory_swap=false
(boolean)
If enabled, prevents docker containers from using swap memory.
incompatible_remove_symlink_execroot_strategy¶
--incompatible_remove_symlink_execroot_strategy=true
(boolean)
Removes support for building exec root using symlinks.
incompatible_require_canonical_container_image¶
--incompatible_require_canonical_container_image=false
(boolean)
If enabled, the 'container-image' platform option must contain the digest of the container.
keep_exec_directories_for_debugging¶
--keep_exec_directories_for_debugging=false
(boolean; previous name: --debug_actions
)
Whether to keep the execution directories after execution for debugging. You also need terminal access to the worker machines to inspect these directories. DO NOT enable this in a production cluster. This flag is silently ignored when persistent workers are used (either with --experimental_persistent_worker
or experimental_persistent_worker_and_docker
) or when incremental exec roots are enabled (with --experimental_incremental_exec_root
).
max_download_concurrency¶
--max_download_concurrency=200
(integer)
The maximum number of concurrent downloads to a worker before an action starts. A negative or zero value indicates no limit. This may be useful to limit the CAS read load as well as preventing running out of file descriptors.
max_execution_timeout¶
--max_execution_timeout=15m
(duration)
The maximum timeout for the execution of a single action. Clients typically only set timeouts for a subset of actions such as test actions to avoid cache fragmentation. The timeout set here applies to all execution requests that do not have a timeout set. In addition, it also provides an upper bound for execution requests that do have a timeout set, i.e., requested timeouts larger than this are silently ignored.
max_input_size¶
--max_input_size=4gb
(capacity)
The maximum total size of all inputs to an action. Actions that exceed this limit are aborted during setup.
max_output_size¶
--max_output_size=4gb
(capacity)
The maximum total size of all outputs of an action. Actions that exceed this limit are aborted during or after execution.
max_upload_concurrency¶
--max_upload_concurrency=0
(integer)
The maximum number of concurrent uploads from a worker after an action completes. A negative or zero value indicates no limit. This may be useful to limit the CAS write load.
notification_period¶
--notification_period=1m
(duration)
Configures how often the service provides updates to the client about running actions. Note that this does not apply to queued actions.
operation_retention_time¶
--operation_retention_time=1m
(duration)
Configures the duration for which the worker retains a finished action before deleting it locally. The worker uses these retained entries to answer waitExecution requests in case the client disconnects during execution. A very small value can cause unnecessary action retries and execution load, and a very large value can cause excessive memory use on the worker.
process_wrapper_binary_path¶
--process_wrapper_binary_path=/usr/bin/engflow/process-wrapper
(string)
The path to a process-wrapper binary on the worker. The process-wrapper binary is part of a Bazel installation and provides improved control of action processes.
process_wrapper_cpu_limit¶
--process_wrapper_cpu_limit=none
(one of: {none
, set
})
Whether and how to limit action CPU usage when using the process wrapper. Use 'none' to apply no per-action limit, and 'set' to restrict the action to an automatically computed set of cores. We recommend using 'set' if possible. Use 'none' only if CPU limitation does not work for some reason. Note that the 'set' setting assumes that the worker service has full control over the machine, as it assigns CPUs starting at 0.
sandbox_allow_network_access¶
--sandbox_allow_network_access=true
(boolean)
If true, sandboxed actions can request network access by setting the platform option sandboxNetwork
, e.g., exec_properties = { "sandboxNetwork": "standard" }
. Otherwise, such actions fail.
sandbox_binary_path¶
--sandbox_binary_path=/usr/bin/engflow/linux-sandbox
(string)
The path to a linux-sandbox binary on the worker. The linux-sandbox binary is part of a Bazel installation and uses Linux Kernel APIs to sandbox the execution of an action process.
sandbox_grace_timeout¶
--sandbox_grace_timeout=5s
(duration)
How long to wait before sending SIGKILL after an action times out. When an action times out, we first send it SIGTERM and only send SIGKILL after this grace period. The value may be rounded up to the next larger whole second.
sandbox_tmpfs_dir¶
--sandbox_tmpfs_dir=null
(string)
Sets the location for an empty tmpfs directory inside the sandbox.
sandbox_writable_path¶
--sandbox_writable_path=[]
(list of strings)
Additional absolute paths that are writeable within the sandbox.
use_process_wrapper¶
--use_process_wrapper=false
(boolean)
Whether to enable the process wrapper for local actions. The process wrapper provides improved process control, ensuring a more consistent execution environment as well as killing all child processes reliably.
warm_containers¶
--warm_containers=true
(boolean)
Try to pull active cluster Docker containers onto worker before accepting any actions.
worker_config¶
--worker_config=auto
(string)
Configures the number and properties of local executors. Specify executor properties as a list of key-value pairs separated by commas, such as cpu=1,ram=2gb,pool=c1_m2
.
To specify multiple identical executors, prefix a set properties with a number and a *
character, such as 4*cpu=2
. To specify multiple different executors, combine them with a +
character, such as 1*cpu=3,ram=1G+2*cpu=1
(one executor with 3 cores and 1 GB of RAM, and two executors with 1 core). The comma operator has precedence over the star operator, which has precedence over the plus operator. Disable local execution by setting this flag to the empty string.
For automatic configuration, specify auto
to create an executor for each available core. This option is useful when the number of cores is not known in advance.
For manual configuration, the only supported keys are cpu
, ram
, and pool
. cpu
specifies the number of cores to reserve, ram
is silently ignored, and pool
specifies the name of the pool for the executor.
ram
must, if specified, also specify a unit (e.g., 10b
for 10 bytes, 5gib
for 5 Gibibytes).
pool
must, if specified, match the following expression: [a-z0-9_]+
.
Options to configure the result store service¶
experimental_build_index_db_threads¶
--experimental_build_index_db_threads=10
(integer)
Only used when --experimental_build_index
is enabled. Specifies how many threads to use to query the invocation index database. The value is a positive integer.
experimental_build_index_service_threads¶
--experimental_build_index_service_threads=4
(integer)
Only used when --experimantal_build_index
is enabled. Specifies how many threads to use to serve invocation index requests. The value is a positive integer.
experimental_jwt_auth¶
--experimental_jwt_auth=false
(boolean)
Enables JWT authentication for grpc-web calls (only Result Store).
experimental_record_reported_action¶
--experimental_record_reported_action=false
(boolean)
Enables recording reported actions from the BEP stream.
experimental_summarize_invocations¶
--experimental_summarize_invocations=false
(boolean)
Summarizes invocation remote resource usage..
Monitoring options¶
cloudwatch_dimensions¶
--cloudwatch_dimensions=null
(string)
Only considered when --enable_cloudwatch=true
and ignored otherwise. Sets common dimensions of reported CloudWatch metrics. The value is a comma-separated list of key-value pairs, e.g. "customer=Acme Inc.,cluster=prod", order does not matter.
cloudwatch_export_interval¶
--cloudwatch_export_interval=1m
(duration)
Configures the time between metrics exports to CloudWatch.
cloudwatch_metrics_filter¶
--cloudwatch_metrics_filter=[.*]
(list of strings)
Required when --enable_cloudwatch=true
, ignored otherwise. A list of regexes that filter metric names: a metric is reported to AWS CloudWatch only if it matches any of the regexes. Entries follow Java regex syntax. Matching is partial by default (e.g. "exec" matches every metric whose name contains this string); to match the whole metric name, use ^
and $
. If empty or not specified, then no metrics are reported.
Example: report metrics about AWS S3 use: --cloudwatch_metrics_filter+=storage\.s3/
; report download-related metrics but from any storage backend: --cloudwatch_metrics_filter+=storage\..*/download
.
Use --cloudwatch_metrics_filter=regex
to override the default.
cloudwatch_namespace¶
--cloudwatch_namespace=null
(string)
Required when --enable_cloudwatch=true
, ignored otherwise. Sets the namespace of reported metrics.
cloudwatch_region¶
--cloudwatch_region=null
(string)
Required when --enable_cloudwatch=true
, ignored otherwise. Sets the AWS region of reported metrics.
enable_cloudwatch¶
--enable_cloudwatch=false
(boolean)
Enables reporting metrics to AWS CloudWatch.
enable_metrics_log¶
--enable_metrics_log=false
(boolean)
Enables reporting metrics to the log; this can be used for testing or for log-based analytics.
enable_prometheus¶
--enable_prometheus=false
(boolean)
Enables a built-in webserver to export monitoring data to Prometheus (https://prometheus.io/). You may also need to set --prometheus_port
and configure Prometheus to start scraping from all cluster nodes.
enable_stackdriver¶
--enable_stackdriver=false
(boolean)
Enables reporting of monitoring and tracing data to StackDriver (a monitoring system integrated into Google Cloud that also supports AWS). You also need to set --stackdriver_project
and provide application default credentials that allow write access to StackDriver.
enable_zipkin¶
--enable_zipkin=false
(boolean)
Enables reporting of performance traces to Zipkin (https://zipkin.io/). You also need to set --zipkin_endpoint
.
execution_stage_latency_metrics¶
--execution_stage_latency_metrics=true
(boolean)
Whether to export execution latency metrics by stage and pool.
grpc_metrics¶
--grpc_metrics=basic
(one of: {none
, minimal
, basic
, all
})
The gRPC library provides a number of metrics that can be logged for monitoring. This option selects what subset of metrics to log. Unfortunately, logging all metrics can be expensive (e.g., on Google Cloud Operations). For the minimal setting, all completed RPCs are logged, but no latency metrics, bytes, or messages.
log_metrics_filter¶
--log_metrics_filter=[.*]
(list of strings)
Required when --enable_metrics_log=true
, ignored otherwise. A list of regexes that filter metric names: a metric is logged only if it matches any of the regexes. Entries follow Java regex syntax. Matching is partial by default (e.g. "exec" matches every metric whose name contains this string); to match the whole metric name, use ^
and $
. If empty or not specified, then no metrics are reported.
lsof_report_interval¶
--lsof_report_interval=0s
(duration)
Configures the time between reports of open file handles. Zero to disable.
monitoring_trace_probability¶
--monitoring_trace_probability=0
(float; previous name: --monitoring_sample_probability
)
Sets the probability of recording a performance trace for a given client request to a scheduler. Setting it to 0 disables tracing. Setting it to 1 enables tracing every request. Tracing a large fraction of the traffic is expensive, and should not be used for production clusters. Note that this flag is evaluated once on the scheduler for each incoming RPC call and then passed along on subsequent calls.
prometheus_bind_to_any¶
--prometheus_bind_to_any=false
(boolean; previous name: --monitoring_prometheus_bind_to_any
)
Whether to bind to any local IP. If false, then only bind to the private IP selected with --private_ip_selector
. If your cluster is connected to the public internet, then enabling this flag exposes your monitoring data publicly.
prometheus_port¶
--prometheus_port=8888
(integer; previous name: --monitoring_prometheus_port
)
Selects the local port to start a prometheus-compatible webserver on.
stackdriver_export_interval¶
--stackdriver_export_interval=1m
(duration)
Configures the time between metrics exports to StackDriver.
stackdriver_optimized_reporting¶
--stackdriver_optimized_reporting=true
(boolean)
Transitional option to enable automatic optimization of the metric export interval for each metric based on observed changes. I.e., metrics are only exported when they change rather than every interval. This can significantly reduce Stackdriver costs during periods of low cluster utilization such as nights and weekends.
stackdriver_project¶
--stackdriver_project=
(string; previous name: --monitoring_stackdriver_project
)
Selects the StackDriver project to send monitoring data to.
zipkin_endpoint¶
--zipkin_endpoint=http://localhost:9411/api/v2/spans
(string; previous name: --monitoring_zipkin_endpoint
)
Configures the zipkin endpoint to push performance traces to.
Options to configure logging to external services¶
aws_log_group_name¶
--aws_log_group_name=null
(string)
Only used if --remote_logging_service=aws_cloudwatch
. The name of the AWS log group, which must already exist.
gcp_log_autodetect¶
--gcp_log_autodetect=true
(boolean)
Only used if --remote_logging_service=google_cloud_operations
. Whether to automatically detect log labels for this process, like the instance name and availability zone. If you log to GCP from outside of GCP, the automatic detection does not work correctly - in that case, set this flag to false.
gcp_log_project_id¶
--gcp_log_project_id=null
(string)
Only used if --remote_logging_service=google_cloud_operations
. The GCP project id to log to. Instances that run on GCP automatically detect the current project; you can use this flag to override the automatically detected project id, or provide one explicitly if the instance is not running on GCP.
remote_log_level¶
--remote_log_level=info
(one of: {off
, severe
, warning
, info
, verbose
, all
})
The verbosity level of remote logging.
remote_logging_service¶
--remote_logging_service=none
(one of: {none
, google_cloud_operations
, aws_cloudwatch
})
The external service to log to.
Threading options¶
default_thread_pool_size¶
--default_thread_pool_size=0
(integer)
The size for the default
executor.
A value <=
indicates using the number of CPU cores of the machine.
disk_thread_pool_size¶
--disk_thread_pool_size=0
(integer)
The size for the disk
executor.
A value <=
indicates using the number of CPU cores of the machine.
network_thread_pool_size¶
--network_thread_pool_size=0
(integer)
The size for the network
executor.
A value <=
indicates using the number of CPU cores of the machine.
slow_task_threshold¶
--slow_task_threshold=1s
(duration)
Length of execution time before a task is considered slow.
Appendix: flag syntax¶
Duration flags¶
You can specify a duration in milliseconds, seconds, minutes, hours, or days. Use the suffix
ms
, s
, m
, h
, or d
respectively:
Capacity flags¶
You can specify a capacity in Bytes, in KiloBytes, MegaBytes, or GigaBytes,
and in KibiBytes, MebiBytes, or GibiBytes. Use b
or no suffix at
all for Bytes; use the suffixes kb
, mb
, or
gb
for decimal units (1000 multipliers), and kib
,
mib
, or gib
for the binary units (1024 multipliers).
Upper and lower-case are considered the same.
Text Only | |
---|---|
List flags¶
You can specify list flags multiple times. The +=
operator adds another value, and
the =
operator drops all accumulated (or default) values:
Text Only | |
---|---|