How do I capture a Bazel profile?¶
When you are generating a Bazel profile to evaluate what improvements might be
achieved with EngFlow Remote Execution, a clean build gives the best signals.
The easiest way to achieve this is to:
1. Run Bazel for the desired target:
* Bazel 5.x and above:
Additionally, do not run other tasks on the machine to avoid CPU contention.
The resulting file (
/tmp/profile.json.gz) can be inspected in Chrome's profile
When using build wrappers such as Make or Ninja
Bazel overwrites existing profile files. As of Bazel 5.1.0 it is not possible to append to an existing profile.
If your main build tool is Make or Ninja and they call Bazel multiple times in a
build, then you need to specify a different path for the profile for each Bazel
run by passing a different
--profile flag every time. Otherwise Bazel will
keep overwriting the profile.
One possible solution is to replace the real Bazel binary with a script that
calls the real Bazel and appends
Why do I get PERMISSION_DENIED when trying to write to the cache?¶
The default service configuration allows remote execution but not remote caching. This is more secure.
A user with write access to the cache can write anything, including malicious code and binaries, which can then be returned to other users on cache lookups.
By comparison, remotely executed actions are typically sandboxed on a remote machine, which the user does not have direct control over. Since the cache key is a cryptographic hash of all input files, the command line, and the environment variables, it's significantly harder to inject malicious data into the action cache.
In order to allow remote cache access, you need to adjust the permissions
settings for the service depending on your authentication configuration
If you are using
--client_auth=gcp_rbe, then you need to adjust
permissions in the GCP IAM console.
--principal_based_permissions to configure per-user
If you disable client authentication (
can add the following line to your configuration:
When running on GCP, why can't it pull my image from gcr.io?¶
While the GCP workers are authenticated with gcloud out of the box, images uploaded to gcr.io are not world-readable by default. You should check that the EngFlow RE role account has access to the image (or give it access if necessary).
See the Google Container Registry documentation for more details: https://cloud.google.com/container-registry/docs/access-control
The EngFlow RE role account is typically named:
What if I get
"clone": Operation not permitted from the sandbox?¶
clone(2), and may fail if the current user has insufficient privileges to use
If you run the Remote Execution service on Kubernetes or in Docker containers, or on a host where unprivileged user namespaces are disabled, sandboxed actions may fail with this error:
If you run the service in a container, you can try running it in privileged
You can also try enabling unprivileged user namespaces in the kernel (Debian):
Why do actions hang for 5 minutes then fail with
RESOURCE_EXHAUSTED: Max queue time exhausted?¶
The rule was probably requesting an
Executor Pool that didn't exist. You
can verify this theory if you override
--max_queue_time_in_empty_pool to 30s for example, retry the
build, and check if the same action fails exactly after that timeout.
How can I force a full rebuild to measure clean build performance?¶
bazel clean, then build again with
Why do I get "403 Forbidden" errors from S3?¶
If you see the error on the client side:
then you need to add s3 permissions to the IAM role policy:
What if my C++ compilation fails with
missing dependency declarations errors?¶
The culprit is that the selected C++ toolchain's
cc_toolchain_config.cxx_builtin_include_directories is missing that
/usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/ to that list.
What if my Java tests fail with
SecurityException: Can't read cryptographic policy directory?¶
Bazel had a bug (https://github.com/bazelbuild/bazel/issues/9189) before release
4.0.0 that caused the security configuration files be excluded from the uploaded
JDK when using
--javabase=@remotejdk15_macos//:jdk, or similar. We therefore recommend
upgrading to Bazel 4.0.0 or later.
How large of a disk should I attach to my EngFlow Virtual Machines?¶
We recommend giving schedulers 16GB of disk space. This need only be large enough to hold the operating system plus a little extra for logs and other resources.
Workers, however, should be given roughly 50GB per executor. So, for example,
if you have set your worker config to
--worker_config=4*cpu=2 you should give
each worker box a disk with at least 200GB (50GB * 4 executors).
Bazel: What if I get
Invalid action cache entry errors?¶
If you see a similar error to this:
then it's due to a Bazel 5.1 bug: https://github.com/bazelbuild/bazel/pull/15151
Downgrading to 5.0 or upgrading beyond 5.1.0 should fix the issue.