EngFlow profile¶
What is an EngFlow profile and what do you need it for?¶
The EngFlow profile is a build artifact that engflow automatically produces for builds running remotely. It is the server-side profile of a build: it provides granular insight into the server-side events for each action that ran remotely.
This artifact is complementary to a Bazel profile, which is the client-side profile that shows everything else in a build, like system resource consumption and critical path.
The EngFlow profile is especially useful when:
- measuring resource utilization of an action (memory, CPU, IO, ...)
- measuring how much docker image sizes impact builds
- measuring costs of container bootstrap
- measuring and debugging remote-persistent-workers
- ... and much more!
Format and schema¶
The file format is the same as a Bazel profile, but it has a somewhat different schema with different events and attributes. Event can be distinguished by category as following:
worker_time events¶
These events are used to mark an end-to-end execution of an action, grouping together the various phases of the lifecycle of an action running on remote execution. Events of this type have a very extensive list of attributes, covering an action's setup as well as its performances.
Generic attributes¶
These attributes can be found in every worker_time event:
- action_digest: the unique identifier for an action, computed by concatenating hash and length of an action's proto.
- action_mnemonic
- action_pool: the pool on which said action ran
- input_tree_stats: statistics on IO throughput, size and so on of the input tree that is staged prior to action execution
- output_tree_stats: the counterpart to input_tree_stats for the outputs generated by running an action
Process wrapper attributes¶
When running actions on remote-execution we can optionally run an action's command through Bazel's process wrapper. This allows us to capture metrics from getrusage. This isn't enabled by default and is only available on linux. Reach out to our EngFlow engineers to have it enabled.
- process_wrapper.execution_statistics.resource_usage.utime_sec: user CPU time used, seconds part
- process_wrapper.execution_statistics.resource_usage.utime_usec: user CPU time used, microseconds part
- process_wrapper.execution_statistics.resource_usage.stime_sec: system CPU time used, seconds part
- process_wrapper.execution_statistics.resource_usage.stime_usec: system CPU time used, microseconds part
- process_wrapper.execution_statistics.resource_usage.maxrss: maximum resident set size (in kilobytes)
- process_wrapper.execution_statistics.resource_usage.minflt: page reclaims (soft page faults)
- process_wrapper.execution_statistics.resource_usage.majflt: page faults (hard page faults)
- process_wrapper.execution_statistics.resource_usage.inblock: block input operations
- process_wrapper.execution_statistics.resource_usage.oublock: block output operations
- process_wrapper.execution_statistics.resource_usage.nvcsw: voluntary context switches
- process_wrapper.execution_statistics.resource_usage.nivcsw: involuntary context switches
Input Tree Stats attributes¶
DirectoryStats¶
- maxDepth: Maximum depth of the input tree.
- totalDirs: Total number of directories in the input tree.
- totalFiles: Total number of files in the input tree.
- totalMetadataSize: Total size of the input tree datastructure (merkle tree). Measured in bytes.
- totalFileSize: Total size of contents of the input tree as stored on disk, including symlinks. Measured in bytes.
- distinctDigestTotalSize: Total size of contents of the input tree as stored on disk, excluding the symlinks. Measured in bytes.
AggregatedReadStatistics¶
- localCas: Number of files read from the worker's CAS volume where the action is executed.
- distributedCas: Number of files read from other worker's CAS volumes via a distributed map.
- distributedCasSpeed: Total number of input tree bytes read from distributed CAS divided by total duration. Measured in Megabytes per second.
- distributedCasLongestDownload: Longest download duration amongst all the input files downloaded from distributed CAS. Unit of measure is specified in the value, usually in seconds.
- externalStorage: Number of files read from external storage, i.e. S3 for AWS or GCS for GCP.
- externalStorageSpeed: Total number of input tree bytes read from external storage divided by total duration. Measured in Megabytes per second.
- externalStorageLongestDownload: Longest download duration amongst all the input files downloaded from external storage. Unit of measure is specified in the value, usually in seconds.
Output Tree Stats attributes¶
- filesUploaded: Number of output files uploaded as action output.
- bytesUploaded: Total size of the files uploaded as action output. Measured in bytes.