Move your tar.xz archives to tar.zst for faster Bazel extraction
As part of trying to find low-hanging fruit in CI, there was an obvious pattern: on a fresh runner, job that used Bazel had a runtime floor of a few minutes.
The first Bazel invocation moved quickly at first, then sat in analysis with no actions running:
(19:55:30) Analyzing: 14 targets (5 packages loaded, 0 targets configured)[0 / 1] checking cached actions(19:55:35) Analyzing: 14 targets (102 packages loaded, 21 targets configured)[1 / 1] no actions running(19:55:41) Analyzing: 14 targets (102 packages loaded, 21 targets configured)[1 / 1] no actions running(19:55:46) Analyzing: 14 targets (102 packages loaded, 21 targets configured)[1 / 1] no actions running(19:55:51) Analyzing: 14 targets (102 packages loaded, 21 targets configured)[1 / 1] no actions running(19:55:56) Analyzing: 14 targets (102 packages loaded, 21 targets configured)[1 / 1] no actions running(19:56:01) Analyzing: 14 targets (102 packages loaded, 21 targets configured)[1 / 1] no actions running(19:56:06) Analyzing: 14 targets (102 packages loaded, 21 targets configured)[1 / 1] no actions running(19:56:11) Analyzing: 14 targets (102 packages loaded, 21 targets configured)[1 / 1] no actions running(19:56:17) Analyzing: 14 targets (102 packages loaded, 21 targets configured)[1 / 1] no actions running(20:00:30) Analyzing: 14 targets (106 packages loaded, 40 targets configured)[1 / 1] no actions running(20:00:35) Analyzing: 14 targets (513 packages loaded, 18227 targets configured)[1 / 1] no actions runningThat is a roughly 4m55s gap during cold analysis before the rest of the build graph starts loading.
What is the culprit?
In our Bazel setup, we register LLVM as a global toolchain.
llvm = use_extension("@toolchains_llvm//toolchain/extensions:llvm.bzl", "llvm")llvm.toolchain( name = "llvm_toolchain", llvm_version = "21.1.8",)
use_repo(llvm, "llvm_toolchain")use_repo(llvm, "llvm_toolchain_llvm")
register_toolchains("@llvm_toolchain//:all")register_toolchains("@llvm_toolchain_llvm//:all")In our case, that 4m55s is the LLVM .tar.xz archive being downloaded from GitHub and extracted.
It took roughly 2.5 minutes to download the archive and 3 minutes to extract it.
How can we improve it?
For both improving the download speed and giving us a fallback host, we setup a mirror using Cloudflare R2.
Let’s focus on the extraction part as any download speed improvement will benefit both equally.
First of all, we just took the .tar.xz distribution and repackaged it into .tar.zst:
xz -dc LLVM-21.1.8-Linux-X64.tar.xz | zstd -T0 -3 -o LLVM-21.1.8-Linux-X64.tar.zstThe top-level directory inside the archive remains the same, so the Bazel strip_prefix remains the same:
LLVM-21.1.8-Linux-X64/LLVM-21.1.8-Linux-ARM64/LLVM-21.1.8-macOS-ARM64/The last step is extending the existing llvm.toolchain block with our new checksums, prefixes and URLs:
llvm.toolchain( name = "llvm_toolchain", llvm_version = "21.1.8", sha256 = { "darwin-aarch64": "4b6ac1f93740a70fea6e5cba9675270f979d6a8a2516a2cae761b85e48a6b56f", "linux-aarch64": "6d30aec67aea9c70d6350de46f6e8472d6b3ccf4f49670255696fcf492912097", "linux-x86_64": "de7b02bbfd45f333b1e99a797392c2d38ce7f5cf1c04295ba24545dbace06b7c", }, strip_prefix = { "darwin-aarch64": "LLVM-21.1.8-macOS-ARM64", "linux-aarch64": "LLVM-21.1.8-Linux-ARM64", "linux-x86_64": "LLVM-21.1.8-Linux-X64", }, urls = { "darwin-aarch64": ["https://.../LLVM-21.1.8-macOS-ARM64.tar.zst"], "linux-aarch64": ["https://.../LLVM-21.1.8-Linux-ARM64.tar.zst"], "linux-x86_64": ["https://.../LLVM-21.1.8-Linux-X64.tar.zst"], },)Results
Our next cold Bazel run showed a promising saving, predominantly from the faster extraction:
| Archive source | Download | Extraction | Download + extraction |
|---|---|---|---|
GitHub .tar.xz | 2m27s | 2m58s | 5m25s |
R2 mirror .tar.zst | 1m13s | 44s | 1m57s |
| Saving | 1m14s | 2m14s | 3m28s |
The extraction step is just over 4x faster, or roughly a 75% reduction in extraction time.
LLVM will soon provide .tar.zst releases
LLVM has accepted a change to add zstd archives to GitHub releases: llvm/llvm-project#186526.
In the future, you won’t have to go through this dance yourself as it’ll already be packaged up for you!