Standing up a Bazel mirror with Cloudflare R2


Either directly or indirectly via. other rules, the http_archive rule is used a lot and often pulls from GitHub.com which leads to a high chance that you’ll see GitHub returning 429’s:

java.io.IOException: Error downloading [https://github.com/bats-core/bats-core/archive/v1.10.0.tar.gz] to ...: GET returned 429 Too Many Requests

The http_archive rule takes an array of urls, trying them one-by-one:

A list of URLs to a file that will be made available to Bazel. Each entry must be a file, http or https URL. Redirections are followed. Authentication is not supported. URLs are tried in order until one succeeds, so you should list local mirrors first. If all downloads fail, the rule will fail.

This is great for our direct usages but we only have two of them, every other usage of this rule is downstream hidden inside other rules.

A single run of our merge request pipelines resulted in ~63 objects being cached in R2 which isn’t far off GitHub’s 60 requests per hour per IP limit for unauthenticated requests.

Whilst you can authenticate with either a personal access token or app token to raise this limit, it does not solve for our reliance on GitHub as the sole source of these dependencies.

Luckily, Bazel has a way for you to add mirrors to all of these without resorting to patches - the —downloader_config:

Specify a file to configure the remote downloader with. This file consists of lines, each of which starts with a directive (allow, block or rewrite) followed by either a host name (for allow and block) or two patterns, one to match against, and one to use as a substitute URL, with back-references starting from $1. It is possible for multiple rewrite directives for the same URL to be given, and in this case multiple URLs will be returned.

In our configuration, we can do something like this:

rewrite (github.com)/(.*) bazel-mirror.example.com/$1/$2
rewrite (github.com)/(.*) $1/$2

If we break that first host (s/mirror/miror), we can see that Bazel will move onto trying GitHub directly:

Terminal window
$ bazelisk --downloader_config=tools/bazel_downloader.cfg build //foo
(20:12:44) INFO: Invocation ID: 8d597222-52d2-4ac7-a88e-fd9c01f5220f
(20:12:44) INFO: Current date is 2026-02-09
(20:12:44) WARNING: Download from https://bazel-miror.example.com/.../ failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException Unknown host: bazel-miror.example.com
(20:12:55) Analyzing: target //foo (0 packages loaded, 0 targets configured)
Fetching repository @@toolchains_llvm++llvm+llvm_toolchain_llvm; starting 10s
Fetching https://github.com/.../download/llvmorg-17.0.6/clang%2Bllvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04.tar.xz; 354.2 MiB (37.2%) 10s

The Worker is extremely simple, using R2 as a pull-through cache:

interface Env {
BUCKET: R2Bucket;
}
export default {
async fetch(request, env, ctx): Promise<Response> {
const pathname = new URL(request.url).pathname.slice(1);
const target = new URL(`https://${pathname}`).href;
const hit = await env.BUCKET.get(target);
if (hit) {
const headers = new Headers();
hit.writeHttpMetadata(headers);
return new Response(hit.body, {
headers,
});
}
const response = await fetch(target, {
...request,
redirect: "follow",
});
if (!response.ok) {
return response;
}
const [one, two] = response.body.tee();
ctx.waitUntil(
env.BUCKET.put(target, one, {
httpMetadata: response.headers,
}),
);
return new Response(two, response);
},
} satisfies ExportedHandler<Env>;

In the event there is a miss, the Worker will fetch the file & stream the response into R2 and the eyeball simultaneously:

Bazel Mirror Pull-Through Cache

Otherwise the file is returned directly from R2:

Bazel Mirror Cache

Back to home