Progressive S3 to Cloudflare R2 migration using Workers
R2, Cloudflare's competitor to S3 and other object stores, is now in open beta! In my opinion, there's three things that make R2 stand out from the rest:
1. There are no egress fees. Yep, absolutely none - you are only charged for storage and operations.
2. Global distribution as standard with region control on the roadmap.
3. There's a generous free tier which makes it great for small projects.
Free | Paid - Rates | |
---|---|---|
Storage | 10 GB / month | $0.015 / GB-month |
Class A Operations | 1,000,000 requests / month | $4.50 / million requests |
Class B Operations | 10,000,000 requests / month | $0.36 / million requests |
But just because R2 has free egress doesn't mean that your current object store does - and that throws a spanner into any migration plans. Ideally, you'd transfer assets as and when they're requested so you can handle the migration over time & not fuss over files that aren't accessed frequently.
Progressive migration
The R2 team is planning to release automatic migration from S3 already, as discussed in the announcement blog post.
To make this easy for you, without requiring you to change any of your tooling, Cloudflare R2 will include automatic migration from other S3-compatible cloud storage services. Migrations are designed to be dead simple. After specifying an existing storage bucket, R2 will serve requests for objects from the existing bucket, egressing the object only once before copying and serving from R2. Our easy-to-use migrator will reduce egress costs from the second you turn it on in the Cloudflare dashboard.
The only issue? That isn't available yet. R2 has only just reached the open beta phase - so that's to be expected - but that doesn't stop us from implementing it on our own.
Using Cloudflare Workers
Workers have a great integration with R2, much like the existing KV and Durable Object bindings, so it's the platform of choice for this.
We're going to assume that you already have a Cloudflare account and have purchased the R2 plan. There's a free tier, it'll just want a payment method to make sure you're not a bot.
If not, follow the get started guide available in Cloudflare's documentation. You'll want to follow that up until Step 5 where you're adding code into your Worker. Pause there because we're going to add our own.
Setting up the wrangler.toml
My wrangler.toml
(the configuration file for your Worker) looks like this:
AWS_S3_BUCKET_SCHEME
is configurable because some providers (like Google Cloud Storage) allow you to have dots in the bucket name which causes issues since wildcard certificates only cover specific levels. Using HTTP instead of HTTPS mitigates this by bypassing SSL entirely. Try HTTPS if possible, though.
In addition to the variables under the [vars]
blocks, we're going to need to add our AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
. Because we don't want those in plain-text, we're going to put them into secrets.
To do this, use wrangler secret put
to add both AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
.
The Worker itself
Let's have a look through the Worker's code and break it down into segments to explain what each part does.
We're going to use TypeScript and use the Module Worker format - we export a fetch
handler that's called on each request.
The env
parameter that contains our bindings - like environmental variables, secrets and the R2 bucket - needs to be defined, as well as adding in the aws4fetch
library that we use to create the signed URLs that we'll fetch()
to grab the object from S3.
As it stands, we're not offering directory listing and we're only interested in GET
requests as this isn't an endpoint to upload or delete files - so we'll get rid of those requests before continuing on.
.get(object)
on a R2 bucket will return null
if the object doesn't exist so that's how we can check if the object exists in our current R2 bucket. If not, we want to fetch it from S3.
We'll first create a new AwsClient
from the aws4fetch package. Then we need to rewrite the request URL to point to our S3 bucket.
If the original request was at https://cdn.example.com/asset.mp4
and we're using Google Cloud Storage, it'll look a little something like https://bucket.storage.googleapis.com/test.mp4
.
We'll turn that into a AWS4 signed URL and make the request.
If we get a 404
back then we'll display a simple 404
page with a short message rather than returning the XML response you'd usually get.
The body returned by fetch()
is a ReadableStream
and can only be used once - which is problematic since we need to push it to R2 and also return the asset to the user. tee() gives us an array containing two ReadableStream
objects so we pass one to R2 and stream the other in a response to the user.
We pass the original headers from the S3 response into the httpMetadata
property of the R2PutOptions
so that any original Content-Type
, Content-Language
, etc is preserved.
Before returning the R2 object, we'll add the headers from the httpMetadata
object using the writeHttpMetadata
method.
Give it a try!
The code for this Worker will be available at https://github.com/KianNH/cloudflare-worker-s3-to-r2-migration
I've tested it with files upwards of 350MB+ but if you spot any issues or bugs, please open an issue or pull request!
You can also leverage the Cache API of Workers to cache the assets from R2, saving on R2 operations - that'll be added into this Worker in the future.