Cloudflare's Transparent Decompression - how does it work?

I've seen a few topics or questions around compressed files being downloaded as plain-text or missing Content-Length headers recently so I thought it'd be a good idea to write about why this happens and how the behaviour is controlled.

Cloudflare is, at it's core, a Content Delivery Network - but more recently has a wealth of developer-focused products such as Workers for serverless execution or Stream for video delivery.

Reducing bandwidth to your origin is one of the main goals of a CDN and caching plays a big role - but so does compression. In Cloudflare's case they will, if your webserver supports it, always request the compressed content.

If you're already using gzip we will honor your gzip settings as long as you're passing the details in a header from your web server for the files.
[Reference]

How can they default to pulling compressed content if not all clients are capable of decompressing that content? That's where transparent decompression comes into play!

Accept-Encoding

In the event that a client hasn't indicated support for compressed content with the Accept-Encoding header, it will be decompressed on-the-fly - this is also the case for Cloudflare R2, their competitor to S3 and other object storage solutions.

Here's a simple Cloudflare Worker that'll return a gzip compressed response, for our testing:

export default <ExportedHandler> {
  async fetch() {
    // fetch a random uuid
    const data = await fetch("https://httpbin.org/uuid")

    // compress the data
    const compressed = data.body.pipeThrough(
      new CompressionStream('gzip')
    );

    // return the compressed response
    return new Response(compressed, {
      headers: {
        'content-encoding': 'gzip'
      },
      // tell Cloudflare not to compress as we already have
      encodeBody: "manual"
    })
  },
};

Despite being a compressed response, a plain curl http://localhost:8787 will allow us to get the plain-text response. This is because of transparent decompression.

➜  ~ curl http://localhost:8787
{
  "uuid": "58834aa8-c529-43c1-bc16-ae5b2b97daaa"
}

Let's indicate that we have support for gzip with the Accept-Encoding header and see what we get back.

➜  ~ curl http://localhost:8787 --header 'accept-encoding: gzip' -o test.gz
➜  ~ file test.gz
test.gz: gzip compressed data, from Unix, original size modulo 2^32 53

There's our data - let's make sure it's not just a garbled mess.

➜  ~ gunzip test.gz && cat test
{
  "uuid": "18b70f66-e84d-4070-8287-d114f4da1293"
}

Transfer-Encoding

You might think that based on what we said earlier, since we're requesting the data as-is by indicating gzip support, we should have a Content-Length header on the response - but this might not always be true either.

Cloudflare - and especially Cloudflare Workers - may go down the route of using chunked transfer encoding when they're returning a HTTP/1.1 response.

For version 1.1 of the HTTP protocol, the chunked transfer mechanism is considered to be always acceptable, even if not listed in the TE request header field
[Reference]
Since the Workers platform always knows whether it has the body size in advance, it automatically chooses whether to use Content-Length vs. Transfer-Encoding: chunked, and it will ignore whatever you specified.
[Reference]

In the event that HTTP/2 is used, Content-Length isn't mandated anyways!

Conclusion

There's numerous reasons why Content-Length may be missing from the response - be it on-the-fly decompression, chunked encoding or it simply not being required for that protocol.

Hopefully this clears up some of the possibly unexpected behaviour when using Cloudflare