Media Processing
Background
snug supports modern media formats, does full-text search, etc. It’s multitenant, which means it can serve multiple websites from various content directories.
Media Processing Overview
“Revisions” are a set of input files, like markdown (.md), sass, etc. but also media assets like .jxl files and .mp4 files.
Media assets are rarely served in their “original” / “input” format: they are usually “derived”. Derivations have inputs (like the hash of the input .jxl file, for example, and also a hash for a “pipeline”, which represents encoding settings, width, etc.), and an output.
Given a DerivationKind
and a Input
, one knows everything that’s needed
to produce the output: maybe it involves loading a JXL, resizing it, and
re-encoding it as a PNG.
Derivations have output URLs that depend on their hashes.
Image Example
- We have an article at
/content/articles/hello/index.md
- It refers to image
blah.jxl
- The input is actually at
/content/articles/hello/blah.jxl
That .jxl
input will generate multiple derivations, one for each width
and output format we want. This is happening in load_pak
in crates/snug/src/site/revision.rs
.
Let’s say that article is served on tenant fasterthanli.me
. The URL for
the article will be https://fasterthanli.me/articles/hello
.
The markup generated for the image will refer to various output formats,
like WebP, AVIF and JXL, and will look something like https://cdn.fasterthanli.me/articles/hello/blah~[hash].avif
where [hash]
is the derivation hash.
Derivation inputs are stored in object storage, and so are derivation outputs.
How are Derivations Served?
CDN edge nodes run the crates/snug
app — they never run derivations by
themselves. It is costly to transcode images, and CDN edge nodes are usually
small virtual machines. They have a copy of the revision pak in memory, so
they know which inputs are there, which revision routes exist, etc., but
they never do any transcoding.
Instead, they communicate with the mom service (crates/mod-mom
) over
HTTP.