Inside Base64: Why Encoding Makes Files Bigger
The Hidden Tax on Every Email Attachment
Sometime in the late 1980s, engineers building the early internet ran into a problem that sounds almost quaint today: they needed to send binary files — images, executables, audio clips — across networks designed exclusively for plain text. The solution they devised was elegant, widely adopted, and quietly annoying. It made every file it touched roughly a third larger.
That solution was Base64 encoding, and it is still everywhere. It shows up in email attachments, JSON APIs, CSS stylesheets, JWTs, TLS certificates, and a dozen other places you interact with every day without realizing it. Understanding why it inflates file sizes — and when that inflation is worth tolerating — turns out to be a surprisingly useful piece of knowledge for anyone who moves data around for a living.
Binary Data in a Text World
Computers store everything as binary: sequences of ones and zeros organized into bytes, where each byte holds eight bits and can represent 256 different values (0 through 255). Most of those 256 values are perfectly harmless — they map to letters, digits, punctuation. But some of them are control characters: signals that tell terminals to clear the screen, beep, end a transmission, or interpret what follows as a command rather than content.
Early email protocols, rooted in SMTP specifications written when the internet was still called ARPANET, transmitted data as 7-bit ASCII text. Send a byte with a value above 127, or worse, a control character, and you risked corrupting the entire message. The protocol would see a stray character and lose its mind trying to interpret it.
Base64 solves this by translating binary data into a character set so boring that no protocol in existence can misread it: the 64 characters A through Z, a through z, 0 through 9, plus + and /. (Some variants use - and _ instead of those last two, for URLs.) Every byte you receive from a Base64 stream is guaranteed to be one of those characters. Nothing ambiguous. Nothing dangerous.
The Math Behind the 33% Overhead
Here is where the size penalty comes from, and it is just arithmetic once you see it.
Base64 takes three bytes of input — 24 bits total — and converts them into four characters of output. Each output character represents six bits (because 2^6 = 64, which is exactly how many options are in the character set). Four characters times six bits each equals 24 bits, which matches the 24 bits we started with. The information content is preserved exactly; only the representation changes.
But those four characters each take up one byte in the output stream. So three input bytes become four output bytes. That ratio — 4 divided by 3 — works out to 1.333, meaning every file encoded in Base64 grows by exactly one third of its original size, plus or minus a small amount of padding (the = characters you see at the end of Base64 strings, used when the input isn't evenly divisible by three bytes).
A 1 MB image becomes roughly 1.37 MB when Base64-encoded. A 10 MB PDF attachment becomes nearly 13.7 MB. Across millions of emails or API calls per day, that overhead accumulates into something worth thinking carefully about.
Where the Tradeoff Is Worth It
Despite the cost, Base64 remains ubiquitous because the alternative — transmitting raw binary over text-based channels — is simply not viable in many contexts. The question isn't whether you like the overhead; it's whether the channel you're using demands the encoding.
Email attachments are the canonical use case. MIME, the standard that lets email carry attachments, specifies Base64 as one of its content transfer encodings. Your email client Base64-encodes attachments before handing them to SMTP, and decodes them on the receiving end. Users never see the encoded blob; they just see the attachment open cleanly. The 33% penalty is the cost of interoperability across every email server on earth, and almost nobody argues it isn't worth paying.Embedding data directly in HTML and CSS is another case where Base64 earns its keep. A small icon embedded as a Base64-encoded data URI inside a stylesheet eliminates an HTTP request entirely. On sites serving hundreds of thousands of pages per day, removing even a handful of small requests per page load can meaningfully improve performance. The image gets bigger, but the total number of round trips shrinks. Web performance engineers weigh this tradeoff constantly — generally, the rule of thumb is that images under about 10 KB benefit from inlining, while larger images are better served as separate files.
JSON APIs routinely use Base64 when they need to include binary payloads. JSON is a text format; it cannot represent raw binary directly. If you want to send an image thumbnail inside a JSON response, you encode it as a Base64 string and include that string as a value. The overhead is annoying, and large-scale APIs often work around it by sending a URL instead of the image data itself, but for small blobs embedded inline, Base64 is the pragmatic choice.
JWTs (JSON Web Tokens) use a URL-safe variant called Base64url for their header and payload sections. Since JWTs are meant to be passed in HTTP headers and URL parameters — which have their own restricted character sets — the encoding serves a dual purpose: it makes binary-friendly data URL-safe and keeps the token as a single unbroken string.
Where the Tradeoff Is Not Worth It
The contexts where Base64 earns justified criticism are those where people reach for it out of habit rather than necessity.
- Large file uploads to object storage: If you're accepting image uploads and then Base64-encoding them before storing in S3 or a database, you're paying a 33% storage premium indefinitely. Binary transfer to object storage is well-supported; there's no reason to encode. Some developers do this accidentally because they've seen Base64 in APIs and assumed it was required.
- Internal microservice communication: Services talking to each other over an internal network, using protocols like gRPC that handle binary natively, gain nothing from Base64 encoding their payloads. It just adds CPU cycles for encoding and decoding, plus the bandwidth overhead.
- Database storage of file contents: Storing Base64-encoded content in a text column instead of using a proper binary column (BLOB, BYTEA, etc.) wastes space and makes queries slower. It's a pattern that surfaces often in codebases where the developer wasn't sure how to handle binary data and chose what looked familiar.
The CPU Cost People Forget to Mention
File size isn't the only cost. Base64 encoding and decoding consume CPU time, and at scale that becomes relevant. The operations are fast — modern processors can Base64-encode several hundred megabytes per second — but "fast" and "free" aren't the same thing. A service encoding millions of small images per hour is spending real compute budget on a transformation that might be avoidable.
Browsers, in particular, pay this cost silently. When a CSS file contains dozens of Base64-encoded background images, the browser decodes them all at parse time. On a low-end device or an older mobile phone, that adds measurable time to the page rendering pipeline. Performance-conscious frontend teams audit their stylesheets for bloated data URIs the same way they audit JavaScript bundle sizes.
Alternatives That Have Emerged
The internet has developed several ways to reduce reliance on Base64 where it isn't necessary. HTTP/2 and HTTP/3 multiplexing over a single connection reduces the penalty for multiple small file requests, weakening the argument for inlining images as data URIs. Multipart form data lets HTTP clients upload binary files without encoding them. Protocol Buffers and other binary serialization formats sidestep the JSON-requires-text problem for internal APIs.
None of these render Base64 obsolete — they just narrow the set of situations where it's the best tool. For the problems it was designed to solve, the encoding remains as relevant now as it was in 1987.
Practical Takeaways
- Expect exactly 33% overhead whenever you Base64-encode a file, plus a few bytes of padding. This is deterministic; there's no way to reduce it without changing the encoding itself.
- Check whether the channel actually needs it. If the transport layer supports binary natively (gRPC, binary WebSocket, direct file upload), encoding is waste.
- For embedded web assets, size matters. Below 10 KB, the eliminated HTTP request usually wins. Above that, the bandwidth cost of the larger asset tends to outweigh the benefit.
- Use the right database column type. Binary data belongs in binary columns. Base64 in a text field is almost never the right answer.
Base64 is one of those technologies that feels like a kludge until you understand the constraint it was solving. It is a kludge — an intentional, carefully designed one that trades storage and bandwidth for compatibility across the full range of text-only protocols that the internet was built on. The overhead is real, and it's worth tracking. But for the problems it was built to solve, there's still nothing cleaner.