This Week in Changelogs: curl
Hey everyone, long time no see!
I started TWiC in 2023, and to be honest, mining diffs manually was exhausting; that's why the project faded away pretty quickly. Today, with a little bit of LLM help and automation, it has become much easier to find hidden gems in modern OSS and bring them to the audience. There's another problem though: sometimes there are too many gems, and I definitely don't want to restart a series of boring longreads.
So today, we're gonna cover recent changes in only one project, (arguably) the most popular library and command-line tool in the world - curl.
Zip bomb protection via delivered-bytes tracking
A zip bomb is a relatively small compressed piece of data that is automatically decompressed on the client side into a giant blob, causing a DoS. One of the ways to protect against it is to set a limit on the incoming data size. Curl has had an option for it, CURLOPT_MAXFILESIZE, since 2003.
There was one small problem: before 8.20.0 it only affected the number of downloaded (compressed) bytes. The good news is that now they calculate delivered (decompressed) bytes separately, extending CURLOPT_MAXFILESIZE to act as zip bomb protection:
/* check that the 'delta' amount of bytes are okay to deliver to the
application, or return error if not. */
CURLcode Curl_pgrs_deliver_check(struct Curl_easy *data, size_t delta)
{
if(data->set.max_filesize &&
((curl_off_t)delta > data->set.max_filesize - data->progress.deliver)) {
failf(data, "Would have exceeded max file size");
return CURLE_FILESIZE_EXCEEDED;
}
return CURLE_OK;
}
Btw, check out the pull request itself: five AI bots and zero people besides the author (Daniel himself) were involved in the code review! And it actually helps a lot, because, well, check out the next one:
API bitmask copy-paste bug

CURLMNWC_CLEAR_DNS and CURLMNWC_CLEAR_CONNS were both defined as (1L << 0) (or just 1), so you could never clear just DNS or just connections - always both.
The bug was introduced mid-2025 (PR, diff), and despite having 3 people involved as code reviewers and 2 approvals, it made it into 8.16.0 to 8.19.0:
Luckily, Codex Security found it, and the author of the original changeset came up with an elegant solution, introducing CURLMNWC_CLEAR_ALL as the new (1L << 0) (since that's what every existing caller was unknowingly doing), and reassigning DNS and CONNS to bits 1 and 2:
if(val & CURLMNWC_CLEAR_ALL)
/* In the beginning, all values available to set were 1 by mistake. We
converted this to mean "all", thus setting all the bits
automatically */
val = CURLMNWC_CLEAR_DNS | CURLMNWC_CLEAR_CONNS;
So yeah, if you can afford to involve complex static analyzers or modern AI-reviewers, you'd better do that, because even seasoned lead devs can miss such small issues.
Missing return
Long story short, there was a helper called return_quote_error(). Thanks to its misleading name, the actual return statement was forgotten before the call, causing segfaults.
This is a classic case of a fix so small you're 100% sure you're right, you skip the review and just merge broken code:
I personally like that apart from fixing this problem, they removed the return_ prefix from the function name to avoid future confusion. Of course, such problems will also be caught by AI checkers, but here's another story:
PowerPC64 endianness AI-slop
A PR from a user named AutoJanitor with a robot avatar and shitloads of contributions since the end of January 2026 added __powerpc64__ to the list of architectures that can use unaligned memory access for MD5/MD4 fast paths with the following description:
Currently the fast path only covers
__i386__,__x86_64__, and__vax__. PowerPC64 (both LE and BE) (sic! - @dima) supports efficient unaligned memory access and should use the same optimization.<...>
Verified on IBM POWER8 S824 (ppc64le) — unaligned 32-bit loads work correctly and produce single
lwzinstructions.
To understand what it means, let's check out the code itself:
/*
* SET reads 4 input bytes in little-endian byte order and stores them
* in a properly aligned word in host byte order.
*
* The check for little-endian architectures that tolerate unaligned memory
* accesses is an optimization. Nothing will break if it does not work.
*/
#if defined(__i386__) || defined(__x86_64__) || \
defined(__vax__) || defined(__powerpc64__)
/* NB ^^^^^^^^^^^^^^^^^^^^^^ */
#define MD4_SET(n) (*(const uint32_t *)(const void *)&ptr[(n) * 4])
/* ... */
So this is an optimization for little-endian architectures, and, as pointed out later, causes undefined behavior.
Anyway, the contribution looked rock-solid, got merged, and a few days later, a real person (the guy has a website with /~sam/ - he knows his shit) came into the original PR, a discussion happened, and the commit was reverted.
The thing is that while PowerPC can run in either endianness, the preprocessor check __powerpc64__ doesn't tell you which one. The optimization above is exclusively for little-endian cases (casting byte arrays directly to uint32_t *, causing unaligned memory access), so on big-endian PowerPC64 it produced wrong results or just failed.
Unfortunately, AI bots have once again ruined things slightly for Daniel Stenberg. Apparently, that's the fate of popular projects these days ¯\_(ツ)_/¯
Thank you, folks! I hope you enjoyed this "episode" of This Week in Changelogs! Next time, fewer AI slop problems and more things to actually learn from.
The Pitfalls of Reading User Input in C: a Story About scanf and Stdin
I recently had to write a piece of C code that takes some input from stdin, ignores the newline, discards whatever exceeds the buffer, and does this repeatedly in a loop.
Knowing something about scanf syntax (man) I came up with this:
#include <stdio.h>
void take_input(void)
{
char buf[20];
printf("> ");
scanf("%19[^\n]", buf);
printf("input=`%s`\n", buf);
}
int main(void)
{
for (int i = 0; i < 5; i++) {
take_input();
}
return 0;
}
(Note: The original code used an infinite loop, but a simple for is enough to demonstrate the behavior.)
When I ran it, the result was surprising:
$ gcc -o main main.c
$ ./main
> hello world↵
input=`hello world`
> input=`hello world`
> input=`hello world`
> input=`hello world`
> input=`hello world`
$ █
It consumed the string once and printed the same value 5 times. Why?
Read moreBuilding a Toy Database: Concurrency is Hard
Building a Toy Database: Learning by Doing
Ever wondered how databases work under the hood? I decided to find out by building one from scratch. Meet Bazoola - a simple file-based database written in pure Python.
Why Build a Toy Database?
As a developer, I use relational databases every day, but I never truly understood what happens when I INSERT or SELECT. Building a database from scratch taught me more about data structures, file I/O, and system design than any tutorial ever could.
Plus, it's fun to implement something that seems like magic!
Read moreThe Cartesian Product Trap in Django ORM
I hit a performance wall at work recently after adding a single extra Count() on a Django queryset that looked harmless. Luckily, the problematic code didn't make it into the production environment, but I had to think hard to figure out what went wrong and find a solution.

