All writing

~/writing/python-to-rust-indexer

Systems debugging
5 min read

The Python version worked. I rewrote it in Rust anyway.

Porting a working chain indexer from Python to Rust, where the easy parts get harder and the vague parts get found. Most of what the rewrite bought wasn't speed, it was bugs that stopped being possible to write.

Same project as before: a friend made the art during the NFT years, and I wrote the backend that answered who owns what. I wrote the first version in Python. It worked. It ran the site through the whole thing.

Then I rewrote it in Rust. Not for a benchmark I can wave around, I never ran a clean one and I'm not going to invent one. I rewrote it because porting made me say, in types, the things Python had let me leave vague, and a couple of those vague spots were bugs that had been there the whole time.

How big is the number

In Python an integer is arbitrary precision and free. I added token values together and never once thought about width. Rust makes you think about it on line one, and on-chain that question has a specific and annoying answer: token ids and amounts are uint256. Nothing Postgres offers holds 256 bits, so the indexer stores them as JSON arrays of decimal strings and the application does the math in a type that understands the width.

The balance walk has a subtler trap, and it's the kind Python hides. A balance is unsigned, you can't own negative tokens. But you reconstruct it by replaying transfers, adding on the way in and subtracting on the way out, and the events don't arrive sorted by your wallet. A transfer out can land before the matching transfer in, so the running total dips below zero mid-walk before it climbs back. Accumulate that in an unsigned integer and it underflows. So the scratch type is deliberately signed, and the sign only gets resolved at the end:

src/backend/queries.rs
let mut balances: HashMap<u64, i64> = HashMap::new();
for event in events {
    for (&id, &value) in ids.iter().zip(values.iter()) {
        // signed scratch: an out-before-in can go temporarily negative
        if to == wallet   { *balances.entry(id).or_insert(0) += value; }
        if from == wallet { *balances.entry(id).or_insert(0) -= value; }
    }
}
balances.retain(|_, &mut v| v > 0); // unsigned answer, resolved once at the end

The collection-wide query carries the other piece of received wisdom: mints come from the zero address and burns go to the dead address, and if you count either as a holder your supply is wrong. They get excluded in the WHERE clause, not patched up afterward.

src/backend/queries.rs
const DEAD_ADDRESS: &str = "0x000000000000000000000000000000000000dEaD";
const ZERO_ADDRESS: &str = "0x0000000000000000000000000000000000000000";

Two programs, one project

The indexer and the API are different jobs. One crawls the chain and writes the database, the other reads the database and answers HTTP. But they share the same Event type and the same database module, and I didn't want two repos drifting apart. Rust does this with one workspace and two binaries under src/bin, sharing a library crate:

afterlife-backend layout
src/
  lib.rs            // pub mod common; pub mod indexer; pub mod backend;
  bin/indexer.rs    // crawls chain -> Postgres
  bin/backend.rs    // warp API over the same tables
  common/           // Event, database, file_loader: shared by both

This is also where I lost an afternoon as a Rust beginner. The module system does not care what you think is obvious. pub mod common in lib.rs wants the code in common.rs or common/mod.rs, the binaries in bin/ reach shared code through the crate name and not a relative path, and the compiler will not let you hand-wave any of it. Once it builds it stays built, which is the trade.

I asked the wrong question for two days

The API serves a lot of small JSON files: per-token metadata, read over and over. So I went looking for the fastest way to read a file in Rust and got stuck on BufReader versus read_to_string for an embarrassingly long time.

It was the wrong question. BufReader helps when you do many small reads from one handle; for slurping a whole small file in one shot it's noise. The cost wasn't the read call, it was reading the same files thousands of times. The fix was a cache, and the only real decision was bounding it so serving metadata couldn't eat all the memory on the box:

src/common/file_loader.rs
pub async fn read_file(path: &Path) -> io::Result<String> {
    let mut buf = String::new();
    BufReader::new(File::open(path).await?).read_to_string(&mut buf).await?;
    Ok(buf)
}
// behind an LruCache with a fixed capacity, shared across handlers

Sharing that cache across warp handlers produced the most Rust error of all Rust errors: future cannot be sent between threads safely. The cache had to be Send + Sync to cross await points in an async handler, which meant an Arc<Mutex<…>> and a once_cell for the global, not the plain HashMap I'd reached for first. Python would have let me staple a dict to a module and move on, and it also would have let me leak memory until the box fell over.

What the rewrite actually bought

Not a number. The honest return is narrower and duller than a speedup: a malformed batch can't become a database row because the length check lives in the constructor, half a block range can't commit because the write is one transaction, and the cache ceiling is a value I picked instead of a surprise I'd meet in production. The whole thing compiles to a single static binary that I cross-compiled to ARM64 from an amd64 laptop and left running on a small box. That's the pitch.

The code is on GitHub.