A first look at token efficiency

A while ago I saw the article Which languages are most token efficient. The article was largely discredited since it didn’t have a clear methodology. Unfortunately, I haven’t come up with one yet but I thought to do a small, anecdotal experiment comparing code generation in Haskell notebooks (with Sabela) vs Python notebooks (Marimo). Notebooks are a great environment to use with agents since they are modular by design and you can create great agent APIs on top of them. It’s also much easier to intervene as the agent is working.

Pandas feels clunky coming from R. What about Haskell?

Some years ago I came across an issue in the Frames repo that mentioned a blog post titled “Why pandas feels clunky when coming from R.” The article showed a side-by-side of simple data exploration in R and compared the code to Pandas. At the time, the author concluded that Pandas was “clunkier” than R. The author operationalises the definition of clunkiness but I think it’s really more of a you-know-it-when-you-see-it thing. You can feel if an API is making you drift further away from your task and making you think more about the tool and its idiosyncracies.

Grow and mow: interpretable models with boosting, symbolic regression and e-graphs

This post is the convergence of two ideas that have been floating in my head for about a year. Can we learn messy stochastic models and use algorithmic/algebraic tools to rein in model complexity to make models interpretable?

Type-level programming is still programming

I was showing a friend the typed dataframe API. The whole pitch was: look, you derive a schema from your data, and then the compiler catches column name typos, type mismatches, all the stuff that would otherwise blow up at runtime. I had a nice demo ready using the Kaggle credit card fraud dataset (about 284,000 rows, 31 columns).

What Category Theory Teaches Us About DataFrames

Every dataframe library ships with hundreds of operations. pandas alone has over 200 methods on a DataFrame. Is pivot different from melt? Is apply different from map? What about transform, agg, applymap, pipe? Some of these seem like the same operation wearing different hats. Others seem genuinely distinct. Without a framework for telling them apart, you end up memorizing APIs instead of understanding structure.

Learning better decision tree splits - LLMs as Heuristics for Program Synthesis

A lot of tabular modeling gets easier the moment you stumble onto the right derived quantity. Not something mysterious or “deep.” It’s usually something you can name: a ratio that turns two raw columns into a rate; a difference that becomes a margin; a simple count that captures what a bunch of messy fields were hinting at.

Installing docker on a Chromebook

I couldn’t find any instructions online so I thought I’d post them here for anyone who goes through a similar struggle.

An introduction to program synthesis (Part II) - Automatically generating features for machine learning

Introduction

This post kicks off the second part of a hands-on series about program synthesis. We’ll apply the previously explored technique (an enumerative bottom-up search) to a slightly more realistic problem: automatically generating features for the Iris dataset.

Progress towards Kaggle-style workflows in Haskell

There’s been a lot of work in the Haskell ecosystem that has made it easier to write interactive Kaggle-like scripts. I’d like to showcase the synergy between 3 such tools: dataframe (my own creation), hasktorch, and IHaskell.

An introduction to program synthesis

Introduction

This post kicks off a hands-on series about program synthesis—the art of teaching machines how to generate code. We’ll build a tiny, FlashFill-style synthesiser that learns to turn strings like “Joshua Nkomo” into “J. Nkomo” from input/output pairs. We’ll see how to define a tiny string-manipulation language, write an interpreter, and search the space of programs to find one that solves our toy problem.

Rewriting dataframes for MicroHs

My fondness for alternative Haskells

Benchmarking Haskell dataframes against Python dataframes

I’ve been working on a dataframe implementation in Haskell for about a year now. While my focus has been on ergonomics the question of performance has inevitably come up. I haven’t made significant performance investments but I thought it might be worth snapshotting the performance to establish a baseline.