What Category Theory Teaches Us About DataFrames

Every dataframe library ships with hundreds of operations. pandas alone has over 200 methods on a DataFrame. Is pivot different from melt? Is apply different from map? What about transform, agg, applymap, pipe? Some of these seem like the same operation wearing different hats. Others seem genuinely distinct. Without a framework for telling them apart, you end up memorizing APIs instead of understanding structure.

Read More

An introduction to program synthesis

Introduction

This post kicks off a hands-on series about program synthesis—the art of teaching machines how to generate code. We’ll build a tiny, FlashFill-style synthesiser that learns to turn strings like “Joshua Nkomo” into “J. Nkomo” from input/output pairs. We’ll see how to define a tiny string-manipulation language, write an interpreter, and search the space of programs to find one that solves our toy problem.

Read More

Benchmarking Haskell dataframes against Python dataframes

I’ve been working on a dataframe implementation in Haskell for about a year now. While my focus has been on ergonomics the question of performance has inevitably come up. I haven’t made significant performance investments but I thought it might be worth snapshotting the performance to establish a baseline.

Read More