Pandas feels clunky coming from R. What about Haskell?

Some years ago I came across an issue in the Frames repo that mentioned a blog post titled “Why pandas feels clunky when coming from R.” The article showed a side-by-side of simple data exploration in R and compared the code to Pandas. At the time, the author concluded that Pandas was “clunkier” than R. The author operationalises the definition of clunkiness but I think it’s really more of a you-know-it-when-you-see-it thing. You can feel if an API is making you drift further away from your task and making you think more about the tool and its idiosyncracies.

Read More

What Category Theory Teaches Us About DataFrames

Every dataframe library ships with hundreds of operations. pandas alone has over 200 methods on a DataFrame. Is pivot different from melt? Is apply different from map? What about transform, agg, applymap, pipe? Some of these seem like the same operation wearing different hats. Others seem genuinely distinct. Without a framework for telling them apart, you end up memorizing APIs instead of understanding structure.

Read More

An introduction to program synthesis

Introduction

This post kicks off a hands-on series about program synthesis—the art of teaching machines how to generate code. We’ll build a tiny, FlashFill-style synthesiser that learns to turn strings like “Joshua Nkomo” into “J. Nkomo” from input/output pairs. We’ll see how to define a tiny string-manipulation language, write an interpreter, and search the space of programs to find one that solves our toy problem.

Read More

Benchmarking Haskell dataframes against Python dataframes

I’ve been working on a dataframe implementation in Haskell for about a year now. While my focus has been on ergonomics the question of performance has inevitably come up. I haven’t made significant performance investments but I thought it might be worth snapshotting the performance to establish a baseline.

Read More