Benchmarking Haskell dataframes against Python dataframes

I’ve been working on a dataframe implementation in Haskell for about a year now. While my focus has been on ergonomics the question of performance has inevitably come up. I haven’t made significant performance investments but I thought it might be worth snapshotting the performance to establish a baseline.

A benchmark for the impatient

I’ll start by adopting a small benchmark from the C++ dataframe implementation. The benchmark generates a dataframe with three columns containing random numbers, runs some statistics on the columns, and finally filters elements satisfying a predicate.

Because all these operations are effectively vector/array traversals this benchmark tests the underlying array implementation more than it tests dataframe specific operations (such as groupBy, pivot, melt, etc). But this is a good starting point since slow arrays mean slow everything. I used Haskell’s Criterion library to generate the benchmark stats. The Python implementation has the slight overhead of spawning a separate process so there are some error bars in the performance numbers. The code can be found here.

I used three values of n: 100M, 150M and 300M. Polars and Pandas, on my machine, HP chromebook dragonfly g4 (32GB RAM) Polars and Pandas ran out of memory so there were only two data points.

Size	Haskell	Polars	Pandas
O(100M)	6.343s	6.607s	9.874s
O(150M)	9.063s	10.607s	17.65s
O(300M)	15.260s	N/A	N/A

The performance is comparable to Polars and 2x faster than Pandas. But, again, these are simple scans. Let’s compare groupBy operations since those are a little more complicated.

Comparing groupBy

We’ll use the California housing dataset. At a high level the code will do the following: read a CSV file, group by ocean_proximity then compute the minimum and maximum median_house_value.

Haskell	Polars	Pandas
119ms	227ms	376ms

We see a similar trend as before. The Polars performance might actually be much closer to Haskell because of the process overhead. But, again, this is comparable.

I’ll update these benchmarks as I add more complex operations. Also, as focus shifts more on performance I’d be curious to see how each performance knob changes these numbers.

Some interesting future projects for performance:

Parallelism: this is all single-threaded.
Vectorization/hardware optimiztion: exploit SIMD-supporting containers e.g. HaskTorch tensors, blas etc.

Written on July 17, 2025