Is the great dataframe showdown finally over? Enter: polars

Every dataframe library that came after pandas promised an expressive API and better performance, yet none replaced the fluffy bamboo-eater 🐼 Or has it? Enter polars: will its multi-threaded, in-memory query engine be enough to dethrone the king? Come and learn to tame this new artic beast!

We all love pandas - at least as much as we are aware of its limitations. No, I am not talking about the “setting a view versus a copy” warning - we’re talking performance.

Eager execution plays nice in notebooks, but is a burden in production. Moreover, its single threaded nature limits significantly its scaling capabilities. Improving on pandas seems easy - in fact, there are multiple, successful libraries out there. Yet pandas is still ubiquitous. To be fair, it set an incredibly high bar, also thanks to its expressive, high-level API.

No one ever made a mystery of pandas limitations. Even its creator, Wes McKinney, said that “my rule of thumb for pandas is that you should have 5 to 10 times as much RAM as the size of your dataset”, while the docs clearly recommend that when performance is a issue “it’s worth considering not using pandas“.

Well, a new contestant is getting traction, and it looks like it might end the great dataframe showdown: polars, an artic beast with a blazingly fast multi-threaded query engine, written in rust.

polars supports both lazy and eager evaluation, as well as larger-than-memory (streaming) data processing. It leverages Apache Arrow’s columnar format, and offers zero-copy to and from pandas or numpy arrays. It also supports reading from Delta tables. Not only polars packs quite a punch in terms of performance, but also offers an intuitive and elegant API, addressing some problems with pandas expressions using a familiar and pythonic syntax.

Let’s be clear: polars is here to stay. And maybe you’d better know how to tame it.

📍 Keynote outline

pandas: when’s the time to look out for an alternative?
polars: what’s the use case?
polars and pandas: differences, similarities and benchmarking
polars: lazy evaluation and streaming APIs
polars: working with Delta tables

Is the great dataframe showdown finally over? Enter: polars

Sunday, May 28

14:30 - 15:00

Luca Baggi

Stay tuned!