Some context: I am completely new to Databricks; have heard good stuff, but also some things that worry me.
One thing that worries me is the performance (and eventual costs) of running Spark with smaller (sub 1TB) datasets. However, one requirement from our architects is "Data Lineage", which is why they are pushing for the use of PySpark.
Soon we will have a Hackathon where I intend to find a way of keeping Data Lineage, but work around Spark if possible, because I've heard that Polars is a much better fit for smaller (sub 200GB datasets) datasets, in the end saving us money and (run)time.
Then there's the required use of Delta Lake, which I'm pretty sure work with Polars.
So my question is: Is it even possible to run a Python application with Polars, on Databricks, while enabling Data Lineage in one way or another, while storing data as/in Deltalake.