โ06-04-2025 02:34 PM
Per the release notes for Databricks Runtime 16.4 LTS, the environment has Apache Spark 3.5.2 and Delta Lake 3.3.1:
https://docs.databricks.com/aws/en/release-notes/runtime/16.4lts
However, Delta Lake 3.3.1 is built on Spark 3.5.3; the newest version of Delta Lake compatible with Spark 3.5.2 is Delta Lake 3.2.0.
Whatever custom modifications to Spark and Delta Lake have been done behind the scenes to enable this, it makes it impossible to build an equivalent environment for local development and testing. This is not what one expects from an LTS version.
โ06-05-2025 08:16 AM
We saw the same thing in previous runtime versions, and even a point-point version broke our code. We actually log the spark version in one pipeline and see different versions popping up from time to time. Apparently the long term goal is to move to "versionless runtimes" so you don't know what you're using, but the execution environment will be monitored for errors and rolled back if errors are detected.
โ10-02-2025 03:23 PM
@Angus-Dawson encountered the same and used an override (like a pip constraints.txt file or PDM resolution override specification) to make sure my local development environment matched the runtime.
โ10-03-2025 07:49 AM - edited โ10-03-2025 07:50 AM
โ10-03-2025 01:21 PM
Exactly asd @saurabh18cs wrote. Databricks is not equal to spark+delta. If you want to perform real tests in a local environment, simply use Databricks Connect and install the matching version of Python in your virtual environment (venv). However, the local Python version, etc., will be automatically advised if you are using, for example, a VSCode extension.
2 weeks ago
Okay, so then it is meaningless to put Spark and Delta Lake versions in your runtime specifications.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now