cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks Runtime 16.4 LTS has inconsistent Spark and Delta Lake versions

Angus-Dawson
New Contributor III

Per the release notes for Databricks Runtime 16.4 LTS, the environment has Apache Spark 3.5.2 and Delta Lake 3.3.1:

https://docs.databricks.com/aws/en/release-notes/runtime/16.4lts

However, Delta Lake 3.3.1 is built on Spark 3.5.3; the newest version of Delta Lake compatible with Spark 3.5.2 is Delta Lake 3.2.0.

Whatever custom modifications to Spark and Delta Lake have been done behind the scenes to enable this, it makes it impossible to build an equivalent environment for local development and testing. This is not what one expects from an LTS version.

4 REPLIES 4

Rjdudley
Honored Contributor

We saw the same thing in previous runtime versions, and even a point-point version broke our code.  We actually log the spark version in one pipeline and see different versions popping up from time to time.  Apparently the long term goal is to move to "versionless runtimes" so you don't know what you're using, but the execution environment will be monitored for errors and rolled back if errors are detected.

SamAdams
Contributor

@Angus-Dawson encountered the same and used an override (like a pip constraints.txt file or PDM resolution override specification) to make sure my local development environment matched the runtime.

saurabh18cs
Honored Contributor II

Hi @Angus-Dawson 

  • Use  Databricks Connect for local development/testing against a remote Databricks cluster—this ensures your code runs in the actual Databricks environment and databricks managed dbrs which are different from open-source versions((DBR) versions often include custom builds and backports of Spark and Delta Lake).
  • Always validate and test on a real Databricks cluster before deploying to production.

Hubert-Dudek
Esteemed Contributor III

Exactly asd @saurabh18cs wrote. Databricks is not equal to spark+delta. If you want to perform real tests in a local environment, simply use Databricks Connect and install the matching version of Python in your virtual environment (venv). However, the local Python version, etc., will be automatically advised if you are using, for example, a VSCode extension.