4 weeks ago
Hello,
if I understood correctly.... using a serverless cluster comes always with the latest runtime version, by default.
Now I need to stick to e.g. runtime version 15.4 for a certain job, which gets deployed via asset bundles. How do I specify/configure the job, so that the serverless cluster provides runtime 15.4 ?
any help highly apprecited ๐
#serverless #assetbundles
4 weeks ago
HI @GeKo,
Serverless versions behave a bit differently than classic runtime versions. With serverless you no longer have control over which cluster runtime is used as it's continuously updated, you do however have control over which API to target and what your base environment looks like by setting the environment version. Take a look at the release notes for serverless versions here.
For notebooks, you need to define this environment directly in the notebook even when they're scheduled to run in jobs. You can see how to do that here.
For non notebook tasks, here's an example of specifying the environment for a task:
resources:
jobs:
example_job:
name: example job
tasks:
- task_key: example_task
...
environment_key: some_environment_key
environments:
- environment_key: some_environment_key
spec:
client: "2"
4 weeks ago
Hi @GeKo
You're correct that serverless clusters typically default to the latest runtime version, but you can specify a particular runtime version for your jobs.
The exact method depends on your platform, but here are the common approaches:
Databricks (Most Common for Asset Bundles)
In your bundle configuration (databricks.yml or job definition):
For serverless compute specifically:
4 weeks ago
Hi @lingareddy_Alva - The second DAB example you provided is not valid. I believe the LLM you used to generate this code may have hallucinated.
4 weeks ago
Hi @lingareddy_Alva ,
many thanks for answering!
The serverless "solution" you provided is unfortunately just a plain response from chatgpt/gemini/etc .... I tried that as well and the AI response is pure nonsense .... as also commented by jesseryoung
4 weeks ago
HI @GeKo,
Serverless versions behave a bit differently than classic runtime versions. With serverless you no longer have control over which cluster runtime is used as it's continuously updated, you do however have control over which API to target and what your base environment looks like by setting the environment version. Take a look at the release notes for serverless versions here.
For notebooks, you need to define this environment directly in the notebook even when they're scheduled to run in jobs. You can see how to do that here.
For non notebook tasks, here's an example of specifying the environment for a task:
resources:
jobs:
example_job:
name: example job
tasks:
- task_key: example_task
...
environment_key: some_environment_key
environments:
- environment_key: some_environment_key
spec:
client: "2"
4 weeks ago
Many thanks @jesseryoung ,
that's exactly what I was looking for ๐
4 weeks ago
Hi @jesseryoung ,
one quick followup question. I am playing aroung with the "environment" property in job config, and I have the following:
resources:
jobs:
serverless_testjob:
name: serverless_testjob
tasks:
- task_key: sample_script_15_4
spark_python_task:
python_file: /Workspace/Users/gerd/get_version.py
environment_key: serverless_15_4
- task_key: sample_script_14_3
spark_python_task:
python_file: /Workspace/Users/gerd/get_version.py
environment_key: serverless_14_3
queue:
enabled: true
environments:
- environment_key: serverless_14_3
spec:
client: "1"
- environment_key: serverless_15_4
spec:
client: "2"
performance_target: STANDARD
both tasks are executing the same python script, but with different environments. The script itself is pretty simple and looks like:
import sys
from pyspark.sql import SparkSession
python_version = sys.version
print(f"The current Python version is: {python_version}")
spark = SparkSession.builder.getOrCreate()
spark_version = spark.version
print(f"Apache Spark Version: {spark_version}")
current_version_info = spark.sql("SELECT current_version()").collect()[0][0]
dbr_version = current_version_info['dbr_version']
print(f"Databricks Runtime Version: {dbr_version}")
Now I am wondering why both tasks output for "Databricks Runtime Version" is => "Databricks Runtime Version: 16.4.x-photon-scala2.12" ?!?!
According the the serverless release notes regarding environments (LINK), I expected a different runtime for each environment. The output of both tasks only show differences in the used python version :
output task sample_script_15_4
The current Python version is: 3.11.10 (main, Sep 7 2024, 18:35:41) [GCC 11.4.0]
Apache Spark Version: 3.5.2
Databricks Runtime Version: 16.4.x-photon-scala2.12
output task sample_script_14_3
The current Python version is: 3.10.12 (main, Feb 4 2025, 14:57:36) [GCC 11.4.0]
Apache Spark Version: 3.5.2
Databricks Runtime Version: 16.4.x-photon-scala2.12
This combination of versions in the outputs doesn't make any sense to me. I mean...I expected a different python version in the different environments, but runtime 16.4 is unexpected and should be contained only in environment version "3".
What is going wrong here ?
4 weeks ago
With serverless, changing the "version" doesn't actually change the cluster runtime, it just makes sure that your code doesn't break over time while allowing Databricks to keep the cluster runtime up to date with the latest changes. Serverless makes use of Spark Connect to break up the cluster runtime dependencies (DBR) and client dependencies (serverless environments.)
4 weeks ago
๐
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now