You are correct—on Databricks Runtime 15.4 and with shared clusters (or clusters enabled with Unity Catalog), you will see the [JVM_ATTRIBUTE_NOT_SUPPORTED] error when trying to directly access sparkContext attributes that are only available in single-user cluster modes. This means sc.setJobGroup(), sc.setJobDescription(), and sparkSession.sparkContext.setLocalProperty() are disabled in this mode. Below are your requested clarifications and alternatives.
Naming Jobs, Tasks, and Stages in Databricks 15.4 (Shared Cluster / Unity Catalog)
Job and Task Naming: What Works and What Doesn't
| Method |
Availability (Shared Cluster/Unity Catalog) |
How to Use / Alternatives |
sc.setJobGroup() |
❌ Not Supported |
Not possible |
sc.setJobDescription() |
❌ Not Supported |
Not possible |
sc.setLocalProperty('spark.job.description', ...) |
❌ Not Supported |
Not possible |
DataFrame.alias("example_df") |
✅ Supported |
See effect below |
DataFrame.writeStream.queryName() |
✅ Supported |
See example below |
How to Add Identifiable Names in Spark UI
Rather than the methods that directly use sparkContext, use these supported options:
1. Naming Structured Streaming Queries in Spark UI
Use .queryName() to set a descriptive name for your streaming query. This will appear in the Spark UI under the “active streaming queries.”
Example:
streaming_df.writeStream \
.queryName("ParquetToDelta_Stream") \
.foreachBatch(process_batch) \
.start() \
.awaitTermination()
This will name the job “ParquetToDelta_Stream” in the streaming tab of Spark UI.
2. Effect of df.alias("example_df") in Spark UI
-
The .alias("example_df") method only sets a logical alias for the DataFrame used in SQL expressions and does not affect job, stage, or DAG block naming in the Spark UI. It is most helpful for SQL readability and debugging optimizer plans, not for UI description.
3. Block Descriptions in the DAG
-
There is currently no direct API for user-defined block descriptions in the DAG on Databricks when using shared clusters. Block/stage names are inferred from the operations (e.g., "Project", "Aggregate", "Filter").
-
For more explicit names in the UI, break your code into small, well-named functions—thereby making your code easier to correlate with the Spark UI, though the block labels themselves are not user-customizable in shared clusters.
Best Practice Example for Databricks 15.4 (Shared Cluster Mode)
from pyspark.sql import SparkSession
def stream_parquet_to_delta(s3_path, delta_table_path):
spark = SparkSession.builder.appName("StreamToDeltaExample").getOrCreate()
streaming_df = spark.readStream.format("parquet").load(s3_path)
def process_batch(df, batch_id):
# logic here...
df.write.format("delta").mode("append").save(delta_table_path)
# Set a meaningful name for the streaming query in Spark UI
streaming_df.writeStream \
.queryName("ParquetToDelta_Stream") \
.foreachBatch(process_batch) \
.start() \
.awaitTermination()
stream_parquet_to_delta("s3://my-bucket/streaming-data", "/delta-table/streaming_data")
-
This query will have the name “ParquetToDelta_Stream” in the Streaming tab of the Spark UI.
-
The code structure assists with relating code blocks to the DAG plan, but block descriptions are not directly moddable in the UI under shared clusters.
Summary Table: What Is and Isn't Possible
| Want to Set... |
Supported? (Shared Cluster) |
How/Alternative |
| Job Group Name |
No |
Not supported |
| Job Description |
No |
Not supported |
| Stage Description |
No |
Not supported |
| Query Name (Streaming) |
Yes |
Use .queryName() on writeStream |
| Table/View Name |
Yes |
Create temporary views with createOrReplaceTempView() |
| DataFrame Alias |
Yes |
Use .alias(), but only affects SQL plans |