Naming jobs in the Spark UI in Databricks Runtime 15.4

sunnyday — Fri, 23 Aug 2024 09:45:35 GMT

I am asking almost the same question as: https://community.databricks.com/t5/data-engineering/how-to-improve-spark-ui-job-description-for-pyspark/td-p/48959 . I would like to know how to improve the readability of the Spark UI by naming jobs. I am using pyspark.

Because I am running Databricks 15.4, I receive the following message when accessing the sparkContext:

[JVM_ATTRIBUTE_NOT_SUPPORTED] Directly accessing the underlying Spark driver JVM using the attribute 'sparkContext' is not supported on shared clusters. If you require direct access to these fields, consider using a single-user cluster. For more details on compatibility and limitations, check: https://docs.databricks.com/compute/access-mode-limitations.html#shared-access-mode-limitations-on-unity-catalog

Accordingly, I do not think that I can use setJobDescription, setName, and setNameas outlined in that answer. Could you please give an example of naming jobs and tasks, including the python class which is called?

Could you also confirm what should be the effect of using df.alias("example_df") in the Spark UI?

I would consider this question answered with these examples:
- Set the JobGroup name in the Spark UI, either from the driver or the worker.

- Set the Job description in the Spark UI, either from the driver or the worker.

- Set the Stage description in the Spark UI, either from the driver or the worker.

- Set the descriptions on the visual blocks in the Dag visualization pages.

The example code, before annotations are added, could look like the following:

from pyspark.sql import SparkSession def stream_parquet_to_delta(s3_path, delta_table_path‌): # Initialize Spark session spark = SparkSession.builder.appName("StreamToDeltaExample").getOrCreate() # Read streaming data from S3 in Parquet format streaming_df = spark.readStream.format("parquet").load(s3_path) # Define the function to process each micro-batch def process_batch(df, batch_id‌): # Write the micro-batch to Delta table df.write.format("delta").mode("append").save(delta_table_path) # Write streaming data to Delta table using foreachBatch streaming_df.writeStream.foreachBatch(process_batch).start().awaitTermination() # Example usage stream_parquet_to_delta("s3://my-bucket/streaming-data", "/delta-table/streaming_data")

Re: Naming jobs in the Spark UI in Databricks Runtime 15.4

mark_ott — Sun, 16 Nov 2025 17:51:47 GMT

You are correct—on Databricks Runtime 15.4 and with shared clusters (or clusters enabled with Unity Catalog), you will see the [JVM_ATTRIBUTE_NOT_SUPPORTED] error when trying to directly access sparkContext attributes that are only available in single-user cluster modes. This means sc.setJobGroup(), sc.setJobDescription(), and sparkSession.sparkContext.setLocalProperty() are disabled in this mode. Below are your requested clarifications and alternatives.

Naming Jobs, Tasks, and Stages in Databricks 15.4 (Shared Cluster / Unity Catalog)

Job and Task Naming: What Works and What Doesn't

Method	Availability (Shared Cluster/Unity Catalog)	How to Use / Alternatives
`sc.setJobGroup()`	❌ Not Supported	Not possible
`sc.setJobDescription()`	❌ Not Supported	Not possible
`sc.setLocalProperty('spark.job.description', ...)`	❌ Not Supported	Not possible
`DataFrame.alias("example_df")`	✅ Supported	See effect below
`DataFrame.writeStream.queryName()`	✅ Supported	See example below

How to Add Identifiable Names in Spark UI

Rather than the methods that directly use sparkContext, use these supported options:

1. Naming Structured Streaming Queries in Spark UI

Use .queryName() to set a descriptive name for your streaming query. This will appear in the Spark UI under the “active streaming queries.”

Example:

python

streaming_df.writeStream \
    .queryName("ParquetToDelta_Stream") \
    .foreachBatch(process_batch) \
    .start() \
    .awaitTermination()

This will name the job “ParquetToDelta_Stream” in the streaming tab of Spark UI.

2. Effect of `df.alias("example_df")` in Spark UI

The .alias("example_df") method only sets a logical alias for the DataFrame used in SQL expressions and does not affect job, stage, or DAG block naming in the Spark UI. It is most helpful for SQL readability and debugging optimizer plans, not for UI description.

3. Block Descriptions in the DAG

There is currently no direct API for user-defined block descriptions in the DAG on Databricks when using shared clusters. Block/stage names are inferred from the operations (e.g., "Project", "Aggregate", "Filter").
For more explicit names in the UI, break your code into small, well-named functions—thereby making your code easier to correlate with the Spark UI, though the block labels themselves are not user-customizable in shared clusters.

Best Practice Example for Databricks 15.4 (Shared Cluster Mode)