cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Naming jobs in the Spark UI in Databricks Runtime 15.4

sunnyday
New Contributor

I am asking almost the same question as: https://community.databricks.com/t5/data-engineering/how-to-improve-spark-ui-job-description-for-pys...

Because I am running Databricks 15.4, I receive the following message when accessing the sparkContext:

[JVM_ATTRIBUTE_NOT_SUPPORTED] Directly accessing the underlying Spark driver JVM using the attribute 'sparkContext' is not supported on shared clusters. If you require direct access to these fields, consider using a single-user cluster. For more details on compatibility and limitations, check: https://docs.databricks.com/compute/access-mode-limitations.html#shared-access-mode-limitations-on-u...
 
Accordingly, I do not think that I can use setJobDescription, setName, and setNameas outlined in that answer.   Could you please give an example of naming jobs and tasks, including the python class which is called? 
Could you also confirm what should be the effect of using df.alias("example_df") in the Spark UI?  
I would consider this question answered with these examples:
- Set the JobGroup name in the Spark UI, either from the driver or the worker.
- Set the Job description in the Spark UI, either from the driver or the worker.
- Set the Stage description in the Spark UI, either from the driver or the worker.
- Set the descriptions on the visual blocks in the Dag visualization pages.
 
The example code, before annotations are added, could look like the following:

 

from pyspark.sql import SparkSession

def stream_parquet_to_delta(s3_path, delta_table_path‌):
  # Initialize Spark session
  spark = SparkSession.builder.appName("StreamToDeltaExample").getOrCreate()

  # Read streaming data from S3 in Parquet format
  streaming_df = spark.readStream.format("parquet").load(s3_path)

  # Define the function to process each micro-batch
  def process_batch(df, batch_id‌):
     # Write the micro-batch to Delta table
     df.write.format("delta").mode("append").save(delta_table_path)

# Write streaming data to Delta table using foreachBatch
streaming_df.writeStream.foreachBatch(process_batch).start().awaitTermination()

# Example usage
stream_parquet_to_delta("s3://my-bucket/streaming-data", "/delta-table/streaming_data")

 

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group