cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to detect if running in a workflow job?

dollyb
Contributor

Hi there,

what's the best way to differentiate in what environment my Spark session is running? Locally I develop with databricks-connect's DatabricksSession, but that doesn't work when running a workflow job which requires SparkSession.getOrCreate(). Right now in the job I'm passing a parameter that the app is reading. Is there another robust way to detected if the app is running on a Databricks cluster or not?

 

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @dollybWhen distinguishing between environments where your Spark session is running, especially when transitioning from local development to a workflow job, it’s essential to ensure robust detection.

Here are some approaches you can consider:

  1. Cluster Context Using Notebook Context:

  2. Spark Session Isolation:

  3. Check for Streaming Tab in Spark UI:

  4. Environment-Specific Configuration Parameters:

    • Consider using environment-specific configuration parameters.
    • For example, pass a parameter (as you’re currently doing) that indicates whether the app is running on a Databricks cluster or not.
    • This approach provides flexibility and allows you to adapt to different environments.

Remember that the choice depends on your specific use case and requirements. By combining these methods, you can create a robust mechanism to detect whether your Spark session is running in a Databricks cluster or elsewhere. 🚀

 

View solution in original post

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @dollybWhen distinguishing between environments where your Spark session is running, especially when transitioning from local development to a workflow job, it’s essential to ensure robust detection.

Here are some approaches you can consider:

  1. Cluster Context Using Notebook Context:

  2. Spark Session Isolation:

  3. Check for Streaming Tab in Spark UI:

  4. Environment-Specific Configuration Parameters:

    • Consider using environment-specific configuration parameters.
    • For example, pass a parameter (as you’re currently doing) that indicates whether the app is running on a Databricks cluster or not.
    • This approach provides flexibility and allows you to adapt to different environments.

Remember that the choice depends on your specific use case and requirements. By combining these methods, you can create a robust mechanism to detect whether your Spark session is running in a Databricks cluster or elsewhere. 🚀

 

dollyb
Contributor

Thanks, dbutils.notebook.getContext does indeed contain information about the job run.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group