cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to detect if running in a workflow job?

dollyb
New Contributor III

Hi there,

what's the best way to differentiate in what environment my Spark session is running? Locally I develop with databricks-connect's DatabricksSession, but that doesn't work when running a workflow job which requires SparkSession.getOrCreate(). Right now in the job I'm passing a parameter that the app is reading. Is there another robust way to detected if the app is running on a Databricks cluster or not?

 

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @dollybWhen distinguishing between environments where your Spark session is running, especially when transitioning from local development to a workflow job, it’s essential to ensure robust detection.

Here are some approaches you can consider:

  1. Cluster Context Using Notebook Context:

  2. Spark Session Isolation:

  3. Check for Streaming Tab in Spark UI:

  4. Environment-Specific Configuration Parameters:

    • Consider using environment-specific configuration parameters.
    • For example, pass a parameter (as you’re currently doing) that indicates whether the app is running on a Databricks cluster or not.
    • This approach provides flexibility and allows you to adapt to different environments.

Remember that the choice depends on your specific use case and requirements. By combining these methods, you can create a robust mechanism to detect whether your Spark session is running in a Databricks cluster or elsewhere. 🚀

 

View solution in original post

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @dollybWhen distinguishing between environments where your Spark session is running, especially when transitioning from local development to a workflow job, it’s essential to ensure robust detection.

Here are some approaches you can consider:

  1. Cluster Context Using Notebook Context:

  2. Spark Session Isolation:

  3. Check for Streaming Tab in Spark UI:

  4. Environment-Specific Configuration Parameters:

    • Consider using environment-specific configuration parameters.
    • For example, pass a parameter (as you’re currently doing) that indicates whether the app is running on a Databricks cluster or not.
    • This approach provides flexibility and allows you to adapt to different environments.

Remember that the choice depends on your specific use case and requirements. By combining these methods, you can create a robust mechanism to detect whether your Spark session is running in a Databricks cluster or elsewhere. 🚀

 

dollyb
New Contributor III

Thanks, dbutils.notebook.getContext does indeed contain information about the job run.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!