cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

How to detect if running in a workflow job?

dollyb
New Contributor III

Hi there,

what's the best way to differentiate in what environment my Spark session is running? Locally I develop with databricks-connect's DatabricksSession, but that doesn't work when running a workflow job which requires SparkSession.getOrCreate(). Right now in the job I'm passing a parameter that the app is reading. Is there another robust way to detected if the app is running on a Databricks cluster or not?

 

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @dollybWhen distinguishing between environments where your Spark session is running, especially when transitioning from local development to a workflow job, it’s essential to ensure robust detection.

Here are some approaches you can consider:

  1. Cluster Context Using Notebook Context:

  2. Spark Session Isolation:

  3. Check for Streaming Tab in Spark UI:

  4. Environment-Specific Configuration Parameters:

    • Consider using environment-specific configuration parameters.
    • For example, pass a parameter (as you’re currently doing) that indicates whether the app is running on a Databricks cluster or not.
    • This approach provides flexibility and allows you to adapt to different environments.

Remember that the choice depends on your specific use case and requirements. By combining these methods, you can create a robust mechanism to detect whether your Spark session is running in a Databricks cluster or elsewhere. 🚀

 

View solution in original post

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @dollybWhen distinguishing between environments where your Spark session is running, especially when transitioning from local development to a workflow job, it’s essential to ensure robust detection.

Here are some approaches you can consider:

  1. Cluster Context Using Notebook Context:

  2. Spark Session Isolation:

  3. Check for Streaming Tab in Spark UI:

  4. Environment-Specific Configuration Parameters:

    • Consider using environment-specific configuration parameters.
    • For example, pass a parameter (as you’re currently doing) that indicates whether the app is running on a Databricks cluster or not.
    • This approach provides flexibility and allows you to adapt to different environments.

Remember that the choice depends on your specific use case and requirements. By combining these methods, you can create a robust mechanism to detect whether your Spark session is running in a Databricks cluster or elsewhere. 🚀

 

dollyb
New Contributor III

Thanks, dbutils.notebook.getContext does indeed contain information about the job run.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.