cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Long run time with %run command

cmilligan
Contributor II

My team has started to see long run times on cells when using the %run commands to run another notebook. The notebook that we are calling with %run only contains variable setting, defining functions, and library imports. In some cases I have seen in excess of 10+ minutes but this isn't behavior I would expect without actually running anything.

Has anyone else run into this and how have you resolved it?

7 REPLIES 7

Kaniz
Community Manager
Community Manager

Hi @cmilligan , 

- Long run times with %run command could be due to notebook size and complexity, Databricks cluster load, and network latency.
%run command executes another notebook immediately, making its functions and variables available in the calling notebook.
- Execution time can increase if there are many or complex operations in the notebook.


- To resolve this issue:
 - Optimize the notebook: Review and optimize or remove any operations.
 - Increase cluster resources: Consider increasing your Databricks cluster resources or use a separate cluster.
 - Use dbutils.notebook.run() instead of %run: It starts a new job to run the notebook which might be more efficient. But it doesn't make the functions and variables of the called notebook available in the calling notebook.
- Example of using dbutils.notebook.run():

python
dbutils.notebook.run("My Other Notebook", 60)

 - This will run the notebook "My Other Notebook" and throw an exception if it doesn't finish within 60 seconds.

Thank you for the reply @Kaniz ,

I don't think the issue is with the performance of the notebook that we're calling with %run. The only things in this notebook are re-usable python functions and simple variable setting (text strings, passwords, static lists, etc.). A majority of the time when running this notebook using %run it takes a couple of seconds or less. Lately it seems that when there is high demand on the cluster it can take upwards of 10 minutes or we will just experience connection timeout failures. This tends to happen more on this command when running as part of a larger job or another job is running on the cluster. 

@Kaniz  it seems like the issue is accessing secrets in the scope. I was testing with a user that doesn't have access to the secret scope which is one of the first commands in the notebook. I would expect it to fail quickly since they don't have access but it still continues to run for a long time.

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi @cmilligan , Can you try this with another user? Also, with a different notebook and cluster? What is the DBR version now running? 

Hi @Debayan,

I've tried this against multiple users and notebooks. We've also used multiple clusters one with 10.4 LTS and the other with 13.3 LTS. The issue is still happening

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi, Do you see anything suspicious in the log4j section of the cluster driver logs? 

@Debayan I'm not really sure, I haven't read the logs before, and looking through it I'm not sure if there is something that should stand out or not

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!