โ02-28-2022 04:05 AM
Hello,
I want to run some notebooks from notebook "A".
And regardless of the contents of the some notebook, it is run for a long time (20 seconds). It is constans value and I do not know why it takes so long.
I tried run simple notebook with one input parameter and only print it - it takes the same 20 seconds.
I use this method:
notebook_result = dbutils.notebook.run("notebook_name", 60, {"key1": "value1", "key2": "value2"})
The notebooks are in the same folder and in the same cluster (really good cluster).
Could someone explain me why it takes so long and how can I speed it run?
Best regards,
ลukasz
โ03-09-2022 01:10 AM
Okay I am not able to set the same session for the both notebooks (parent and children).
So my result is to use %run ./notebook_name .
I put all the code to functions and now I can use them.
Example:
# Children notebook
def do_something(param1, param2):
# some code ...
return result_value
# Parent notebook
# some code ...
%run ./children_notebook
# some code ...
function_result = do_something(value_1, value_2)
# some code ...
Thanks to everyone for the answers
โ02-28-2022 08:59 AM
I guess the creation of the spark session requires the 20 seconds
โ02-28-2022 09:50 AM
I believe that dbutils.notebook.run creates a new session so there is a little more overhead. If you do not want to create a new session you can use
%run <NOTEBOOK PATH>
This will execute the notebook inline with the same session as the parent notebook. Note that this shares the session so if you define variables or functions in the child notebook they will be available in the parent notebook.
Also, if you are trying to orchestrate notebooks you should use the task orchestration available in the Databricks jobs ui.
โ03-01-2022 12:41 AM
Hello Ryan,
Thank you for the response.
Now I understand.
However, is there any way to put inputs and take outputs from the notebook using this method?
Best regards,
ลukasz
โ03-04-2022 09:01 AM
I do not believe you can get outputs from dbutils.notebook.exit. But you could potentially drop a file locally with values and read it in the other notebook or save them as variables and access that variable.
โ03-01-2022 12:33 AM
You can also just use files in repos and import needed library/class to your notebook.
If you run 2 notebooks in parallel it is good to reserve resources for every of them using pool option:
spark.sparkContext.setLocalProperty("spark.scheduler.pool", "notebook1")
โ03-04-2022 05:48 AM
Hello Hubert,
Thank you for the response.
I am not sure if it works for me.
I run in a loop the same notebook in a few times. Something like that:
spark.sparkContext.setLocalProperty("spark.scheduler.pool", "My_Notebook")
for row in data:
notebook_results = dbutils.notebook.run("My_Notebook", 60, {"data": row})
And yet the time to start any notebook is several seconds.
Could you tell me what is wrong with this solution?
Best regards,
ลukasz
โ03-09-2022 01:10 AM
Okay I am not able to set the same session for the both notebooks (parent and children).
So my result is to use %run ./notebook_name .
I put all the code to functions and now I can use them.
Example:
# Children notebook
def do_something(param1, param2):
# some code ...
return result_value
# Parent notebook
# some code ...
%run ./children_notebook
# some code ...
function_result = do_something(value_1, value_2)
# some code ...
Thanks to everyone for the answers
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group