cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How do I define & run jobs that execute scripts that are copied inside a custom DataBricks container?

Thijs
New Contributor III

Hi all, we are building custom Databricks containers (https://docs.databricks.com/clusters/custom-containers.html). During the container build process we install dependencies and also python source code scripts. We now want to run some of these scripts as jobs, ideally also providing command line arguments. However, when creating jobs, there doesn't seen a way to reference code that is inside the container? Any ideas?

3 REPLIES 3

Anonymous
Not applicable

@Thijs van den Berg​ :

When creating a job in Databricks, you can reference code that is inside the container by using the dbutils module. Here's an example of how you could reference a Python file called myscript.py that is located in the /opt/myapp directory of the container:

import os
 
dbutils.fs.cp("file:/opt/myapp/myscript.py", "dbfs:/mnt/my-mount-point/myscript.py")
 
os.system("python /dbfs/mnt/my-mount-point/myscript.py arg1 arg2 arg3")

In this example, we first copy the myscript.py file from the container file system to a DBFS mount point using the dbutils.fs.cp() method. Then we run the Python script using the os.system() method and passing in any command line arguments. You can also use the databricks-cli to automate the creation of jobs and the upload of files to DBFS. Here's an example:

databricks fs cp /opt/myapp/myscript.py dbfs:/mnt/my-mount-point/myscript.py
databricks jobs create --name "My Job" --python-task "python /dbfs/mnt/my-mount-point/myscript.py arg1 arg2 arg3" --max-retries 0

This example uses the databricks-cli to copy the myscript.py file to DBFS and then creates a new job with a Python task that runs the script with command line arguments.

I hope this helps! Let me know if you have any further questions.

Thijs
New Contributor III

thanks @Suteja Kanuri​ for answering. The question I asked was about scheduling/running "jobs" scripts that reside inside the container throught the Web Interface: Worksflows > Jobs > Create Job.

What we ended up doing is to package our job scripts into a python module, pip install that module into the container. That allowed us to create a job of type "Python Wheel", and then use package name and entry point to point to the job code we stored in our module inside the container.

Anonymous
Not applicable

Hi @Thijs van den Berg​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.