How do I define & run jobs that execute scripts that are copied inside a custom DataBricks container?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-12-2023 03:35 AM
Hi all, we are building custom Databricks containers (https://docs.databricks.com/clusters/custom-containers.html). During the container build process we install dependencies and also python source code scripts. We now want to run some of these scripts as jobs, ideally also providing command line arguments. However, when creating jobs, there doesn't seen a way to reference code that is inside the container? Any ideas?
- Labels:
-
Container
-
Custom Docker Image
-
JOBS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-13-2023 09:06 AM
@Thijs van den Berg :
When creating a job in Databricks, you can reference code that is inside the container by using the dbutils module. Here's an example of how you could reference a Python file called myscript.py that is located in the /opt/myapp directory of the container:
import os
dbutils.fs.cp("file:/opt/myapp/myscript.py", "dbfs:/mnt/my-mount-point/myscript.py")
os.system("python /dbfs/mnt/my-mount-point/myscript.py arg1 arg2 arg3")
In this example, we first copy the myscript.py file from the container file system to a DBFS mount point using the dbutils.fs.cp() method. Then we run the Python script using the os.system() method and passing in any command line arguments. You can also use the databricks-cli to automate the creation of jobs and the upload of files to DBFS. Here's an example:
databricks fs cp /opt/myapp/myscript.py dbfs:/mnt/my-mount-point/myscript.py
databricks jobs create --name "My Job" --python-task "python /dbfs/mnt/my-mount-point/myscript.py arg1 arg2 arg3" --max-retries 0
This example uses the databricks-cli to copy the myscript.py file to DBFS and then creates a new job with a Python task that runs the script with command line arguments.
I hope this helps! Let me know if you have any further questions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-15-2023 02:19 AM
thanks @Suteja Kanuri for answering. The question I asked was about scheduling/running "jobs" scripts that reside inside the container throught the Web Interface: Worksflows > Jobs > Create Job.
What we ended up doing is to package our job scripts into a python module, pip install that module into the container. That allowed us to create a job of type "Python Wheel", and then use package name and entry point to point to the job code we stored in our module inside the container.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-13-2023 06:01 PM
Hi @Thijs van den Berg
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!

