Copy a library into the folder of script ran in workflow job

gehbiszumeis — Thu, 15 May 2025 14:22:48 GMT

I have a python script which gets run in a databricks workflow job task run using the git integration. Originally, in the repo contained a git submodule with a library (not supported by databricks). Therefore I need to copy the library repo (which I have in my workspace folder e.g.) into the folder of my run script. From various print outs I learned that the current working directory where the run script lies is e.g. /Workspace/Repos/.internal/634b4e4c6d_commits/e6a9c03cac7f1b0732e28df085ff3e93f5cf0674/./runscript.py

This is likely a dynamic folder path and different in every job task run. How can I get this path inside the runtime? Best solution would be to have a init script where this path is obtained and the library is copied into it?

Re: Copy a library into the folder of script ran in workflow job

lingareddy_Alva — Thu, 15 May 2025 14:48:27 GMT

Hi @gehbiszumeis

This is a common challenge when working with Databricks jobs and git integration, especially when you need additional libraries that aren't directly supported by Databricks.
You're right that the working directory path is dynamically generated for each job run. Let me help you create a solution to get this path during runtime and copy your library.

There are a few approaches you can take:
1. Using Python's os and inspect modules
The simplest approach is to use Python's built-in modules to determine the directory of your running script:
import os
import inspect

# Get the current script's directory
current_file = inspect.getfile(inspect.currentframe())
current_dir = os.path.dirname(os.path.abspath(current_file))

print(f"Current script directory: {current_dir}")

# Now you can copy your library to this location

2. Using __file__ variable
Even simpler, you can use the __file__ variable which contains the path to the current script:

import os

# Get the current script's directory
current_dir = os.path.dirname(os.path.abspath(__file__))

print(f"Current script directory: {current_dir}")

3. Creating an init script
If you want to set up an initialization script that runs before your main script, you can create a file that:
-- Determines the script location
-- Copies the library from your workspace to that location
-- Sets up any necessary environment variables
Here's an example of what such an init script might look like:

# init_script.py
import os
import shutil
import sys

def setup_environment():
# Get the current script directory
current_dir = os.path.dirname(os.path.abspath(__file__))
print(f"Current directory: {current_dir}")

# Source library location in workspace
library_source = "/Workspace/path/to/your/library"

# Destination in the current directory
library_dest = os.path.join(current_dir, "library_name")

# Copy the library if it doesn't exist
if not os.path.exists(library_dest):
print(f"Copying library from {library_source} to {library_dest}")
shutil.copytree(library_source, library_dest)

# Add the library to Python path if needed
if library_dest not in sys.path:
sys.path.append(library_dest)
print(f"Added {library_dest} to Python path")

if __name__ == "__main__":
setup_environment()

Then in your main script, you would import and run this init script first:
# main_script.py
import init_script

# Run the setup
init_script.setup_environment()

# Now continue with your main script logic
# You can now import from your library
from library_name import some_module

Re: Copy a library into the folder of script ran in workflow job

gehbiszumeis — Fri, 16 May 2025 08:06:01 GMT

Thank you @lingareddy_Alva for your reply. Is it possible to have the path identification and copying done in a bash init script? I'd like to keep my run file clean as it is supposed to run also on other environments.

topic Re: Copy a library into the folder of script ran in workflow job in Data Engineering

Copy a library into the folder of script ran in workflow job

Re: Copy a library into the folder of script ran in workflow job

Re: Copy a library into the folder of script ran in workflow job