<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Copy a library into the folder of script ran in workflow job in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/copy-a-library-into-the-folder-of-script-ran-in-workflow-job/m-p/119424#M45874</link>
    <description>&lt;P&gt;Thank you &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/24053"&gt;@lingareddy_Alva&lt;/a&gt;&amp;nbsp;for your reply. Is it possible to have the path identification and copying done in a bash init script? I'd like to keep my run file clean as it is supposed to run also on other environments.&lt;/P&gt;</description>
    <pubDate>Fri, 16 May 2025 08:06:01 GMT</pubDate>
    <dc:creator>gehbiszumeis</dc:creator>
    <dc:date>2025-05-16T08:06:01Z</dc:date>
    <item>
      <title>Copy a library into the folder of script ran in workflow job</title>
      <link>https://community.databricks.com/t5/data-engineering/copy-a-library-into-the-folder-of-script-ran-in-workflow-job/m-p/119339#M45844</link>
      <description>&lt;P&gt;I have a python script which gets run in a databricks workflow job task run using the git integration. Originally, in the repo contained a git submodule with a library (not supported by databricks). Therefore I need to copy the library repo (which I have in my workspace folder e.g.) into the folder of my run script. From various print outs I learned that the current working directory where the run script lies is e.g.&amp;nbsp;&lt;SPAN&gt;/Workspace/Repos/.internal/634b4e4c6d_commits/e6a9c03cac7f1b0732e28df085ff3e93f5cf0674/./runscript.py&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;This is likely a dynamic folder path and different in every job task run. How can I get this path inside the runtime? Best solution would be to have a init script where this path is obtained and the library is copied into it?&lt;/P&gt;</description>
      <pubDate>Thu, 15 May 2025 14:22:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/copy-a-library-into-the-folder-of-script-ran-in-workflow-job/m-p/119339#M45844</guid>
      <dc:creator>gehbiszumeis</dc:creator>
      <dc:date>2025-05-15T14:22:48Z</dc:date>
    </item>
    <item>
      <title>Re: Copy a library into the folder of script ran in workflow job</title>
      <link>https://community.databricks.com/t5/data-engineering/copy-a-library-into-the-folder-of-script-ran-in-workflow-job/m-p/119350#M45847</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/164607"&gt;@gehbiszumeis&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is a common challenge when working with Databricks jobs and git integration, especially when you need additional libraries that aren't directly supported by Databricks.&lt;BR /&gt;You're right that the working directory path is dynamically generated for each job run. Let me help you create a solution to get this path during runtime and copy your library.&lt;/P&gt;&lt;P&gt;There are a few approaches you can take:&lt;BR /&gt;&lt;STRONG&gt;1. Using Python's os and inspect modules&lt;/STRONG&gt;&lt;BR /&gt;The simplest approach is to use Python's built-in modules to determine the directory of your running script:&lt;BR /&gt;import os&lt;BR /&gt;import inspect&lt;/P&gt;&lt;P&gt;# Get the current script's directory&lt;BR /&gt;current_file = inspect.getfile(inspect.currentframe())&lt;BR /&gt;current_dir = os.path.dirname(os.path.abspath(current_file))&lt;/P&gt;&lt;P&gt;print(f"Current script directory: {current_dir}")&lt;/P&gt;&lt;P&gt;# Now you can copy your library to this location&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;2. Using __file__ variable&lt;/STRONG&gt;&lt;BR /&gt;Even simpler, you can use the __file__ variable which contains the path to the current script:&lt;/P&gt;&lt;P&gt;import os&lt;/P&gt;&lt;P&gt;# Get the current script's directory&lt;BR /&gt;current_dir = os.path.dirname(os.path.abspath(__file__))&lt;/P&gt;&lt;P&gt;print(f"Current script directory: {current_dir}")&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;3. Creating an init script&lt;/STRONG&gt;&lt;BR /&gt;If you want to set up an initialization script that runs before your main script, you can create a file that:&lt;BR /&gt;-- Determines the script location&lt;BR /&gt;-- Copies the library from your workspace to that location&lt;BR /&gt;-- Sets up any necessary environment variables&lt;BR /&gt;Here's an example of what such an init script might look like:&lt;/P&gt;&lt;P&gt;# init_script.py&lt;BR /&gt;import os&lt;BR /&gt;import shutil&lt;BR /&gt;import sys&lt;/P&gt;&lt;P&gt;def setup_environment():&lt;BR /&gt;# Get the current script directory&lt;BR /&gt;current_dir = os.path.dirname(os.path.abspath(__file__))&lt;BR /&gt;print(f"Current directory: {current_dir}")&lt;BR /&gt;&lt;BR /&gt;# Source library location in workspace&lt;BR /&gt;library_source = "/Workspace/path/to/your/library"&lt;BR /&gt;&lt;BR /&gt;# Destination in the current directory&lt;BR /&gt;library_dest = os.path.join(current_dir, "library_name")&lt;BR /&gt;&lt;BR /&gt;# Copy the library if it doesn't exist&lt;BR /&gt;if not os.path.exists(library_dest):&lt;BR /&gt;print(f"Copying library from {library_source} to {library_dest}")&lt;BR /&gt;shutil.copytree(library_source, library_dest)&lt;BR /&gt;&lt;BR /&gt;# Add the library to Python path if needed&lt;BR /&gt;if library_dest not in sys.path:&lt;BR /&gt;sys.path.append(library_dest)&lt;BR /&gt;print(f"Added {library_dest} to Python path")&lt;/P&gt;&lt;P&gt;if __name__ == "__main__":&lt;BR /&gt;setup_environment()&lt;/P&gt;&lt;P&gt;Then in your main script, you would import and run this init script first:&lt;BR /&gt;# main_script.py&lt;BR /&gt;import init_script&lt;/P&gt;&lt;P&gt;# Run the setup&lt;BR /&gt;init_script.setup_environment()&lt;/P&gt;&lt;P&gt;# Now continue with your main script logic&lt;BR /&gt;# You can now import from your library&lt;BR /&gt;from library_name import some_module&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 15 May 2025 14:48:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/copy-a-library-into-the-folder-of-script-ran-in-workflow-job/m-p/119350#M45847</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2025-05-15T14:48:27Z</dc:date>
    </item>
    <item>
      <title>Re: Copy a library into the folder of script ran in workflow job</title>
      <link>https://community.databricks.com/t5/data-engineering/copy-a-library-into-the-folder-of-script-ran-in-workflow-job/m-p/119424#M45874</link>
      <description>&lt;P&gt;Thank you &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/24053"&gt;@lingareddy_Alva&lt;/a&gt;&amp;nbsp;for your reply. Is it possible to have the path identification and copying done in a bash init script? I'd like to keep my run file clean as it is supposed to run also on other environments.&lt;/P&gt;</description>
      <pubDate>Fri, 16 May 2025 08:06:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/copy-a-library-into-the-folder-of-script-ran-in-workflow-job/m-p/119424#M45874</guid>
      <dc:creator>gehbiszumeis</dc:creator>
      <dc:date>2025-05-16T08:06:01Z</dc:date>
    </item>
  </channel>
</rss>

