<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Accessing workspace files within cluster init script in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/accessing-workspace-files-within-cluster-init-script/m-p/3184#M337</link>
    <description>&lt;P&gt;&lt;A href="https://community.databricks.com/s/feed/0D58Y0000AQjUAoSQN" alt="https://community.databricks.com/s/feed/0D58Y0000AQjUAoSQN" target="_blank"&gt;here is a similar topic&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;We haven't figured it out yet but it might be helpful for you.&lt;/P&gt;</description>
    <pubDate>Tue, 13 Jun 2023 13:08:58 GMT</pubDate>
    <dc:creator>-werners-</dc:creator>
    <dc:date>2023-06-13T13:08:58Z</dc:date>
    <item>
      <title>Accessing workspace files within cluster init script</title>
      <link>https://community.databricks.com/t5/data-engineering/accessing-workspace-files-within-cluster-init-script/m-p/3183#M336</link>
      <description>&lt;P&gt;Greetings all!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am currently facing an issue while accessing workspace files from the init script.&lt;/P&gt;&lt;P&gt;As it was explained in the documentation, it is possible to place init script inside workspace files (&lt;A href="https://docs.databricks.com/clusters/init-scripts.html?_gl=1*12nlw4v*_gcl_au*MTQ0MzkxOTgwNC4xNjgyMzU2MjM3*_ga*MTQ5OTM1NTcyNS4xNjgyMzU2MjM3*_ga_PQSEQ3RZQC*MTY4NjY1Mzk5NC4zNy4xLjE2ODY2NTQ0MDEuMjYuMC4w&amp;amp;_ga=2.168239394.2022494050.1686645617-1499355725.1682356237#configure-a-cluster-scoped-init-script-using-the-ui" alt="https://docs.databricks.com/clusters/init-scripts.html?_gl=1*12nlw4v*_gcl_au*MTQ0MzkxOTgwNC4xNjgyMzU2MjM3*_ga*MTQ5OTM1NTcyNS4xNjgyMzU2MjM3*_ga_PQSEQ3RZQC*MTY4NjY1Mzk5NC4zNy4xLjE2ODY2NTQ0MDEuMjYuMC4w&amp;amp;_ga=2.168239394.2022494050.1686645617-1499355725.1682356237#configure-a-cluster-scoped-init-script-using-the-ui" target="_blank"&gt;link&lt;/A&gt;). This works perfectly fine and init script is being actually executed.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;However, it seems that it is not possible to reference a workspace file from the init script itself. E.g. if I placed pyproject.toml file inside my workspace folder (/Workspace/Users/username@email.com/pyproject.toml). Accessing this pyproject.toml within init script fails.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I also tried to debug it a bit and tried to list root directory ("/") and /Workspace directory during init script execution. The result of "ls /" outputs /Workspace folder as visible, however, "ls /Workspace" throws an error:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;ls: cannot open directory '/Workspace': Invalid argument&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm using Azure Databricks with cluster created by me with DB runtime 12.2 LTS ML. Workspace is created as premium and I'm admin on this workspace. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://stackoverflow.com/questions/76414162/databricks-how-to-access-workspace-files-in-init-scripts" alt="https://stackoverflow.com/questions/76414162/databricks-how-to-access-workspace-files-in-init-scripts" target="_blank"&gt;As I see others also are facing the same issue&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Gleb Smolnik&lt;/P&gt;</description>
      <pubDate>Tue, 13 Jun 2023 12:56:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/accessing-workspace-files-within-cluster-init-script/m-p/3183#M336</guid>
      <dc:creator>glebex</dc:creator>
      <dc:date>2023-06-13T12:56:16Z</dc:date>
    </item>
    <item>
      <title>Re: Accessing workspace files within cluster init script</title>
      <link>https://community.databricks.com/t5/data-engineering/accessing-workspace-files-within-cluster-init-script/m-p/3184#M337</link>
      <description>&lt;P&gt;&lt;A href="https://community.databricks.com/s/feed/0D58Y0000AQjUAoSQN" alt="https://community.databricks.com/s/feed/0D58Y0000AQjUAoSQN" target="_blank"&gt;here is a similar topic&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;We haven't figured it out yet but it might be helpful for you.&lt;/P&gt;</description>
      <pubDate>Tue, 13 Jun 2023 13:08:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/accessing-workspace-files-within-cluster-init-script/m-p/3184#M337</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2023-06-13T13:08:58Z</dc:date>
    </item>
    <item>
      <title>Re: Accessing workspace files within cluster init script</title>
      <link>https://community.databricks.com/t5/data-engineering/accessing-workspace-files-within-cluster-init-script/m-p/3185#M338</link>
      <description>&lt;P&gt;@Gleb Smolnik​&amp;nbsp;:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The init script runs on the cluster nodes before the notebook execution, and it does not have direct access to workspace files.&lt;/P&gt;&lt;P&gt;The documentation you mentioned refers to placing the init script inside a workspace file, which means you can store the script itself in a file within the Databricks workspace. However, it doesn't grant direct access to other workspace files from within the init script.&lt;/P&gt;&lt;P&gt;To access a workspace file within the init script, you can consider using the Databricks CLI or Databricks API to retrieve the file and then copy or read it on the cluster nodes during the init script execution.&lt;/P&gt;</description>
      <pubDate>Wed, 14 Jun 2023 07:45:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/accessing-workspace-files-within-cluster-init-script/m-p/3185#M338</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-06-14T07:45:52Z</dc:date>
    </item>
    <item>
      <title>Re: Accessing workspace files within cluster init script</title>
      <link>https://community.databricks.com/t5/data-engineering/accessing-workspace-files-within-cluster-init-script/m-p/3187#M340</link>
      <description>&lt;P&gt;Hey @Suteja Kanuri​,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for your answer. I understand your point. However, I could imagine the scenario, when init script acts as an orchestrator, executing other shell scripts in a desired order. The documentation article I referenced (at least how I interpreted it) allows placing init script into workspace files, kind of implying that other files will be accessible during init script execution too (which is not the case).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Anyways, I will try to figure it out with the suggestions you provided. It will be obviously nice to have workspace files mounted to databricks cluster before init script execution (not sure is it a part of a feature roadmap, so just a suggestion).&lt;/P&gt;</description>
      <pubDate>Wed, 14 Jun 2023 08:31:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/accessing-workspace-files-within-cluster-init-script/m-p/3187#M340</guid>
      <dc:creator>glebex</dc:creator>
      <dc:date>2023-06-14T08:31:37Z</dc:date>
    </item>
    <item>
      <title>Re: Accessing workspace files within cluster init script</title>
      <link>https://community.databricks.com/t5/data-engineering/accessing-workspace-files-within-cluster-init-script/m-p/3188#M341</link>
      <description>&lt;P&gt;@Gleb Smolnik​&amp;nbsp;You might also want to try cloning a github repo in your init script and then storing dependencies like requirements.txt files and other init scripts there. By doing this you can pull a whole slew of init scripts to be utilized in your cluster dynamically from a versioned source.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;init.sh&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;git clone &amp;lt;github repo url&amp;gt;my_repo.git
git -C ./my_repo checkout common_cluster_init # checkout non-main branches
pip install -r ./my_repo/init/dbricks_clusters/requirements.txt # use scripts in the repo&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Jun 2023 20:44:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/accessing-workspace-files-within-cluster-init-script/m-p/3188#M341</guid>
      <dc:creator>jacob_hill_prof</dc:creator>
      <dc:date>2023-06-20T20:44:01Z</dc:date>
    </item>
    <item>
      <title>Re: Accessing workspace files within cluster init script</title>
      <link>https://community.databricks.com/t5/data-engineering/accessing-workspace-files-within-cluster-init-script/m-p/37133#M26271</link>
      <description>&lt;P&gt;Link isn't working anymore&lt;/P&gt;</description>
      <pubDate>Fri, 07 Jul 2023 05:56:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/accessing-workspace-files-within-cluster-init-script/m-p/37133#M26271</guid>
      <dc:creator>FRG96</dc:creator>
      <dc:date>2023-07-07T05:56:26Z</dc:date>
    </item>
    <item>
      <title>Re: Accessing workspace files within cluster init script</title>
      <link>https://community.databricks.com/t5/data-engineering/accessing-workspace-files-within-cluster-init-script/m-p/37137#M26272</link>
      <description>&lt;P&gt;Hi&amp;nbsp;@Anonymous&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/72963"&gt;@glebex&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to use the Databricks Workspace export REST API using curl in the init script to download a workspace file locally.&lt;BR /&gt;What's the recommended way to pass the Databricks Instance URL and the API Token to the init script execution context?&lt;/P&gt;</description>
      <pubDate>Fri, 07 Jul 2023 07:28:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/accessing-workspace-files-within-cluster-init-script/m-p/37137#M26272</guid>
      <dc:creator>FRG96</dc:creator>
      <dc:date>2023-07-07T07:28:58Z</dc:date>
    </item>
    <item>
      <title>Re: Accessing workspace files within cluster init script</title>
      <link>https://community.databricks.com/t5/data-engineering/accessing-workspace-files-within-cluster-init-script/m-p/37269#M26307</link>
      <description>&lt;P&gt;&lt;SPAN&gt;When we are using databricks CLI - it didn't copied .txt file and in the another workaround using databricks API, it is using dbfs there is no API regarding Workspace FIle.&amp;nbsp;Just wanted to check if there is an another way to accessing workspace files within cluster init script.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 10 Jul 2023 06:07:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/accessing-workspace-files-within-cluster-init-script/m-p/37269#M26307</guid>
      <dc:creator>Nitya</dc:creator>
      <dc:date>2023-07-10T06:07:08Z</dc:date>
    </item>
  </channel>
</rss>

