cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to copy notebooks from local to the tarrget folder via asset bundles

BobCat62
New Contributor III

Hi all,

I am able to deploy Databricks assets to the target workspace. Jobs and workflows can also be created successfully.

But I have aspecial requirement, that I copy the note books to the target folder on databricks workspace.

Example:

on Local I have such a structure:

databricks/
├─ fixtures/
├─ src/
│ ├─ SubFolder1/
│ │ ├─ Notebook1.ipynb
│ ├─ SubFolder2/
│ │ ├─ Notebook2.ipynb
│ ├─ Notebook3.ipnyb
├─ resources/
│ ├─ dab.job.yml.css
├─ .gitignore

On Databricks side, I would like to have this:

Workspace/
├─ Repos/
├─ Shared/
├─ Users/
├─ SubFolder1/
│ ├─ Notebook1.ipynb
├─ Subfolder2/
│ ├─ Notebook2.ipynb
├─ Notebook3.ipnyb

How can I set DAB that I can copy the notebooks to the desired folder on workspace and not to the root path with other files

8 REPLIES 8

ashraf1395
Honored Contributor

Hi @BobCat62, we meet again 😃,


I hope you are doing great.

You can deploy your notebooks to your workspace, which are even outside your databricks.yml (bundle root path) using the sync paths mapping. Though by default all these resources go to your specified workspace root path. 

If you want to send these files to any target root path, you can do it using the following ways

#databricks.yml

targets:
  test:
    workspace:
      file_path: /Workspace/
    sync:
      path:
        - .src/subfolder1/*.ipynb
        - .src/subfolder2/*.ipynb
        - ./src/notebook3.ipynb

- I have placed all the sync and workspace mappings inside a target mapping you can have it at root level as well.
- I specified all the notebooks separately but in your case since all are in the src directory you can directly just put the ./src/* as well

- I have assumed that databricks.yml is in the root directory , if not you can change the path with relative to that.

Here are the databricks docs : 
https://docs.databricks.com/aws/en/dev-tools/bundles/settings#file_path
https://docs.databricks.com/aws/en/dev-tools/bundles/settings#paths

BobCat62
New Contributor III

Hello @ashraf1395 ,

Nice to hear you and thank you for your hints.

Actually with your idea, I could reach half of my aim 😊

you can see here the folder structure in my VS code:

BobCat62_0-1743023725878.png

and here is part of my `databrick.yml` file:

targets:
  dev:
    # The default target uses 'mode: development' to create a development copy.
    # - Deployed resources get prefixed with '[dev my_user_name]'
    # - Any job schedules and triggers are paused by default.
    mode: production
    #default: true
    workspace:
      root_path: /Workspace/Shared/.bundle/${bundle.name}/${bundle.target}
      file_path: /Workspace/
    sync:
      paths:
        - .src/emob1/*.ipynb
      exclude:
        - databricks/fixtures
        - databricks/resources
        - databricks/fixtures
        - databricks/.vscode  
        - databricks/src/.gitignor
        - databricks/src/README.md

If I deploy with this yml, I get this error:

Error: stat .src/emob1/*.ipynb: no such file or directory

 

I I remove this part:

 paths:
        - .src/emob1/*.ipynb
I can deploy successfully, but the result is not I want:
BobCat62_1-1743024066327.png

 I would like to have emob1 and emob2 directly under Workspace, and not under Databricks/src

 

Do you have any Idea?

Thanks

 

ashraf1395
Honored Contributor

Hi @BobCat62 , I made a small typo it would be ./src/emob1/*.ipynb or you can just keep it ./src/emob1/

If you databricks.yml file is outside the databricks folder the path will be
./databricks/src/emob1/
and with file_path: /Workspace/  set to this. Your target file will be added at the correct place only drawback databricks syncs the local path dynamically wrt databricks.yml file

So your databricks.yml file is outside of databricks repo i.e. in parallel with it your databricks workspace structure will look like this 
/Workspace/databricks/src/emob1/
but if you want emob1 directly I don't see any way we can change the path of bundle root location.

ashraf1395_0-1743054766106.png

 

If it is not mandatory to keep the emob1 and 2 files inside databricks/src you can keep them in parallel with your databricks.yml
Whick wont be a good practice I guess.

If you can find a way to change the bundle root location using an environment variable etc (though I don't find something like this anywhere )you work can be done or for quick workaround you can keep your emob files in parallel with databricks.yml file

BobCat62
New Contributor III

@ashraf1395 Thank you so much...

You mean, if I could put the databricks.yml file inside the databricks folder and relocate my emob folders then I can achieve my aim

BobCat62_0-1743077178783.png

 

As I understand it is not possible to reloacte the databricks.yml because it should be in the root. Is my understanding correct?

ashraf1395
Honored Contributor

Right, @BobCat62  databricks.yml should be in the root location , if you change the place of your emob folders wrt databricks.yml then you can achieve it like maybe taking out the emob folder and keeping them in root.

I don't know any other solution to this

kmodelew
New Contributor II

What are the permissions to this databricks directory? Can someone delete this directory or any file? On Shared workspace everyone can delete bundle files or bundle directory, even if in databricks.yml I provided permissions only to admins ('CAN MANAGE').

permissions:
- user_groups: admins
level: CAN_MANAGE

I'm looking a safe place and  common space for all bundles. I don't like the idea to have bundles on User Workspaces (databricks suggest this approach).

cmathieu
New Contributor III

Can you store it in a service principal's workspace? I read this somewhere in the documentation but can't find it again. 

ashraf1395
Honored Contributor

Hi @kmodelew , You can change your workspace : root_path to maybe /Workspace or any other common location you want so that it can be view by all and accessed only by those who have the permissions.

 

By default the bundle path is /Workspace/users/current_user....

You can find more information about this here : https://docs.databricks.com/aws/en/dev-tools/bundles/settings#workspace