Databricks Community

renancy · ‎08-09-2024

Hi I'm a Data engineer and recently developed a Notebook analytics template for general purposes that I would like to be the standard on my company. Continuing, I created another notebook with a text widget that uses the user input to map the folder and deposit the template enumerated.
The problem is that, there are no known way of programmatically, using the %sh magic command to clone the Notebook with the correct format. It always export as a txt or .py file with all Notebook original cells as hashtag comments. Is there anyone that faces the same issue and how did you fixed it? I already tried the following commands below and none of them worked correctly:

%sh databricks workspace export
%sh databricks workspace export_dir
%sh databricks fs cp ...

Renancius

mark_ott · Tuesday

The issue you’re facing is common among Databricks users who try to automate notebook cloning via shell commands or %sh magic, only to encounter format loss: exporting via %sh databricks workspace export or related commands typically results in .dbc, .py, or .txt files, losing the original .dbc or .ipynb rich structure (cell types, markdown, output, etc.) and exporting code as commented Python, not as true notebooks.

Why This Happens

%sh commands are limited to file manipulations, not workspace-level notebook operations in native format.
Exporting with CLI: By default, the Databricks CLI returns Python script files with each cell as comment-hashtag, which doesn’t preserve the notebook interface or markdown cells.
No direct workspace APIs via %sh for rich-format notebook copying.

Known Community Strategies

Use Databricks REST API: You can programmatically clone/duplicate notebooks in their original format using the Databricks [Workspace API] rather than file system or %sh commands. This approach preserves notebook cell structures and metadata.
- Endpoint: /api/2.0/workspace/import & /api/2.0/workspace/export
- You can use these with dbutils.notebook.run() or via external automation (Python, CI/CD, etc.).
Automate via Databricks CLI (v0.18+): Upgrading to the new version of CLI (version 0.18+), which supports rich notebook format handling, may help.
- For .dbc or source IPython format, you need to specify the correct format in the export/import command.
Direct use of dbutils (if allowed): While not natively supporting Notebook copy, tools like dbutils.fs.cp() are only for file system operations, not notebooks in the workspace.

Example - Using Databricks REST API

Clone a notebook with cell structure:

Export Notebook:

python

import requests workspace_url = "https://<your-databricks-instance>" token = "<your-personal-access-token>" headers = {"Authorization": f"Bearer {token}"} params = { "path": "/Workspace/Templates/YourTemplateNotebook", "format": "SOURCE", # Or "DBC" for archive } response = requests.get( workspace_url + "/api/2.0/workspace/export", headers=headers, params=params ) notebook_content = response.json()["content"]
Import as new notebook:

python

import base64 new_path = f"/Workspace/YourTargetFolder/NewNotebook_{i}" encoded_content = notebook_content # usually already base64 data = { "content": encoded_content, "path": new_path, "format": "SOURCE", # Or "DBC" "overwrite": True } response = requests.post( workspace_url + "/api/2.0/workspace/import", headers=headers, json=data )

Recommendations

Programmatic Cloning: Use the REST API as above inside a separate automation script (Python notebook, workflow), rather than %sh commands.
Jinja or Widgets: Use notebook widgets for parameterizing new notebook names/paths, then let your automation handle the copy logic outside the UI.
Sync with CI/CD: Store templates in git; automate sync into Databricks workspace using the REST API as part of your deployment pipeline.

Summary Table

Approach	Preserves Cell Type	Automation-friendly	Supported in %sh
%sh CLI/FS commands	No	Yes	Yes
Databricks REST API	Yes	Yes	No (external)
dbutils.fs.cp	No	Yes	Yes
Manual UI Clone	Yes	No	N/A

This limitation is acknowledged in the Databricks community, and the recommended enterprise approach for official source-of-truth notebooks and automation is to use the REST API or Databricks CLI with explicit attention to export/import format.