The issue you’re facing is common among Databricks users who try to automate notebook cloning via shell commands or %sh magic, only to encounter format loss: exporting via %sh databricks workspace export or related commands typically results in .dbc, .py, or .txt files, losing the original .dbc or .ipynb rich structure (cell types, markdown, output, etc.) and exporting code as commented Python, not as true notebooks.
Why This Happens
-
%sh commands are limited to file manipulations, not workspace-level notebook operations in native format.
-
Exporting with CLI: By default, the Databricks CLI returns Python script files with each cell as comment-hashtag, which doesn’t preserve the notebook interface or markdown cells.
-
No direct workspace APIs via %sh for rich-format notebook copying.
Known Community Strategies
-
Use Databricks REST API: You can programmatically clone/duplicate notebooks in their original format using the Databricks [Workspace API] rather than file system or %sh commands. This approach preserves notebook cell structures and metadata.
-
Endpoint: /api/2.0/workspace/import & /api/2.0/workspace/export
-
You can use these with dbutils.notebook.run() or via external automation (Python, CI/CD, etc.).
-
Automate via Databricks CLI (v0.18+): Upgrading to the new version of CLI (version 0.18+), which supports rich notebook format handling, may help.
-
For .dbc or source IPython format, you need to specify the correct format in the export/import command.
-
Direct use of dbutils (if allowed): While not natively supporting Notebook copy, tools like dbutils.fs.cp() are only for file system operations, not notebooks in the workspace.
Example - Using Databricks REST API
Clone a notebook with cell structure:
-
Export Notebook:
import requests
workspace_url = "https://<your-databricks-instance>"
token = "<your-personal-access-token>"
headers = {"Authorization": f"Bearer {token}"}
params = {
"path": "/Workspace/Templates/YourTemplateNotebook",
"format": "SOURCE", # Or "DBC" for archive
}
response = requests.get(
workspace_url + "/api/2.0/workspace/export",
headers=headers,
params=params
)
notebook_content = response.json()["content"]
-
Import as new notebook:
import base64
new_path = f"/Workspace/YourTargetFolder/NewNotebook_{i}"
encoded_content = notebook_content # usually already base64
data = {
"content": encoded_content,
"path": new_path,
"format": "SOURCE", # Or "DBC"
"overwrite": True
}
response = requests.post(
workspace_url + "/api/2.0/workspace/import",
headers=headers,
json=data
)
Recommendations
-
Programmatic Cloning: Use the REST API as above inside a separate automation script (Python notebook, workflow), rather than %sh commands.
-
Jinja or Widgets: Use notebook widgets for parameterizing new notebook names/paths, then let your automation handle the copy logic outside the UI.
-
Sync with CI/CD: Store templates in git; automate sync into Databricks workspace using the REST API as part of your deployment pipeline.
Summary Table
| Approach |
Preserves Cell Type |
Automation-friendly |
Supported in %sh |
| %sh CLI/FS commands |
No |
Yes |
Yes |
| Databricks REST API |
Yes |
Yes |
No (external) |
| dbutils.fs.cp |
No |
Yes |
Yes |
| Manual UI Clone |
Yes |
No |
N/A |
This limitation is acknowledged in the Databricks community, and the recommended enterprise approach for official source-of-truth notebooks and automation is to use the REST API or Databricks CLI with explicit attention to export/import format.