Databricks Community

shaunangcx · ‎04-20-2023

I have a workflow which will run every month and it will create a new notebook containing the outputs from the main notebook. However, after some time, the outputs from the created notebook will disappear. Is there anyway I can retain the outputs?

Anonymous · ‎04-24-2023

@Shaun Ang :

There are a few possible reasons why the outputs from the created notebook might be disappearing:

Notebook permissions: It's possible that the user or service account running the workflow does not have permission to write to the destination notebook. Make sure that the user or service account has the necessary permissions to write to the notebook.
Notebook deletion: It's possible that the created notebook is being deleted or overwritten by another process or user. Make sure that the workflow is not deleting or overwriting the notebook, and that no other processes or users are deleting or overwriting the notebook.
Notebook metadata: The Jupyter notebook format stores outputs as metadata in the notebook file. It's possible that the created notebook is missing or has corrupted metadata, which is causing the outputs to disappear. You can try opening the created notebook in a text editor and checking the metadata to see if the outputs are present.

To retain the outputs, you can try the following:

Save the created notebook in a different location or with a different name to avoid overwriting the original notebook.
Use version control to track changes to the notebook and its outputs.
Modify the workflow to include a step that saves the outputs to a separate file or database, rather than relying solely on the notebook metadata.

I hope this helps, and please let me know if you have any further questions or concerns.

View solution in original post

shaunangcx · ‎04-20-2023

To follow up with the discussion, when the new notebook with command outputs is created, it shows that the revision history is empty and it has a pending revision. I have to manually click save for the outputs to stay. Is there a way that I can automatically save the revision from the workflow such that the outputs can be retained?

Anonymous · ‎04-24-2023

@Shaun Ang :

Yes, you can use the Databricks Workspace API to programmatically save the revision of the created notebook, without the need for manual intervention.

You can use the workspace object in the Databricks Python API to create a new revision of the notebook and save its contents. Here's an example code snippet that shows how to do this:

import requests
import json
from databricks_cli.sdk.api_client import ApiClient
from databricks_cli.workspace.api import WorkspaceApi
 
# Set up the Databricks API client
api_client = ApiClient(token=dbutils.secrets.get(scope="<scope>", key="<key>"))
workspace_api = WorkspaceApi(api_client)
 
# Create a new revision of the notebook
notebook_path = "/path/to/new/notebook"
notebook_name = "new_notebook_name"
notebook_content = dbutils.fs.head(notebook_path)
notebook = workspace_api.import_workspace(
    notebook_name, 
    format="SOURCE",
    language="PYTHON",
    content=json.dumps({"content": notebook_content})
)
 
# Save the new revision of the notebook
notebook_revision = notebook["object_id"]
workspace_api.save(notebook_path, revision=notebook_revision)

In this example, we first set up the Databricks API client using an API token retrieved from the Databricks Secrets API. We then create a new revision of the notebook by calling the

import_workspace method of the WorkspaceApi object, which takes the name of the new notebook, the format of the content (in this case, "SOURCE" for a notebook file), the language of the notebook (in this case, "PYTHON"), and the contents of the notebook file as a JSON object. We then retrieve the object ID of the new notebook from the response of

import_workspace.

Finally, we save the new revision of the notebook using the save method of the WorkspaceApi object, which takes the path of the notebook and the object ID of the new revision.

Note that you'll need to replace <scope> and <key> in the dbutils.secrets.get method with the appropriate scope and key names for your Databricks environment.

I hope this helps, and please let me know if you have any further questions or concerns.

Anonymous · ‎04-24-2023

@Shaun Ang :

There are a few possible reasons why the outputs from the created notebook might be disappearing:

Notebook permissions: It's possible that the user or service account running the workflow does not have permission to write to the destination notebook. Make sure that the user or service account has the necessary permissions to write to the notebook.
Notebook deletion: It's possible that the created notebook is being deleted or overwritten by another process or user. Make sure that the workflow is not deleting or overwriting the notebook, and that no other processes or users are deleting or overwriting the notebook.
Notebook metadata: The Jupyter notebook format stores outputs as metadata in the notebook file. It's possible that the created notebook is missing or has corrupted metadata, which is causing the outputs to disappear. You can try opening the created notebook in a text editor and checking the metadata to see if the outputs are present.

To retain the outputs, you can try the following:

Save the created notebook in a different location or with a different name to avoid overwriting the original notebook.
Use version control to track changes to the notebook and its outputs.
Modify the workflow to include a step that saves the outputs to a separate file or database, rather than relying solely on the notebook metadata.

I hope this helps, and please let me know if you have any further questions or concerns.

Databricks Community

Command output disappearing (Not sure what's the root cause)

Join Us as a Local Community Builder!

🚀 Announcing the Databricks Data Intelligence Platform Cheat Sheet

Find Sensitive Data at Scale with Data Classification in Unity Catalog

Solution Accelerator Series | #6 - Adverse Drug Event Detection

Announcing Backfill Runs in Lakeflow Jobs for Higher Quality Downstream Data

🚀 New: Databricks Interactive Architecture Design Workshops