cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Automating Purging of All Notebook Revision

Sergecom
New Contributor III

Hi everyone,

We work with sensitive data in Databricks, so it's crucial from both security and regulatory perspectives to purge all data saved in notebook revisions.

Currently, there are two manual methods:

  1. Delete all history from each notebook individually.

  2. Permanently purge all revision history via Settings -> Advanced for all notebooks.

Is there any way to automate this process?

I noticed that the API endpoints used for this are not documented:

  • {workspace_url}/dataretention/purgehistorybefore/{purge_before_ms}

  • {workspace_url}/notebook/{notebook_id}/history/clearall

I've tested calling these endpoints; although I receive an HTTP 200 response, the history does not actually get purged.

Has anyone managed to automate notebook revision purging successfully?
Any guidance would be greatly appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions

BigRoux
Databricks Employee
Databricks Employee

Here are some things to consider:

 

Automating the purging of notebook revision history in Databricks is not currently a directly supported feature, and there are some challenges in achieving this:
  1. Available Methods:
    Currently, Databricks provides manual options to purge notebook revision history:
    • Deleting revision history for individual notebooks.
    • Using the "Permanently purge all revision history" option accessible via Settings > Advanced, allowing bulk purging for all notebooks.
  2. Testing Internal API Endpoints:
    The undocumented endpoints you mentioned:
    • {workspace_url}/dataretention/purgehistorybefore/{purge_before_ms}
    • {workspace_url}/notebook/{notebook_id}/history/clearall While you have tested these endpoints and received HTTP 200 responses, the issue seems to be that the purging is not actually executed. This behavior could be due to the experimental or deprecated status of these endpoints, making them unreliable.
  3. Alternative Automation Strategies:
    Based on context from related tooling and approaches:
    • To script bulk revision purging, programmatic access through Databricks APIs (where available) remains the best option. However, for undocumented endpoints or limited API support, you might consider using workarounds such as invoking browser automation tools (like Selenium) to mimic the manual process of purging notebooks in the UI.
    • Ensure that Databricks' internal API logs or audit logs are checked for specific operation timings or errors related to purge attempts to diagnose why purging did not succeed.
  4. Documentation and Guidance:
    The official Databricks documentation provides steps for manual purging but does not yet describe any backend API methods for automation. For updates on this functionality, regularly refer to the relevant Databricks documentation Purge workspace storage.
Recommendations: - Reach Out to Databricks Support: Given the nature of these undocumented APIs, it is recommended to consult Databricks Support to understand their status and intended use. - Feature Request: If automating this process is critical for your use case, consider engaging with your Databricks representative to raise an official feature request for exposing reliable, supported API endpoints for notebook revision purging.
 

View solution in original post

1 REPLY 1

BigRoux
Databricks Employee
Databricks Employee

Here are some things to consider:

 

Automating the purging of notebook revision history in Databricks is not currently a directly supported feature, and there are some challenges in achieving this:
  1. Available Methods:
    Currently, Databricks provides manual options to purge notebook revision history:
    • Deleting revision history for individual notebooks.
    • Using the "Permanently purge all revision history" option accessible via Settings > Advanced, allowing bulk purging for all notebooks.
  2. Testing Internal API Endpoints:
    The undocumented endpoints you mentioned:
    • {workspace_url}/dataretention/purgehistorybefore/{purge_before_ms}
    • {workspace_url}/notebook/{notebook_id}/history/clearall While you have tested these endpoints and received HTTP 200 responses, the issue seems to be that the purging is not actually executed. This behavior could be due to the experimental or deprecated status of these endpoints, making them unreliable.
  3. Alternative Automation Strategies:
    Based on context from related tooling and approaches:
    • To script bulk revision purging, programmatic access through Databricks APIs (where available) remains the best option. However, for undocumented endpoints or limited API support, you might consider using workarounds such as invoking browser automation tools (like Selenium) to mimic the manual process of purging notebooks in the UI.
    • Ensure that Databricks' internal API logs or audit logs are checked for specific operation timings or errors related to purge attempts to diagnose why purging did not succeed.
  4. Documentation and Guidance:
    The official Databricks documentation provides steps for manual purging but does not yet describe any backend API methods for automation. For updates on this functionality, regularly refer to the relevant Databricks documentation Purge workspace storage.
Recommendations: - Reach Out to Databricks Support: Given the nature of these undocumented APIs, it is recommended to consult Databricks Support to understand their status and intended use. - Feature Request: If automating this process is critical for your use case, consider engaging with your Databricks representative to raise an official feature request for exposing reliable, supported API endpoints for notebook revision purging.