cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Is there a way I can tell when a Notebook was last run, so I can identify and delete Notebooks that are no longer being used?

wpenfold
New Contributor II
 
1 ACCEPTED SOLUTION

Accepted Solutions

AmanSehgal
Honored Contributor III

Using workspace API you can list out all the notebooks for a given user.

The API response will tell you if the objects under the path is a folder or a notebook.

If it's a folder then you can add it to the path and get notebooks within the folder.

Put all of that in an excel or something and ask your team members if they need a notebook or not.

GET https://<databricks-host-name>/api/2.0/workspace/list
 
Body:
 
{ "path": "/Users/<username>" }

Refer to this documentation for more details.

Also, for a period of 'x' months archive them all in a github repo, in case someone needs access to notebooks later.

Going ahead, add sufficient logs in the notebook or a mechanism to record execution time.

It could be as simple as an insert statement at top cell that inserts a row in a table default.notebook-run with values notebook-name and timestamp, every time a notebook runs.

View solution in original post

5 REPLIES 5

Anonymous
Not applicable

You can see when a notebook was last run if it's attached to an active cluster. You can also read old logs to see what happened, but it's a lot of work for almost no gain. There isn't any harm in having old notebooks that aren't run. I have some notebooks in a workspace I have never run once and it's not problematic

wpenfold
New Contributor II

Hi Josephk, I'm new to databricks, but I've been asked to clean up old notebooks in our environment that have been created over the years, and are no longer used. Is there an API I can use to find when the last time a notebook was run? Or any other suggestion you have?

Anonymous
Not applicable

I looked around internally and couldn't find anything. Certainly nothing in the docs. Maybe just try deleting things and seeing if people complain?

AmanSehgal
Honored Contributor III

Using workspace API you can list out all the notebooks for a given user.

The API response will tell you if the objects under the path is a folder or a notebook.

If it's a folder then you can add it to the path and get notebooks within the folder.

Put all of that in an excel or something and ask your team members if they need a notebook or not.

GET https://<databricks-host-name>/api/2.0/workspace/list
 
Body:
 
{ "path": "/Users/<username>" }

Refer to this documentation for more details.

Also, for a period of 'x' months archive them all in a github repo, in case someone needs access to notebooks later.

Going ahead, add sufficient logs in the notebook or a mechanism to record execution time.

It could be as simple as an insert statement at top cell that inserts a row in a table default.notebook-run with values notebook-name and timestamp, every time a notebook runs.

wpenfold
New Contributor II

Just wondering...I can display 'Recent Activity' for a notebook--which gives me the information I'm looking for. So it is being collected...someplace. I can't find it in the APIs. Anyplace else I could look for that info?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group