cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to find the last modified date of a notebook?

Naveen_KumarMad
New Contributor III

I would like to find the notebooks that are not required and not being used and then I can review and delete them. If there is a way to find last modified date of a notebook programmatically then I can get a list of notebooks, which I can review and delete them.

I have checked the workspace API functions and could not find how to get the last modified date of a notebook.

1 ACCEPTED SOLUTION

Accepted Solutions

NateAnth
Databricks Employee
Databricks Employee

Your ask was how to do this programmatically, here is an example where you can use the workspace 2.0 API to get this information with the list function. See my example below:

❯ curl --netrc --request GET \
  https://workspaceURL/api/2.0/workspace/list \
  --header 'Accept: application/json' \
  --data '{ "path": "/Users/nathan.anthony@databricks.com/", "recursive": true }' 
 
{
  "objects": [
    {
      "object_type": "NOTEBOOK",
      "path": "/Users/nathan.anthony@databricks.com/NotebookA",
      "language": "SQL",
      "created_at": 1670968138861,
      "modified_at": 1671133083375,
      "object_id": 37800453517611
    },
    {
      "object_type": "DIRECTORY",
      "path": "/Users/nathan.anthony@databricks.com/testDirectory",
      "object_id": 250599769035380
    },
    {
      "object_type": "NOTEBOOK",
      "path": "/Users/nathan.anthony@databricks.com/testNotebook",
      "language": "SQL",
      "created_at": 1662656912698,
      "modified_at": 1669064685778,
      "object_id": 250599769035457
    },

View solution in original post

13 REPLIES 13

Tayyab_Vohra
Contributor

Use Python commands to display creation date and modification date

The ls command is an easy way to display basic information. If you want more detailed timestamps, you should use Python API calls.

For example, this sample code uses datetime functions to display the creation date and modified date of all listed files and directories in the /dbfs/ folder. Replace /dbfs/ with the full path to the files you want to display.

%python

 %python
 
import os
from datetime import datetime
path = '/dbfs/'
fdpaths = [path+"/"+fd for fd in os.listdir(path)]
print(" file_path " + " create_date " + " modified_date ")
for fdpath in fdpaths:
  statinfo = os.stat(fdpath)
  create_date = datetime.fromtimestamp(statinfo.st_ctime)
  modified_date = datetime.fromtimestamp(statinfo.st_mtime)
  print(fdpath, create_date, modified_date)

Output:

file_path  create_date  modified_date

/dbfs//FileStore 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730

/dbfs//databricks 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730

/dbfs//databricks-datasets 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730

/dbfs//databricks-results 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730

/dbfs//dbfs 2020-06-09 21:11:24 2020-06-09 21:11:24

/dbfs//local_disk0 2020-05-20 22:32:05 2020-05-20 22:32:05

/dbfs//ml 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730

/dbfs//tmp 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730

/dbfs//user 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730

Reference:

https://kb.databricks.com/en_US/python/display-file-timestamp-details

Thanks Tayyab for the code. This works for '/dbfs/', but not working for a notebook path. My notebooks path looks something like this "/dev/abcd/xyz/notebook'. Please let me know if I am missing something.

@Naveen Kumar Madas​ can you please show the error in jpeg or something, you can also print your path of the file before loop iteration, try to print the path, and show it, then I can edit the code.

Here is the error message ( '/dev/adhoc/' is the folder path containing notebooks):

FileNotFoundError: [Errno 2] No such file or directory: '/dev/adhoc/'

Here is the code I am using:

%python
 
import os
from datetime import datetime
path = '/dev/adhoc/'
# path = '/dbfs/'
fdpaths = [path+"/"+fd for fd in os.listdir(path)]
print(" file_path " + " create_date " + " modified_date ")
for fdpath in fdpaths:
  statinfo = os.stat(fdpath)
  create_date = datetime.fromtimestamp(statinfo.st_ctime)
  modified_date = datetime.fromtimestamp(statinfo.st_mtime)
  print(fdpath, create_date, modified_date)

1) have you manually checked that the notebook really exists in that folder? because here the error shows that you don't have any notebook in the folder.

2) And if the notebook is inside the folder then please comment out the loop part and print all the file name first.

3) Once you print that then it would be easily accessible.

Because after checking your error it shows that the notebook is not inside the folder.

Please see the attached picture. Under Workspace, you will see 'dev' folder, which in turn contains 3 sub folders. My notebook exists in one of the sub folder. But, 'os.listdir(''/dev/')' shows different list of directories/filenames instead of the 3 sub-folders I see in the picture. It looks like 'os.listdir' command is referring to a different 'dev' folder, but not the one under Workspace. Thanks.

Lakshay
Databricks Employee
Databricks Employee

Hi @Naveen Kumar Madas​ , You can use the "revision history" option on the right-side panel to view the last modified date for a notebook.

youssefmrini
Databricks Employee
Databricks Employee

You can use the revision history and if you are using repos. You can check the different commit

NateAnth
Databricks Employee
Databricks Employee

Your ask was how to do this programmatically, here is an example where you can use the workspace 2.0 API to get this information with the list function. See my example below:

❯ curl --netrc --request GET \
  https://workspaceURL/api/2.0/workspace/list \
  --header 'Accept: application/json' \
  --data '{ "path": "/Users/nathan.anthony@databricks.com/", "recursive": true }' 
 
{
  "objects": [
    {
      "object_type": "NOTEBOOK",
      "path": "/Users/nathan.anthony@databricks.com/NotebookA",
      "language": "SQL",
      "created_at": 1670968138861,
      "modified_at": 1671133083375,
      "object_id": 37800453517611
    },
    {
      "object_type": "DIRECTORY",
      "path": "/Users/nathan.anthony@databricks.com/testDirectory",
      "object_id": 250599769035380
    },
    {
      "object_type": "NOTEBOOK",
      "path": "/Users/nathan.anthony@databricks.com/testNotebook",
      "language": "SQL",
      "created_at": 1662656912698,
      "modified_at": 1669064685778,
      "object_id": 250599769035457
    },

NateAnth
Databricks Employee
Databricks Employee

Additionally, if you want this in CSV, you could try piping the output into a utility such as jq

Example:

curl ..... | jq -r '.objects[] | [.object_type, .path, .language, .modified_at] | @csv'

Thanks @Nathan Anthony​. This works.

Anonymous
Not applicable

Hi @Naveen Kumar Madas​ 

Thank you for your question!

To assist you better, please take a moment to review the answer and let me know if it best fits your needs.

Please help us select the best solution by clicking on "Select As Best" if it does.

Your feedback will help us ensure that we are providing the best possible service to you.

Thank you!

Amit_352107
New Contributor III

Hi @Naveen Kumar Madas​ 

you can go through below code block

%sh

ls -lt /dbfs/

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group