cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to find the last modified date of a notebook?

Naveen_KumarMad
New Contributor III

I would like to find the notebooks that are not required and not being used and then I can review and delete them. If there is a way to find last modified date of a notebook programmatically then I can get a list of notebooks, which I can review and delete them.

I have checked the workspace API functions and could not find how to get the last modified date of a notebook.

1 ACCEPTED SOLUTION

Accepted Solutions

NateAnth
Valued Contributor
Valued Contributor

Your ask was how to do this programmatically, here is an example where you can use the workspace 2.0 API to get this information with the list function. See my example below:

❯ curl --netrc --request GET \
  https://workspaceURL/api/2.0/workspace/list \
  --header 'Accept: application/json' \
  --data '{ "path": "/Users/nathan.anthony@databricks.com/", "recursive": true }' 
 
{
  "objects": [
    {
      "object_type": "NOTEBOOK",
      "path": "/Users/nathan.anthony@databricks.com/NotebookA",
      "language": "SQL",
      "created_at": 1670968138861,
      "modified_at": 1671133083375,
      "object_id": 37800453517611
    },
    {
      "object_type": "DIRECTORY",
      "path": "/Users/nathan.anthony@databricks.com/testDirectory",
      "object_id": 250599769035380
    },
    {
      "object_type": "NOTEBOOK",
      "path": "/Users/nathan.anthony@databricks.com/testNotebook",
      "language": "SQL",
      "created_at": 1662656912698,
      "modified_at": 1669064685778,
      "object_id": 250599769035457
    },

View solution in original post

13 REPLIES 13

Tayyab_Vohra
Contributor

Use Python commands to display creation date and modification date

The ls command is an easy way to display basic information. If you want more detailed timestamps, you should use Python API calls.

For example, this sample code uses datetime functions to display the creation date and modified date of all listed files and directories in the /dbfs/ folder. Replace /dbfs/ with the full path to the files you want to display.

%python

 %python
 
import os
from datetime import datetime
path = '/dbfs/'
fdpaths = [path+"/"+fd for fd in os.listdir(path)]
print(" file_path " + " create_date " + " modified_date ")
for fdpath in fdpaths:
  statinfo = os.stat(fdpath)
  create_date = datetime.fromtimestamp(statinfo.st_ctime)
  modified_date = datetime.fromtimestamp(statinfo.st_mtime)
  print(fdpath, create_date, modified_date)

Output:

file_path  create_date  modified_date

/dbfs//FileStore 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730

/dbfs//databricks 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730

/dbfs//databricks-datasets 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730

/dbfs//databricks-results 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730

/dbfs//dbfs 2020-06-09 21:11:24 2020-06-09 21:11:24

/dbfs//local_disk0 2020-05-20 22:32:05 2020-05-20 22:32:05

/dbfs//ml 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730

/dbfs//tmp 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730

/dbfs//user 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730

Reference:

https://kb.databricks.com/en_US/python/display-file-timestamp-details

Thanks Tayyab for the code. This works for '/dbfs/', but not working for a notebook path. My notebooks path looks something like this "/dev/abcd/xyz/notebook'. Please let me know if I am missing something.

@Naveen Kumar Madas​ can you please show the error in jpeg or something, you can also print your path of the file before loop iteration, try to print the path, and show it, then I can edit the code.

Here is the error message ( '/dev/adhoc/' is the folder path containing notebooks):

FileNotFoundError: [Errno 2] No such file or directory: '/dev/adhoc/'

Here is the code I am using:

%python
 
import os
from datetime import datetime
path = '/dev/adhoc/'
# path = '/dbfs/'
fdpaths = [path+"/"+fd for fd in os.listdir(path)]
print(" file_path " + " create_date " + " modified_date ")
for fdpath in fdpaths:
  statinfo = os.stat(fdpath)
  create_date = datetime.fromtimestamp(statinfo.st_ctime)
  modified_date = datetime.fromtimestamp(statinfo.st_mtime)
  print(fdpath, create_date, modified_date)

1) have you manually checked that the notebook really exists in that folder? because here the error shows that you don't have any notebook in the folder.

2) And if the notebook is inside the folder then please comment out the loop part and print all the file name first.

3) Once you print that then it would be easily accessible.

Because after checking your error it shows that the notebook is not inside the folder.

Please see the attached picture. Under Workspace, you will see 'dev' folder, which in turn contains 3 sub folders. My notebook exists in one of the sub folder. But, 'os.listdir(''/dev/')' shows different list of directories/filenames instead of the 3 sub-folders I see in the picture. It looks like 'os.listdir' command is referring to a different 'dev' folder, but not the one under Workspace. Thanks.

Lakshay
Esteemed Contributor
Esteemed Contributor

Hi @Naveen Kumar Madas​ , You can use the "revision history" option on the right-side panel to view the last modified date for a notebook.

youssefmrini
Honored Contributor III
Honored Contributor III

You can use the revision history and if you are using repos. You can check the different commit

NateAnth
Valued Contributor
Valued Contributor

Your ask was how to do this programmatically, here is an example where you can use the workspace 2.0 API to get this information with the list function. See my example below:

❯ curl --netrc --request GET \
  https://workspaceURL/api/2.0/workspace/list \
  --header 'Accept: application/json' \
  --data '{ "path": "/Users/nathan.anthony@databricks.com/", "recursive": true }' 
 
{
  "objects": [
    {
      "object_type": "NOTEBOOK",
      "path": "/Users/nathan.anthony@databricks.com/NotebookA",
      "language": "SQL",
      "created_at": 1670968138861,
      "modified_at": 1671133083375,
      "object_id": 37800453517611
    },
    {
      "object_type": "DIRECTORY",
      "path": "/Users/nathan.anthony@databricks.com/testDirectory",
      "object_id": 250599769035380
    },
    {
      "object_type": "NOTEBOOK",
      "path": "/Users/nathan.anthony@databricks.com/testNotebook",
      "language": "SQL",
      "created_at": 1662656912698,
      "modified_at": 1669064685778,
      "object_id": 250599769035457
    },

NateAnth
Valued Contributor
Valued Contributor

Additionally, if you want this in CSV, you could try piping the output into a utility such as jq

Example:

curl ..... | jq -r '.objects[] | [.object_type, .path, .language, .modified_at] | @csv'

Thanks @Nathan Anthony​. This works.

Anonymous
Not applicable

Hi @Naveen Kumar Madas​ 

Thank you for your question!

To assist you better, please take a moment to review the answer and let me know if it best fits your needs.

Please help us select the best solution by clicking on "Select As Best" if it does.

Your feedback will help us ensure that we are providing the best possible service to you.

Thank you!

Amit_352107
New Contributor III

Hi @Naveen Kumar Madas​ 

you can go through below code block

%sh

ls -lt /dbfs/

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.