โ02-27-2023 02:23 AM
I would like to find the notebooks that are not required and not being used and then I can review and delete them. If there is a way to find last modified date of a notebook programmatically then I can get a list of notebooks, which I can review and delete them.
I have checked the workspace API functions and could not find how to get the last modified date of a notebook.
โ02-28-2023 08:24 AM
Your ask was how to do this programmatically, here is an example where you can use the workspace 2.0 API to get this information with the list function. See my example below:
โฏ curl --netrc --request GET \
https://workspaceURL/api/2.0/workspace/list \
--header 'Accept: application/json' \
--data '{ "path": "/Users/nathan.anthony@databricks.com/", "recursive": true }'
{
"objects": [
{
"object_type": "NOTEBOOK",
"path": "/Users/nathan.anthony@databricks.com/NotebookA",
"language": "SQL",
"created_at": 1670968138861,
"modified_at": 1671133083375,
"object_id": 37800453517611
},
{
"object_type": "DIRECTORY",
"path": "/Users/nathan.anthony@databricks.com/testDirectory",
"object_id": 250599769035380
},
{
"object_type": "NOTEBOOK",
"path": "/Users/nathan.anthony@databricks.com/testNotebook",
"language": "SQL",
"created_at": 1662656912698,
"modified_at": 1669064685778,
"object_id": 250599769035457
},
โ02-27-2023 03:34 AM
Use Python commands to display creation date and modification date
The ls command is an easy way to display basic information. If you want more detailed timestamps, you should use Python API calls.
For example, this sample code uses datetime functions to display the creation date and modified date of all listed files and directories in the /dbfs/ folder. Replace /dbfs/ with the full path to the files you want to display.
%python
%python
import os
from datetime import datetime
path = '/dbfs/'
fdpaths = [path+"/"+fd for fd in os.listdir(path)]
print(" file_path " + " create_date " + " modified_date ")
for fdpath in fdpaths:
statinfo = os.stat(fdpath)
create_date = datetime.fromtimestamp(statinfo.st_ctime)
modified_date = datetime.fromtimestamp(statinfo.st_mtime)
print(fdpath, create_date, modified_date)
Output:
file_path create_date modified_date
/dbfs//FileStore 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730
/dbfs//databricks 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730
/dbfs//databricks-datasets 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730
/dbfs//databricks-results 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730
/dbfs//dbfs 2020-06-09 21:11:24 2020-06-09 21:11:24
/dbfs//local_disk0 2020-05-20 22:32:05 2020-05-20 22:32:05
/dbfs//ml 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730
/dbfs//tmp 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730
/dbfs//user 2021-07-01 12:49:45.264730 2021-07-01 12:49:45.264730
Reference:
https://kb.databricks.com/en_US/python/display-file-timestamp-details
โ02-27-2023 10:21 PM
Thanks Tayyab for the code. This works for '/dbfs/', but not working for a notebook path. My notebooks path looks something like this "/dev/abcd/xyz/notebook'. Please let me know if I am missing something.
โ02-27-2023 11:27 PM
@Naveen Kumar Madasโ can you please show the error in jpeg or something, you can also print your path of the file before loop iteration, try to print the path, and show it, then I can edit the code.
โ02-28-2023 03:45 AM
Here is the error message ( '/dev/adhoc/' is the folder path containing notebooks):
FileNotFoundError: [Errno 2] No such file or directory: '/dev/adhoc/'
Here is the code I am using:
%python
import os
from datetime import datetime
path = '/dev/adhoc/'
# path = '/dbfs/'
fdpaths = [path+"/"+fd for fd in os.listdir(path)]
print(" file_path " + " create_date " + " modified_date ")
for fdpath in fdpaths:
statinfo = os.stat(fdpath)
create_date = datetime.fromtimestamp(statinfo.st_ctime)
modified_date = datetime.fromtimestamp(statinfo.st_mtime)
print(fdpath, create_date, modified_date)
โ02-28-2023 05:08 AM
1) have you manually checked that the notebook really exists in that folder? because here the error shows that you don't have any notebook in the folder.
2) And if the notebook is inside the folder then please comment out the loop part and print all the file name first.
3) Once you print that then it would be easily accessible.
Because after checking your error it shows that the notebook is not inside the folder.
โ03-01-2023 02:40 AM
Please see the attached picture. Under Workspace, you will see 'dev' folder, which in turn contains 3 sub folders. My notebook exists in one of the sub folder. But, 'os.listdir(''/dev/')' shows different list of directories/filenames instead of the 3 sub-folders I see in the picture. It looks like 'os.listdir' command is referring to a different 'dev' folder, but not the one under Workspace. Thanks.
โ02-27-2023 10:57 AM
Hi @Naveen Kumar Madasโ , You can use the "revision history" option on the right-side panel to view the last modified date for a notebook.
โ02-28-2023 03:01 AM
You can use the revision history and if you are using repos. You can check the different commit
โ02-28-2023 08:24 AM
Your ask was how to do this programmatically, here is an example where you can use the workspace 2.0 API to get this information with the list function. See my example below:
โฏ curl --netrc --request GET \
https://workspaceURL/api/2.0/workspace/list \
--header 'Accept: application/json' \
--data '{ "path": "/Users/nathan.anthony@databricks.com/", "recursive": true }'
{
"objects": [
{
"object_type": "NOTEBOOK",
"path": "/Users/nathan.anthony@databricks.com/NotebookA",
"language": "SQL",
"created_at": 1670968138861,
"modified_at": 1671133083375,
"object_id": 37800453517611
},
{
"object_type": "DIRECTORY",
"path": "/Users/nathan.anthony@databricks.com/testDirectory",
"object_id": 250599769035380
},
{
"object_type": "NOTEBOOK",
"path": "/Users/nathan.anthony@databricks.com/testNotebook",
"language": "SQL",
"created_at": 1662656912698,
"modified_at": 1669064685778,
"object_id": 250599769035457
},
โ02-28-2023 08:30 AM
Additionally, if you want this in CSV, you could try piping the output into a utility such as jq
Example:
curl ..... | jq -r '.objects[] | [.object_type, .path, .language, .modified_at] | @csv'
โ03-01-2023 03:10 AM
Thanks @Nathan Anthonyโ. This works.
โ03-06-2023 10:33 PM
Hi @Naveen Kumar Madasโ
Thank you for your question!
To assist you better, please take a moment to review the answer and let me know if it best fits your needs.
Please help us select the best solution by clicking on "Select As Best" if it does.
Your feedback will help us ensure that we are providing the best possible service to you.
Thank you!
โ03-23-2023 12:14 AM
Hi @Naveen Kumar Madasโ
you can go through below code block
%sh
ls -lt /dbfs/
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group