12-22-2021 03:55 AM
In workspace one folder I have around 100+ pyspark scripts, all these scripts need to be compiled before running the main program. In order to compile all these files, we are using the %run magic command like %run ../prod/netSales. Since we have 100+ such files, we wrote 100+ magic commands like in a notebook to compile all 100+ files.
Question is, is there any way to compile all the files under one folder in ADB workspace instead of one by one? Are there any iterative methods are available to go through eah the file and compile it.
12-22-2021 06:58 AM
Problem is that you can list all files in workspace only via API call and than you can run every one of them using:
dbutils.notebook.run()
This is the script to list files from workspace (probably you need to add some filterning):
import requests
ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext()
host_name = ctx.tags().get("browserHostName").get()
host_token = ctx.apiToken().get()
response = requests.post(
f'https://{host_name}/api/2.0/workspace/list',
headers={'Authorization': f'Bearer {host_token}'},
data = {'path': '<your-path>'}
).json()
12-22-2021 06:58 AM
Problem is that you can list all files in workspace only via API call and than you can run every one of them using:
dbutils.notebook.run()
This is the script to list files from workspace (probably you need to add some filterning):
import requests
ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext()
host_name = ctx.tags().get("browserHostName").get()
host_token = ctx.apiToken().get()
response = requests.post(
f'https://{host_name}/api/2.0/workspace/list',
headers={'Authorization': f'Bearer {host_token}'},
data = {'path': '<your-path>'}
).json()
12-22-2021 11:01 PM
When I tried the above code I got the below error.
{'error_code': 'ENDPOINT_NOT_FOUND', 'message': "No API found for 'POST /workspace/list'"}
12-23-2021 05:10 AM
what distribution are you using (community, Azure)? so I will update that code as it is quite old
12-23-2021 05:19 AM
I didn't get you correctly.
I am using 9.1 LTS run time, ADB premium service, DS3_V2 instance... is it will help?
12-23-2021 05:48 AM
so Azure - just tested , had to change to get and data to json. Please manipulate with path to get data from folder which you need:
import requests
ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext()
host_name = ctx.tags().get("browserHostName").get()
host_token = ctx.apiToken().get()
response = requests.get(
f'https://{host_name}/api/2.0/workspace/list',
headers={'Authorization': f'Bearer {host_token}',
'Accept': 'application/json'},
json = {'path': '/'}
)
12-24-2021 03:00 AM
did it help?
01-21-2022 02:38 AM
Sorry for the delay, that was not completely fixed
Modified the code which you shared, and now I could able to get the list of files under the folder in the workspace.
In order to compile (not to run) the files, is there any way there? Can we use magic command %run in some loop or something?
02-12-2022 08:20 AM
@Thushar R does https://docs.databricks.com/notebooks/notebook-workflows.html notebook workflow help you on this scenario ?
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group