โ12-22-2021 03:55 AM
In workspace one folder I have around 100+ pyspark scripts, all these scripts need to be compiled before running the main program. In order to compile all these files, we are using the %run magic command like %run ../prod/netSales. Since we have 100+ such files, we wrote 100+ magic commands like in a notebook to compile all 100+ files.
Question is, is there any way to compile all the files under one folder in ADB workspace instead of one by one? Are there any iterative methods are available to go through eah the file and compile it.
โ12-22-2021 06:58 AM
Problem is that you can list all files in workspace only via API call and than you can run every one of them using:
dbutils.notebook.run()
This is the script to list files from workspace (probably you need to add some filterning):
import requests
ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext()
host_name = ctx.tags().get("browserHostName").get()
host_token = ctx.apiToken().get()
response = requests.post(
f'https://{host_name}/api/2.0/workspace/list',
headers={'Authorization': f'Bearer {host_token}'},
data = {'path': '<your-path>'}
).json()
โ12-22-2021 06:58 AM
Problem is that you can list all files in workspace only via API call and than you can run every one of them using:
dbutils.notebook.run()
This is the script to list files from workspace (probably you need to add some filterning):
import requests
ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext()
host_name = ctx.tags().get("browserHostName").get()
host_token = ctx.apiToken().get()
response = requests.post(
f'https://{host_name}/api/2.0/workspace/list',
headers={'Authorization': f'Bearer {host_token}'},
data = {'path': '<your-path>'}
).json()
โ12-22-2021 11:01 PM
When I tried the above code I got the below error.
{'error_code': 'ENDPOINT_NOT_FOUND', 'message': "No API found for 'POST /workspace/list'"}
โ12-23-2021 05:10 AM
what distribution are you using (community, Azure)? so I will update that code as it is quite old
โ12-23-2021 05:19 AM
I didn't get you correctly.
I am using 9.1 LTS run time, ADB premium service, DS3_V2 instance... is it will help?
โ12-23-2021 05:48 AM
so Azure - just tested , had to change to get and data to json. Please manipulate with path to get data from folder which you need:
import requests
ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext()
host_name = ctx.tags().get("browserHostName").get()
host_token = ctx.apiToken().get()
response = requests.get(
f'https://{host_name}/api/2.0/workspace/list',
headers={'Authorization': f'Bearer {host_token}',
'Accept': 'application/json'},
json = {'path': '/'}
)
โ12-24-2021 03:00 AM
did it help?
โ01-21-2022 02:38 AM
Sorry for the delay, that was not completely fixed
Modified the code which you shared, and now I could able to get the list of files under the folder in the workspace.
In order to compile (not to run) the files, is there any way there? Can we use magic command %run in some loop or something?
โ02-12-2022 08:20 AM
@Thushar Rโ does https://docs.databricks.com/notebooks/notebook-workflows.html notebook workflow help you on this scenario ?
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group