โ04-19-2023 09:05 AM
Hi Community,
I have successfully run a job through the API but would need to be able to pass parameters (configuration) to the DLT workflow via the API
I have tried passing JSON in this format:
{
"full_refresh": "true",
"configuration": [
{
"config1": "config1_value",
"config2": "config2_value"
}
]
}
The API seems happy with the structure of the JSON but config1 and config2 are not being overridden
Any help greatly appreciated.
โ08-24-2023 03:45 AM
hey @labromb ,
have you seen this documentation page?
https://docs.databricks.com/api/workspace/pipelines/update
?
also for configuration, there is no need for square brackets:
"configuration": {
"property1": "string",
"property2": "string"
},
let me know if this works ๐
โ04-19-2023 04:50 PM
Hi @Brian Labromโ, To pass parameters to your Databricks job via the API, you can use the --conf option when launching a job.
You'll need to modify the notebook_task configuration in your job to pass these parameters as arguments to the notebook.
First, you can access the parameters in your notebook using dbutils.widgets.get().
Here's an example:
full_refresh = dbutils.widgets.get("full_refresh") == "true"
config1 = dbutils.widgets.get("config1")
config2 = dbutils.widgets.get("config2")
Now, when you submit the job through the API, pass the parameters in the
notebook_task section like this:
{
"name": "My Job",
"new_cluster": {
"spark_version": "x.x.x-scala2.x",
"node_type_id": "node_type",
"num_workers": 1
},
"notebook_task": {
"notebook_path": "/path/to/your/notebook",
"base_parameters": {
"full_refresh": "true",
"config1": "config1_value",
"config2": "config2_value"
}
}
}
Replace /path/to/your/notebook with the path to your laptop, and modify the spark_version, node_type_id, and num_workers according to your requirements.
If you're using Python or a different language for your API call, make sure to adjust the code accordingly. For example, in Python, you could use the requests library to submit the job like this:
import json
import requests
api_key = "your_databricks_token"
api_url = "https://your_databricks_instance/api/2.0/jobs/runs/submit"
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
job_config = {
"name": "My Job",
"new_cluster": {
"spark_version": "x.x.x-scala2.x",
"node_type_id": "node_type",
"num_workers": 1
},
"notebook_task": {
"notebook_path": "/path/to/your/notebook",
"base_parameters": {
"full_refresh": "true",
"config1": "config1_value",
"config2": "config2_value"
}
}
}
response = requests.post(api_url, headers=headers, data=json.dumps(job_config))
Remember to replace your_databricks_token, your_databricks_instance, and other placeholders with your actual values.
โ04-28-2023 01:26 AM
Hi @Kaniz Fatmaโ,
Just wondered if there was any update on this. This is quite an important aspect of how we would implement DLT pipelines so would be good to know if it can be done, or if it's coming.
Many thanks.
โ04-20-2023 01:34 AM
Hi @Kaniz Fatmaโ, thanks for the detailed reply. Looks like the response is talking about a job, not a delta live tables pipeline. Apologies if my initial question was not clear enough...
I am using the Delta Live Tables API:
Delta Live Tables API guide - Azure Databricks | Microsoft Learn
And want to refresh a DLT pipeline... I can initiate a refresh but but need to be able to override the configuration of the DLT pipeline with the values I supply.
I am using Azure Data Factory to call the API so just need to know what the JSON format needs to be in the request body so I can override the parameters
โ08-23-2023 06:57 AM
@labromb - Please let me know if you found a solution to your problem. I'm trying to do the same thing.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.