cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Hydra configuration and job parameters of DABs

jeremy98
Honored Contributor

Hello Community,

I'm trying to create a job pipeline in Databricks that runs a spark_python_task, which executes a Python script configured with Hydra. The script's configuration file defines parameters, such as id.

How can I pass this parameter at the job level in Databricks so that the task picks it up and overrides it using Hydra? And how to use the dbutils.secrets.get through this type of spark_python_task, to retrieve the keys I need?

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

You can pass and override configuration parameters for Hydra in a Databricks spark_python_task by specifying job-level parameters (as arguments) and using environment variables or Hydraโ€™s command line overrides. For accessing secrets with dbutils.secrets.get, ensure your Python script is written to call this Databricks utility directly. Hereโ€™s how to achieve both:

Passing Hydra Parameters in Databricks Job

To override Hydra config values like id from a Databricks job:

  • Use the parameters field in the spark_python_task specification.

  • Hydra interprets CLI arguments (e.g., id=foobar) and environment variables as overrides.

Example Databricks job JSON:

json
{ "tasks": [{ "task_key": "my_hydra_task", "spark_python_task": { "python_file": "dbfs:/path/to/main.py", "parameters": ["id=1234"] } }] }

This sends id=1234 to your script; Hydra will override the value if your script is structured to accept it from the CLI (main.py should invoke Hydra using @Hydra.main, which handles CLI overrides automatically).

Using dbutils.secrets.get in Your Script

Inside your Python script:

python
from pyspark.dbutils import DBUtils # In Databricks, 'dbutils' is automatically available in notebook, for jobs: dbutils = DBUtils(spark) secret_value = dbutils.secrets.get(scope="my_scope", key="my_key")

Access your secret as shown, and use it wherever needed.

Example main.py Structure

python
import hydra from omegaconf import DictConfig from pyspark.dbutils import DBUtils @Hydra.main(version_base=None, config_path="conf", config_name="config") def main(cfg: DictConfig): dbutils = DBUtils(spark) secret = dbutils.secrets.get(scope="my_scope", key="my_key") print(f"Received id: {cfg.id}") print(f"Retrieved secret: {secret}") if __name__ == "__main__": main()
  • The script will get id from the CLI/job configuration or config file (overridden if passed at job level).

  • dbutils.secrets.get retrieves secrets managed by Databricks.

Key Points

  • Use the parameters argument in Databricks jobs to override Hydra config variables.

  • Ensure your scriptโ€™s main function uses @Hydra.main, so CLI overrides work.

  • Call dbutils.secrets.get directly in the script to read secretsโ€”this works in .py files run by Databricks jobs.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now