You can pass and override configuration parameters for Hydra in a Databricks spark_python_task by specifying job-level parameters (as arguments) and using environment variables or Hydraโs command line overrides. For accessing secrets with dbutils.secrets.get, ensure your Python script is written to call this Databricks utility directly. Hereโs how to achieve both:
Passing Hydra Parameters in Databricks Job
To override Hydra config values like id from a Databricks job:
-
Use the parameters field in the spark_python_task specification.
-
Hydra interprets CLI arguments (e.g., id=foobar) and environment variables as overrides.
Example Databricks job JSON:
{
"tasks": [{
"task_key": "my_hydra_task",
"spark_python_task": {
"python_file": "dbfs:/path/to/main.py",
"parameters": ["id=1234"]
}
}]
}
This sends id=1234 to your script; Hydra will override the value if your script is structured to accept it from the CLI (main.py should invoke Hydra using @Hydra.main, which handles CLI overrides automatically).
Using dbutils.secrets.get in Your Script
Inside your Python script:
from pyspark.dbutils import DBUtils
# In Databricks, 'dbutils' is automatically available in notebook, for jobs:
dbutils = DBUtils(spark)
secret_value = dbutils.secrets.get(scope="my_scope", key="my_key")
Access your secret as shown, and use it wherever needed.
Example main.py Structure
import hydra
from omegaconf import DictConfig
from pyspark.dbutils import DBUtils
@Hydra.main(version_base=None, config_path="conf", config_name="config")
def main(cfg: DictConfig):
dbutils = DBUtils(spark)
secret = dbutils.secrets.get(scope="my_scope", key="my_key")
print(f"Received id: {cfg.id}")
print(f"Retrieved secret: {secret}")
if __name__ == "__main__":
main()
-
The script will get id from the CLI/job configuration or config file (overridden if passed at job level).
-
dbutils.secrets.get retrieves secrets managed by Databricks.
Key Points
-
Use the parameters argument in Databricks jobs to override Hydra config variables.
-
Ensure your scriptโs main function uses @Hydra.main, so CLI overrides work.
-
Call dbutils.secrets.get directly in the script to read secretsโthis works in .py files run by Databricks jobs.