Databricks Community

fermin_vicente · ‎03-28-2022

Hi there, if I set any secret in an env var to be used by a cluster-scoped init script, it remains available for the users attaching any notebook to the cluster and easily extracted with a print.

There's some hint in the documentation about the secret being "not accessible from a program running in Spark" (I assume it refers to commands ran in notebooks as well) but I tried several combinations to no avail.

Specifying the secret path with the standard "{{secrets/scope_name/secret_name}}" works, but the secret is accessible from any notebook afterwards
The substitution by the actual secret value doesn't work in init script or notebook if I use a path without {{ }} or the secrets/ part. I tried because the SPARKPASSWORD documentation could be interpreted that way
Using an env var named 'SPARKPASSWORD' seems to behave no different to any other env var naming

I'm sure I'm missing something. Any help would be appreciated, thanks!

pavan_kumar · ‎03-29-2022

@Fermin Vicente

Hi vicente,

you can use the below in your init script which will remove the environment variables from the spark-env.sh so that it will not be available after running the init script:

sed -i '/^TOKEN/d' /databricks/spark/conf/spark-env.sh

example:

if you have set the below environment variable in cluster spark environment variable:

TOKEN={{secrets/mlflow_model_reg/ml-token}}

in your init script use the below line at last which will remove the "TOKEN" environment variable from the spark env:

sed -i '/^TOKEN/d' /databricks/spark/conf/spark-env.sh

View solution in original post

Hubert-Dudek · ‎03-28-2022

spark.password {{secrets/scope1/key1}} is spark property and than it will be available in all notebooks via spark.conf.get("spark.password") (we add in Spark config for cluster)

SPARKPASSWORD={{secrets/scope1/key1}} is environment variable (we add it in Environment variables in cluster config)

The problem is that with a standard account you have access to secrets anyway - all of them. In premium, you could make different scopes and set one of them to be accessible only to users who create.start cluster (environment variable) so then people running notebooks will have no access to that secrets.

fermin_vicente · ‎03-29-2022

Thanks. We do have premium and we use scopes, but we want users to not be able to print the secret within the environment variable with a simple Python command

' '.join(os.environ['SPARKPASSWORD'])

pavan_kumar · ‎03-29-2022

@Fermin Vicente usually when any user tries to print the values from the secrets it will be redacted. can you please try to print and check if you are seeing the actual value?

fermin_vicente · ‎03-29-2022

Hi Pavan,

if you do

print(os.environ['SPARKPASSWORD'])

the output is [REDACTED]

however, if you run the command I put in my previous reply (and it's just one of many ways to do it), you can perfectly see the contents of the secret.

Is there a way to make the env var unset after running the init script?

Thanks

pavan_kumar · ‎03-29-2022

@Fermin Vicente

Hi vicente,

you can use the below in your init script which will remove the environment variables from the spark-env.sh so that it will not be available after running the init script:

sed -i '/^TOKEN/d' /databricks/spark/conf/spark-env.sh

example:

if you have set the below environment variable in cluster spark environment variable:

TOKEN={{secrets/mlflow_model_reg/ml-token}}

in your init script use the below line at last which will remove the "TOKEN" environment variable from the spark env:

sed -i '/^TOKEN/d' /databricks/spark/conf/spark-env.sh