cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Can secrets be retrieved only for the scope of an init script?

fermin_vicente
New Contributor III

Hi there, if I set any secret in an env var to be used by a cluster-scoped init script, it remains available for the users attaching any notebook to the cluster and easily extracted with a print.

There's some hint in the documentation about the secret being "not accessible from a program running in Spark" (I assume it refers to commands ran in notebooks as well) but I tried several combinations to no avail.

  • Specifying the secret path with the standard "{{secrets/scope_name/secret_name}}" works, but the secret is accessible from any notebook afterwards
  • The substitution by the actual secret value doesn't work in init script or notebook if I use a path without {{ }} or the secrets/ part. I tried because the SPARKPASSWORD documentation could be interpreted that way
  • Using an env var named 'SPARKPASSWORD' seems to behave no different to any other env var naming

I'm sure I'm missing something. Any help would be appreciated, thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

pavan_kumar
Contributor

@Fermin Vicenteโ€‹ 

Hi vicente,

you can use the below in your init script which will remove the environment variables from the spark-env.sh so that it will not be available after running the init script:

sed -i '/^TOKEN/d' /databricks/spark/conf/spark-env.sh

example:

if you have set the below environment variable in cluster spark environment variable:

TOKEN={{secrets/mlflow_model_reg/ml-token}}

in your init script use the below line at last which will remove the "TOKEN" environment variable from the spark env:

sed -i '/^TOKEN/d' /databricks/spark/conf/spark-env.sh

View solution in original post

7 REPLIES 7

Hubert-Dudek
Esteemed Contributor III

spark.password {{secrets/scope1/key1}} is spark property and than it will be available in all notebooks via spark.conf.get("spark.password") (we add in Spark config for cluster)

SPARKPASSWORD={{secrets/scope1/key1}} is environment variable (we add it in Environment variables in cluster config)

The problem is that with a standard account you have access to secrets anyway - all of them. In premium, you could make different scopes and set one of them to be accessible only to users who create.start cluster (environment variable) so then people running notebooks will have no access to that secrets.

fermin_vicente
New Contributor III

Thanks. We do have premium and we use scopes, but we want users to not be able to print the secret within the environment variable with a simple Python command

' '.join(os.environ['SPARKPASSWORD'])

pavan_kumar
Contributor

@Fermin Vicenteโ€‹  usually when any user tries to print the values from the secrets it will be redacted. can you please try to print and check if you are seeing the actual value?

fermin_vicente
New Contributor III

Hi Pavan,

if you do

print(os.environ['SPARKPASSWORD'])

the output is [REDACTED]

however, if you run the command I put in my previous reply (and it's just one of many ways to do it), you can perfectly see the contents of the secret.

Is there a way to make the env var unset after running the init script?

Thanks

pavan_kumar
Contributor

@Fermin Vicenteโ€‹ 

Hi vicente,

you can use the below in your init script which will remove the environment variables from the spark-env.sh so that it will not be available after running the init script:

sed -i '/^TOKEN/d' /databricks/spark/conf/spark-env.sh

example:

if you have set the below environment variable in cluster spark environment variable:

TOKEN={{secrets/mlflow_model_reg/ml-token}}

in your init script use the below line at last which will remove the "TOKEN" environment variable from the spark env:

sed -i '/^TOKEN/d' /databricks/spark/conf/spark-env.sh

fermin_vicente
New Contributor III

Thanks a lot Pavan, that approach works like a charm!

pavan_kumar
Contributor

@Fermin Vicenteโ€‹ 

good to know that this approach is working well. but please make sure that you use this approach at the end of your init script only

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.