โ03-28-2022 07:55 AM
Hi there, if I set any secret in an env var to be used by a cluster-scoped init script, it remains available for the users attaching any notebook to the cluster and easily extracted with a print.
There's some hint in the documentation about the secret being "not accessible from a program running in Spark" (I assume it refers to commands ran in notebooks as well) but I tried several combinations to no avail.
I'm sure I'm missing something. Any help would be appreciated, thanks!
โ03-29-2022 01:32 AM
@Fermin Vicenteโ
Hi vicente,
you can use the below in your init script which will remove the environment variables from the spark-env.sh so that it will not be available after running the init script:
sed -i '/^TOKEN/d' /databricks/spark/conf/spark-env.sh
example:
if you have set the below environment variable in cluster spark environment variable:
TOKEN={{secrets/mlflow_model_reg/ml-token}}
in your init script use the below line at last which will remove the "TOKEN" environment variable from the spark env:
sed -i '/^TOKEN/d' /databricks/spark/conf/spark-env.sh
โ03-28-2022 09:07 AM
spark.password {{secrets/scope1/key1}} is spark property and than it will be available in all notebooks via spark.conf.get("spark.password") (we add in Spark config for cluster)
SPARKPASSWORD={{secrets/scope1/key1}} is environment variable (we add it in Environment variables in cluster config)
The problem is that with a standard account you have access to secrets anyway - all of them. In premium, you could make different scopes and set one of them to be accessible only to users who create.start cluster (environment variable) so then people running notebooks will have no access to that secrets.
โ03-29-2022 12:19 AM
Thanks. We do have premium and we use scopes, but we want users to not be able to print the secret within the environment variable with a simple Python command
' '.join(os.environ['SPARKPASSWORD'])
โ03-29-2022 12:37 AM
@Fermin Vicenteโ usually when any user tries to print the values from the secrets it will be redacted. can you please try to print and check if you are seeing the actual value?
โ03-29-2022 12:43 AM
Hi Pavan,
if you do
print(os.environ['SPARKPASSWORD'])
the output is [REDACTED]
however, if you run the command I put in my previous reply (and it's just one of many ways to do it), you can perfectly see the contents of the secret.
Is there a way to make the env var unset after running the init script?
Thanks
โ03-29-2022 01:32 AM
@Fermin Vicenteโ
Hi vicente,
you can use the below in your init script which will remove the environment variables from the spark-env.sh so that it will not be available after running the init script:
sed -i '/^TOKEN/d' /databricks/spark/conf/spark-env.sh
example:
if you have set the below environment variable in cluster spark environment variable:
TOKEN={{secrets/mlflow_model_reg/ml-token}}
in your init script use the below line at last which will remove the "TOKEN" environment variable from the spark env:
sed -i '/^TOKEN/d' /databricks/spark/conf/spark-env.sh
โ03-29-2022 04:11 AM
Thanks a lot Pavan, that approach works like a charm!
โ03-29-2022 04:13 AM
@Fermin Vicenteโ
good to know that this approach is working well. but please make sure that you use this approach at the end of your init script only
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group