- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ03-28-2022 07:55 AM
Hi there, if I set any secret in an env var to be used by a cluster-scoped init script, it remains available for the users attaching any notebook to the cluster and easily extracted with a print.
There's some hint in the documentation about the secret being "not accessible from a program running in Spark" (I assume it refers to commands ran in notebooks as well) but I tried several combinations to no avail.
- Specifying the secret path with the standard "{{secrets/scope_name/secret_name}}" works, but the secret is accessible from any notebook afterwards
- The substitution by the actual secret value doesn't work in init script or notebook if I use a path without {{ }} or the secrets/ part. I tried because the SPARKPASSWORD documentation could be interpreted that way
- Using an env var named 'SPARKPASSWORD' seems to behave no different to any other env var naming
I'm sure I'm missing something. Any help would be appreciated, thanks!
- Labels:
-
Cluster
-
Init
-
Init script
-
Init Scripts
-
Secret
-
Secrets
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ03-29-2022 01:32 AM
@Fermin Vicenteโ
Hi vicente,
you can use the below in your init script which will remove the environment variables from the spark-env.sh so that it will not be available after running the init script:
sed -i '/^TOKEN/d' /databricks/spark/conf/spark-env.sh
example:
if you have set the below environment variable in cluster spark environment variable:
TOKEN={{secrets/mlflow_model_reg/ml-token}}
in your init script use the below line at last which will remove the "TOKEN" environment variable from the spark env:
sed -i '/^TOKEN/d' /databricks/spark/conf/spark-env.sh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ03-28-2022 09:07 AM
spark.password {{secrets/scope1/key1}} is spark property and than it will be available in all notebooks via spark.conf.get("spark.password") (we add in Spark config for cluster)
SPARKPASSWORD={{secrets/scope1/key1}} is environment variable (we add it in Environment variables in cluster config)
The problem is that with a standard account you have access to secrets anyway - all of them. In premium, you could make different scopes and set one of them to be accessible only to users who create.start cluster (environment variable) so then people running notebooks will have no access to that secrets.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ03-29-2022 12:19 AM
Thanks. We do have premium and we use scopes, but we want users to not be able to print the secret within the environment variable with a simple Python command
' '.join(os.environ['SPARKPASSWORD'])
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ03-29-2022 12:37 AM
@Fermin Vicenteโ usually when any user tries to print the values from the secrets it will be redacted. can you please try to print and check if you are seeing the actual value?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ03-29-2022 12:43 AM
Hi Pavan,
if you do
print(os.environ['SPARKPASSWORD'])
the output is [REDACTED]
however, if you run the command I put in my previous reply (and it's just one of many ways to do it), you can perfectly see the contents of the secret.
Is there a way to make the env var unset after running the init script?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ03-29-2022 01:32 AM
@Fermin Vicenteโ
Hi vicente,
you can use the below in your init script which will remove the environment variables from the spark-env.sh so that it will not be available after running the init script:
sed -i '/^TOKEN/d' /databricks/spark/conf/spark-env.sh
example:
if you have set the below environment variable in cluster spark environment variable:
TOKEN={{secrets/mlflow_model_reg/ml-token}}
in your init script use the below line at last which will remove the "TOKEN" environment variable from the spark env:
sed -i '/^TOKEN/d' /databricks/spark/conf/spark-env.sh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ03-29-2022 04:11 AM
Thanks a lot Pavan, that approach works like a charm!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ03-29-2022 04:13 AM
@Fermin Vicenteโ
good to know that this approach is working well. but please make sure that you use this approach at the end of your init script only