- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-12-2021 02:31 PM
Unfortunately it's basically not possible to solve this problem when you're using the secrets API. Because Databricks is a general purpose compute platform, we could attempt to make this slightly more difficult but we couldn't actually solve it. For example, if we matched whitespace between characters, then people could further obfuscate it (such as base64 encode the secret and then printing that).
Even if we went to great lengths to build a custom class that couldn't be printed, in order to use that secret we would have to allow it to be converted to a string (without protection) because all functions that actually use the secret would need to use a string. For example, in order to use an API Key in a python request, the requests module needs to have a plain string in order to pass it out.
Basically, given the general purpose compute platform, the conflict between usability and security is such that the secrets API can't be usable if the passwords aren't discoverable (and in practice, it is very difficult to create methods that are not trivial to bypass).
That said, the best path forward is to avoid relying on the secrets API when we can. This can't be applied for every use case, but for specific traffic flows where Databricks can own the entire process (such as accessing a specific ADLS storage account or S3 / GCS bucket), or where we can implement a native integration with roles provided by the cloud provider so that no explicit credentials are required (such as with AWS IAM Instance Profiles). Some of those abilities are already in place, and Databricks is working to deliver more capabilities in this area in the near future.