cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Advise on "airlocking" Databricks service

staskh
Contributor

Need advice: I'm building a data analysis service solution on top of DataBricks and need to protect it from unauthorized data leaks, specifically file downloads.
As far as I can tell, I need some sort of remote browser isolation (RBI).

  • Is this the correct technology?

  • Are there any alternatives?

  • What are the best, most reasonably priced vendors?

Thank you in advance!

Stas

1 ACCEPTED SOLUTION

Accepted Solutions

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @staskh,

Got it. You need something that makes bulk leaks harder without fighting screenshots, phones, etc.

On the Catalog Explorer download button... today, if a user has READ VOLUME on a Unity Catalog volume, Catalog Explorer is explicitly designed to let them select files and click Download. There isnโ€™t a separate UI switch to hide/disable that controls the way some Jupyterโ€‘style file browsers let you do.

The practical pattern, if you want to avoid oneโ€‘click file downloads, is..

  • Donโ€™t grant business users access to volumes at all (no READ VOLUME), and
  • Expose data only as tables/views via Unity Catalog, where you can:
    • Limit what they see (row/column security, views).
    • Control/disable result downloads in downstream tools. For example, the SQL editor has an admin control that can disable downloads entirely for the workspace.

On the earlier endpoint/channel controls point, I meant..

  • DLP/CASB: A gateway or endpoint agent that inspects traffic and either blocks or flags patterns like "user just downloaded a 3 GB CSV from Databricks" or "uploaded a large file to a personal SaaS app".
  • Rateโ€‘limiting/size limits: Use the builtโ€‘in limits (e.g., max download sizes in SQL/Genie) plus your own rules (views that aggregate or cap result sizes) so users canโ€™t casually pull fullโ€‘fidelity history in one go.
  • VDI/RDS: Put Databricks behind a virtual desktop (Citrix, VMware Horizon, Azure Virtual Desktop, Amazon WorkSpaces, etc.) and lock down that desktop (no local drives, restricted clipboard/printing). That way, even if the UI offers "Download", the data is landing in a tightly controlled environment, not directly on a personal laptop.

In terms of RBI/VDI vendors.... Databricks doesnโ€™t publish an official recommended vendor list for RBI/VDI. In practice, customers usually standardise on whatever is already blessed by their security/org stack (for example, Citrix / VMware Horizon / AVD / Amazon WorkSpaces on the VDI side, or Zscaler / Netskope / Cloudflareโ€‘style secure web gateways on the RBI/DLP side). You can sync with your security architects to validate any specific vendor choices, but we donโ€™t mandate a productโ€‘specific shortlist.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

5 REPLIES 5

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @staskh - Good question. Airlocking Databricks with RBI can easily become more complex than the actual risk youโ€™re trying to manage.

If your end users need insights, donโ€™t give them Databricks access at all. Run workloads via jobs (service principals), keep data in Unity Catalog, and expose only approved outputs via a separate app/BI layer. No workspace access means no download buttons to worry about.

If some users do need interactive access, focus on reducing blast radius, not perfect prevention. You canโ€™t fully stop exfiltration (copy/paste, screenshots, photos). Technology can only raise the bar. Policy and monitoring do the rest.

  • Use Unity Catalog with tight table/column/row permissions.
  • Prefer Databricks SQL/dashboards and control export there.
  • Lock down network and egress (Private Link/VNet, firewalls).

RBI can be a lastโ€‘mile control in very highโ€‘security environments. Still, for many deployments, you get more value (and less complexity) from strong permissions, jobโ€‘driven access patterns, and network controls rather than trying to airlock the Databricks UI itself.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

staskh
Contributor

In my situation, I am required to ensure that the end user has access to all data and capabilities within the Databricks environment while simultaneously preventing the download of large amounts of data.

The objective is to safeguard the core datasets, and screenshots and photos are both permissible and even expected.

I am certain that the "backend" airlock will be preserved by employing a virtual private cloud (VPC) with strictly limited egress points. At present, I am confronted with the task of determining an appropriate technology and solution for the frontend airlock.

 


Sincerely

STAS

 

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @staskh,

Thanks for the extra context. Iโ€™d flag that your requirement is a bit selfโ€‘contradictory...

end user has access to all data and capabilities within Databricks
while preventing download of large amounts of data

If a user can see all the data and use all capabilities (including notebooks, SQL, APIs), thereโ€™s no frontend technology... RBI included... that can guarantee they donโ€™t exfiltrate large volumes. They can always script incremental reads, copy/paste, or do exactly what you already accept (screenshots/photos).

Given youโ€™re already planning a strong backend airlock with VPC + controlled egress, Iโ€™d frame the frontend airlock requirement as "Not make exfiltration impossible, but make bulk exfiltration harder and more visible."

In practice, that usually means a mix of:

  • Databricksโ€‘side controls: strict authZ (Unity Catalog), sensible query/result size limits, and audit/monitoring for unusual volumes.
  • Endpoint/channel controls: DLP, rateโ€‘limiting, possibly VDI/RDS for the user session if you need extra assurance.

RBI can add friction to bulk download options in the browser, but because you already allow screenshots/photos, its incremental benefit is limited. Iโ€™d start by tightening permissions, limits, and monitoring, and only add RBI/VDI if your security team still feels the residual risk is too high.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

staskh
Contributor

I do not require an "NSA-level airlock." Indeed, a malicious actor could develop a script that projects scrolled data onto the screen and records it with an external device, or more effectively, they could create a series of QR code movies to address error correction. We can't prevent it unless we have a physical airlock (no electronic devices allowed, armed security guard, etc.).

I aspire to offer a practical level of data leak prevention, such as disabling the download of a file from Catalog Explorer with a single click, as illustrated below:

 

Screenshot 2026-03-08 at 23.06.17.png

 

While "Jupiter" views allow for the disabling of similar downloads, it appears that catalog explorer lacks this capability.

Can you please clarify your point on 

  • Endpoint/channel controls: DLP, rateโ€‘limiting, possibly VDI/RDS for the user session if you need extra assurance.

Additionally, I would be extremely grateful for any information regarding the RBI/VDI vendors that Dtabricks has recommended.

 

Regards,

Stas

 

 

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @staskh,

Got it. You need something that makes bulk leaks harder without fighting screenshots, phones, etc.

On the Catalog Explorer download button... today, if a user has READ VOLUME on a Unity Catalog volume, Catalog Explorer is explicitly designed to let them select files and click Download. There isnโ€™t a separate UI switch to hide/disable that controls the way some Jupyterโ€‘style file browsers let you do.

The practical pattern, if you want to avoid oneโ€‘click file downloads, is..

  • Donโ€™t grant business users access to volumes at all (no READ VOLUME), and
  • Expose data only as tables/views via Unity Catalog, where you can:
    • Limit what they see (row/column security, views).
    • Control/disable result downloads in downstream tools. For example, the SQL editor has an admin control that can disable downloads entirely for the workspace.

On the earlier endpoint/channel controls point, I meant..

  • DLP/CASB: A gateway or endpoint agent that inspects traffic and either blocks or flags patterns like "user just downloaded a 3 GB CSV from Databricks" or "uploaded a large file to a personal SaaS app".
  • Rateโ€‘limiting/size limits: Use the builtโ€‘in limits (e.g., max download sizes in SQL/Genie) plus your own rules (views that aggregate or cap result sizes) so users canโ€™t casually pull fullโ€‘fidelity history in one go.
  • VDI/RDS: Put Databricks behind a virtual desktop (Citrix, VMware Horizon, Azure Virtual Desktop, Amazon WorkSpaces, etc.) and lock down that desktop (no local drives, restricted clipboard/printing). That way, even if the UI offers "Download", the data is landing in a tightly controlled environment, not directly on a personal laptop.

In terms of RBI/VDI vendors.... Databricks doesnโ€™t publish an official recommended vendor list for RBI/VDI. In practice, customers usually standardise on whatever is already blessed by their security/org stack (for example, Citrix / VMware Horizon / AVD / Amazon WorkSpaces on the VDI side, or Zscaler / Netskope / Cloudflareโ€‘style secure web gateways on the RBI/DLP side). You can sync with your security architects to validate any specific vendor choices, but we donโ€™t mandate a productโ€‘specific shortlist.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***