cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

How safe is Databricks workspaces with user files uploaded to workspace?

Chiran-Gajula
New Contributor

With the growing adoption of diverse machine learning, AI, and data science models available in the market, it has become increasingly challenging to assess the safety of processing these models—especially when considering the potential for malicious content. This concern also extends to handling various file formats such as .zip, .dbc, .py, .bin, and others that are uploaded into the Databricks workspace.
- Is there currently any mechanism in place within Databricks to track and verify the safety of models available in the environment?
- How can we ensure that uploaded files are being scanned and monitored for potential malicious activity?
- I am in the process of developing a tool aimed at scanning notebooks, models, and related artifacts for security risks.
I would greatly appreciate your insights on how we can better safeguard this system and enhance our security posture.

G.Chiranjeevi
1 REPLY 1

stbjelcevic
Databricks Employee
Databricks Employee

Hi @Chiran-Gajula ,

Thanks for raising this. There are a few complementary controls that can put in place across models, inference traffic, files, and observability.

Is there currently any mechanism in place within Databricks to track and verify the safety of models available in the environment?

  • Yes, Databricks provides governance and lineage for models via Unity Catalog (access controls, audit trails, cross‑workspace discovery, signature requirements), so you can trace provenance and enforce permissions. This, combined with endpoint guardrails in Mosaic AI Gateway (safety filtering/PII detection) and system tables, supports the safety and compliance of model use in production.
  • To ensure the safety of outputs at runtime, enable AI Gateway guardrails, Inference Tables, and Lakehouse Monitoring to detect/track harmful content or PII and measure quality over time (including LLM‑as‑judge metrics).
  • For host integrity, ESM adds malware/integrity monitoring on classic compute and logs detections to audit/system tables for review and alerting.
  • (source 1) (source 2)

How can we ensure that uploaded files are being scanned and monitored for potential malicious activity?

  • Databricks does not automatically scan files at upload into object storage (DBFS/UC Volumes).
  • The recommended pattern is to enable cloud‑native “on‑upload” malware scanning (for example, Microsoft Defender for Storage on ADLS or Amazon GuardDuty malware protection for S3) and stage/quarantine files before Autoloader or downstream ingestion. You can then move “clean” files from a quarantine/staging path into a “safe” landing path watched by Autoloader.
  • There are other protections offered through enhanced security monitoring