cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How safe is Databricks workspaces with user files uploaded to workspace?

Chiran-Gajula
New Contributor II

With the growing adoption of diverse machine learning, AI, and data science models available in the market, it has become increasingly challenging to assess the safety of processing these modelsโ€”especially when considering the potential for malicious content. This concern also extends to handling various file formats such as .zip, .dbc, .py, .bin, and others that are uploaded into the Databricks workspace.
- Is there currently any mechanism in place within Databricks to track and verify the safety of models available in the environment?
- How can we ensure that uploaded files are being scanned and monitored for potential malicious activity?
- I am in the process of developing a tool aimed at scanning notebooks, models, and related artifacts for security risks.
I would greatly appreciate your insights on how we can better safeguard this system and enhance our security posture.

G.Chiranjeevi
1 ACCEPTED SOLUTION

Accepted Solutions

stbjelcevic
Databricks Employee
Databricks Employee

Hi @Chiran-Gajula ,

Thanks for raising this. There are a few complementary controls that can put in place across models, inference traffic, files, and observability.

Is there currently any mechanism in place within Databricks to track and verify the safety of models available in the environment?

  • Yes, Databricks provides governance and lineage for models via Unity Catalog (access controls, audit trails, crossโ€‘workspace discovery, signature requirements), so you can trace provenance and enforce permissions. This, combined with endpoint guardrails in Mosaic AI Gateway (safety filtering/PII detection) and system tables, supports the safety and compliance of model use in production.
  • To ensure the safety of outputs at runtime, enable AI Gateway guardrails, Inference Tables, and Lakehouse Monitoring to detect/track harmful content or PII and measure quality over time (including LLMโ€‘asโ€‘judge metrics).
  • For host integrity, ESM adds malware/integrity monitoring on classic compute and logs detections to audit/system tables for review and alerting.
  • (source 1) (source 2)

How can we ensure that uploaded files are being scanned and monitored for potential malicious activity?

  • Databricks does not automatically scan files at upload into object storage (DBFS/UC Volumes).
  • The recommended pattern is to enable cloudโ€‘native โ€œonโ€‘uploadโ€ malware scanning (for example, Microsoft Defender for Storage on ADLS or Amazon GuardDuty malware protection for S3) and stage/quarantine files before Autoloader or downstream ingestion. You can then move โ€œcleanโ€ files from a quarantine/staging path into a โ€œsafeโ€ landing path watched by Autoloader.
  • There are other protections offered through enhanced security monitoring

View solution in original post

1 REPLY 1

stbjelcevic
Databricks Employee
Databricks Employee

Hi @Chiran-Gajula ,

Thanks for raising this. There are a few complementary controls that can put in place across models, inference traffic, files, and observability.

Is there currently any mechanism in place within Databricks to track and verify the safety of models available in the environment?

  • Yes, Databricks provides governance and lineage for models via Unity Catalog (access controls, audit trails, crossโ€‘workspace discovery, signature requirements), so you can trace provenance and enforce permissions. This, combined with endpoint guardrails in Mosaic AI Gateway (safety filtering/PII detection) and system tables, supports the safety and compliance of model use in production.
  • To ensure the safety of outputs at runtime, enable AI Gateway guardrails, Inference Tables, and Lakehouse Monitoring to detect/track harmful content or PII and measure quality over time (including LLMโ€‘asโ€‘judge metrics).
  • For host integrity, ESM adds malware/integrity monitoring on classic compute and logs detections to audit/system tables for review and alerting.
  • (source 1) (source 2)

How can we ensure that uploaded files are being scanned and monitored for potential malicious activity?

  • Databricks does not automatically scan files at upload into object storage (DBFS/UC Volumes).
  • The recommended pattern is to enable cloudโ€‘native โ€œonโ€‘uploadโ€ malware scanning (for example, Microsoft Defender for Storage on ADLS or Amazon GuardDuty malware protection for S3) and stage/quarantine files before Autoloader or downstream ingestion. You can then move โ€œcleanโ€ files from a quarantine/staging path into a โ€œsafeโ€ landing path watched by Autoloader.
  • There are other protections offered through enhanced security monitoring