Databricks Community

Chiran-Gajula · a week ago

With the growing adoption of diverse machine learning, AI, and data science models available in the market, it has become increasingly challenging to assess the safety of processing these models—especially when considering the potential for malicious content. This concern also extends to handling various file formats such as .zip, .dbc, .py, .bin, and others that are uploaded into the Databricks workspace.
- Is there currently any mechanism in place within Databricks to track and verify the safety of models available in the environment?
- How can we ensure that uploaded files are being scanned and monitored for potential malicious activity?
- I am in the process of developing a tool aimed at scanning notebooks, models, and related artifacts for security risks.
I would greatly appreciate your insights on how we can better safeguard this system and enhance our security posture.

G.Chiranjeevi

stbjelcevic · a week ago

Hi @Chiran-Gajula ,

Thanks for raising this. There are a few complementary controls that can put in place across models, inference traffic, files, and observability.

Is there currently any mechanism in place within Databricks to track and verify the safety of models available in the environment?

Yes, Databricks provides governance and lineage for models via Unity Catalog (access controls, audit trails, cross‑workspace discovery, signature requirements), so you can trace provenance and enforce permissions. This, combined with endpoint guardrails in Mosaic AI Gateway (safety filtering/PII detection) and system tables, supports the safety and compliance of model use in production.
To ensure the safety of outputs at runtime, enable AI Gateway guardrails, Inference Tables, and Lakehouse Monitoring to detect/track harmful content or PII and measure quality over time (including LLM‑as‑judge metrics).
For host integrity, ESM adds malware/integrity monitoring on classic compute and logs detections to audit/system tables for review and alerting.
(source 1) (source 2)

How can we ensure that uploaded files are being scanned and monitored for potential malicious activity?

Databricks does not automatically scan files at upload into object storage (DBFS/UC Volumes).
The recommended pattern is to enable cloud‑native “on‑upload” malware scanning (for example, Microsoft Defender for Storage on ADLS or Amazon GuardDuty malware protection for S3) and stage/quarantine files before Autoloader or downstream ingestion. You can then move “clean” files from a quarantine/staging path into a “safe” landing path watched by Autoloader.
There are other protections offered through enhanced security monitoring

View solution in original post

stbjelcevic · a week ago

Hi @Chiran-Gajula ,

Thanks for raising this. There are a few complementary controls that can put in place across models, inference traffic, files, and observability.

Is there currently any mechanism in place within Databricks to track and verify the safety of models available in the environment?

Yes, Databricks provides governance and lineage for models via Unity Catalog (access controls, audit trails, cross‑workspace discovery, signature requirements), so you can trace provenance and enforce permissions. This, combined with endpoint guardrails in Mosaic AI Gateway (safety filtering/PII detection) and system tables, supports the safety and compliance of model use in production.
To ensure the safety of outputs at runtime, enable AI Gateway guardrails, Inference Tables, and Lakehouse Monitoring to detect/track harmful content or PII and measure quality over time (including LLM‑as‑judge metrics).
For host integrity, ESM adds malware/integrity monitoring on classic compute and logs detections to audit/system tables for review and alerting.
(source 1) (source 2)

How can we ensure that uploaded files are being scanned and monitored for potential malicious activity?

Databricks does not automatically scan files at upload into object storage (DBFS/UC Volumes).
The recommended pattern is to enable cloud‑native “on‑upload” malware scanning (for example, Microsoft Defender for Storage on ADLS or Amazon GuardDuty malware protection for S3) and stage/quarantine files before Autoloader or downstream ingestion. You can then move “clean” files from a quarantine/staging path into a “safe” landing path watched by Autoloader.
There are other protections offered through enhanced security monitoring

Databricks Community

How safe is Databricks workspaces with user files uploaded to workspace?

Join Us as a Local Community Builder!

🚀 Announcing the Databricks Data Intelligence Platform Cheat Sheet

Find Sensitive Data at Scale with Data Classification in Unity Catalog

Solution Accelerator Series | #6 - Adverse Drug Event Detection

Announcing Backfill Runs in Lakeflow Jobs for Higher Quality Downstream Data

🚀 New: Databricks Interactive Architecture Design Workshops