Databricks Community

ccsong · ‎10-16-2024

Context:

We are utilizing an Azure Databricks workspace for data management and model serving within our project, with delegated VNet and subnets configured specifically for this workspace. However, we are consistently observing malicious flow entries in the VNet flow logs. It appears that a background script is continuously running, sending requests to certain URLs and IP addresses. We are currently operating on the runtime version 15.4.x-cpu-ml-scala2.12, with no third-party libraries installed.

The urls are like: https://chandramoulisangabathula01.github.io & http://yasse5n.github.io/EDJOSK & https://solankisuryansh.github.io/CloneNetflix

Just screenshot one of them:

The ips listed in below screenshot:

And the requests go out from a databricks configured aclRule called "microsoft.databricks-workspaces_useonly_databricks-worker-to-worker-outbound", the screenshot shown below:

Alberto_Umana · 3 weeks ago

Hi @ccsong,

Greetings from Databricks!

Looks like this requires a case to further investigate. Do you have an active support plan?

Could you please submit a request?

Please refer to: https://docs.databricks.com/en/resources/support.html

cuser731 · 3 weeks ago

@ccsong have you find out the root cause for the malicious flow entries? We are experiencing similar behavior to similar URLs. Is anyone else experiencing similar behavior that can explain the malicious flows?

Alberto_Umana · 2 weeks ago

Hello everyone!

We have worked with our security team, Microsoft, and other customers who have seen similar log messages.

This log message is very misleading, as it appears to state that the malicious URI was detected within your network — this would be a major concern were it the case. However, as we’ve learned when working with those other customers, that URI is just an example of a malicious URI that has previously been associated with that IP. But it wasn’t observed within your network.

Apart from by checking with Microsoft, you can validate this because the data source for this (flow logs) operate only at layer3/4 and cannot actually contain URIs. We have also seen these alerts on connections blocked at the firewall (would never be able to request a URI) and also on encrypted connections (where the tool wouldn’t be able to see the URI).

The IP address in question is for github.io, so all that is actually occurring to trigger this is any connection to github.io. In practice, we have high confidence this is a call to nvidia.github.io that is issued on some Azure Databricks systems based on Nvidia drivers.

In summary: based on conversations with Microsoft and lengthy analysis across multiple customers, this is just a very misleading log message and not an indication of any infection.

Databricks Community

Data leakage risk happened when we use the Azure Databricks workspace

Connect with Databricks Users in Your Area

Meet the Databricks MVPs

Databricks training invests in closing the data + AI skills gap across enterprises

Insights from a global survey of 1,100 technologists and interviews with 28 CIOs

Data + AI Summit: Call for Presentations

Season's Speedings: Databricks SQL Delivers 4x Performance Boost Over Two Years