cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Data leakage risk happened when we use the Azure Databricks workspace

ccsong
New Contributor II

Context:

We are utilizing an Azure Databricks workspace for data management and model serving within our project, with delegated VNet and subnets configured specifically for this workspace. However, we are consistently observing malicious flow entries in the VNet flow logs. It appears that a background script is continuously running, sending requests to certain URLs and IP addresses. We are currently operating on the runtime version 15.4.x-cpu-ml-scala2.12, with no third-party libraries installed.

The urls are like: https://chandramoulisangabathula01.github.io & http://yasse5n.github.io/EDJOSK & https://solankisuryansh.github.io/CloneNetflix

Just screenshot one of them:

Screenshot 2024-10-16 at 18.14.45.png

The ips listed in below screenshot:Screenshot 2024-10-16 at 18.15.19.png

And the requests go out from a databricks configured aclRule called "microsoft.databricks-workspaces_useonly_databricks-worker-to-worker-outbound", the screenshot shown below:

Screenshot 2024-10-16 at 18.18.24.png

3 REPLIES 3

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @ccsong,

Greetings from Databricks! 

Looks like this requires a case to further investigate. Do you have an active support plan? 

Could you please submit a request? 

Please refer to: https://docs.databricks.com/en/resources/support.html

cuser731
New Contributor II

@ccsong have you find out the root cause for the malicious flow entries? We are experiencing similar behavior to similar URLs. Is anyone else experiencing similar behavior that can explain the malicious flows?

Alberto_Umana
Databricks Employee
Databricks Employee

Hello everyone!

We have worked with our security team, Microsoft, and other customers who have seen similar log messages.

This log message is very misleading, as it appears to state that the malicious URI was detected within your network — this would be a major concern were it the case. However, as we’ve learned when working with those other customers, that URI is just an example of a malicious URI that has previously been associated with that IP. But it wasn’t observed within your network.

Apart from by checking with Microsoft, you can validate this because the data source for this (flow logs) operate only at layer3/4 and cannot actually contain URIs. We have also seen these alerts on connections blocked at the firewall (would never be able to request a URI) and also on encrypted connections (where the tool wouldn’t be able to see the URI).

The IP address in question is for github.io, so all that is actually occurring to trigger this is any connection to github.io. In practice, we have high confidence this is a call to nvidia.github.io that is issued on some Azure Databricks systems based on Nvidia drivers.

In summary: based on conversations with Microsoft and lengthy analysis across multiple customers, this is just a very misleading log message and not an indication of any infection.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group