<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Sample Datasets URL in Azure Databricks / access sample datasets when NPIP and Firewall is enabled in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/sample-datasets-url-in-azure-databricks-access-sample-datasets/m-p/9806#M470</link>
    <description>&lt;P&gt;It might be useful to update the Azure docs to include the addresses of the sample Datasets in AWS so they can be accessed from clusters in Azure that are using a Firewall: &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/resources/supported-regions#control-plane-ip-addresses" target="test_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/resources/supported-regions#control-plane-ip-addresses&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 12 Feb 2023 23:09:35 GMT</pubDate>
    <dc:creator>ajbush</dc:creator>
    <dc:date>2023-02-12T23:09:35Z</dc:date>
    <item>
      <title>Sample Datasets URL in Azure Databricks / access sample datasets when NPIP and Firewall is enabled</title>
      <link>https://community.databricks.com/t5/machine-learning/sample-datasets-url-in-azure-databricks-access-sample-datasets/m-p/9802#M466</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have an Azure Databricks instance configured to use VNet injection with secure cluster connectivity. I have an Azure Firewall configured and controlling all traffic ingress and egress locations as per this article: &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/resources/supported-regions#--dbfs-root-blob-storage-ip-address" target="test_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/resources/supported-regions#--dbfs-root-blob-storage-ip-address&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I can access the Hive metastore, DBFS via the internal storage account etc etc, basically the cluster is up and running and I seem to have whitelisted every domain or IP for connectivity to work as per the article.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;However, the one thing I can't get running is the sample-datasets mount on DBFS. Every time I try to access the mount it times out:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Screenshot 2023-02-08 at 4.45.47 PM"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/699iB3DB2C7208E19D66/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2023-02-08 at 4.45.47 PM" alt="Screenshot 2023-02-08 at 4.45.47 PM" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm going to assume that it's because I haven't whitelisted the underlying storage location of this dataset source. When I list the mounts it doesn't give me any more detail:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;mountPoint	source	encryptionType
/databricks-datasets	databricks-datasets	
/databricks/mlflow-tracking	databricks/mlflow-tracking	
/databricks-results	databricks-results	
/databricks/mlflow-registry	databricks/mlflow-registry	
/	DatabricksRoot	&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Looking at the exception, it seems to time out on an S3 client, so I assume it's actually reading an S3 bucket in AWS somewhere:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;---------------------------------------------------------------------------
ExecutionError                            Traceback (most recent call last)
&amp;lt;command-3658692990033083&amp;gt; in &amp;lt;cell line: 1&amp;gt;()
----&amp;gt; 1 dbutils.fs.ls("/databricks-datasets")
&amp;nbsp;
/databricks/python_shell/dbruntime/dbutils.py in f_with_exception_handling(*args, **kwargs)
    360                     exc.__context__ = None
    361                     exc.__cause__ = None
--&amp;gt; 362                     raise exc
    363 
    364             return f_with_exception_handling
&amp;nbsp;
ExecutionError: An error occurred while calling o374.ls.
: java.rmi.RemoteException: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Timed out with exception after 12 attempts; nested exception is: 
	java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Timed out with exception after 12 attempts
	at com.databricks.backend.daemon.data.client.DbfsClient.send0(DbfsClient.scala:135)
	at com.databricks.backend.daemon.data.client.DbfsClient.sendIdempotent(DbfsClient.scala:69)
	at com.databricks.backend.daemon.data.client.RemoteDatabricksStsClient.getSessionTokenFor(DbfsClient.scala:311)
	at com.databricks.backend.daemon.data.client.DatabricksSessionCredentialsProvider.startSession(DatabricksSessionCredentialsProvider.scala:56)
	at com.databricks.backend.daemon.data.client.DatabricksSessionCredentialsProvider.getCredentials(DatabricksSessionCredentialsProvider.scala:46)
	at com.databricks.backend.daemon.data.client.DatabricksSessionCredentialsProvider.getCredentials(DatabricksSessionCredentialsProvider.scala:34)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1266)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:842)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:792)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5453)
	at com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:6428)
	at com.amazonaws.services.s3.AmazonS3Client.fetchRegionFromCache(AmazonS3Client.java:6401)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5438)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5400)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5394)
	at com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:971)
	at shaded.databricks.org.apache.hadoop.fs.s3a.EnforcingDatabricksS3Client.listObjectsV2(EnforcingDatabricksS3Client.scala:214)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Is there any documentation on where this storage account actually is? Can it be accessed with an Azure Firewall configured to filter traffic?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Alex&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Feb 2023 03:51:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/sample-datasets-url-in-azure-databricks-access-sample-datasets/m-p/9802#M466</guid>
      <dc:creator>ajbush</dc:creator>
      <dc:date>2023-02-08T03:51:26Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Datasets URL in Azure Databricks / access sample datasets when NPIP and Firewall is enabled</title>
      <link>https://community.databricks.com/t5/machine-learning/sample-datasets-url-in-azure-databricks-access-sample-datasets/m-p/9804#M468</link>
      <description>&lt;P&gt;Hi Debayan,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for the reply and the links.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have configured the workspace and Azure infrastructure as described in the links. All the storage and clusters were working except for the sample datasets.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I did a little digging into the firewall logs and found the following logs:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;HTTPS request from 10.1.2.4:58006 to sts.amazonaws.com:443. Action: Deny. No rule matched. Proceeding with default action
&amp;nbsp;
HTTPS request from 10.1.2.4:41590 to databricks-datasets-oregon.s3.amazonaws.com:443. Action: Deny. No rule matched. Proceeding with default action
&amp;nbsp;
HTTPS request from 10.1.2.5:59056 to databricks-datasets-oregon.s3.us-west-2.amazonaws.com:443. Action: Deny. No rule matched. Proceeding with default action&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;So it seems the Azure workspace calls to AWS to read the sample datasets (I wouldn't want to be the one paying your data egress bill!).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I added the following rule to the firewall and it works:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;      "Sample Datasets" = {
        action = "Allow"
        target_fqdns = [
          "sts.amazonaws.com",
          "databricks-datasets-oregon.s3.amazonaws.com",
          "databricks-datasets-oregon.s3.us-west-2.amazonaws.com"
        ]
        protocol = {
          type = "Https"
          port = "443"
        }
      }&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Do you know what the mapping for Azure region to AWS region is? Looking at the docs here (https://docs.databricks.com/resources/supported-regions.html#control-plane-nat-and-storage-bucket-addresses) it seems you host the datasets across multiple regions. Do I have to whitelist all of them or does the Australia East region always go to Oregon to get the datasets?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Alex&lt;/P&gt;</description>
      <pubDate>Thu, 09 Feb 2023 04:09:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/sample-datasets-url-in-azure-databricks-access-sample-datasets/m-p/9804#M468</guid>
      <dc:creator>ajbush</dc:creator>
      <dc:date>2023-02-09T04:09:23Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Datasets URL in Azure Databricks / access sample datasets when NPIP and Firewall is enabled</title>
      <link>https://community.databricks.com/t5/machine-learning/sample-datasets-url-in-azure-databricks-access-sample-datasets/m-p/9806#M470</link>
      <description>&lt;P&gt;It might be useful to update the Azure docs to include the addresses of the sample Datasets in AWS so they can be accessed from clusters in Azure that are using a Firewall: &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/resources/supported-regions#control-plane-ip-addresses" target="test_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/resources/supported-regions#control-plane-ip-addresses&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 12 Feb 2023 23:09:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/sample-datasets-url-in-azure-databricks-access-sample-datasets/m-p/9806#M470</guid>
      <dc:creator>ajbush</dc:creator>
      <dc:date>2023-02-12T23:09:35Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Datasets URL in Azure Databricks / access sample datasets when NPIP and Firewall is enabled</title>
      <link>https://community.databricks.com/t5/machine-learning/sample-datasets-url-in-azure-databricks-access-sample-datasets/m-p/9807#M471</link>
      <description>&lt;P&gt;Hi @Alex Bush​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope everything is going great.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Cheers!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 09 Apr 2023 06:53:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/sample-datasets-url-in-azure-databricks-access-sample-datasets/m-p/9807#M471</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-04-09T06:53:11Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Datasets URL in Azure Databricks / access sample datasets when NPIP and Firewall is enabled</title>
      <link>https://community.databricks.com/t5/machine-learning/sample-datasets-url-in-azure-databricks-access-sample-datasets/m-p/9803#M467</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Could you please reverify if you have followed the main steps i.e. (&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/vnet-inject" alt="https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/vnet-inject" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/vnet-inject&lt;/A&gt;)&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Connect Azure Databricks to other Azure services (such as Azure Storage) in a more secure manner using&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-service-endpoints-overview" alt="https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-service-endpoints-overview" target="_blank"&gt;service endpoints&lt;/A&gt;&amp;nbsp;or&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/storage/common/storage-private-endpoints" alt="https://learn.microsoft.com/en-us/azure/storage/common/storage-private-endpoints" target="_blank"&gt;private endpoints&lt;/A&gt;.&lt;/LI&gt;&lt;LI&gt;Connect to&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/on-prem-network" alt="https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/on-prem-network" target="_blank"&gt;on-premises data sources&lt;/A&gt;&amp;nbsp;for use with Azure Databricks, taking advantage of&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/udr" alt="https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/udr" target="_blank"&gt;user-defined routes&lt;/A&gt;.&lt;/LI&gt;&lt;LI&gt;Connect Azure Databricks to a&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/on-prem-network#route-via-firewall" alt="https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/on-prem-network#route-via-firewall" target="_blank"&gt;network virtual appliance&lt;/A&gt;&amp;nbsp;to inspect all outbound traffic and take actions according to allow and deny rules, by using&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/udr" alt="https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/udr" target="_blank"&gt;user-defined routes&lt;/A&gt;.&lt;/LI&gt;&lt;LI&gt;Configure Azure Databricks to use&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/on-prem-network#vnet-custom-dns" alt="https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/on-prem-network#vnet-custom-dns" target="_blank"&gt;custom DNS&lt;/A&gt;.&lt;/LI&gt;&lt;LI&gt;Configure&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/virtual-network/manage-network-security-group" alt="https://learn.microsoft.com/en-us/azure/virtual-network/manage-network-security-group" target="_blank"&gt;network security group (NSG) rules&lt;/A&gt;&amp;nbsp;to specify egress traffic restrictions.&lt;/LI&gt;&lt;LI&gt;Deploy Azure Databricks clusters in your existing VNet.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Also, the UDR related steps, &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/udr" alt="https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/udr" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/udr&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Also, mainly, for metastore, artifact blob storages and etc you have to allow the below DNS list as per the region: &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/resources/supported-regions#--metastore-artifact-blob-storage-log-blob-storage-and-event-hub-endpoint-ip-addresses" alt="https://learn.microsoft.com/en-us/azure/databricks/resources/supported-regions#--metastore-artifact-blob-storage-log-blob-storage-and-event-hub-endpoint-ip-addresses" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/resources/supported-regions#--metastore-artifact-blob-storage-log-blob-storage-and-event-hub-endpoint-ip-addresses&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please let us know if this helps. &lt;/P&gt;</description>
      <pubDate>Wed, 08 Feb 2023 05:11:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/sample-datasets-url-in-azure-databricks-access-sample-datasets/m-p/9803#M467</guid>
      <dc:creator>Debayan</dc:creator>
      <dc:date>2023-02-08T05:11:16Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Datasets URL in Azure Databricks / access sample datasets when NPIP and Firewall is enabled</title>
      <link>https://community.databricks.com/t5/machine-learning/sample-datasets-url-in-azure-databricks-access-sample-datasets/m-p/9805#M469</link>
      <description>&lt;P&gt;Hi, you can refer to &lt;A href="https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#subnet-level-network-acls" alt="https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#subnet-level-network-acls" target="_blank"&gt;https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#subnet-level-network-acls&lt;/A&gt;, let me know if this helps. &lt;/P&gt;</description>
      <pubDate>Thu, 09 Feb 2023 17:32:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/sample-datasets-url-in-azure-databricks-access-sample-datasets/m-p/9805#M469</guid>
      <dc:creator>Debayan</dc:creator>
      <dc:date>2023-02-09T17:32:44Z</dc:date>
    </item>
  </channel>
</rss>

