<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Efficient data retrieval process between Azure Blob storage and Azure databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/efficient-data-retrieval-process-between-azure-blob-storage-and/m-p/24809#M17255</link>
    <description>&lt;P&gt;Check out our auto loader capabilities that can automatically track and process files that need to be processed. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.microsoft.com/en-us/azure/databricks/spark/latest/structured-streaming/auto-loader" alt="https://docs.microsoft.com/en-us/azure/databricks/spark/latest/structured-streaming/auto-loader" target="_blank"&gt;Autoloader&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;There are two options: &lt;/P&gt;&lt;UL&gt;&lt;LI&gt;directory listing, which is essentially completing the same steps that you have listed above but in a slightly more efficient manner. &lt;/LI&gt;&lt;LI&gt;file notification, which creates managed resources in order to track files using a Azure Event Grid and Queue Storage services. &lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The file notification option is more scalable and is likely to better suit your needs. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 21 Jun 2021 19:31:54 GMT</pubDate>
    <dc:creator>Ryan_Chynoweth</dc:creator>
    <dc:date>2021-06-21T19:31:54Z</dc:date>
    <item>
      <title>Efficient data retrieval process between Azure Blob storage and Azure databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/efficient-data-retrieval-process-between-azure-blob-storage-and/m-p/24808#M17254</link>
      <description>&lt;P&gt;I am trying to design a stream a data analytics project using   functions --&amp;gt;  event hub --&amp;gt;  storage --&amp;gt; Azure factory --&amp;gt; databricks --&amp;gt;  SQL server.&lt;/P&gt;&lt;P&gt;What I am strugging with at the moment is the idea about how to optimize "data retrieval" to feed my ETL process on Azure Databricks.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;with this I am going to handle lots of incoming file during different period of time and  I an using. function to create event of the file as it comes and sent o blob storage then I put  the data to azure data factory and then it comes to databricks, all this process is taking ample amount of time and  creating delay in full process&lt;/P&gt;</description>
      <pubDate>Mon, 14 Jun 2021 13:26:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/efficient-data-retrieval-process-between-azure-blob-storage-and/m-p/24808#M17254</guid>
      <dc:creator>User16826994223</dc:creator>
      <dc:date>2021-06-14T13:26:52Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient data retrieval process between Azure Blob storage and Azure databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/efficient-data-retrieval-process-between-azure-blob-storage-and/m-p/24809#M17255</link>
      <description>&lt;P&gt;Check out our auto loader capabilities that can automatically track and process files that need to be processed. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.microsoft.com/en-us/azure/databricks/spark/latest/structured-streaming/auto-loader" alt="https://docs.microsoft.com/en-us/azure/databricks/spark/latest/structured-streaming/auto-loader" target="_blank"&gt;Autoloader&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;There are two options: &lt;/P&gt;&lt;UL&gt;&lt;LI&gt;directory listing, which is essentially completing the same steps that you have listed above but in a slightly more efficient manner. &lt;/LI&gt;&lt;LI&gt;file notification, which creates managed resources in order to track files using a Azure Event Grid and Queue Storage services. &lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The file notification option is more scalable and is likely to better suit your needs. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 21 Jun 2021 19:31:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/efficient-data-retrieval-process-between-azure-blob-storage-and/m-p/24809#M17255</guid>
      <dc:creator>Ryan_Chynoweth</dc:creator>
      <dc:date>2021-06-21T19:31:54Z</dc:date>
    </item>
  </channel>
</rss>

