<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Best practices for working with external locations where many files arrive constantly in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/best-practices-for-working-with-external-locations-where-many/m-p/64055#M6835</link>
    <description>&lt;P&gt;I have an Azure Function that receives files (not volumes) and dumps them to cloud storage. One-five files are received approx. per second. I want to create a partitioned table in Databricks to work with. How should I do this? E.g.: register the container as an external location and create a bundle that creates a table and continuously trigger on arrival of new files and adds this data into databricks? What would such code look like - or are there something else I should do. I need something that runs continuously. (It is not an option to move the logic from the Azure Function into Databricks). Should an external or managed table be created?&lt;/P&gt;&lt;P&gt;I also have a similar case, with a lot less data - so partitioning is not required. Should then a managed table, external table or a view be created? What are the pros/cones for each in this case.&lt;/P&gt;&lt;P&gt;I would be very happy if someone could provide code - especially if that code works in a continuous job in Databricks (through bundles).&lt;/P&gt;</description>
    <pubDate>Tue, 19 Mar 2024 09:34:45 GMT</pubDate>
    <dc:creator>pernilak</dc:creator>
    <dc:date>2024-03-19T09:34:45Z</dc:date>
    <item>
      <title>Best practices for working with external locations where many files arrive constantly</title>
      <link>https://community.databricks.com/t5/get-started-discussions/best-practices-for-working-with-external-locations-where-many/m-p/64055#M6835</link>
      <description>&lt;P&gt;I have an Azure Function that receives files (not volumes) and dumps them to cloud storage. One-five files are received approx. per second. I want to create a partitioned table in Databricks to work with. How should I do this? E.g.: register the container as an external location and create a bundle that creates a table and continuously trigger on arrival of new files and adds this data into databricks? What would such code look like - or are there something else I should do. I need something that runs continuously. (It is not an option to move the logic from the Azure Function into Databricks). Should an external or managed table be created?&lt;/P&gt;&lt;P&gt;I also have a similar case, with a lot less data - so partitioning is not required. Should then a managed table, external table or a view be created? What are the pros/cones for each in this case.&lt;/P&gt;&lt;P&gt;I would be very happy if someone could provide code - especially if that code works in a continuous job in Databricks (through bundles).&lt;/P&gt;</description>
      <pubDate>Tue, 19 Mar 2024 09:34:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/best-practices-for-working-with-external-locations-where-many/m-p/64055#M6835</guid>
      <dc:creator>pernilak</dc:creator>
      <dc:date>2024-03-19T09:34:45Z</dc:date>
    </item>
  </channel>
</rss>

