<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Recommendations for loading table from two different folder paths using Autoloader and DLT in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/recommendations-for-loading-table-from-two-different-folder/m-p/33697#M24637</link>
    <description>&lt;P&gt;Kaniz, thank you for the response.  Perhaps this can help, need to do more reading on ThreadPoolExecutor for Spark.  The other "minor" issue I did not mention is that the files in each folder have a few mutually-exclusive metadata columns that I either exclude/omit or synthesize by including with a "withColumn".  The scenario I'm trying to accommodate is the D365 Export to Data Lake which seems like it should be straight-forward but is not really.&lt;/P&gt;</description>
    <pubDate>Wed, 07 Sep 2022 20:41:29 GMT</pubDate>
    <dc:creator>bblakey</dc:creator>
    <dc:date>2022-09-07T20:41:29Z</dc:date>
    <item>
      <title>Recommendations for loading table from two different folder paths using Autoloader and DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/recommendations-for-loading-table-from-two-different-folder/m-p/33695#M24635</link>
      <description>&lt;P&gt;I have a new (bronze) table that I want to write to - the initial table load (refresh) csv file is placed in folder a, the incremental changes (inserts/updates/deletes) csv files are placed in folder b.  I've written a notebook that can load one OR the other, but not both.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My intention is that I will load the table initially (folder a), then consume data changes (from folder b) as they arrive and apply_changes to that table I've loaded from folder a.  So one target table with two source folders.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What is the recommendation for approaching this, what would be a good ingestion pattern for something like this?&lt;/P&gt;</description>
      <pubDate>Tue, 23 Aug 2022 21:19:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/recommendations-for-loading-table-from-two-different-folder/m-p/33695#M24635</guid>
      <dc:creator>bblakey</dc:creator>
      <dc:date>2022-08-23T21:19:03Z</dc:date>
    </item>
    <item>
      <title>Re: Recommendations for loading table from two different folder paths using Autoloader and DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/recommendations-for-loading-table-from-two-different-folder/m-p/33697#M24637</link>
      <description>&lt;P&gt;Kaniz, thank you for the response.  Perhaps this can help, need to do more reading on ThreadPoolExecutor for Spark.  The other "minor" issue I did not mention is that the files in each folder have a few mutually-exclusive metadata columns that I either exclude/omit or synthesize by including with a "withColumn".  The scenario I'm trying to accommodate is the D365 Export to Data Lake which seems like it should be straight-forward but is not really.&lt;/P&gt;</description>
      <pubDate>Wed, 07 Sep 2022 20:41:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/recommendations-for-loading-table-from-two-different-folder/m-p/33697#M24637</guid>
      <dc:creator>bblakey</dc:creator>
      <dc:date>2022-09-07T20:41:29Z</dc:date>
    </item>
  </channel>
</rss>

