<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Best Practice in Reading Delta Files in Storage Bucket Created through Fivetran MDLS in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/best-practice-in-reading-delta-files-in-storage-bucket-created/m-p/146027#M52601</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/197523"&gt;@wanglee&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;1. While your solution is valid, it might turn out to be not the most cost-efficient. While the tables are small (&amp;lt; 50k rows) and assuming they will not grow over time, more cost-effective solution is to define tables (and not materialized views) on the external location. Materialized views are good for materializing heavy aggregations on large amounts of data. However, for &amp;lt;50k rows it can be an overkill.&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;USE CATALOG example_catalog;
USE CATALOG example_schema_1;

CREATE OR REPLACE TABLE table_1 AS
SELECT * FROM `delta`.`gs://example_bucket/landing/connector_1/table_1`;&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;2. Databricks won't automatically "scan" a bucket to create a catalog. However, you can automate this using a simple Python notebook. &lt;SPAN class="citation-2"&gt;You can use &lt;/SPAN&gt;&lt;CODE data-path-to-node="19" data-index-in-node="160"&gt;&lt;SPAN class="citation-2"&gt;dbutils.fs.ls()&lt;/SPAN&gt;&lt;/CODE&gt;&lt;SPAN class="citation-2"&gt; to list the folders in your landing zone and dynamically execute the &lt;/SPAN&gt;&lt;CODE data-path-to-node="19" data-index-in-node="245"&gt;&lt;SPAN class="citation-2"&gt;CREATE TABLE&lt;/SPAN&gt;&lt;/CODE&gt;&lt;SPAN class="citation-2 citation-end-2"&gt; statement for each table found.&lt;/SPAN&gt; This handles the "50+ tables" problem in a few lines of code.&lt;/P&gt;
&lt;P&gt;3. Auto loader would not be better here as the data volume is low (and assuming it wouldn't grow). Simply re-read the whole table at every run.&lt;/P&gt;
&lt;P&gt;Hope this helps!&lt;/P&gt;</description>
    <pubDate>Fri, 30 Jan 2026 11:38:47 GMT</pubDate>
    <dc:creator>aleksandra_ch</dc:creator>
    <dc:date>2026-01-30T11:38:47Z</dc:date>
    <item>
      <title>Best Practice in Reading Delta Files in Storage Bucket Created through Fivetran MDLS</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practice-in-reading-delta-files-in-storage-bucket-created/m-p/145193#M52472</link>
      <description>&lt;P&gt;Hello, Databricks Community!&lt;/P&gt;&lt;P&gt;I'd like to ask for advice on how you approach in reading and maintaining bucket files in Delta file format created through Fivetran Managed Data Lake (MDLS) destination?&lt;/P&gt;&lt;P&gt;I've set-up a Fivetran Destination to a GCP Bucket where it stores the data this way. The tables only contain less than 50k rows.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;example_bucket/landing/
+-- connector_1/
|   +-- table_1/
|   |   +-- _delta_log/
|   |   +-- data/
|   |   +-- metadata/
|   +-- table_2/
|   |   +-- _delta_log/
|   |   +-- data/
|   |   +-- metadata/
+-- connector_2/
|   +-- ...&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;Now, for me to read and add them to the pipeline that refreshes daily, I use this SQL syntax.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;USE CATALOG example_catalog;
USE CATALOG example_schema_1;

CREATE OR REFRESH MATERIALIZED VIEW table_1 AS
SELECT * FROM `delta`.`gs://example_bucket/landing/connector_1/table_1`;&lt;/LI-CODE&gt;&lt;P&gt;I'm also aware that Fivetran has an option in their setup to update the Unity Catalog for a Databricks workspace upon syncing to the bucket. However, I think this wouldn't be applicable to me because I wanted to replicate this to 3 DBx workspaces (dev, staging, prod) via DAB and it wouldn't be cost efficient to create multiple Fivetran connectors to different Databricks workspaces. Therefore, I'm inclined into storing them in the bucket first and all 3 workspaces will reference the same bucket to read the data.&lt;/P&gt;&lt;P&gt;With all that said, I'd like to ask:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Is this a good practice? What improvements from this setup that can be done?&lt;/LI&gt;&lt;LI&gt;Is it possible for Databricks to create an external catalog that simply references the whole bucket? Right now, I'm creating multiple materialized views and schemas but it'll be time-consuming to set them up individually - What if I have a connector that has 50+ tables?&lt;/LI&gt;&lt;LI&gt;Would Auto Loader&amp;nbsp;be better in this scenario?&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;</description>
      <pubDate>Mon, 26 Jan 2026 03:17:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practice-in-reading-delta-files-in-storage-bucket-created/m-p/145193#M52472</guid>
      <dc:creator>wanglee</dc:creator>
      <dc:date>2026-01-26T03:17:02Z</dc:date>
    </item>
    <item>
      <title>Re: Best Practice in Reading Delta Files in Storage Bucket Created through Fivetran MDLS</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practice-in-reading-delta-files-in-storage-bucket-created/m-p/146027#M52601</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/197523"&gt;@wanglee&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;1. While your solution is valid, it might turn out to be not the most cost-efficient. While the tables are small (&amp;lt; 50k rows) and assuming they will not grow over time, more cost-effective solution is to define tables (and not materialized views) on the external location. Materialized views are good for materializing heavy aggregations on large amounts of data. However, for &amp;lt;50k rows it can be an overkill.&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;USE CATALOG example_catalog;
USE CATALOG example_schema_1;

CREATE OR REPLACE TABLE table_1 AS
SELECT * FROM `delta`.`gs://example_bucket/landing/connector_1/table_1`;&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;2. Databricks won't automatically "scan" a bucket to create a catalog. However, you can automate this using a simple Python notebook. &lt;SPAN class="citation-2"&gt;You can use &lt;/SPAN&gt;&lt;CODE data-path-to-node="19" data-index-in-node="160"&gt;&lt;SPAN class="citation-2"&gt;dbutils.fs.ls()&lt;/SPAN&gt;&lt;/CODE&gt;&lt;SPAN class="citation-2"&gt; to list the folders in your landing zone and dynamically execute the &lt;/SPAN&gt;&lt;CODE data-path-to-node="19" data-index-in-node="245"&gt;&lt;SPAN class="citation-2"&gt;CREATE TABLE&lt;/SPAN&gt;&lt;/CODE&gt;&lt;SPAN class="citation-2 citation-end-2"&gt; statement for each table found.&lt;/SPAN&gt; This handles the "50+ tables" problem in a few lines of code.&lt;/P&gt;
&lt;P&gt;3. Auto loader would not be better here as the data volume is low (and assuming it wouldn't grow). Simply re-read the whole table at every run.&lt;/P&gt;
&lt;P&gt;Hope this helps!&lt;/P&gt;</description>
      <pubDate>Fri, 30 Jan 2026 11:38:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practice-in-reading-delta-files-in-storage-bucket-created/m-p/146027#M52601</guid>
      <dc:creator>aleksandra_ch</dc:creator>
      <dc:date>2026-01-30T11:38:47Z</dc:date>
    </item>
  </channel>
</rss>

