Best Practice in Reading Delta Files in Storage Bucket Created through Fivetran MDLS

wanglee — Mon, 26 Jan 2026 03:17:02 GMT

Hello, Databricks Community!

I'd like to ask for advice on how you approach in reading and maintaining bucket files in Delta file format created through Fivetran Managed Data Lake (MDLS) destination?

I've set-up a Fivetran Destination to a GCP Bucket where it stores the data this way. The tables only contain less than 50k rows.

Now, for me to read and add them to the pipeline that refreshes daily, I use this SQL syntax.

USE CATALOG example_catalog; USE CATALOG example_schema_1; CREATE OR REFRESH MATERIALIZED VIEW table_1 AS SELECT * FROM `delta`.`gs://example_bucket/landing/connector_1/table_1`;

I'm also aware that Fivetran has an option in their setup to update the Unity Catalog for a Databricks workspace upon syncing to the bucket. However, I think this wouldn't be applicable to me because I wanted to replicate this to 3 DBx workspaces (dev, staging, prod) via DAB and it wouldn't be cost efficient to create multiple Fivetran connectors to different Databricks workspaces. Therefore, I'm inclined into storing them in the bucket first and all 3 workspaces will reference the same bucket to read the data.

With all that said, I'd like to ask:

Is this a good practice? What improvements from this setup that can be done?
Is it possible for Databricks to create an external catalog that simply references the whole bucket? Right now, I'm creating multiple materialized views and schemas but it'll be time-consuming to set them up individually - What if I have a connector that has 50+ tables?
Would Auto Loader be better in this scenario?

Thanks in advance!

Re: Best Practice in Reading Delta Files in Storage Bucket Created through Fivetran MDLS

aleksandra_ch — Fri, 30 Jan 2026 11:38:47 GMT

Hi @wanglee ,

1. While your solution is valid, it might turn out to be not the most cost-efficient. While the tables are small (< 50k rows) and assuming they will not grow over time, more cost-effective solution is to define tables (and not materialized views) on the external location. Materialized views are good for materializing heavy aggregations on large amounts of data. However, for <50k rows it can be an overkill.

USE CATALOG example_catalog; USE CATALOG example_schema_1; CREATE OR REPLACE TABLE table_1 AS SELECT * FROM `delta`.`gs://example_bucket/landing/connector_1/table_1`;

2. Databricks won't automatically "scan" a bucket to create a catalog. However, you can automate this using a simple Python notebook. You can use dbutils.fs.ls() to list the folders in your landing zone and dynamically execute the CREATE TABLE statement for each table found. This handles the "50+ tables" problem in a few lines of code.

3. Auto loader would not be better here as the data volume is low (and assuming it wouldn't grow). Simply re-read the whole table at every run.

Hope this helps!

topic Re: Best Practice in Reading Delta Files in Storage Bucket Created through Fivetran MDLS in Data Engineering

Best Practice in Reading Delta Files in Storage Bucket Created through Fivetran MDLS

Re: Best Practice in Reading Delta Files in Storage Bucket Created through Fivetran MDLS