Databricks Community

HoussemBL · ‎01-21-2025

Hello community,

I have implemented a DLT pipeline.
In the "Destination" setting of the pipeline I have specified a unity catalog with target schema of type external referring to an S3 destination.
My DLT pipeline works well. Yet, I noticed that all streaming tables and materialized views generated from this pipeline are stored in non-readable locations.
Is it possible in DLT pipeline code to specify s3 path of the table using `@dlt.create_streaming_table`?

Alberto_Umana · ‎01-21-2025

Hello @HoussemBL,

You can use below code example:

import dlt

@dlt.create_streaming_table(
name="your_table_name",
path="s3://your-bucket/your-path/",
schema="schema-definition"
)
def your_table_function():
return ( spark.readStream
.format("your_format")
.option("your_option_key", "your_option_value")
.load("your_source_path")
)

When using Unity Catalog with DLT pipelines, tables are stored in the storage location specified for the target schema. If a schema storage location is not specified, tables are stored in the catalog storage location. If neither schema nor catalog storage locations are specified, tables are stored in the root storage location of the metastore. This could be why the tables are in non-readable locations if the storage paths are not explicitly defined

HoussemBL · ‎01-22-2025

Hello @Alberto_Umana

Thanks for your reply.
I have followed your proposal. However, I got the following error when launching a DLT pipeline with Unity catalog.

java.lang.IllegalArgumentException: Cannot specify an explicit path for a table when using Unity Catalog. Remove the explicit path:...

Sushil_saini · 4 weeks ago

This won't work.best approach is create dlt sink to write to delta external table. This pipeline should only be 1 step. Read table and append flow using data sink. It works fine.

Databricks Community

External tables in DLT pipelines

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!