cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

External tables in DLT pipelines

HoussemBL
New Contributor III

Hello community,

I have implemented a DLT pipeline.
In the "Destination" setting of the pipeline I have specified a unity catalog with target schema of type external referring to an S3 destination.
My DLT pipeline works well. Yet, I noticed that all streaming tables and materialized views generated from this pipeline are stored in non-readable locations.
Is it possible in DLT pipeline code to specify s3 path of the table using `@dlt.create_streaming_table`?

3 REPLIES 3

Alberto_Umana
Databricks Employee
Databricks Employee

Hello @HoussemBL,

You can use below code example:

import dlt

@dlt.create_streaming_table(
name="your_table_name",
path="s3://your-bucket/your-path/",
schema="schema-definition"
)
def your_table_function():
return ( spark.readStream
.format("your_format")
.option("your_option_key", "your_option_value")
.load("your_source_path")
)

 

When using Unity Catalog with DLT pipelines, tables are stored in the storage location specified for the target schema. If a schema storage location is not specified, tables are stored in the catalog storage location. If neither schema nor catalog storage locations are specified, tables are stored in the root storage location of the metastore. This could be why the tables are in non-readable locations if the storage paths are not explicitly defined

Hello @Alberto_Umana 

Thanks for your reply.
I have followed your proposal. However, I got the following error when launching a DLT pipeline with Unity catalog.

java.lang.IllegalArgumentException: Cannot specify an explicit path for a table when using Unity Catalog. Remove the explicit path:...

Sushil_saini
Visitor

This won't work.best approach is create dlt sink to write to delta external table. This pipeline should only be 1 step. Read table and append flow using data sink. It works fine.