cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Pipelines using dlt modules from the Unity Catalog

rt-slowth
Contributor

[Situation]
I am using AWS DMS to store mysql cdc in S3 as a parquet file.
I have implemented a streaming pipeline using the DLT module.
The target destination is Unity Catalog.


[Questions and issues].
- Where are the tables and materialized views specified in Unity Catalog stored, in DBFS or metastore?
- Can I delete a parquet that has been readStreamed even once in S3?
- Is there a way to save a DataFrame with Join and Window operations on a table read with dlt.read from a streaming Delta Live Table as a Table instead of a Materialized View?
- The output of the @Dlt.table decorator seems to be created as a Matarialized View, but is it possible to change it to a Table?

 

 

 

You can answer them one by one.

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @rt-slowth

  • Where are the tables and materialized views specified in Unity Catalog stored, in DBFS or metastore?

The tables and materialized views specified in Unity Catalog are stored in the metastore of the Azure Databricks workspace. You can use the CREATE EXTERNAL LOCATION statement to create a reference to a storage location in an Azure data lake Storage Gen2 container in your Azure account. You can also use the CREATE EXTERNAL TABLE or CREATE EXTERNAL VIEW statements to create a table or view that is backed by an external location. For more information, see Create External Locations and Create External Tables.

 

  • Can I delete a parquet that has been readStreamed even once in S3?

Yes, using the spark, you can delete a parquet that has been readStreamed even once in S3.delete() method on the DataFrame returned by spark.readStream(). 

 

For more information, see Delete data from Delta Lake tables.

 

  • Is there a way to save a DataFrame with Join and Window operations on a table read with dlt.read from a streaming Delta Live Table as a Table instead of a Materialized View?

Yes, there is a way to save a DataFrame with Join and Window operations on a table read with dlt.read from a streaming Delta Live Table as a Table instead of a Materialized View. 

 

Based on the same query, you can use the @Dlt.table decorator to define both materialized views and streaming tables. 

 

For more information, see Transform data with Delta Live Tables.

 

  • The output of the @Dlt.table decorator seems to be created as a Materialized View, but is it possible to change it to a Table?

Yes, it is possible to change the output of the @Dlt.table decorator from being created as a Materialized View to being created as a Table.

 

You can use the @Dlt.view decorator instead of the @Dlt.table decorator for your function definition. 

 

For more information, see Transform data with Delta Live Tables.

 

I hope this answers your questions. If you have any other questions about AWS DMS, Delta Live Tables, or Unity Catalog, please feel free to ask me. 😊

View solution in original post

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @rt-slowth

  • Where are the tables and materialized views specified in Unity Catalog stored, in DBFS or metastore?

The tables and materialized views specified in Unity Catalog are stored in the metastore of the Azure Databricks workspace. You can use the CREATE EXTERNAL LOCATION statement to create a reference to a storage location in an Azure data lake Storage Gen2 container in your Azure account. You can also use the CREATE EXTERNAL TABLE or CREATE EXTERNAL VIEW statements to create a table or view that is backed by an external location. For more information, see Create External Locations and Create External Tables.

 

  • Can I delete a parquet that has been readStreamed even once in S3?

Yes, using the spark, you can delete a parquet that has been readStreamed even once in S3.delete() method on the DataFrame returned by spark.readStream(). 

 

For more information, see Delete data from Delta Lake tables.

 

  • Is there a way to save a DataFrame with Join and Window operations on a table read with dlt.read from a streaming Delta Live Table as a Table instead of a Materialized View?

Yes, there is a way to save a DataFrame with Join and Window operations on a table read with dlt.read from a streaming Delta Live Table as a Table instead of a Materialized View. 

 

Based on the same query, you can use the @Dlt.table decorator to define both materialized views and streaming tables. 

 

For more information, see Transform data with Delta Live Tables.

 

  • The output of the @Dlt.table decorator seems to be created as a Materialized View, but is it possible to change it to a Table?

Yes, it is possible to change the output of the @Dlt.table decorator from being created as a Materialized View to being created as a Table.

 

You can use the @Dlt.view decorator instead of the @Dlt.table decorator for your function definition. 

 

For more information, see Transform data with Delta Live Tables.

 

I hope this answers your questions. If you have any other questions about AWS DMS, Delta Live Tables, or Unity Catalog, please feel free to ask me. 😊

Here's the pipeline for the delta live table I'm creating.

1. import CDC and source using AWS DMS
2. after import dlt, create streaming table with dlt.create_streaming_table (set scd = 1)
3. read the streaming table with dlt.read and perform operations such as join
4. save the result of step 3 as dlt.table

At this time, even if you specify a dlt.table decorator in step 4, it is being saved as a materialized view.
Currently, I am using Unity Catalog.

Kaniz
Community Manager
Community Manager

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 
 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.