cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Pipelines using dlt modules from the Unity Catalog

rt-slowth
Contributor

[Situation]
I am using AWS DMS to store mysql cdc in S3 as a parquet file.
I have implemented a streaming pipeline using the DLT module.
The target destination is Unity Catalog.


[Questions and issues].
- Where are the tables and materialized views specified in Unity Catalog stored, in DBFS or metastore?
- Can I delete a parquet that has been readStreamed even once in S3?
- Is there a way to save a DataFrame with Join and Window operations on a table read with dlt.read from a streaming Delta Live Table as a Table instead of a Materialized View?
- The output of the @Dlt.table decorator seems to be created as a Matarialized View, but is it possible to change it to a Table?

 

 

 

You can answer them one by one.

1 REPLY 1

Here's the pipeline for the delta live table I'm creating.

1. import CDC and source using AWS DMS
2. after import dlt, create streaming table with dlt.create_streaming_table (set scd = 1)
3. read the streaming table with dlt.read and perform operations such as join
4. save the result of step 3 as dlt.table

At this time, even if you specify a dlt.table decorator in step 4, it is being saved as a materialized view.
Currently, I am using Unity Catalog.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group