cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Lakehouse federation bringing data from SQL Server

NathanSundarara
Contributor

Did any one tried to bring data using the newly announced Lakehouse federation and ingest using DELTA LIVE TABLES? I'm currently testing using Materialized Views. First loaded the full data and now loading last 3 days daily and recomputing using Materialized views. At this time materialized view is doing full recompute. Some of the records may be already existing in the current materialized view , we are doing window functions to recompute and keep last record based on time stamp. Tried to do DLT using Apply changes it gives error because the data changed so looking for options.

 

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @NathanSundarara, Certainly! Let’s explore how you can work with Delta Live Tables (DLT) in the context of Lakehouse federation and materialized views.

 

Delta Live Tables (DLT):

Materialized Views and Recomputation:

  • Materialized views in DLT are powerful constructs for maintaining derived datasets. They allow you to create and keep up-to-date views based on declarative queries.
  • In your case, you’ve been loading data incrementally for the last 3 days and recomputing the materialized view. However, you’re encountering a full recompute issue.

Handling Existing Records:

  • When recomputing materialized views, DLT processes records as required to return accurate results for the current data state.
  • To address existing records, consider using window functions to recompute and keep the last record based on timestamps. This approach ensures that only necessary changes are applied.

DLT and Apply Changes:

  • The “Apply changes” operation in DLT is designed for incremental updates. However, it seems that the data changes are causing errors.
  • If you’re encountering issues with “Apply changes,” consider the following alternatives:
    • Delta Merge: Use Delta’s built-in merge operation to update existing records based on keys efficiently. This can be more efficient than full recomputation.
    • Change Data Capture (CDC): Implement CDC strategies to capture and process only the changed data. DLT supports materialized views for CDC processing.
    • Custom Logic: Write custom logic to handle incremental updates based on your use case.

Lakehouse Federation:

  • The newly announced Lakehouse Federation allows you to access external data sources from Databricks.
  • You can use Lakehouse federation to ingest data from supported sources into DLT pipelines.
  • Ensure that your DLT pipeline configuration includes the necessary settings for the Lakehouse federation.

Iterate and Monitor:

  • As you experiment with different approaches, monitor execution times, resource utilization, and data quality.
  • Adjust your pipeline based on performance and reliability requirements.

Remember that DLT provides a robust abstraction layer, but fine-tuning your pipeline often involves a combination of declarative definitions and custom logic.

Feel free to iterate and adapt your solution based on your specific data requirements.

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.