Databricks Community

aliacovella · ‎01-03-2025

I'm trying to create a pipeline containing a view from a federated source. In this case, I'd like to just create materialized views from the federation and and schedule the pipeline for execution. If I define a pipeline with only something like the following:

@dlt.view(name="users_view")

def users_view():

return spark.read.table("some_catalog.public.users")

I get the following error:

com.databricks.pipelines.execution.core.ExecutionFailedException: [DLT ERROR CODE: NO_TABLES_IN_PIPELINE] No tables are defined by the libraries of this pipeline. This error usually occurs when flows defined without defining the table they target, or when all top-level definitions are views.

Is it not possible to create a pipeline with just views or is there some other way I should be doing this?

Alberto_Umana · ‎01-04-2025

Hi @aliacovella,

That is correct:

CDC with Delta Live Tables: The apply_changes API is designed to simplify CDC with Delta Live Tables by processing changes from a change data feed (CDF). This API requires a streaming table to function.
Streaming Table Requirement: Since streaming tables are not supported from JDBC sources, you cannot directly use a Postgres database as a streaming source for the apply_changes API.
Alternative Streaming Sources: To implement CDC, you would need to use a streaming source that is supported by Databricks, such as Amazon Kinesis. This would involve setting up a Kinesis stream to capture changes from your Postgres database and then using this stream as the source for your Delta Live Tables pipeline.
Configuration and Metrics: When using Kinesis, you can configure various options such as maxFetchDuration and minFetchPeriod to optimize the streaming query performance. Additionally, Kinesis provides metrics like avgMsBehindLatest, maxMsBehindLatest, and minMsBehindLatest to monitor the streaming query's progress.

View solution in original post

Alberto_Umana · ‎01-05-2025

No problem, if you have any other questions let me know!

View solution in original post

Alberto_Umana · ‎01-03-2025

Hi @aliacovella,

The error message you received indicates that no tables are defined by the libraries of the pipeline, which typically occurs when all top-level definitions are views.

In Delta Live Tables, a pipeline must include at least one table definition. Views alone are not sufficient to define a pipeline. This is because views in DLT are meant to be derived from tables, and the pipeline needs at least one table to anchor the transformations.

To resolve this issue, you can define a table in your pipeline alongside your views. Here is an example of how you can modify your pipeline to include a table definition:

import dlt

from pyspark.sql.functions import *

# Define a table

@dlt.table

def users_table():

return spark.read.table("some_catalog.public.users")

# Define a view based on the table

@dlt.view

def users_view():

return dlt.read("users_table")

In this example, users_table is defined as a table, and users_view is a view that references the users_table. This ensures that your pipeline has at least one table definition, which should resolve the error you encountered.

aliacovella · ‎01-04-2025

Thanks, that makes sense. So if I create a table against a federated data source, when changes occur in the source table, does it automatically handle the change data capture or does it perform a table scan on the federated source to determine changes and update the DLT?

aliacovella · ‎01-04-2025

From what I can see, it looks like it queries all the records again. I tried looking into the apply changes API, but that seems to require a streaming table, and it appears that streaming tables are not supported from jdbc sources and in this case it happens to be a Postgres database. Is the only choice to support CDC to stream from something like kinesis in this case?

Alberto_Umana · ‎01-04-2025

Hi @aliacovella,

That is correct:

CDC with Delta Live Tables: The apply_changes API is designed to simplify CDC with Delta Live Tables by processing changes from a change data feed (CDF). This API requires a streaming table to function.
Streaming Table Requirement: Since streaming tables are not supported from JDBC sources, you cannot directly use a Postgres database as a streaming source for the apply_changes API.
Alternative Streaming Sources: To implement CDC, you would need to use a streaming source that is supported by Databricks, such as Amazon Kinesis. This would involve setting up a Kinesis stream to capture changes from your Postgres database and then using this stream as the source for your Delta Live Tables pipeline.
Configuration and Metrics: When using Kinesis, you can configure various options such as maxFetchDuration and minFetchPeriod to optimize the streaming query performance. Additionally, Kinesis provides metrics like avgMsBehindLatest, maxMsBehindLatest, and minMsBehindLatest to monitor the streaming query's progress.

Brahmareddy · ‎01-04-2025

Informative.

aliacovella · ‎01-05-2025

Thanks, I originally tried the Kinesis route, and it worked well but was hoping there was a simpler solution. I'll go with that then.

Alberto_Umana · ‎01-05-2025

No problem, if you have any other questions let me know!

aliacovella · ‎01-06-2025

Thanks, I do have one more question. In the following scenario:

A new migration via AWS database migration service has been configured with the CDC going to Kinesis
A new DLT has been configured to read from the Kinesis stream
An initial load of potentially millions of records needs to occur

Is the recommended approach to just allow the DLT process the entire stream, or is there a more efficient approach that I should consider.

Databricks Community

DLT Pipleline with only views

Join Us as a Local Community Builder!

🚀 Weekly Delta (1 - 7 October): A Look Back at This Week’s Top Community Highlights!

🌟 Community Sparks of the Week | September 26 – October 2 🌟

Solution Accelerator Series | #4 - Toxicity Detection for Gaming

Level Up with Databricks Specialist Sessions

Announcing Data Intelligence for Cybersecurity