Databricks Community

JUMAN4422 · ‎12-11-2024

how can we process multiple tables within a delta live table pipeline parallelly as table names as parameters.

Alberto_Umana · ‎12-12-2024

To process multiple tables within a Delta Live Table (DLT) pipeline in parallel using table names as parameters, you can leverage the flexibility of the DLT Python API. Here’s a step-by-step guide on how to achieve this:

Define the Tables Dynamically:

Use the @Dlt.table decorator to define your tables. You can create a function that takes table names as parameters and dynamically generates the required tables.1
Use the dlt.read or spark.read.table Functions:

These functions allow you to read from other tables within the same pipeline. Use the LIVE keyword to reference tables defined in the same pipeline.1
Parallel Processing: While DLT manages the orchestration of tasks, you can define multiple tables in your pipeline, and DLT will handle their dependencies and execution order. Ensure that your tables are defined in a way that allows DLT to infer the dependencies correctly.

Here’s an example of how you can define multiple tables dynamically:

import dlt

from pyspark.sql.functions import col

# Function to create a table

def create_table(table_name):

@Dlt.table(name=table_name)

def table_def():

return spark.read.table(f"source_database.{table_name}")

# List of table names to process

table_names = ["table1", "table2", "table3"]

# Create tables dynamically

for table_name in table_names:

create_table(table_name)

JUMAN4422 · ‎12-15-2024

if we use for loop to pass table names, it will be handled one by one, right?
if yes, can you suggest any other methods like I need to process 'n' number of tables at a time .

JUMAN4422 · ‎12-17-2024

can we run a dlt pipeline multiple time at the same time using different parameters using rest api call with asyncio.

i have created a function to start the pipeline using rest api.
when calling the function with asyncio , i am getting [409 Conflict]> error.

ChantellevdWalt · a month ago

@Alberto_Umana where you're ingesting the list "table_names = ["table1", "table2", "table3"]", can I replace this with the row values from a DLT view?
When I've tried using the @dlt.view, I run into the error that I need to iterate within the confines of a dlt structure and if I use the rows from a @dlt.table then I run into a "table not found" error which I think is a limitation on how DLT sets up the DAG/relationships before actual processing?

Databricks Community

DELTA LIVE TABLE -Parallel processing

li.media.uploader-dialog.title

Join Us as a Local Community Builder!

Exciting Opportunity to Collaborate with Us!

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Share Your Thoughts on Databricks & Get Rewarded!

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Virtual Learning Festival: 9 April - 30 April