cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

DELTA LIVE TABLE -Parallel processing

JUMAN4422
New Contributor II

how can we process multiple tables within a delta live table pipeline parallelly as table names as parameters.

3 REPLIES 3

Alberto_Umana
Databricks Employee
Databricks Employee

To process multiple tables within a Delta Live Table (DLT) pipeline in parallel using table names as parameters, you can leverage the flexibility of the DLT Python API. Hereโ€™s a step-by-step guide on how to achieve this:

 

  1. Define the Tables Dynamically:

    Use the @Dlt.table decorator to define your tables. You can create a function that takes table names as parameters and dynamically generates the required tables.1
  2. Use the dlt.read or spark.read.table Functions:

    These functions allow you to read from other tables within the same pipeline. Use the LIVE keyword to reference tables defined in the same pipeline.1
  3. Parallel Processing: While DLT manages the orchestration of tasks, you can define multiple tables in your pipeline, and DLT will handle their dependencies and execution order. Ensure that your tables are defined in a way that allows DLT to infer the dependencies correctly.

 

Hereโ€™s an example of how you can define multiple tables dynamically:

 

import dlt

from pyspark.sql.functions import col

 

# Function to create a table

def create_table(table_name):

    @Dlt.table(name=table_name)

    def table_def():

        return spark.read.table(f"source_database.{table_name}")

 

# List of table names to process

table_names = ["table1", "table2", "table3"]

 

# Create tables dynamically

for table_name in table_names:

    create_table(table_name)

if we use for loop to pass table names, it will be handled one by one, right?
if yes, can you suggest any other methods like I need to process 'n' number of tables at a time .

JUMAN4422
New Contributor II

can we run a dlt pipeline multiple time at the same time using different parameters using rest api call with asyncio.

i have created a function to start the pipeline using rest api.
when calling the function with asyncio , i am getting [409 Conflict]> error.



Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group