Load multiple delta tables at once from Sql server

Data4 — Thu, 29 Jun 2023 20:33:03 GMT

What’s the best way to efficiently move multiple sql tables in parallel into delta tables

Re: Load multiple delta tables at once from Sql server

Tharun-Kumar — Tue, 18 Jul 2023 11:50:39 GMT

To enable parallel read and write operations, the ThreadPool functionality can be leveraged. This process involves specifying a list of tables that need to be read, creating a method for reading these tables from the JDBC source and saving them in DELTA format, and subsequently employing the ThreadPool for parallel processing.

Prepare the List of Tables for Reading: Before proceeding with parallel read and write operations, you must first assemble a list of tables to be read. The list should be defined as follows:

 iterationList = ["table1", "table2", "table3"]

In this step, a method named "transformation" will be implemented to handle the reading of tables from the JDBC source and saving them in DELTA format. The transformation method will be structured as follows:

def transformation(table): 
   print(f"Thread for: {table} starts") 
   spark.read.format("jdbc").load(table).write.saveAsTable(table) 
   print(f"Thread for: {table} completed")

To achieve parallel processing, the ThreadPool with an appropriate number of threads can be initialized. In this example, we will create a ThreadPool with 3 threads. The "map" function of the ThreadPool is then employed to execute the "transformation" method in parallel for each table in the "iterationList".

from multiprocessing.pool 
import ThreadPool 

# Initialize ThreadPool with 3 threads 
pool = ThreadPool(3) 

# Execute the transformation method in parallel for each table in iterationList 
pool.map(transformation, iterationList)

topic Re: Load multiple delta tables at once from Sql server in Data Engineering

Load multiple delta tables at once from Sql server

Re: Load multiple delta tables at once from Sql server