cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Exploring parallelism for multiple tables

suja
New Contributor

I am new to databricks. The app we need to build reads from hive tables, go thru bronze, silver and gold layers and store in relational db tables.  There are multiple hive tables with no dependencies. What is the best way to achieve parallelism. Do we use threads thru each layers to process the multiple tables or run as separate tasks in jobs or any other suggestions. What would be the efficient way of implementation. Thanks

1 REPLY 1

LRALVA
Honored Contributor

Hi @suja 

Use Databricks Workflows (Jobs) with Task Parallelism
Instead of using threads within a single notebook, leverage Databricks Jobs to define multiple tasks, each responsible for a table. Tasks can:
                     1. Run in parallel
                     2. Be modular and reusable
                     3. Be monitored and retried independently
Each task (or task group) would represent processing for one Hive table from Bronze โ†’ Silver โ†’ Gold.

Avoid Using Threads for Spark Workloads
Using Python threads for Spark workloads is not recommended, because:
Spark is already distributed.
Threads donโ€™t provide real parallelism in Python (due to GIL)
You lose visibility, fault tolerance, and scalability.

Use Databricks Workflows with parallel tasksโ€”each processing one Hive table through Bronze โ†’ Silver โ†’ Goldโ€”and writing to relational DB. Avoid threading and instead modularize processing via parameterized notebooks or scripts.
Spark jobs scale better via job tasks rather than threads

LR

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now