Hi @suja
Use Databricks Workflows (Jobs) with Task Parallelism
Instead of using threads within a single notebook, leverage Databricks Jobs to define multiple tasks, each responsible for a table. Tasks can:
1. Run in parallel
2. Be modular and reusable
3. Be monitored and retried independently
Each task (or task group) would represent processing for one Hive table from Bronze โ Silver โ Gold.
Avoid Using Threads for Spark Workloads
Using Python threads for Spark workloads is not recommended, because:
Spark is already distributed.
Threads donโt provide real parallelism in Python (due to GIL)
You lose visibility, fault tolerance, and scalability.
Use Databricks Workflows with parallel tasksโeach processing one Hive table through Bronze โ Silver โ Goldโand writing to relational DB. Avoid threading and instead modularize processing via parameterized notebooks or scripts.
Spark jobs scale better via job tasks rather than threads
LR