Execute Pyspark cells concurrently
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-29-2024 09:36 AM
Hi Team,
Hi Team,
Is it feasible to run pyspark cells concurrently in databricks notebooks? If so, kindly provide instructions on how to accomplish this. We aim to execute the intermediate steps simultaneously.
The given scenario entails the simultaneous execution of several PySpark cells based on a condition.
Regards,
Janga
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2025 07:58 PM
Hi @Phani1,
Unfortunately, there isn't a way to run cells in a notebook simultaneously. But with your use case needing the parallel execution of code, you can configure a Databricks Workflow with multiple tasks running concurrently: https://learn.microsoft.com/en-us/azure/databricks/jobs/#what-is-a-task
Best
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2025 11:08 PM
Databricks also supports executing SQL cells in parallel. While a command is running and your notebook is attached to an interactive cluster, you can run a SQL cell simultaneously with the current command. The SQL cell is executed in a new, parallel session. However, this feature is limited to SQL cells and does not apply to PySpark cells.
dbutils.notebook.run
command to run other notebooks from within a notebook. This command can be used to trigger multiple notebooks to run concurrently. However, this approach is limited by the number of concurrent notebook runs allowed by your Databricks workspace.
