Topics with Label: Parallel processing

Forum Posts

Sorted by:

by vanepet • New Contributor II

12-10-2022 7:31:36 AM

18760 Views
5 replies
2 kudos

Is it possible to use multiprocessing or threads to submit multiple queries to a database from Databricks in parallel?

We are trying to improve our overall runtime by running queries in parallel using either multiprocessing or threads. What I am seeing though is that when the function that runs this code is run on a separate process it doesnt return a dataFrame with...

Data Engineering

18760 Views
5 replies
2 kudos

12-10-2022 7:31:36 AM

View Replies

Latest Reply

BapsDBS
New Contributor II

04-08-2024 9:37:51 PM

2 kudos

Thanks for the links mentioned above. But both of them uses raw python to achieve parallelism. Does this mean Spark (read PySpark) does exactly provisions for parallel execution of functions or even notebooks ? We used a wrapper notebook with ThreadP...

2 kudos

04-08-2024 9:37:51 PM

4 More Replies

by kll • New Contributor III

04-19-2023 4:48:03 PM

18380 Views
3 replies
0 kudos

python multiprocessing and the Databricks Architecture - under the hood.

I am curious what is going on under-the-hood when using `multiprocessing` module to parallelize an function call and apply it to a Pandas DataFrame along the row axis. Specifically, how does it work with DataBricks Architecture / Compute. My cluster ...

Data Engineering

18380 Views
3 replies
0 kudos

04-19-2023 4:48:03 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-20-2023 7:23:45 PM

0 kudos

@Keval Shah :When using the multiprocessing module in Python to parallelize a function call and apply it to a Pandas DataFrame along the row axis, the following happens under the hood:The Pool object is created with the specified number of processes...

0 kudos

04-20-2023 7:23:45 PM

2 More Replies

by Sandesh87 • New Contributor III

06-13-2023 11:31:15 AM

1141 Views
1 replies
2 kudos

apply a function across multiple smaller dataframes created from one big dataframe in scala

The dataframe 'big_df' looks like the below| id| index| timestamp||:---- |:------:| -----:|| abc| 1| 11:00:00|| abc| 1| 11:00:10|| abc| 1| 11:00:20|| abc| 1| 11:00:30|| abc| 1| 11:00:40|| abc| 1| 11:00:50|| abc| 2| 11:01:00|| abc| 2| 11:01:10|| abc| ...

Data Engineering

1141 Views
1 replies
2 kudos

06-13-2023 11:31:15 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-15-2023 11:10:06 PM

2 kudos

Hi @Sandesh Puligundla Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

2 kudos

06-15-2023 11:10:06 PM

by Nandini • New Contributor II

12-05-2022 12:19:47 AM

13445 Views
10 replies
7 kudos

Pyspark: You cannot use dbutils within a spark job

I am trying to parallelise the execution of file copy in Databricks. Making use of multiple executors is one way. So, this is the piece of code that I wrote in pyspark.def parallel_copy_execution(src_path: str, target_path: str): files_in_path = db...

Data Engineering

13445 Views
10 replies
7 kudos

12-05-2022 12:19:47 AM

View Replies

Latest Reply

Etyr
Contributor

01-11-2023 2:33:17 AM

7 kudos

If you have spark session, you can use Spark hidden File System:# Get FileSystem from SparkSession fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(spark._jsc.hadoopConfiguration()) # Get Path class to convert string path to FS path path = spark._...

7 kudos

01-11-2023 2:33:17 AM

9 More Replies

by DipakBachhav • New Contributor III

10-03-2022 7:25:48 AM

15136 Views
3 replies
3 kudos

Resolved! Geting error Caused by: com.databricks.NotebookExecutionException: FAILED

I am trying to run the below notebook through databricks but getting the below error. I have tried to update the notebook timeout and the retry mechanism but still no luck yet. NotebookData("/Users/mynotebook",9900, retry=3) ] res = parallelNot...

Data Engineering

15136 Views
3 replies
3 kudos

10-03-2022 7:25:48 AM

View Replies

Latest Reply

sujai_sparks
New Contributor III

11-28-2022 10:47:09 AM

3 kudos

Hi @Dipak Bachhav, not sure if you have fixed the issue, but here are few things you can check: Is the path "/Users/mynotebook" correct? Maybe you are missing the dot in the beginning.Run the notebook using dbutils.notebook.run("/Users/mynotebook") ...

3 kudos

11-28-2022 10:47:09 AM

2 More Replies

by rubenteixeira • New Contributor III

01-09-2023 7:16:03 AM

4241 Views
2 replies
0 kudos

Can't parallelize model training with sc.parallelize, even tough I can run the same code without parallelizing

I'm training a NeuralProphet for a time series forecasting problem. I'm trying to parallelize my training, but this error is appearingThe folder lightning_logs has a hparams.yaml but it's empty. Is this related to permissions on the cluster? Thanks i...

Data Engineering

4241 Views
2 replies
0 kudos

01-09-2023 7:16:03 AM

View Replies

Latest Reply

Debayan
Databricks Employee

01-09-2023 2:07:40 PM

0 kudos

Hi,Please let us know if this was checked already:

0 kudos

01-09-2023 2:07:40 PM

1 More Replies

by HariharaSam • Contributor

09-09-2022 5:51:59 AM

22201 Views
4 replies
2 kudos

Parallel Processing of Databricks Notebook

I have a scenario where I need to run same databricks notebook multiple times in parallel.What is the best approach to do this ?

Data Engineering

22201 Views
4 replies
2 kudos

09-09-2022 5:51:59 AM

View Replies

Latest Reply

Anonymous
Not applicable

09-24-2022 1:08:54 AM

2 kudos

Hi @Hariharan Sambath Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

2 kudos

09-24-2022 1:08:54 AM

3 More Replies

by AzureDatabricks • New Contributor III

11-21-2021 11:34:20 PM

7344 Views
5 replies
1 kudos

Parallel processing of json files in databricks pyspark

How we can read files from azure blob storage and process parallel in databricks using pyspark.As of now we are reading all 10 files at a time into dataframe and flattening it.Thanks & Regards,Sujata

Data Engineering

7344 Views
5 replies
1 kudos

11-21-2021 11:34:20 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

11-22-2021 1:54:07 AM

1 kudos

spark.read.json("/mnt/dbfs/<ENTER PATH OF JSON DIR HERE>/*.jsonyou first have to mount your blob storage to databricks, I assume that is already done.https://spark.apache.org/docs/latest/sql-data-sources-json.html

1 kudos

11-22-2021 1:54:07 AM

4 More Replies