cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

HariharaSam
by Contributor
  • 18865 Views
  • 4 replies
  • 2 kudos

Parallel Processing of Databricks Notebook

I have a scenario where I need to run same databricks notebook multiple times in parallel.What is the best approach to do this ?

  • 18865 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Hariharan Sambath​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

  • 2 kudos
3 More Replies
Edmondo
by New Contributor III
  • 6032 Views
  • 7 replies
  • 3 kudos

Resolved! Limiting parallelism when external APIs are invoked (i.e. mlflow)

We are applying a groupby operation to a pyspark.sql.Dataframe and then on each group train a single model for mlflow. We see intermittent failures because the MLFlow server replies with a 429, because of too many requests/sWhat are the best practice...

  • 6032 Views
  • 7 replies
  • 3 kudos
Latest Reply
Edmondo
New Contributor III
  • 3 kudos

To me it's already resolved through professional services. The question I do have is how useful is this community if people with the right background aren't here, and if it takes a month to get a no-answer.

  • 3 kudos
6 More Replies
narek_margaryan
by New Contributor II
  • 2756 Views
  • 1 replies
  • 3 kudos

Resolved! Do Spark nodes read data from storage in a sequence?

I'm new to Spark and trying to understand how some of its components work.I understand that once the data is loaded into the memory of separate nodes, they process partitions in parallel, within their own memory (RAM).But I'm wondering whether the in...

  • 2756 Views
  • 1 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

@Narek Margaryan​ , Normally the reading is done in parallel because the underlying file system is already distributed (if you use HDFS-based storage or something like, a data lake f.e.).The number of partitions in the file itself also matters.This l...

  • 3 kudos
Labels