cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16835756816
by Valued Contributor
  • 2939 Views
  • 3 replies
  • 1 kudos

How can I optimize my data pipeline?

Delta Lake provides optimizations that can help you accelerate your data lake operations. Here’s how you can improve query speed by optimizing the layout of data in storage.There are two ways you can optimize your data pipeline: 1) Notebook Optimizat...

  • 2939 Views
  • 3 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

some tips from me:Look for data skews; some partitions can be huge, some small because of incorrect partitioning. You can use Spark UI to do that but also debug your code a bit (get getNumPartitions()), especially SQL can divide it unequally to parti...

  • 1 kudos
2 More Replies
JordanYaker
by Contributor
  • 5235 Views
  • 7 replies
  • 8 kudos

Resolved! Is anyone else experiencing intermittent "Failure starting REPL" errors with PySpark Jobs?

I have a Multi-Task Job that is running a bunch of PySpark notebooks and about 30-60% of the time, my jobs fail with the following error:I haven't seen any consistency with this error. I've had as many as all of the tasks in the job giving this error...

image.png
  • 5235 Views
  • 7 replies
  • 8 kudos
Latest Reply
James_Cole
New Contributor III
  • 8 kudos

Hi. Did you ever got a resolution to this problem outside of rolling back to 10.4? I have recently moved some workloads over to runtime 11.3 and am experiencing intermittent "repl did not start in 30 seconds." errors.I have increased the repl timeout...

  • 8 kudos
6 More Replies
swetha
by New Contributor III
  • 5689 Views
  • 2 replies
  • 2 kudos

Databricks job cluster logs

I am using databricks job cluster for multitask jobs, when my job failed/succeeded I couldn't see any logs, Do I need to add any location in advanced options, cluster logging to see the logs for the failed/succeeded jobs or what it is and how it work...

  • 5689 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @swetha kadiyala​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

  • 2 kudos
1 More Replies
Labels