- 2214 Views
- 3 replies
- 2 kudos
KB Feedback DiscussionIn addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers to common questions about Databricks, as well as information on optimisation and troubleshooting.These...
- 2214 Views
- 3 replies
- 2 kudos
- 18540 Views
- 17 replies
- 24 kudos
I am using MultiThread in this job which creates 8 parallel jobs. It fails for few times in a day and sometimes stuck in any of the Python notebook cell process. Here The Python process exited with an unknown exit code.The last 10 KB of the process's...
- 18540 Views
- 17 replies
- 24 kudos
Latest Reply
Hey, it seems that the issue is related to the driver undergoing a memory bottleneck, which causes it to crash with an out of memory (OOM) condition and gets restarted or becomes unresponsive due to frequent full garbage collection. The reason for th...
16 More Replies
- 2248 Views
- 6 replies
- 5 kudos
KB Feedback Discussion In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers to common questions about Databricks, as well as information on optimisation and troubleshooting.Thes...
- 2248 Views
- 6 replies
- 5 kudos
- 18971 Views
- 3 replies
- 2 kudos
Hello everyone,I am trying to determine the appropriate cluster specifications/sizing for my workload:Run a PySpark task to transform a batch of input avro files to parquet files and create or re-create persistent views on these parquet files. This t...
- 18971 Views
- 3 replies
- 2 kudos
Latest Reply
If the data is 100MB, then I'd try a single node cluster, which will be the smallest and least expensive. You'll have more than enough memory to store it all. You can automate this and use a jobs cluster.
2 More Replies