Data Engineering

Forum Posts

Sorted by:

by Sujitha • Databricks Employee

12-13-2022 10:38:47 AM

2677 Views
3 replies
2 kudos

KB Feedback Discussion In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers t...

KB Feedback DiscussionIn addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers to common questions about Databricks, as well as information on optimisation and troubleshooting.These...

Data Engineering

2677 Views
3 replies
2 kudos

12-13-2022 10:38:47 AM

View Replies

Latest Reply

martinez
New Contributor III

07-17-2023 12:00:38 AM

2 kudos

Thanks for sharing!

2 kudos

07-17-2023 12:00:38 AM

2 More Replies

by Data_Analytics1 • Contributor III

02-07-2023 9:59:37 PM

21504 Views
17 replies
24 kudos

Fatal error: The Python kernel is unresponsive.

I am using MultiThread in this job which creates 8 parallel jobs. It fails for few times in a day and sometimes stuck in any of the Python notebook cell process. Here The Python process exited with an unknown exit code.The last 10 KB of the process's...

Data Engineering

21504 Views
17 replies
24 kudos

02-07-2023 9:59:37 PM

View Replies

Latest Reply

luis_herrera
Databricks Employee

04-28-2023 8:48:37 AM

24 kudos

Hey, it seems that the issue is related to the driver undergoing a memory bottleneck, which causes it to crash with an out of memory (OOM) condition and gets restarted or becomes unresponsive due to frequent full garbage collection. The reason for th...

24 kudos

04-28-2023 8:48:37 AM

16 More Replies

by Sujitha • Databricks Employee

12-09-2022 12:20:05 AM

2936 Views
6 replies
5 kudos

KB Feedback Discussion In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers ...

KB Feedback Discussion In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers to common questions about Databricks, as well as information on optimisation and troubleshooting.Thes...

Data Engineering

2936 Views
6 replies
5 kudos

12-09-2022 12:20:05 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

12-09-2022 1:03:07 AM

5 kudos

Thanks for sharing @Sujitha Ramamoorthy

5 kudos

12-09-2022 1:03:07 AM

5 More Replies

by sage5616 • Valued Contributor

08-03-2022 3:06:05 PM

23200 Views
3 replies
2 kudos

Resolved! Choosing the optimal cluster size/specs.

Hello everyone,I am trying to determine the appropriate cluster specifications/sizing for my workload:Run a PySpark task to transform a batch of input avro files to parquet files and create or re-create persistent views on these parquet files. This t...

Data Engineering

23200 Views
3 replies
2 kudos

08-03-2022 3:06:05 PM

View Replies

Latest Reply

Anonymous
Not applicable

08-07-2022 1:25:11 PM

2 kudos

If the data is 100MB, then I'd try a single node cluster, which will be the smallest and least expensive. You'll have more than enough memory to store it all. You can automate this and use a jobs cluster.

2 kudos

08-07-2022 1:25:11 PM

2 More Replies