Databricks Community

ReKa · 09-07-2017

This is a generic problem. Cheap solution is to increase number of shuffle partitions (in case loads are skewed) or restart the cluster. Safe solution is to increase cluster size or node sizes (SSD, RAM,…) Eventually, you have to make sure that you ...

ReKa · 11-12-2016

My guess is that the reason this may not work is the fact that the dictionary input does not have unique keys. With this syntax, column-names are keys and if you have two or more aggregation for the same column, some internal loops may forget the no...

ReKa · 05-02-2016

Your schema is tight, but make sure that the conversion to it does not throw an exception. Try with Memory Optimized Nodes, you may be fine. My problem was parsing a lot of data from sequence files containing 10K xml files and saving them as a table...

ReKa · 05-01-2016

In a similar problem following fixed the problem: - Using Memory Optimised Nodes (Compute Optimised had problems) - Tighter definition of schema (specially for nested clusters in pyspark, where order may matter) - Using S3a mount instead of S3n moun...

ReKa · 03-25-2016

see the same problem frequently despite brute force rm and changing table_name

Databricks Community

User Stats

User Activity

Re: How do I avoid the "No space left on device" error where my disk is running out of space?

Re: agg function not working for multiple aggregations

Re: Why do I get 'java.io.IOException: File already exists' for saveAsTable with Overwrite mode?

Re: Why do I get 'java.io.IOException: File already exists' for saveAsTable with Overwrite mode?

Re: Why do I get 'java.io.IOException: File already exists' for saveAsTable with Overwrite mode?