Databricks

Rnmj · ‎10-25-2021

I am trying to run a python code where a json file is flattened to pipe separated file . The code works with smaller files but for huge files of 2.4 GB I get below error:

ConnectException: Connection refused (Connection refused)

Error while obtaining a new communication channel

ConnectException error: This is often caused by an OOM error that causes the connection to the Python REPL to be closed. Check your query's memory usage.

Databricks version 9.1 LTS

The cluster is 5 node Standard_DS4_V2

Kaniz · ‎10-25-2021

Hi @ Rnmj! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

-werners- · ‎10-25-2021

Can you check this topic?

It might be what you are looking for:

https://community.databricks.com/s/question/0D53f00001Q0Rq9CAF/bufferholder-exceeded-on-json-flatten...

jose_gonzalez · ‎10-26-2021

hi @RN mj ,

Could you provide more details? how do you read your JSON file? are you using an autoscaling cluster? what is the full error stack-trace?

Rnmj · ‎10-28-2021

Hi @Jose Gonzalez , @Werner Stinckens @Kaniz Fatma ,

Thanks for your response .Appreciate a lot.

The issue was in the code, it was a python /panda code running on Spark. Due to this only driver node was being used. i did validate this by increasing the driver configuration. The next steps is to revisit the code and use RDD/dataframes so code has some parallel processing

Kaniz · ‎10-28-2021

Great, Thanks!

Databricks

ConnectException: Connection refused (Connection refused) This is often caused by an OOM error

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI