10-25-2021 05:25 AM
I am trying to run a python code where a json file is flattened to pipe separated file . The code works with smaller files but for huge files of 2.4 GB I get below error:
ConnectException: Connection refused (Connection refused)
Error while obtaining a new communication channel
ConnectException error: This is often caused by an OOM error that causes the connection to the Python REPL to be closed. Check your query's memory usage.
Databricks version 9.1 LTS
The cluster is 5 node Standard_DS4_V2
10-25-2021 05:29 AM
Hi @ Rnmj! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.
10-25-2021 06:40 AM
Can you check this topic?
It might be what you are looking for:
10-26-2021 01:33 PM
hi @RN mj ,
Could you provide more details? how do you read your JSON file? are you using an autoscaling cluster? what is the full error stack-trace?
10-28-2021 08:58 PM
Hi @Jose Gonzalez , @Werner Stinckens @Kaniz Fatma ,
Thanks for your response .Appreciate a lot.
The issue was in the code, it was a python /panda code running on Spark. Due to this only driver node was being used. i did validate this by increasing the driver configuration. The next steps is to revisit the code and use RDD/dataframes so code has some parallel processing
10-28-2021 10:58 PM
Great, Thanks!
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.