OSError: [Errno 107] Transport endpoint is not connected
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-11-2024 10:23 AM - edited 03-11-2024 10:26 AM
Hi,
I am facing this error when performing write operation in foreach() on a dataframe. The piece of code was working fine for over 3 months but started failing since last week.
To give some context, I have a dataframe extract_df which contains 2 columns xml_full_name and content.
I use the below code to write these xmls to the target folder in ADLS.
extract_df.foreach(write_file)
'write_file' is defined as:
def write_file(row):
with open(row.extract_path, "wb") as f:
f.write(row.content)
The notebook also uses spark.write command to write some parquets which is working fine.
On investigating, I found that this issue could be related to parallelism and as a work around I tried the below:
for row in extract_df.collect():
with open(row.extract_path, "wb") as f:
f.write(row.content)
This works which means the connection is fine but parallelism isn't working. However, this can't be used as a fix though as it will bring down the performance.
Anyone here has faced this issue? If this is related to some configuration, any suggestions on what and where I can check. All inputs are welcome.
Thanks.