cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
cancel
Showing results for 
Search instead for 
Did you mean: 

Removing special character in data in databricks

eimis_pacheco
Contributor

Hi dear community,

My company is in a migration project from MapR to databricks, and we have the following piece of code that used to work fine in this platform but once in databricks it stopped working. I noticed that this is failing is just with this specific regular expresion because with others this is not getting any error.

The error is "Error while obtaining a new communication channel" and after that, we can not continue writting code and testing, something breaks.

I am attaching a screenshot for reference.

Error while obtaining a new communication channel 

import pyspark.sql.functions as pyfunc

df=spark.read.parquet("/mnt/gpdipedlstgamrasp50565/stg_db/intermediate/ODX/ODW/STUDY_REPORT/Current/Data/")

df.count()

df = df.withColumn('CSR_RESULTS_SUMMARY', pyfunc.regexp_replace(pyfunc.col('CSR_RESULTS_SUMMARY'),u'([\ud800-\udfff\ufdd0-\ufdef\ufffe-\uffff+])',''))

df.show()

Thank you very much in advance.

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Eimis Pacheco​,

This means that the driver crashed because of an OOM (Out of memory) exception, and it cannot establish a new connection with the driver.

Please try the below options:-

  • Try increasing driver-side memory and then retry.
  • You can look at the spark job dag, which gives you more data flow info.

For more information, follow this article.

View solution in original post

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @Eimis Pacheco​,

This means that the driver crashed because of an OOM (Out of memory) exception, and it cannot establish a new connection with the driver.

Please try the below options:-

  • Try increasing driver-side memory and then retry.
  • You can look at the spark job dag, which gives you more data flow info.

For more information, follow this article.

Kaniz
Community Manager
Community Manager

Hi @Eimis Pacheco​ , We haven't heard from you since my last response, and I was checking back to see if my suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to others. Also, please don't forget to click the "Select As Best" button whenever the information provided helps resolve your question.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.