cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

Removing special character in data in databricks

eimis_pacheco
Contributor

Hi dear community,

My company is in a migration project from MapR to databricks, and we have the following piece of code that used to work fine in this platform but once in databricks it stopped working. I noticed that this is failing is just with this specific regular expresion because with others this is not getting any error.

The error is "Error while obtaining a new communication channel" and after that, we can not continue writting code and testing, something breaks.

I am attaching a screenshot for reference.

Error while obtaining a new communication channel 

import pyspark.sql.functions as pyfunc

df=spark.read.parquet("/mnt/gpdipedlstgamrasp50565/stg_db/intermediate/ODX/ODW/STUDY_REPORT/Current/Data/")

df.count()

df = df.withColumn('CSR_RESULTS_SUMMARY', pyfunc.regexp_replace(pyfunc.col('CSR_RESULTS_SUMMARY'),u'([\ud800-\udfff\ufdd0-\ufdef\ufffe-\uffff+])',''))

df.show()

Thank you very much in advance.

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group