cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to use Java MaskFormatter in sparksql?

Databrickguy
New Contributor II

I create a function based on Java MaskFormatter function in Databricks/Scala.

But when I call it from sparksql, I received error message

Error in SQL statement: AnalysisException: Undefined function: formatAccount. This function is neither a built-in/temporary function, nor a persistent function that is qualified as spark_catalog.default.formataccount.; line 1 pos 32

Here is my function

import javax.swing.text.MaskFormatter
 
def formatAccount(account: String, mask:String ) : String =
 {
  val formatter = new MaskFormatter(mask.replace("X", "A"))
  formatter.setValueContainsLiteralCharacters(false)
  val formatAccount = formatter.valueToString(account)
   formatAccount
 }

Here is the query code which received the error message

sql("""select java_method(emitToKafka ,formatAccount("1222233334", "X-XXXX-XXXX-X"))""")

However if I run below code, it works fine.

formatAccount("1222233334", "X-XXXX-XXXX-X")
res0: String = 1-2222-3333-4

what could be missed?

1 REPLY 1

Anonymous
Not applicable

@Tim zhang​ :

The issue is that the formatAccount function is defined as a Scala function, but SparkSQL is looking for a SQL function. You need to register the Scala function as a SQL function so that it can be called from SparkSQL. You can register the Scala function as a SQL function using the spark.udf.register method. Here is an example code snippet:

import org.apache.spark.sql.functions.udf
 
val formatAccountUDF = udf((account: String, mask:String ) => {
  val formatter = new MaskFormatter(mask.replace("X", "A"))
  formatter.setValueContainsLiteralCharacters(false)
  val formatAccount = formatter.valueToString(account)
  formatAccount
})
 
spark.udf.register("formatAccount", formatAccountUDF)

After registering the function, you can use it in your SparkSQL queries:

sql("""select formatAccount("1222233334", "X-XXXX-XXXX-X")""")

Hopefully this should work.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group