cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

[INTERNAL_ERROR] Cannot generate code for expression: claimsconifer.default.decrypt_colA(

nikhilkumawat
New Contributor III

A column contains encrypted data at rest. I am trying to create a sql function which will decrypt the data if the user is a part of a particular group. Below is the function:

 

%sql
CREATE OR REPLACE FUNCTION test.default.decrypt_if_valid_user(col_a STRING) 
RETURN CASE WHEN is_account_group_member('admin') THEN test.default.decrypt_colA (col_a ,secret('fernet_key', 'fernet_key_secret'))
    ELSE col_a
  END

 

Here "test.default.decrypt_colA" is already created. When I ran the query to retreive data I got decrypted data.

 

%sql
select test.default.decrypt_if_valid_user(col_a) from test.default.sampletbl limit 2

 

With this I am getting decrypted data. 

Now I applied this function directly on column by altering the table like this:

 

%sql
ALTER TABLE tes.default.sampletbl ALTER COLUMN col_a SET MASK test.default.decrypt_if_valid_user

 

Now when I try to query the above table I am getting below error:

 

%sql
select * from test.default.sampletbl limit 2
org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot generate code for expression: test.default.decrypt_colA (input[11, string, true], secret_value)
	at org.apache.spark.SparkException$.internalError(SparkException.scala:85)
	at org.apache.spark.SparkException$.internalError(SparkException.scala:89)
	at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotGenerateCodeForExpressionError(QueryExecutionErrors.scala:77)
	at org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode(Expression.scala:503)
	at org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode$(Expression.scala:502)
	at com.databricks.sql.analyzer.ExternalUDFExpression.doGenCode(ExternalUDFExpression.scala:37)
	at org.apache.spark.sql.catalyst.expressions.Expression.genCodeInternal(Expression.scala:249)
	at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$2(Expression.scala:225)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:225)
	at org.apache.spark.sql.catalyst.expressions.Alias.genCodeInternal(namedExpressions.scala:170)
	at com.databricks.sql.expressions.codegen.EdgeExpressionCodegen$.$anonfun$genCodeWithFallback$2(EdgeExpressionCodegen.scala:269)
	at scala.Option.getOrElse(Option.scala:189)
	at com.databricks.sql.expressions.codegen.EdgeExpressionCodegen$.$anonfun$genCodeWithFallback$1(EdgeExpressionCodegen.scala:269)
	at scala.Option.getOrElse(Option.scala:189)
	at com.databricks.sql.expressions.codegen.EdgeExpressionCodegen$.genCodeWithFallback(EdgeExpressionCodegen.scala:267)
	at org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.generateExpression(CodeGenerator.scala:1450)
	at org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.$anonfun$generateExpressionsForWholeStageWithCSE$2(CodeGenerator.scala:1531)
	at scala.collection.immutable.List.map(List.scala:297)
	at org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.$anonfun$generateExpressionsForWholeStageWithCSE$1(CodeGenerator.scala:1529)
	at org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.withSubExprEliminationExprs(CodeGenerator.scala:1183)
	at org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.generateExpressionsForWholeStageWithCSE(CodeGenerator.scala:1529)
	at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:76)
	at org.apache.spark.sql.execution.CodegenSupport.consume(WholeStageCodegenExec.scala:199)
	at org.apache.spark.sql.execution.CodegenSupport.consume$(WholeStageCodegenExec.scala:154)
	at org.apache.spark.sql.execution.ColumnarToRowExec.consume(Columnar.scala:78)
	at org.apache.spark.sql.execution.ColumnarToRowExec.doProduce(Columnar.scala:218)
	at org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:99)
	at org.apache.spark.sql.execution.SparkPlan$.org$apache$spark$sql$execution$SparkPlan$$withExecuteQueryLogging(SparkPlan.scala:107)

 

 

Any idea on how to resolve this issue ?

3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi @nikhilkumawat , 

The error message indicates that the decrypt_if_valid_user the function is not properly recognized when used as a masking function in the ALTER TABLE statement.

The masking feature in Databricks is not designed to work with user-defined functions (UDFs) or external functions like is_account_group_member and secret. The masking feature can only be used with built-in functions that are supported by the masking feature.

One way to solve this is to convert the UDF decrypt_if_valid_user to a built-in function using CREATE OR REPLACE TEMPORARY TABLE FUNCTION. Using this approach, you can define the function directly in SQL rather than in Python:

CREATE OR REPLACE TEMPORARY TABLE FUNCTION mask_decrypt(col_a STRING) 
  RETURNS STRING 
  COMMENT "Decrypts the given column if the user is a member of the 'admin' group"
  LANGUAGE SQL 
  AS "
  CASE 
    WHEN is_account_group_member('admin') 
    THEN test.default.decrypt_colA (col_a, secret('fernet_key', 'fernet_key_secret'))
    ELSE col_a
  END
";

Here, CREATE OR REPLACE TEMPORARY TABLE FUNCTION creates a built-in function that can be used with the masking feature. You can then use this function in the ALTER TABLE statement to mask the column:

ALTER TABLE test.default.sampletbl ALTER COLUMN col_a SET MASK mask_decrypt;

This should allow you to use the mask_decrypt function to mask the col_a column in the sampletbl table. Note that you can only use built-in functions with the masking feature, so any custom functions or external functions must be converted to built-in functions using SQL or the Delta Lake version of the Databricks Spark runtime.

 

Hi @Kaniz_Fatma I tried to create same function that you described . It is giving the error:

nikhil1991_0-1697522742982.png

 

nikhilkumawat
New Contributor III

Hi @Kaniz_Fatma After removing "TABLE" keyword from create or replace statement this function got registered as builtin function. Just to verify that I displayed all the functions and I can see that function--> decrypt_if_valid_user:

nikhil1991_0-1697541657408.png

Now I am trying to alter the table using below command and it is giving below error:

%sql
ALTER TABLE test.default.sampletbl ALTER COLUMN col_a SET MASK decrypt_if_valid_user

nikhil1991_1-1697541801866.png

Although I can see this function in the list but still it is not able recognize that function.

 

 

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!