cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Is there something like Oracle's VPD-Feature in Databricks?

quakenbush
Contributor

Since I am porting some code from Oracle to Databricks, I have another specific question.

In Oracle there's something called Virtual Private Database, VPD. It's a simple security feature used to generate a WHERE-clause which the system will add to a user's query.

Basically, you set an "Application Context" (something like a tag) when a connection is made. This context is read in a specific function which then returns a string to be attached as the WHERE clause to the actual query.

I am aware of the current_user()-function in Pyspark/Databricks. However, I can't use it, because it would return the technical username of the BI-tool. That's why the VPD was used in the first place.

Thanks 🙂

1 REPLY 1

Anonymous
Not applicable

@Roger Bieri​ :

In Databricks, you can use the UserDefinedFunction (UDF) feature to create a custom function that will be applied to a DataFrame. You can use this feature to add a WHERE clause to a DataFrame based on the user context. Here's an example of how you can use UDF to apply a WHERE clause to a DataFrame based on the user context:

from pyspark.sql.functions import udf
from pyspark.sql.types import StringType
 
# Define your custom function to add WHERE clause to DataFrame
def add_where_clause(user_context):
    if user_context == "context_1":
        return "WHERE column_1 > 10"
    elif user_context == "context_2":
        return "WHERE column_2 < 100"
    else:
        return ""
 
# Create a UDF from the custom function
add_where_clause_udf = udf(add_where_clause, StringType())
 
# Apply the UDF to the DataFrame to add WHERE clause based on user context
df = df.select("*", add_where_clause_udf(current_user()).alias("where_clause"))
df_filtered = df.filter(df.where_clause != "")

In this example, add_where_clause is a custom function that takes the user context as an input and returns a WHERE clause string based on the context. The add_where_clause_udf function creates a UDF from the custom function. The df DataFrame is then modified by applying the UDF to create a new column where_clause. Finally, the df_filtered DataFrame is created by filtering the rows where where_clause is not an empty string. You can modify the add_where_clause function to suit your specific use case, and apply the UDF to your DataFrame to add a WHERE clause based on the user context.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.