Is there something like Oracle's VPD-Feature in Databricks?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2023 06:01 AM
Since I am porting some code from Oracle to Databricks, I have another specific question.
In Oracle there's something called Virtual Private Database, VPD. It's a simple security feature used to generate a WHERE-clause which the system will add to a user's query.
Basically, you set an "Application Context" (something like a tag) when a connection is made. This context is read in a specific function which then returns a string to be attached as the WHERE clause to the actual query.
I am aware of the current_user()-function in Pyspark/Databricks. However, I can't use it, because it would return the technical username of the BI-tool. That's why the VPD was used in the first place.
Thanks 🙂
- Labels:
-
Oracle
-
Pyspark
-
Row Level Security
-
VPD
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-10-2023 07:07 AM
@Roger Bieri :
In Databricks, you can use the UserDefinedFunction (UDF) feature to create a custom function that will be applied to a DataFrame. You can use this feature to add a WHERE clause to a DataFrame based on the user context. Here's an example of how you can use UDF to apply a WHERE clause to a DataFrame based on the user context:
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType
# Define your custom function to add WHERE clause to DataFrame
def add_where_clause(user_context):
if user_context == "context_1":
return "WHERE column_1 > 10"
elif user_context == "context_2":
return "WHERE column_2 < 100"
else:
return ""
# Create a UDF from the custom function
add_where_clause_udf = udf(add_where_clause, StringType())
# Apply the UDF to the DataFrame to add WHERE clause based on user context
df = df.select("*", add_where_clause_udf(current_user()).alias("where_clause"))
df_filtered = df.filter(df.where_clause != "")
In this example, add_where_clause is a custom function that takes the user context as an input and returns a WHERE clause string based on the context. The add_where_clause_udf function creates a UDF from the custom function. The df DataFrame is then modified by applying the UDF to create a new column where_clause. Finally, the df_filtered DataFrame is created by filtering the rows where where_clause is not an empty string. You can modify the add_where_clause function to suit your specific use case, and apply the UDF to your DataFrame to add a WHERE clause based on the user context.