cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks Upload local files (Create/Modify table)

VJ3
Contributor

Hello Team,

I believe Databricks come out recently feature of Create or modify a table using file upload which is less than 2 GB (file format CSV, TSV, or JSON, Avro, Parquet, or text files to create or overwrite a managed Delta Lake table) on Self Serve workspace. (https://learn.microsoft.com/en-us/azure/databricks/ingestion/add-data/upload-data)

I am looking for your guidance on below:

- How do we ensure that One user uploading file can not shared with another user?

- Do we know if Databricks Local File upload abide with Bell–LaPadula model? Here is the information on Bell–LaPadula model. https://en.wikipedia.org/wiki/Bell%E2%80%93LaPadula_model

- What are the best practice abide with least privilege, need to know, and segregation duty for File Upload on Databricks Self-Serve Workspace?

- Can user overwrite the data (table) uploaded by another user?

- Can we use File upload on Non Secure Cluster?

 

Thank you

3 REPLIES 3

NandiniN
Databricks Employee
Databricks Employee

Hi @VJ3 ,

 

The "Imported files are uploaded to a secure internal location within your account which is garbage collected daily."

I created a new table and tried to check the path from the details but was not able to access the underlying file.

Unity Catalog should help you with the permissions for the tables if you do not want other users to override.

For access control we have the below that we follow, there is no explicit mention of the Bell–LaPadula model- https://docs.databricks.com/en/data-governance/table-acls/table-acl.html#enable-table-access-control...

Can we use File upload on Non Secure Cluster? Are you facing any issue? 

  • You can upload data to the staging area without connecting to compute resources, but you must select an active compute resource to preview and configure your table.
  • You must have access to a running compute resource and permissions to create tables in a target schema.

Thanks!

VJ3
Contributor

Hello Nandini,

Thank you for reply. Apologies for delay. Let's say I uploaded CSV file containing PII data using Upload feature available in Databricks UI. Will I be able to share that file with another user who should not have access to PII data elements? Can the user modify the table not owned by him? What is required to mask PII data before sharing the CSV file with another user? How do we ensure that user can not upload the file to DBFS root which is accessible to all users?

Thank you

Vijay

NandiniN
Databricks Employee
Databricks Employee

For Sharing a CSV file containing PII data with another user who should not have access to PII data elements:

  • You can use Databricks' Unity Catalog to manage and govern access to data. Unity Catalog allows you to define fine-grained access controls at the column level, ensuring that users without the necessary permissions cannot access PII data.
  • You can create views that mask or exclude PII data for users who should not have access to it. This can be done using dynamic view functions, which return either encrypted or masked data based on the user's access level.

Modifying a table not owned by the user:

  • Users cannot modify tables they do not own unless they have been explicitly granted the necessary permissions. Unity Catalog provides a unified permission model to manage access policies consistently across data and AI assets.

You can enforce access controls and permissions to prevent users from uploading files to the DBFS root. This can be managed through the Databricks workspace settings and Unity Catalog, ensuring that only authorized users have the necessary permissions to upload files to specific locations.

https://www.databricks.com/blog/2020/11/20/enforcing-column-level-encryption-and-avoiding-data-dupli...

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group