cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Accessing ADLS Gen 2 Raw Files with UC ?

Loki
New Contributor III

We are using service principal to access data from raw files such as json, CSV .

I saw a video suggesting that it could be done via unity catalog as well.

Could someone comment on this please ?

1 ACCEPTED SOLUTION

Accepted Solutions

Sallyroque
New Contributor III

@Loki wrote:

We are using service principal to access data from raw files such as json, CSV .

I saw a video suggesting that it could be done via unity catalog as well.  MyAccountAccess

Could someone comment on this please ?


Accessing raw files in Azure Data Lake Storage (ADLS) Gen 2 can be done using a service principal or Unity Catalog (UC). Both methods provide different approaches for accessing the data.

Service Principal:
Using a service principal involves creating an Azure Active Directory (AAD) application and assigning it the necessary permissions to access the ADLS Gen 2 storage account. The service principal acts as a service account, allowing programmatic access to the raw files. This method is commonly used for automation and integrating with other services.

Unity Catalog (UC):
Unity Catalog is a metadata layer that sits on top of the raw files in ADLS Gen 2. It provides a way to organize and manage data assets using a relational database-like interface. Unity Catalog allows you to create tables, define schemas, and query data using SQL-like syntax, abstracting away the complexities of dealing with raw files directly. It provides a more structured and user-friendly way to access and analyze data.

It's worth noting that Unity Catalog operates on top of the raw files; it does not replace or modify them. It provides an additional layer of abstraction and simplifies data discovery and querying.

The choice between using a service principal or Unity Catalog depends on your specific use case and requirements. If you need programmatic access to the raw files, and you are comfortable working with files directly, using a service principal might be more suitable. On the other hand, if you prefer a more structured and user-friendly approach, Unity Catalog can provide a convenient way to interact with the data.

Consider factors such as the complexity of your data, the required level of abstraction, and the skill set of your team when deciding between these approaches.

I hope the information helps you. 

 

 

 

 

View solution in original post

4 REPLIES 4

Sallyroque
New Contributor III

@Loki wrote:

We are using service principal to access data from raw files such as json, CSV .

I saw a video suggesting that it could be done via unity catalog as well.  MyAccountAccess

Could someone comment on this please ?


Accessing raw files in Azure Data Lake Storage (ADLS) Gen 2 can be done using a service principal or Unity Catalog (UC). Both methods provide different approaches for accessing the data.

Service Principal:
Using a service principal involves creating an Azure Active Directory (AAD) application and assigning it the necessary permissions to access the ADLS Gen 2 storage account. The service principal acts as a service account, allowing programmatic access to the raw files. This method is commonly used for automation and integrating with other services.

Unity Catalog (UC):
Unity Catalog is a metadata layer that sits on top of the raw files in ADLS Gen 2. It provides a way to organize and manage data assets using a relational database-like interface. Unity Catalog allows you to create tables, define schemas, and query data using SQL-like syntax, abstracting away the complexities of dealing with raw files directly. It provides a more structured and user-friendly way to access and analyze data.

It's worth noting that Unity Catalog operates on top of the raw files; it does not replace or modify them. It provides an additional layer of abstraction and simplifies data discovery and querying.

The choice between using a service principal or Unity Catalog depends on your specific use case and requirements. If you need programmatic access to the raw files, and you are comfortable working with files directly, using a service principal might be more suitable. On the other hand, if you prefer a more structured and user-friendly approach, Unity Catalog can provide a convenient way to interact with the data.

Consider factors such as the complexity of your data, the required level of abstraction, and the skill set of your team when deciding between these approaches.

I hope the information helps you. 

 

 

 

 

Loki
New Contributor III

Thanks Sally, the use case is a typical ETL project with medallion architecture. 
we want to read the raw files (csv,json,txt,parquet), do some transformations on it and move it to delta silver layer. 

could unity catalog be used here, since using service principal approach is a little tedious ?

another question, is mount point functionality being deprecated now ?

Anonymous
Not applicable

Hi @Loki 

Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.

Cheers!

donkyhotes
New Contributor II

@Loki wrote:

We are using service principal to access data from raw files such as json, CSV .Car Games

I saw a video suggesting that it could be done via unity catalog as well.

Could someone comment on this please ?


That's great! Service principals are a secure and recommended way to authenticate and access resources in Azure. With service principals, you can provide access to raw files in Azure storage or any other data source that supports service principal authentication.

To access data from raw files such as JSON or CSV, you can follow these general steps:

Create a service principal: Generate a service principal in Azure Active Directory (AAD) or Azure Active Directory B2C, depending on your requirements.

Assign required permissions: Assign the necessary permissions to the service principal to access the storage account or data source containing the raw files. For Azure storage accounts, you can provide appropriate access permissions (e.g., read, write, or list).

Use authentication credentials: Obtain the necessary authentication credentials for the service principal, such as its client ID, client secret, or certificate.

Implement code or scripts: Use programming languages like Python, Java, or PowerShell to write code or scripts that utilize the service principal's credentials and the appropriate SDKs or APIs to access the raw file data.

Connect to the data source: Use the service principal's credentials to authenticate and establish a connection to the data source or Azure storage account.

Access the raw files: Once the connection is established, you can access the raw files using methods provided by the SDKs or APIs. For example, you can read JSON or CSV files, parse their contents, and perform required operations.