โ07-05-2023 08:44 PM
We are using service principal to access data from raw files such as json, CSV .
I saw a video suggesting that it could be done via unity catalog as well.
Could someone comment on this please ?
โ07-05-2023 09:24 PM - edited โ07-05-2023 09:26 PM
@Loki wrote:We are using service principal to access data from raw files such as json, CSV .
I saw a video suggesting that it could be done via unity catalog as well. MyAccountAccess
Could someone comment on this please ?
Accessing raw files in Azure Data Lake Storage (ADLS) Gen 2 can be done using a service principal or Unity Catalog (UC). Both methods provide different approaches for accessing the data.
Service Principal:
Using a service principal involves creating an Azure Active Directory (AAD) application and assigning it the necessary permissions to access the ADLS Gen 2 storage account. The service principal acts as a service account, allowing programmatic access to the raw files. This method is commonly used for automation and integrating with other services.
Unity Catalog (UC):
Unity Catalog is a metadata layer that sits on top of the raw files in ADLS Gen 2. It provides a way to organize and manage data assets using a relational database-like interface. Unity Catalog allows you to create tables, define schemas, and query data using SQL-like syntax, abstracting away the complexities of dealing with raw files directly. It provides a more structured and user-friendly way to access and analyze data.
It's worth noting that Unity Catalog operates on top of the raw files; it does not replace or modify them. It provides an additional layer of abstraction and simplifies data discovery and querying.
The choice between using a service principal or Unity Catalog depends on your specific use case and requirements. If you need programmatic access to the raw files, and you are comfortable working with files directly, using a service principal might be more suitable. On the other hand, if you prefer a more structured and user-friendly approach, Unity Catalog can provide a convenient way to interact with the data.
Consider factors such as the complexity of your data, the required level of abstraction, and the skill set of your team when deciding between these approaches.
I hope the information helps you.
โ07-05-2023 09:24 PM - edited โ07-05-2023 09:26 PM
@Loki wrote:We are using service principal to access data from raw files such as json, CSV .
I saw a video suggesting that it could be done via unity catalog as well. MyAccountAccess
Could someone comment on this please ?
Accessing raw files in Azure Data Lake Storage (ADLS) Gen 2 can be done using a service principal or Unity Catalog (UC). Both methods provide different approaches for accessing the data.
Service Principal:
Using a service principal involves creating an Azure Active Directory (AAD) application and assigning it the necessary permissions to access the ADLS Gen 2 storage account. The service principal acts as a service account, allowing programmatic access to the raw files. This method is commonly used for automation and integrating with other services.
Unity Catalog (UC):
Unity Catalog is a metadata layer that sits on top of the raw files in ADLS Gen 2. It provides a way to organize and manage data assets using a relational database-like interface. Unity Catalog allows you to create tables, define schemas, and query data using SQL-like syntax, abstracting away the complexities of dealing with raw files directly. It provides a more structured and user-friendly way to access and analyze data.
It's worth noting that Unity Catalog operates on top of the raw files; it does not replace or modify them. It provides an additional layer of abstraction and simplifies data discovery and querying.
The choice between using a service principal or Unity Catalog depends on your specific use case and requirements. If you need programmatic access to the raw files, and you are comfortable working with files directly, using a service principal might be more suitable. On the other hand, if you prefer a more structured and user-friendly approach, Unity Catalog can provide a convenient way to interact with the data.
Consider factors such as the complexity of your data, the required level of abstraction, and the skill set of your team when deciding between these approaches.
I hope the information helps you.
โ07-06-2023 12:08 AM
Thanks Sally, the use case is a typical ETL project with medallion architecture.
we want to read the raw files (csv,json,txt,parquet), do some transformations on it and move it to delta silver layer.
could unity catalog be used here, since using service principal approach is a little tedious ?
another question, is mount point functionality being deprecated now ?
โ07-12-2023 02:50 AM
Hi @Loki
Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.
Cheers!
โ08-18-2023 01:55 AM
@Loki wrote:We are using service principal to access data from raw files such as json, CSV .Car Games
I saw a video suggesting that it could be done via unity catalog as well.
Could someone comment on this please ?
That's great! Service principals are a secure and recommended way to authenticate and access resources in Azure. With service principals, you can provide access to raw files in Azure storage or any other data source that supports service principal authentication.
To access data from raw files such as JSON or CSV, you can follow these general steps:
Create a service principal: Generate a service principal in Azure Active Directory (AAD) or Azure Active Directory B2C, depending on your requirements.
Assign required permissions: Assign the necessary permissions to the service principal to access the storage account or data source containing the raw files. For Azure storage accounts, you can provide appropriate access permissions (e.g., read, write, or list).
Use authentication credentials: Obtain the necessary authentication credentials for the service principal, such as its client ID, client secret, or certificate.
Implement code or scripts: Use programming languages like Python, Java, or PowerShell to write code or scripts that utilize the service principal's credentials and the appropriate SDKs or APIs to access the raw file data.
Connect to the data source: Use the service principal's credentials to authenticate and establish a connection to the data source or Azure storage account.
Access the raw files: Once the connection is established, you can access the raw files using methods provided by the SDKs or APIs. For example, you can read JSON or CSV files, parse their contents, and perform required operations.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group