cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

To read data from Azure Storage

bchaubey
Contributor II

Hi Team,

May i know how to read Azure storage data in Databricks through Python.

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @Bhagwan Chaubey​ , Once you've uploaded your files to your blob container,

Step 1: Get credentials necessary for databricks to connect to your blob container

From your Azure portal, you need to navigate to all resources then select your blob storage account and from under the settings select account keys. Once there, copy the key under Key1 to a local notepad.

Step 2: Configure DataBricks to read the file

To start reading the data, first, you need to configure your spark session to use credentials for your blob container. This can simply be done through the spark.conf.set command.

storage_account_name = 'nameofyourstorageaccount'
storage_account_access_key = 'thekeyfortheblobcontainer'
spark.conf.set('fs.azure.account.key.' + storage_account_name + '.blob.core.windows.net', storage_account_access_key)

Once done, we need to build the file path in the blob container and read the file as a Spark data frame.

blob_container = 'yourblobcontainername'
filePath = "wasbs://" + blob_container + "@" + storage_account_name + ".blob.core.windows.net/Sales/SalesFile.csv"
salesDf = spark.read.format("csv").load(filePath, inferSchema = True, header = True)

And congrats, we are done.

You can use the display command to have a sneak peek at our data.

Below is a snapshot of my code.

Screenshot 2022-01-06 at 5.24.47 PM 

View solution in original post

18 REPLIES 18

Kaniz_Fatma
Community Manager
Community Manager

Hi @ bchaubey! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

Kaniz_Fatma
Community Manager
Community Manager

Hi @Bhagwan Chaubey​ , You can access your files using python through the below-mentioned code.

#Once you've mounted a Blob storage container or a folder inside a container through the code:-
 
dbutils.fs.mount(
  source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
  mount_point = "/mnt/<mount-name>",
  extra_configs = {"<conf-key>":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})
 
#read the csv data
df = spark.read.csv("dbfs:/mnt/%s/...." % <name-of-your-mount>)
display(df)

bchaubey
Contributor II

@Kaniz Fatma​  how can find value of mountPoint = "/mnt/<mount-name>"

Hi @Bhagwan Chaubey​ ,

<mount-name>  is a DBFS path representing where the Blob storage container or a folder inside the container (specified in the source) will be mounted in DBFS.

Have you created any folders inside your blob containers? If not, your mount point will be simply - "dbfs:/mnt/dataset.csv"

As you can see in the screenshot below:-

If I want to read my country_classification.csv file, in my case the mount point will be "dbfs:/mnt/country_classification.csv" as I've not created any folder or directory inside my blob.

Screenshot 2022-01-05 at 6.51.11 PMAdding the snap of my code here too:-

Screenshot 2022-01-05 at 6.55.11 PMPlease do let me know if you have any more doubts.

bchaubey
Contributor II

%scala

df = spark.read.csv("dbfs:/mnt/country_classification.csv")

 display(df)

may i know how can find dbfs:/mnt

Hi @Bhagwan Chaubey​ , Can you please browse this path in your Microsoft azure:- "storage_account/containers/directory_in_which_you've_uploaded_your_dataset"? That itself will be your mount point.

bchaubey
Contributor II

Hi @Kaniz Fatma​  I am facing issue during read the data. Please see the attachment

Hi @Bhagwan Chaubey​ , Can you please enter the correct scope and key names in the above code?

bchaubey
Contributor II

@Kaniz Fatma​  i have added correct key

Hi @Bhagwan Chaubey​ , There might be a different scope name or any wrong credentials. You need to recheck all the values again. However, I've provided another way to solve your query. Please try and let me know if it works.

Kaniz_Fatma
Community Manager
Community Manager

Hi @Bhagwan Chaubey​ , Once you've uploaded your files to your blob container,

Step 1: Get credentials necessary for databricks to connect to your blob container

From your Azure portal, you need to navigate to all resources then select your blob storage account and from under the settings select account keys. Once there, copy the key under Key1 to a local notepad.

Step 2: Configure DataBricks to read the file

To start reading the data, first, you need to configure your spark session to use credentials for your blob container. This can simply be done through the spark.conf.set command.

storage_account_name = 'nameofyourstorageaccount'
storage_account_access_key = 'thekeyfortheblobcontainer'
spark.conf.set('fs.azure.account.key.' + storage_account_name + '.blob.core.windows.net', storage_account_access_key)

Once done, we need to build the file path in the blob container and read the file as a Spark data frame.

blob_container = 'yourblobcontainername'
filePath = "wasbs://" + blob_container + "@" + storage_account_name + ".blob.core.windows.net/Sales/SalesFile.csv"
salesDf = spark.read.format("csv").load(filePath, inferSchema = True, header = True)

And congrats, we are done.

You can use the display command to have a sneak peek at our data.

Below is a snapshot of my code.

Screenshot 2022-01-06 at 5.24.47 PM 

Hi @Bhagwan Chaubey​ , Does this work for you? Do you have any further doubts? Were you able to execute the above commands and get the desired results? Please do let us know If you need help.

@Kaniz Fatma​  I am using your code. no any error. But data is still not showing

Geoff123
New Contributor III

Kaniz,

I kept getting "org.apache.spark.SparkSecurityException: [INSUFFICIENT_PERMISSIONS] Insufficient privileges:" with the same codes.  Do you know why?

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group