Databricks

bchaubey · ‎01-05-2022

Hi Team,

May i know how to read Azure storage data in Databricks through Python.

Kaniz · ‎01-06-2022

Hi @Bhagwan Chaubey , Once you've uploaded your files to your blob container,

Step 1: Get credentials necessary for databricks to connect to your blob container

From your Azure portal, you need to navigate to all resources then select your blob storage account and from under the settings select account keys. Once there, copy the key under Key1 to a local notepad.

Step 2: Configure DataBricks to read the file

To start reading the data, first, you need to configure your spark session to use credentials for your blob container. This can simply be done through the spark.conf.set command.

storage_account_name = 'nameofyourstorageaccount'
storage_account_access_key = 'thekeyfortheblobcontainer'
spark.conf.set('fs.azure.account.key.' + storage_account_name + '.blob.core.windows.net', storage_account_access_key)

Once done, we need to build the file path in the blob container and read the file as a Spark data frame.

blob_container = 'yourblobcontainername'
filePath = "wasbs://" + blob_container + "@" + storage_account_name + ".blob.core.windows.net/Sales/SalesFile.csv"
salesDf = spark.read.format("csv").load(filePath, inferSchema = True, header = True)

And congrats, we are done.

You can use the display command to have a sneak peek at our data.

Below is a snapshot of my code.

View solution in original post

Kaniz · ‎01-05-2022

Hi @ bchaubey! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

Kaniz · ‎01-05-2022

Hi @Bhagwan Chaubey , You can access your files using python through the below-mentioned code.

#Once you've mounted a Blob storage container or a folder inside a container through the code:-
 
dbutils.fs.mount(
  source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
  mount_point = "/mnt/<mount-name>",
  extra_configs = {"<conf-key>":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})
 
#read the csv data
df = spark.read.csv("dbfs:/mnt/%s/...." % <name-of-your-mount>)
display(df)

bchaubey · ‎01-05-2022

@Kaniz Fatma how can find value of mountPoint = "/mnt/<mount-name>"

Kaniz · ‎01-05-2022

Hi @Bhagwan Chaubey ,

<mount-name> is a DBFS path representing where the Blob storage container or a folder inside the container (specified in the source) will be mounted in DBFS.

Have you created any folders inside your blob containers? If not, your mount point will be simply - "dbfs:/mnt/dataset.csv"

As you can see in the screenshot below:-

If I want to read my country_classification.csv file, in my case the mount point will be "dbfs:/mnt/country_classification.csv" as I've not created any folder or directory inside my blob.

Adding the snap of my code here too:-

Please do let me know if you have any more doubts.

bchaubey · ‎01-05-2022

%scala

df = spark.read.csv("dbfs:/mnt/country_classification.csv")

display(df)

may i know how can find dbfs:/mnt

Kaniz · ‎01-05-2022

Hi @Bhagwan Chaubey , Can you please browse this path in your Microsoft azure:- "storage_account/containers/directory_in_which_you've_uploaded_your_dataset"? That itself will be your mount point.

bchaubey · ‎01-05-2022

Hi @Kaniz Fatma I am facing issue during read the data. Please see the attachment

Kaniz · ‎01-05-2022

Hi @Bhagwan Chaubey , Can you please enter the correct scope and key names in the above code?

bchaubey · ‎01-05-2022

@Kaniz Fatma i have added correct key

Kaniz · ‎01-06-2022

Hi @Bhagwan Chaubey , There might be a different scope name or any wrong credentials. You need to recheck all the values again. However, I've provided another way to solve your query. Please try and let me know if it works.

Kaniz · ‎01-06-2022

Hi @Bhagwan Chaubey , Once you've uploaded your files to your blob container,

Step 1: Get credentials necessary for databricks to connect to your blob container

From your Azure portal, you need to navigate to all resources then select your blob storage account and from under the settings select account keys. Once there, copy the key under Key1 to a local notepad.

Step 2: Configure DataBricks to read the file

To start reading the data, first, you need to configure your spark session to use credentials for your blob container. This can simply be done through the spark.conf.set command.

storage_account_name = 'nameofyourstorageaccount'
storage_account_access_key = 'thekeyfortheblobcontainer'
spark.conf.set('fs.azure.account.key.' + storage_account_name + '.blob.core.windows.net', storage_account_access_key)

Once done, we need to build the file path in the blob container and read the file as a Spark data frame.

blob_container = 'yourblobcontainername'
filePath = "wasbs://" + blob_container + "@" + storage_account_name + ".blob.core.windows.net/Sales/SalesFile.csv"
salesDf = spark.read.format("csv").load(filePath, inferSchema = True, header = True)

And congrats, we are done.

You can use the display command to have a sneak peek at our data.

Below is a snapshot of my code.

Kaniz · ‎01-07-2022

Hi @Bhagwan Chaubey , Does this work for you? Do you have any further doubts? Were you able to execute the above commands and get the desired results? Please do let us know If you need help.

bchaubey · ‎01-07-2022

@Kaniz Fatma I am using your code. no any error. But data is still not showing

Geoff123 · a month ago

Kaniz,

I kept getting "org.apache.spark.SparkSecurityException: [INSUFFICIENT_PERMISSIONS] Insufficient privileges:" with the same codes. Do you know why?

Thanks!

Databricks

To read data from Azure Storage

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI