cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

autoloader error using unity catalog

garf
New Contributor III

Hello! 
I'm new on Databricks and I'm exploring some of its features.

I've successfully configured a workspace with unity catalog, one external storage location (ADLSg2) and the associated storage credential. I provided all privileges for all account users and try 'test connection' to ensure that everything is ok.


When I run the following command: 

input_file_path = f"abfss://<my_container>@<my_storage_account>.dfs.core.windows.net<my_path>"
schema = spark.read.parquet(input_file_path).schema
 
I was able to read my parquet file and obtain the schema.
 
When I tried the following code to test the autoloader capabilities:
 
df =  spark.readStream.format("cloudFiles") \
    .option("cloudFiles.format", "parquet") \
    .option("cloudFiles.inferColumnTypes", "true") \
    .option("cloudFiles.schemaLocation", <schema_location_path>) \
    .load(input_file_path)
 
I received the following error:
Failure to initialize configuration for storage account <my_storage_account>.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid...
 
How is it possibile that I can read my file using the standard read() function but I'm not able to read it with load()? 
1 ACCEPTED SOLUTION

Accepted Solutions

garf
New Contributor III

Hello @Mo , 

thank you for the quick feedback and sorry for the late reply.

The issue was related to 'schema_location_path'  Azure container.

I forgot to register 'schema_location_path' container as external location and the script was not able to read from this specific location. 

I added the specific external location and fixed the problem.

Thank you!

View solution in original post

2 REPLIES 2

Mo
Databricks Employee
Databricks Employee

hey @garf 

could you please try to create an external volume using your external location and then use the file path in the volume as the input file path?

garf
New Contributor III

Hello @Mo , 

thank you for the quick feedback and sorry for the late reply.

The issue was related to 'schema_location_path'  Azure container.

I forgot to register 'schema_location_path' container as external location and the script was not able to read from this specific location. 

I added the specific external location and fixed the problem.

Thank you!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group