cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

KeyProviderException when trying to create external table on databricks

GS2312
New Contributor II

Hi There,

I have been trying to create an external table on Azure Databricks with below statement.

df.write.partitionBy("year", "month", "day").format('org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat').option("path",sourcepath).mode("overwrite").saveAsTable('test.table_test')

sourcepath above is the path on ADLS storage account.

the code works and the table is created pointing to the adlas storage account container, but I get following exception

analysisexception: org.apache.hadoop.hive.ql.metadata.hiveexception: metaexception(message:got exception: shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.contracts.exceptions.keyproviderexception failure to initialize configuration

I also tried setting up configs at cluster level as suggested on

https://community.databricks.com/s/question/0D58Y00009EV8pCSAT/external-table-issue-format-in-databr...

I configured the cluster level configs as below

fs.azure.account.oauth2.client.endpoint.<STORAGE ACCOUNT>.dfs.core.windows.net https://login.microsoftonline.com/<key>/oauth2/token

fs.azure.account.auth.type.<STORAGE ACCOUNT>.dfs.core.windows.net OAuth

fs.azure.account.oauth2.client.id.<STORAGE ACCOUNT>.dfs.core.windows.net ClientId

fs.azure.account.oauth.provider.type.<STORAGE ACCOUNT>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider

fs.azure.account.oauth2.client.secret.<STORAGE ACCOUNT>.dfs.core.windows.net {{secrets/scope/Secret_Key_Name}}

But it still does not work and throws the same exception.

Can anyone who may have encountered and resolved this issue earlier guide me on if I am missing any configs above and what can be the reason I am getting above excpetion although the data is getting written to adls and a table is getting created pointing to the same data.

Thanks

6 REPLIES 6

etsyal1e2r3
Honored Contributor

You should just set up your external location with the gui tool in the data tab (bottom left where it says external locations). Youll have to setup a service principal and add a databricksconnector in your azure resource group. This is the easiest way so you dont have to change the spar.config.

GS2312
New Contributor II

Hi,

Thank you for taking time to answer my query.

I tried setting up external location under data tab on databricks workspace. Also, added databricks access connector on resource group being used by workspace.

But the exception remains the same.

Even after I specified the key for the storage account on cluster level advanced options.

The .save() works perfectly without any issues.. But .saveAsTable() gives the exception.

Thank you

etsyal1e2r3
Honored Contributor

Well the error is talking about the hive metastore so that means youre still not connecting to your external location.

Can you create a table on the external location with this (assuming you have a json file in a blob folder location)?

# Set Variables
catalog_name = test_catalog
schema_name = test_schema
catalog_container = <catalog container string>
storage_account = <storage account string>
blob_path = f"abfss://{catalog_container}@{storage_account}.dfs.core.windows.net/<folder path>"
 
# Create Catalog if it doesn't exist
spark.sql(f"CREATE CATALOG IF NOT EXISTS {catalog_name} MANAGED LOCATION 'abfss://{catalog_container}@{storage_account}.dfs.core.windows.net/'")
 
# Create Schema if it doesn't exist
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {catalog_name }.{schema_name }")
 
# Read Files into Dataframe
df = spark.read.format('json').load(blob_path)
df.display()

emmafrost1
New Contributor II

Setting up external location in the data tab is must. then only you can proceed accordingly.

emmafrost1
New Contributor II

The reason why the special characters are converted to '?????' is because the Hive Metastore stores data in a binary format. When you create a view, the Hive Metastore converts the data in the view to binary format. This conversion process strips out any non-ASCII characters.

To preserve the special characters in the view representation, you can use the following workaround:

  1. Create a new column in the view that contains the encoded version of the string attribute.
  2. Use the new column in the where clause of the view.

The following code shows how to do this:

CREATE VIEW my_view AS
SELECT
  *,
  encode(string_attribute, 'utf-8') AS encoded_string_attribute
FROM my_table;
 
SELECT
  *
FROM my_view
WHERE
  encoded_string_attribute != 'シミュレータに接続されていません';

This code will create a new view called

my_view

that contains the original data from

my_table

plus a new column called

encoded_string_attribute

. The

encoded_string_attribute

column contains the encoded version of the

string_attribute

column. The where clause of the view will use the

encoded_string_attribute

column to filter out rows where the value of the

string_attribute

column is equal to 'シミュレータに接続されていません'.

This workaround will preserve the special characters in the view representation. However, it will also make the view slightly larger, because the encoded version of the

string_attribute

column will be slightly larger than the original

string_attribute

column.

If you want to avoid the performance penalty of storing the encoded version of the

string_attribute

column, you can use the following alternative workaround:

  1. Create a new column in the view that contains the escaped version of the string attribute.
  2. Use the new column in the where clause of the view.

The following code shows how to do this:

CREATE VIEW my_view AS
SELECT
  *,
  escape(string_attribute) AS escaped_string_attribute
FROM my_table;
 
SELECT
  *
FROM my_view
WHERE
  escaped_string_attribute != 'シミュレータに接続されていません';

This code will create a new view called

my_view

that contains the original data from

my_table

plus a new column called

escaped_string_attribute

. The

escaped_string_attribute

column contains the escaped version of the

string_attribute

column. The where clause of the view will use the

escaped_string_attribute

column to filter out rows where the value of the

string_attribute

column is equal to 'シミュレータに接続されていません'.

This workaround will preserve the special characters in the view representation without making the view any larger. However, it will make the where clause of the view slightly slower, because the Hive Metastore has to do some extra work to escape the special characters.

Which workaround you choose will depend on your specific needs. If you need to preserve the special characters in the view representation and you need the view to be as fast as possible, then you should use the first workaround. If you need to preserve the special characters in the view representation but you don't need the view to be as fast as possible, then you should use the second workaround.

Anonymous
Not applicable

Hi @Gaurishankar Sakhare​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group