09-20-2022 11:26 PM
I am new to databricks
I am trying to create a external table in databricks with below format :
CREATE EXTERNAL TABLE Salesforce.Account
(
Id string ,
IsDeleted bigint,
Name string ,
Type string ,
RecordTypeId string ,
ParentId string ,
ShippingStreet string ,
ShippingCity string ,
ShippingState string ,
ShippingPostalCode string ,
ShippingCountry string ,
ShippingStateCode string ,
ShippingCountryCode string ,
Phone string ,
Fax string ,
AccountNumber string ,
Sic string ,
Industry string ,
AnnualRevenue float,
NumberOfEmployees float,
Ownership string ,
Description string ,
Rating string ,
CurrencyIsoCode string ,
OwnerId string ,
CreatedDate bigint,
CreatedById string ,
IsPartner bigint,
AccountSource string ,
SicDesc string ,
IsGlobalKeyAccount__c bigint,
Rating__c string ,
AccountNumberAuto__c string ,
AccountStatus__c string ,
BUID__c string ,
CompanyName__c string ,
CreditLimit__c float,
CreditOnHold__c bigint,
CustomerClassification__c string ,
DUNSNumber__c string ,
DepartmentLabel__c string ,
DepartmentName__c string ,
DepartmentType__c string ,
DiscountGroup__c string ,
DoNotAllowBulkEmails__c bigint,
Email__c string ,
EnglishCompanyName__c string ,
Interest__c string ,
Language__c string ,
LastCheckedBy__c string ,
LastCheckedOn__c float,
MarketOrganization__c string ,
OtherPhone__c string ,
PaymentTerms__c string ,
Price_Book__c string ,
RecordType__c string ,
RelatedToGlobalKeyAccount__c bigint,
RequestDeletion__c bigint,
RequestEdit__c bigint,
Segment__c string ,
ShippingCountry__c string ,
Subsegment1__c string ,
Subsegment2__c string ,
TermsOfDelivery__c string ,
Status__c string ,
SynchronizeBillingAddress__c bigint,
Target_Account__c bigint,
TravelZone__c string ,
DynamicsAutoNumber__c string ,
Goal__c string ,
OriginOfData__c string ,
CustomDUNS__c string ,
TAP_Description__c string
)
STORED as parquet
location 'abfss://Storagename@containername.dfs.core.windows.net/Bronze/Salesforce/Account/*'
Error in SQL statement: AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.contracts.exceptions.KeyProviderException Failure to initialize configuration)
com.databricks.backend.common.rpc.SparkDriverExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.contracts.exceptions.KeyProviderException Failure to initialize configuration)
at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$2(HiveExternalCatalog.scala:163)
at org.apache.spark.sql.hive.HiveExternalCatalog.maybeSynchronized(HiveExternalCatalog.scala:115)
at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$1(HiveExternalCatalog.scala:153)
at com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:377)
at com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:363)
at com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:152)
at org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:335)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:102)
at org.apache.spark.sql.catalyst.catalog.SessionCatalogImpl.createTable(SessionCatalog.scala:875)
at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.createTableInternal(ManagedCatalogSessionCatalog.scala:728)
at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.createTable(ManagedCatalogSessionCatalog.scala:689)
at com.databricks.sql.DatabricksSessionCatalog.createTable(DatabricksSessionCatalog.scala:205)
at org.apache.spark.sql.execution.command.CreateTableCommand.run(tables.scala:186)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:78)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:89)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:202)
09-21-2022 04:43 AM
Since it's a parquet file, you should be able to just do
create table name
using parquet
options (path tablepath)
You don't need to provide the schema. You might be having trouble with stored, I can't say I've seen that keyword before. Might also be a permission error on the underlying directory. The error message is not helpful here.
09-23-2022 02:56 PM
@Rohit Kulkarni - From the error stack trace, This error specifically means it could not find any storage account keys and falls back to the default storage account keys.
could you please let us know if you are setting spark configs for ADLS gen2 at the cluster level or at the notebook level? If you are setting the spark configs at the notebook level. Kindly set it up at the cluster to make it available for metastore client.
12-12-2022 08:29 AM
Hi @Shanmugavel Chandrakasu , I have the same issue, could u pls write the spark configs to set at cluster level?
12-13-2022 02:24 PM
Here is the instruction how to connect with storage account: https://learn.microsoft.com/en-us/azure/databricks/external-data/azure-storage.
Setting it on the cluster level: https://docs.databricks.com/clusters/configure.html#spark-configuration.
"spark.hadoop.fs.azure.account.key.<storage-account-name>.dfs.core.windows.net": <storage-account-key>"
10-02-2022 11:17 PM
Hi @Rohit Kulkarni
Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
12-16-2022 10:16 AM
Databricks is awesome if you have SQL knowledge....I just came across one of my problem in my project and databricks helped me a lot....like a use of low watermark to hold the load success date .....
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group