cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

External Table issue format in databricks

RohitKulkarni
Contributor

I am new to databricks

I am trying to create a external table in databricks with below format :

CREATE EXTERNAL TABLE Salesforce.Account

(

 Id string ,

  IsDeleted bigint,

  Name string ,

  Type string ,

  RecordTypeId string ,

  ParentId string ,

  ShippingStreet string ,

  ShippingCity string ,

  ShippingState string ,

  ShippingPostalCode string ,

  ShippingCountry string ,

  ShippingStateCode string ,

  ShippingCountryCode string ,

  Phone string ,

  Fax string ,

  AccountNumber string ,

  Sic string ,

  Industry string ,

  AnnualRevenue float,

  NumberOfEmployees float,

  Ownership string ,

  Description string ,

  Rating string ,

  CurrencyIsoCode string ,

  OwnerId string ,

  CreatedDate bigint,

  CreatedById string ,

  IsPartner bigint,

  AccountSource string ,

  SicDesc string ,

  IsGlobalKeyAccount__c bigint,

  Rating__c string ,

  AccountNumberAuto__c string ,

  AccountStatus__c string ,

  BUID__c string ,

  CompanyName__c string ,

  CreditLimit__c float,

  CreditOnHold__c bigint,

  CustomerClassification__c string ,

  DUNSNumber__c string ,

  DepartmentLabel__c string ,

  DepartmentName__c string ,

  DepartmentType__c string ,

  DiscountGroup__c string ,

  DoNotAllowBulkEmails__c bigint,

  Email__c string ,

  EnglishCompanyName__c string ,

  Interest__c string ,

  Language__c string ,

  LastCheckedBy__c string ,

  LastCheckedOn__c float,

  MarketOrganization__c string ,

  OtherPhone__c string ,

  PaymentTerms__c string ,

  Price_Book__c string ,

  RecordType__c string ,

  RelatedToGlobalKeyAccount__c bigint,

  RequestDeletion__c bigint,

  RequestEdit__c bigint,

  Segment__c string ,

  ShippingCountry__c string ,

  Subsegment1__c string ,

  Subsegment2__c string ,

  TermsOfDelivery__c string ,

  Status__c string ,

  SynchronizeBillingAddress__c bigint,

  Target_Account__c bigint,

  TravelZone__c string ,

  DynamicsAutoNumber__c string ,

  Goal__c string ,

  OriginOfData__c string ,

  CustomDUNS__c string ,

  TAP_Description__c string

)

STORED as parquet

location 'abfss://Storagename@containername.dfs.core.windows.net/Bronze/Salesforce/Account/*'

Error in SQL statement: AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.contracts.exceptions.KeyProviderException Failure to initialize configuration)

com.databricks.backend.common.rpc.SparkDriverExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.contracts.exceptions.KeyProviderException Failure to initialize configuration)

at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$2(HiveExternalCatalog.scala:163)

at org.apache.spark.sql.hive.HiveExternalCatalog.maybeSynchronized(HiveExternalCatalog.scala:115)

at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$1(HiveExternalCatalog.scala:153)

at com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:377)

at com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:363)

at com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34)

at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:152)

at org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:335)

at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:102)

at org.apache.spark.sql.catalyst.catalog.SessionCatalogImpl.createTable(SessionCatalog.scala:875)

at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.createTableInternal(ManagedCatalogSessionCatalog.scala:728)

at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.createTable(ManagedCatalogSessionCatalog.scala:689)

at com.databricks.sql.DatabricksSessionCatalog.createTable(DatabricksSessionCatalog.scala:205)

at org.apache.spark.sql.execution.command.CreateTableCommand.run(tables.scala:186)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:78)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:89)

at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:202)

6 REPLIES 6

Anonymous
Not applicable

Since it's a parquet file, you should be able to just do

create table name
using parquet
options (path tablepath)

You don't need to provide the schema. You might be having trouble with stored, I can't say I've seen that keyword before. Might also be a permission error on the underlying directory. The error message is not helpful here.

shan_chandra
Esteemed Contributor
Esteemed Contributor

@Rohit Kulkarni​  - From the error stack trace, This error specifically means it could not find any storage account keys and falls back to the default storage account keys.

could you please let us know if you are setting spark configs for ADLS gen2 at the cluster level or at the notebook level? If you are setting the spark configs at the notebook level. Kindly set it up at the cluster to make it available for metastore client.

Hi @Shanmugavel Chandrakasu​ , I have the same issue, could u pls write the spark configs to set at cluster level?

Here is the instruction how to connect with storage account: https://learn.microsoft.com/en-us/azure/databricks/external-data/azure-storage.

Setting it on the cluster level: https://docs.databricks.com/clusters/configure.html#spark-configuration.

  • Example configuration using storage account key:

"spark.hadoop.fs.azure.account.key.<storage-account-name>.dfs.core.windows.net": <storage-account-key>"

Anonymous
Not applicable

Hi @Rohit Kulkarni​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

AmitA1
Contributor

Databricks is awesome if you have SQL knowledge....I just came across one of my problem in my project and databricks helped me a lot....like a use of low watermark to hold the load success date .....​