cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

data_centers_q2_q3.snappy

William
New Contributor

I am writing to inquire about the following error:

om.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.

at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$13(DataSource.scala:234)

at scala.Option.getOrElse(Option.scala:189)

at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:234)

at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:468)

at org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:89)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)

at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:235)

at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3825)

at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:130)

at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:273)

at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:104)

at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:854)

at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:77)

at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:223)

at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3823)

at org.apache.spark.sql.Dataset.<init>(Dataset.scala:235)

at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:104)

at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:854)

Which is generated from the the following code (copied directly from the Databricks academy SQL training):

DROP TABLE IF EXISTS dc_data_raw;

CREATE TABLE dc_data_raw

USING parquet

OPTIONS (

 PATH "/FileStore/Tables/data_centers_q2_q3.snappy.parquet"

);

I have done some investigation on my own but have turned up nothing that solves this. I have attached the downloaded version of the parquet file included for the course.

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group