10-31-2022 09:46 AM
Hi,
I have data in parquet format in GCS buckets partitioned by name eg. gs://mybucket/name=ABCD/
I am trying to create a table in Databaricks as follows
DROP TABLE IF EXISTS name_test;
CREATE TABLE name_test
USING parquet
LOCATION "gs://mybucket/name=*/";
It is loading all the columns except name !
I get the following error when trying to view the schema from Data console
An error occurred while fetching table: name_test
com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: java.util.NoSuchElementException: key not found: name
It should infer the schema automatically I suppose,
Can someone let me know where I am doing wrong?
Thanks
11-01-2022 06:20 AM
CREATE TABLE name_test
(
id INT,
other STRING
)
PARTITIONED BY ( name STRING);
COPY INTO name_test
FROM "gs://<my_bucket>/"
FILEFORMAT = parquet;
This combination worked, after some trial and error.
Thanks for your help Pat.
11-01-2022 01:46 AM
Hi @M Baig ,
the error doesn't tell me much, but you could try:
CREATE TABLE name_test
USING parquet
PARTITIONED BY ( name STRING)
LOCATION "gs://mybucket/";
11-01-2022 03:11 AM
Hi Pat
Thanks for your reply.
I tried your approach but it didn't worked getting the following exception
Error in SQL statement: AnalysisException: Cannot use all columns for partition columns
com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: Cannot use all columns for partition columns
at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.org$apache$spark$sql$execution$datasources$PreprocessTableCreation$$failAnalysis(rules.scala:454)
at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.normalizePartitionColumns(rules.scala:414)
at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.org$apache$spark$sql$execution$datasources$PreprocessTableCreation$$normalizeCatalogTable(rules.scala:384)
at org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:323)
at org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:182)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$2(AnalysisHelper.scala:171)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:167)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:171)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:324)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:169)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:165)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning(AnalysisHelper.scala:99)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning$(AnalysisHelper.scala:96)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:76)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:75)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:30)
at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:182)
at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:178)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$3(RuleExecutor.scala:216)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:216)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:91)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:213)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:205)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:205)
at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:301)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:294)
at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:196)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:294)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:222)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:184)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:130)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:184)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:274)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:331)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:273)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:128)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:268)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:265)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:968)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:265)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:129)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:126)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:118)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:103)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:968)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:101)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:803)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:968)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:798)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:695)
at com.databricks.backend.daemon.driver.SQLDriverLocal.$anonfun$executeSql$1(SQLDriverLocal.scala:91)
at scala.collection.immutable.List.map(List.scala:297)
at com.databricks.backend.daemon.driver.SQLDriverLocal.executeSql(SQLDriverLocal.scala:37)
at com.databricks.backend.daemon.driver.SQLDriverLocal.repl(SQLDriverLocal.scala:145)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$12(DriverLocal.scala:631)
at com.databricks.logging.Log4jUsageLoggingShim$.$anonfun$withAttributionContext$1(Log4jUsageLoggingShim.scala:33)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:94)
at com.databricks.logging.Log4jUsageLoggingShim$.withAttributionContext(Log4jUsageLoggingShim.scala:31)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:205)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:204)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:59)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:240)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:225)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:59)
at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:608)
at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:615)
at scala.util.Try$.apply(Try.scala:213)
at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:607)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommandAndGetError(DriverWrapper.scala:526)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:561)
at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:431)
at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:374)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:225)
at java.lang.Thread.run(Thread.java:748)
11-01-2022 03:30 AM
I cannot test it now, but maybe you can try this way:
CREATE TABLE name_test
USING parquet
LOCATION "gs://mybucket/";
It might discover that table is partitioned by `name`, I don't remember right now.
The issues with my previous statement is that you would have to specify columns manually:
CREATE TABLE name_test
(
id INT,
other STRING
)
USING parquet
PARTITIONED BY ( name STRING)
LOCATION "gs://mybucket/";
11-01-2022 06:20 AM
CREATE TABLE name_test
(
id INT,
other STRING
)
PARTITIONED BY ( name STRING);
COPY INTO name_test
FROM "gs://<my_bucket>/"
FILEFORMAT = parquet;
This combination worked, after some trial and error.
Thanks for your help Pat.
03-20-2024 08:24 AM
Does this create an external table?
11-01-2022 03:41 PM
Hi @M Baig , It would mean a lot if you could select the "Best Answer" to help others find the correct answer faster.
This makes that answer appear right after the question, so it's easier to find within a thread.
It also helps us mark the question as Answered and close the thread so we can have more eyes helping others with Unanswered questions.
Can I count on you?