<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: External table from parquet partition in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/external-table-from-parquet-partition/m-p/24692#M17185</link>
    <description>&lt;P&gt;I cannot test it now, but maybe you can try this way:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;CREATE TABLE name_test
USING parquet
LOCATION "gs://mybucket/";&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;It might discover that table is partitioned by `name`, I don't remember right now.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The issues with my previous statement is that you would have to specify columns manually:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;    CREATE TABLE name_test
(
id INT,
other STRING
)
    USING parquet
    PARTITIONED BY ( name STRING)
    LOCATION "gs://mybucket/";&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 01 Nov 2022 10:30:36 GMT</pubDate>
    <dc:creator>Pat</dc:creator>
    <dc:date>2022-11-01T10:30:36Z</dc:date>
    <item>
      <title>External table from parquet partition</title>
      <link>https://community.databricks.com/t5/data-engineering/external-table-from-parquet-partition/m-p/24689#M17182</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have data in parquet format in GCS buckets partitioned by name eg.  gs://mybucket/name=ABCD/&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am trying to create a table in Databaricks as follows&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;DROP TABLE IF EXISTS name_test;&amp;nbsp;&lt;/P&gt;&lt;P&gt;CREATE TABLE name_test&lt;/P&gt;&lt;P&gt;USING parquet&lt;/P&gt;&lt;P&gt;LOCATION "gs://mybucket/name=*/";&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It is loading all the columns except name !&lt;/P&gt;&lt;P&gt;I get the following error when trying to view the schema from Data console&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;An error occurred while fetching table: name_test&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: java.util.NoSuchElementException: key not found: name&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It should infer the schema automatically I suppose,&lt;/P&gt;&lt;P&gt;Can someone let me know where I am doing wrong?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 31 Oct 2022 16:46:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/external-table-from-parquet-partition/m-p/24689#M17182</guid>
      <dc:creator>MBV3</dc:creator>
      <dc:date>2022-10-31T16:46:03Z</dc:date>
    </item>
    <item>
      <title>Re: External table from parquet partition</title>
      <link>https://community.databricks.com/t5/data-engineering/external-table-from-parquet-partition/m-p/24690#M17183</link>
      <description>&lt;P&gt;Hi @M Baig​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;the error doesn't tell me much, but you could try:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;CREATE TABLE name_test
USING parquet
PARTITIONED BY ( name STRING)
LOCATION "gs://mybucket/";&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Nov 2022 08:46:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/external-table-from-parquet-partition/m-p/24690#M17183</guid>
      <dc:creator>Pat</dc:creator>
      <dc:date>2022-11-01T08:46:26Z</dc:date>
    </item>
    <item>
      <title>Re: External table from parquet partition</title>
      <link>https://community.databricks.com/t5/data-engineering/external-table-from-parquet-partition/m-p/24691#M17184</link>
      <description>&lt;P&gt;Hi Pat&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for your reply.&lt;/P&gt;&lt;P&gt;I tried your approach but it didn't worked getting the following exception&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Error in SQL statement: AnalysisException: Cannot use all columns for partition columns
com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: Cannot use all columns for partition columns
    at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.org$apache$spark$sql$execution$datasources$PreprocessTableCreation$$failAnalysis(rules.scala:454)
    at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.normalizePartitionColumns(rules.scala:414)
    at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.org$apache$spark$sql$execution$datasources$PreprocessTableCreation$$normalizeCatalogTable(rules.scala:384)
    at org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:323)
    at org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:182)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$2(AnalysisHelper.scala:171)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:167)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:171)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:324)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:169)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:165)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning(AnalysisHelper.scala:99)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning$(AnalysisHelper.scala:96)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:76)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:75)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:30)
    at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:182)
    at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:178)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$3(RuleExecutor.scala:216)
    at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:216)
    at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
    at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
    at scala.collection.immutable.List.foldLeft(List.scala:91)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:213)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:205)
    at scala.collection.immutable.List.foreach(List.scala:431)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:205)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:301)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:294)
    at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:196)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:294)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:222)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:184)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:130)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:184)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:274)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:331)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:273)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:128)
    at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:268)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:265)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:968)
    at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:265)
    at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:129)
    at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:126)
    at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:118)
    at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:103)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:968)
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:101)
    at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:803)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:968)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:798)
    at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:695)
    at com.databricks.backend.daemon.driver.SQLDriverLocal.$anonfun$executeSql$1(SQLDriverLocal.scala:91)
    at scala.collection.immutable.List.map(List.scala:297)
    at com.databricks.backend.daemon.driver.SQLDriverLocal.executeSql(SQLDriverLocal.scala:37)
    at com.databricks.backend.daemon.driver.SQLDriverLocal.repl(SQLDriverLocal.scala:145)
    at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$12(DriverLocal.scala:631)
    at com.databricks.logging.Log4jUsageLoggingShim$.$anonfun$withAttributionContext$1(Log4jUsageLoggingShim.scala:33)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
    at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:94)
    at com.databricks.logging.Log4jUsageLoggingShim$.withAttributionContext(Log4jUsageLoggingShim.scala:31)
    at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:205)
    at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:204)
    at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:59)
    at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:240)
    at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:225)
    at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:59)
    at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:608)
    at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:615)
    at scala.util.Try$.apply(Try.scala:213)
    at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:607)
    at com.databricks.backend.daemon.driver.DriverWrapper.executeCommandAndGetError(DriverWrapper.scala:526)
    at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:561)
    at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:431)
    at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:374)
    at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:225)
    at java.lang.Thread.run(Thread.java:748)
&amp;nbsp;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Nov 2022 10:11:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/external-table-from-parquet-partition/m-p/24691#M17184</guid>
      <dc:creator>MBV3</dc:creator>
      <dc:date>2022-11-01T10:11:30Z</dc:date>
    </item>
    <item>
      <title>Re: External table from parquet partition</title>
      <link>https://community.databricks.com/t5/data-engineering/external-table-from-parquet-partition/m-p/24692#M17185</link>
      <description>&lt;P&gt;I cannot test it now, but maybe you can try this way:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;CREATE TABLE name_test
USING parquet
LOCATION "gs://mybucket/";&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;It might discover that table is partitioned by `name`, I don't remember right now.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The issues with my previous statement is that you would have to specify columns manually:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;    CREATE TABLE name_test
(
id INT,
other STRING
)
    USING parquet
    PARTITIONED BY ( name STRING)
    LOCATION "gs://mybucket/";&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Nov 2022 10:30:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/external-table-from-parquet-partition/m-p/24692#M17185</guid>
      <dc:creator>Pat</dc:creator>
      <dc:date>2022-11-01T10:30:36Z</dc:date>
    </item>
    <item>
      <title>Re: External table from parquet partition</title>
      <link>https://community.databricks.com/t5/data-engineering/external-table-from-parquet-partition/m-p/24693#M17186</link>
      <description>&lt;PRE&gt;&lt;CODE&gt;CREATE TABLE name_test
(
id INT,
other STRING
)
    PARTITIONED BY ( name STRING);
&amp;nbsp;
COPY INTO name_test
FROM "gs://&amp;lt;my_bucket&amp;gt;/"
FILEFORMAT = parquet;
&amp;nbsp;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;This combination worked, after some trial and error.&lt;/P&gt;&lt;P&gt;Thanks for your help Pat.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Nov 2022 13:20:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/external-table-from-parquet-partition/m-p/24693#M17186</guid>
      <dc:creator>MBV3</dc:creator>
      <dc:date>2022-11-01T13:20:31Z</dc:date>
    </item>
    <item>
      <title>Re: External table from parquet partition</title>
      <link>https://community.databricks.com/t5/data-engineering/external-table-from-parquet-partition/m-p/64195#M32488</link>
      <description>&lt;P&gt;Does this create an external table?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Mar 2024 15:24:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/external-table-from-parquet-partition/m-p/64195#M32488</guid>
      <dc:creator>data_extractor</dc:creator>
      <dc:date>2024-03-20T15:24:16Z</dc:date>
    </item>
  </channel>
</rss>

