cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

I am getting an exception "RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive."

Anonymous
Not applicable

I have a parquet dataframe df. I first add a column using df.withColumn("version",lit(currentTimestamp)) and append it a table db.tbl with format parquet and partitioned by the "version" column. I then ran MSCK REPAIR TABLE db.tbl. I have then created a view from this table called db.tbl_v.

Then I run the code below which gives me the error.

val am = spark.table("db.tbl_v").filter(col("col1")>=0.5).drop("col2")

display(am)

Image 

What I have tried:

The same code is working on lower environment. I believe that the configurations and settings will remain same in DEV and Prod environment.

I have tried creating a new table. It did not work.

When I run select * from db.tbl_v where col1 >= 0.5. I get an error Error in SQL statement: UndeclaredThrowableException:

When I run select * from db.tbl where col1 >= 0.5, I get the rows.

Thank you for reading my question and appreciate your help.

1 REPLY 1

Anonymous
Not applicable

@vikashk84

The exception "RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive" typically occurs when there is an issue with Hive metadata related to partitioning in Databricks. Here are a few steps you can try to resolve the issue:

  1. Check Hive Metastore Configuration: Ensure that the Hive Metastore is properly configured in Databricks. You can check the Hive Metastore configuration settings in the Databricks cluster's configuration or runtime environment, and verify that it is correctly pointing to the Hive Metastore service.
  2. Verify Partitioning: Double-check that the partitioning column "version" in the table "db.tbl" is correctly defined and matches the partitioning column used in the "MSCK REPAIR TABLE" command. Ensure that the partitioning column is of the correct data type and matches the data type of the "version" column in the Parquet dataframe "df".
  3. Check Table Metadata: Verify that the table metadata is correctly updated after running the "MSCK REPAIR TABLE" command. You can check the table metadata in the Hive Metastore to ensure that the partitions are correctly registered.
  4. Check View Definition: Review the view "db.tbl_v" definition to ensure that it is correctly referencing the table "db.tbl" and its partitioning column "version". Make sure that the view definition is correctly written and does not contain any errors.
  5. Verify Column Names: Check that the column names used in the "filter" and "drop" operations ("col1" and "col2") in the code are correct and match the column names in the table or view. Make sure there are no typographical errors or differences in column names between the table, view, and code.

If you have checked all of the above and the issue persists, it may be necessary to further investigate the specific details of your environment and data to identify the root cause of the exception.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group