cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

I am getting an exception "RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive."

Anonymous
Not applicable

I have a parquet dataframe df. I first add a column using df.withColumn("version",lit(currentTimestamp)) and append it a table db.tbl with format parquet and partitioned by the "version" column. I then ran MSCK REPAIR TABLE db.tbl. I have then created a view from this table called db.tbl_v.

Then I run the code below which gives me the error.

val am = spark.table("db.tbl_v").filter(col("col1")>=0.5).drop("col2")

display(am)

Image 

What I have tried:

The same code is working on lower environment. I believe that the configurations and settings will remain same in DEV and Prod environment.

I have tried creating a new table. It did not work.

When I run select * from db.tbl_v where col1 >= 0.5. I get an error Error in SQL statement: UndeclaredThrowableException:

When I run select * from db.tbl where col1 >= 0.5, I get the rows.

Thank you for reading my question and appreciate your help.

1 REPLY 1

Anonymous
Not applicable

@vikashk84

The exception "RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive" typically occurs when there is an issue with Hive metadata related to partitioning in Databricks. Here are a few steps you can try to resolve the issue:

  1. Check Hive Metastore Configuration: Ensure that the Hive Metastore is properly configured in Databricks. You can check the Hive Metastore configuration settings in the Databricks cluster's configuration or runtime environment, and verify that it is correctly pointing to the Hive Metastore service.
  2. Verify Partitioning: Double-check that the partitioning column "version" in the table "db.tbl" is correctly defined and matches the partitioning column used in the "MSCK REPAIR TABLE" command. Ensure that the partitioning column is of the correct data type and matches the data type of the "version" column in the Parquet dataframe "df".
  3. Check Table Metadata: Verify that the table metadata is correctly updated after running the "MSCK REPAIR TABLE" command. You can check the table metadata in the Hive Metastore to ensure that the partitions are correctly registered.
  4. Check View Definition: Review the view "db.tbl_v" definition to ensure that it is correctly referencing the table "db.tbl" and its partitioning column "version". Make sure that the view definition is correctly written and does not contain any errors.
  5. Verify Column Names: Check that the column names used in the "filter" and "drop" operations ("col1" and "col2") in the code are correct and match the column names in the table or view. Make sure there are no typographical errors or differences in column names between the table, view, and code.

If you have checked all of the above and the issue persists, it may be necessary to further investigate the specific details of your environment and data to identify the root cause of the exception.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.