Databricks

Anonymous · ‎03-28-2023

I have a parquet dataframe df. I first add a column using df.withColumn("version",lit(currentTimestamp)) and append it a table db.tbl with format parquet and partitioned by the "version" column. I then ran MSCK REPAIR TABLE db.tbl. I have then created a view from this table called db.tbl_v.

Then I run the code below which gives me the error.

val am = spark.table("db.tbl_v").filter(col("col1")>=0.5).drop("col2")

display(am)

What I have tried:

The same code is working on lower environment. I believe that the configurations and settings will remain same in DEV and Prod environment.

I have tried creating a new table. It did not work.

When I run select * from db.tbl_v where col1 >= 0.5. I get an error Error in SQL statement: UndeclaredThrowableException:

When I run select * from db.tbl where col1 >= 0.5, I get the rows.

Thank you for reading my question and appreciate your help.

Anonymous · ‎04-04-2023

@vikashk84

The exception "RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive" typically occurs when there is an issue with Hive metadata related to partitioning in Databricks. Here are a few steps you can try to resolve the issue:

Check Hive Metastore Configuration: Ensure that the Hive Metastore is properly configured in Databricks. You can check the Hive Metastore configuration settings in the Databricks cluster's configuration or runtime environment, and verify that it is correctly pointing to the Hive Metastore service.
Verify Partitioning: Double-check that the partitioning column "version" in the table "db.tbl" is correctly defined and matches the partitioning column used in the "MSCK REPAIR TABLE" command. Ensure that the partitioning column is of the correct data type and matches the data type of the "version" column in the Parquet dataframe "df".
Check Table Metadata: Verify that the table metadata is correctly updated after running the "MSCK REPAIR TABLE" command. You can check the table metadata in the Hive Metastore to ensure that the partitions are correctly registered.
Check View Definition: Review the view "db.tbl_v" definition to ensure that it is correctly referencing the table "db.tbl" and its partitioning column "version". Make sure that the view definition is correctly written and does not contain any errors.
Verify Column Names: Check that the column names used in the "filter" and "drop" operations ("col1" and "col2") in the code are correct and match the column names in the table or view. Make sure there are no typographical errors or differences in column names between the table, view, and code.

If you have checked all of the above and the issue persists, it may be necessary to further investigate the specific details of your environment and data to identify the root cause of the exception.

Databricks

I am getting an exception "RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive."

Unity Catalog Lakeguard: Industry-first and only data governance for multi-user Apache™ Spark cluste

Announcing the General Availability of Databricks Asset Bundles

Register now and save 50% on training at Data + AI Summit!

How to successfully build GenAI applications

Meet DBRX, the New Standard for High-Quality LLMs