03-27-2023 03:57 PM
Problem:
When running dbt with persist column docs enabled we get the following error: org.apache.hadoop.hive.ql.metadata.HiveException: at least one column must be specified for the table
Background:
There is an issue on the dbt-spark github that was closed.
https://github.com/dbt-labs/dbt-spark/issues/364 and this was discussed in the #db-databricks-and-spark channel of the DBT slack community.
We tried the recommended fix, setting spark.databricks.delta.catalog.update.enabled=false, but we still get the error if column docs are enabled. We do not get the error if we set persist column docs to be false but then we are not able to use the full documentation functionality in dbt.
Questions:
1) a year ago there was a comment on the dbt slack "we're about 2 weeks away from deploying an enhancement to the async thread that does though delta catalog updates to fix this issue so you won't need to disable delta.catalog.update". Is the fix, above, still recommended? If not is there a new recommendation?
2) The recommended fix setting spark.databricks.delta.catalog.update.enabled=false worked for other posters but not for us so I am wondering if we are implementing correctly. The terraform docs do not exactly match what was described in the post. We set this parameter in "data_access_config" but posters describe setting it in "sql_config_params", but we are not able to do that with terraform. Is the implementation, below, correct? (links to docs that we looked at are bulleted, below)
resource "databricks_sql_global_config" "cfg" {
instance_profile_arn = aws_iam_instance_profile.hc.arn
data_access_config = {
"spark.databricks.delta.catalog.update.enabled" : "false"
}
}
resource "databricks_sql_global_config" "cfg" {
instance_profile_arn = aws_iam_instance_profile.hc.arn
sql_config_params = {
data_access_config = {
"spark.databricks.delta.catalog.update.enabled" : "false"
}
}
Thanks for your help!
04-02-2023 07:08 AM
@Kesshi Jordan :
04-04-2023 10:26 AM
Hi, @Suteja Kanuri thanks so much for your response! Do you have any suggestions of what the other configuration issue might be or any resources to recommend on debugging this? We haven't been able to figure it out yet.
04-04-2023 10:38 AM
I just realized I copied the diff of the two options we tried by mistake. I've corrected to the current one which is below.
resource "databricks_sql_global_config" "cfg" {
instance_profile_arn = aws_iam_instance_profile.hc.arn
data_access_config = {
"spark.databricks.delta.catalog.update.enabled" : "false"
}
}
We also tried setting sql_config_params, which caused an error so we went back to setting it in data_access_config
resource "databricks_sql_global_config" "cfg" {
instance_profile_arn = aws_iam_instance_profile.hc.arn
sql_config_params = {
"spark.databricks.delta.catalog.update.enabled" : "false"
}
}
04-05-2023 09:49 PM
@Kesshi Jordan : Can you please provide more information on the error message you're seeing.
04-11-2023 07:22 AM
I don't know that I have anymore detail on the error message than I put in the initial post. This seems to be a well-documented issue between the dbt slack channel and the github issue from dbt-spark I linked in the initial post and I think our experience pretty well matches other reports. I think you're right that the issue is something else in the configuration we haven't been able to figure out yet. I do have an update, though. We were able to add the parameter using the UI and so far we haven't gotten the error since we made that change so it seems to be something with the terraform implementation of the configuration change. We have a meeting with our databricks contact soon so I'll update this with a solution if we figure it out. Thanks for your help on this!
04-03-2023 11:32 PM
Hi @Kesshi Jordan
Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
04-11-2023 12:20 PM
@Kesshi Jordan Update - we got this to work by setting the configuration in the admin settings Data Access Configuration of the Instance Profile.
04-11-2023 12:19 PM
Also confirming that you do not have any of these limitations:
Some databases limit where and how descriptions can be added to database objects. Those database adapters might not support persist_docs, or might offer only partial support.
Some known issues and limitations of Databricks specifically:
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group