cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

When running DBT pipeline with column docs persisted we get error at least one column must be specified

kj1
New Contributor III

Problem:

When running dbt with persist column docs enabled we get the following error: org.apache.hadoop.hive.ql.metadata.HiveException: at least one column must be specified for the table

Background:

There is an issue on the dbt-spark github that was closed.

https://github.com/dbt-labs/dbt-spark/issues/364 and this was discussed in the #db-databricks-and-spark channel of the DBT slack community.

We tried the recommended fix, setting spark.databricks.delta.catalog.update.enabled=false, but we still get the error if column docs are enabled. We do not get the error if we set persist column docs to be false but then we are not able to use the full documentation functionality in dbt.

Questions:

1) a year ago there was a comment on the dbt slack "we're about 2 weeks away from deploying an enhancement to the async thread that does though delta catalog updates to fix this issue so you won't need to disable delta.catalog.update". Is the fix, above, still recommended? If not is there a new recommendation?

2) The recommended fix setting spark.databricks.delta.catalog.update.enabled=false worked for other posters but not for us so I am wondering if we are implementing correctly. The terraform docs do not exactly match what was described in the post. We set this parameter in "data_access_config" but posters describe setting it in "sql_config_params", but we are not able to do that with terraform. Is the implementation, below, correct? (links to docs that we looked at are bulleted, below)

resource "databricks_sql_global_config" "cfg" {

 instance_profile_arn = aws_iam_instance_profile.hc.arn

  data_access_config = {

  "spark.databricks.delta.catalog.update.enabled" : "false"

  }

}

resource "databricks_sql_global_config" "cfg" {

instance_profile_arn = aws_iam_instance_profile.hc.arn

sql_config_params = {

data_access_config = {

"spark.databricks.delta.catalog.update.enabled" : "false"

}

}

Thanks for your help!

8 REPLIES 8

Anonymous
Not applicable

@Kesshi Jordan​ :

  1. Based on the closed issue on the dbt-spark GitHub and the comment from a year ago in the #db-databricks-and-spark channel of the DBT Slack community, it seems that the enhancement to the async thread that does delta catalog updates has not yet been deployed. Therefore, the recommended fix of setting spark.databricks.delta.catalog.update.enabled=false is still relevant.
  2. The implementation that you have shared seems to be correct. Although the Terraform documentation separates sql_config_params and data_access_config as optional maps, it seems that they can be nested within each other. In your implementation, you have correctly set spark.databricks.delta.catalog.update.enabled to false within data_access_config. Therefore, if this parameter is not taking effect for you, it is possible that there might be some other configuration issue causing the problem.

kj1
New Contributor III

Hi, @Suteja Kanuri​ thanks so much for your response! Do you have any suggestions of what the other configuration issue might be or any resources to recommend on debugging this? We haven't been able to figure it out yet.

kj1
New Contributor III

I just realized I copied the diff of the two options we tried by mistake. I've corrected to the current one which is below.

resource "databricks_sql_global_config" "cfg" {

 instance_profile_arn = aws_iam_instance_profile.hc.arn

  data_access_config = {

  "spark.databricks.delta.catalog.update.enabled" : "false"

  }

}

We also tried setting sql_config_params, which caused an error so we went back to setting it in data_access_config

resource "databricks_sql_global_config" "cfg" {

 instance_profile_arn = aws_iam_instance_profile.hc.arn

  sql_config_params = {

  "spark.databricks.delta.catalog.update.enabled" : "false"

  }

}

Anonymous
Not applicable

@Kesshi Jordan​ : Can you please provide more information on the error message you're seeing.

kj1
New Contributor III

I don't know that I have anymore detail on the error message than I put in the initial post. This seems to be a well-documented issue between the dbt slack channel and the github issue from dbt-spark I linked in the initial post and I think our experience pretty well matches other reports. I think you're right that the issue is something else in the configuration we haven't been able to figure out yet. I do have an update, though. We were able to add the parameter using the UI and so far we haven't gotten the error since we made that change so it seems to be something with the terraform implementation of the configuration change. We have a meeting with our databricks contact soon so I'll update this with a solution if we figure it out. Thanks for your help on this!

Anonymous
Not applicable

Hi @Kesshi Jordan​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Dooley
Valued Contributor II

@Kesshi Jordan​  Update - we got this to work by setting the configuration in the admin settings Data Access Configuration of the Instance Profile.

Dooley
Valued Contributor II

Also confirming that you do not have any of these limitations:

From DBT's website:

Some databases limit where and how descriptions can be added to database objects. Those database adapters might not support persist_docs, or might offer only partial support.

Some known issues and limitations of Databricks specifically:

  • Column-level comments require 
  • file_format: delta
  •  (or another "v2 file format")
  • Column-level comments aren't supported for models materialized as views (issue)

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group