Databricks Community

kj1 · ‎03-27-2023

Problem:

When running dbt with persist column docs enabled we get the following error: org.apache.hadoop.hive.ql.metadata.HiveException: at least one column must be specified for the table

Background:

There is an issue on the dbt-spark github that was closed.

https://github.com/dbt-labs/dbt-spark/issues/364 and this was discussed in the #db-databricks-and-spark channel of the DBT slack community.

We tried the recommended fix, setting spark.databricks.delta.catalog.update.enabled=false, but we still get the error if column docs are enabled. We do not get the error if we set persist column docs to be false but then we are not able to use the full documentation functionality in dbt.

Questions:

1) a year ago there was a comment on the dbt slack "we're about 2 weeks away from deploying an enhancement to the async thread that does though delta catalog updates to fix this issue so you won't need to disable delta.catalog.update". Is the fix, above, still recommended? If not is there a new recommendation?

2) The recommended fix setting spark.databricks.delta.catalog.update.enabled=false worked for other posters but not for us so I am wondering if we are implementing correctly. The terraform docs do not exactly match what was described in the post. We set this parameter in "data_access_config" but posters describe setting it in "sql_config_params", but we are not able to do that with terraform. Is the implementation, below, correct? (links to docs that we looked at are bulleted, below)

terraform docs on argument reference have sql_config_params separate from data_access_config as an optional map
data access configuration supported properties includes spark.databricks.delta.catalog.update.enabled false
sql endpoints global-edit documentation includes it in a separate sql_configuration_parameters in the Example Request (below)

resource "databricks_sql_global_config" "cfg" {

instance_profile_arn = aws_iam_instance_profile.hc.arn

data_access_config = {

"spark.databricks.delta.catalog.update.enabled" : "false"

}

~~resource "databricks_sql_global_config" "cfg" {~~

~~instance_profile_arn = aws_iam_instance_profile.hc.arn~~

~~sql_config_params = {~~

~~data_access_config = {~~

~~"spark.databricks.delta.catalog.update.enabled" : "false"~~

}

Thanks for your help!

Anonymous · ‎04-02-2023

@Kesshi Jordan :

Based on the closed issue on the dbt-spark GitHub and the comment from a year ago in the #db-databricks-and-spark channel of the DBT Slack community, it seems that the enhancement to the async thread that does delta catalog updates has not yet been deployed. Therefore, the recommended fix of setting spark.databricks.delta.catalog.update.enabled=false is still relevant.
The implementation that you have shared seems to be correct. Although the Terraform documentation separates sql_config_params and data_access_config as optional maps, it seems that they can be nested within each other. In your implementation, you have correctly set spark.databricks.delta.catalog.update.enabled to false within data_access_config. Therefore, if this parameter is not taking effect for you, it is possible that there might be some other configuration issue causing the problem.

kj1 · ‎04-04-2023

Hi, @Suteja Kanuri thanks so much for your response! Do you have any suggestions of what the other configuration issue might be or any resources to recommend on debugging this? We haven't been able to figure it out yet.

kj1 · ‎04-04-2023

I just realized I copied the diff of the two options we tried by mistake. I've corrected to the current one which is below.

resource "databricks_sql_global_config" "cfg" {

instance_profile_arn = aws_iam_instance_profile.hc.arn

data_access_config = {

"spark.databricks.delta.catalog.update.enabled" : "false"

}

We also tried setting sql_config_params, which caused an error so we went back to setting it in data_access_config

resource "databricks_sql_global_config" "cfg" {

instance_profile_arn = aws_iam_instance_profile.hc.arn

sql_config_params = {

"spark.databricks.delta.catalog.update.enabled" : "false"

}

Anonymous · ‎04-05-2023

@Kesshi Jordan : Can you please provide more information on the error message you're seeing.

kj1 · ‎04-11-2023

I don't know that I have anymore detail on the error message than I put in the initial post. This seems to be a well-documented issue between the dbt slack channel and the github issue from dbt-spark I linked in the initial post and I think our experience pretty well matches other reports. I think you're right that the issue is something else in the configuration we haven't been able to figure out yet. I do have an update, though. We were able to add the parameter using the UI and so far we haven't gotten the error since we made that change so it seems to be something with the terraform implementation of the configuration change. We have a meeting with our databricks contact soon so I'll update this with a solution if we figure it out. Thanks for your help on this!

Anonymous · ‎04-03-2023

Hi @Kesshi Jordan

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.

We'd love to hear from you.

Thanks!

Dooley · ‎04-11-2023

@Kesshi Jordan Update - we got this to work by setting the configuration in the admin settings Data Access Configuration of the Instance Profile.

Dooley · ‎04-11-2023

Also confirming that you do not have any of these limitations:

From DBT's website:

Some databases limit where and how descriptions can be added to database objects. Those database adapters might not support persist_docs, or might offer only partial support.

Some known issues and limitations of Databricks specifically: