cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Random error related to dynamic variables in SQL - at least one column must be specified (shallowclone table)

lizou
Contributor II

Environment: Azure

A data transformation template that take s table name as variable. The variable is set in separate notebook %run ../_config

Shallow clone is used to generate a staging table with exact same columns of target table.

A random error occurs when the ETL is running concurrent notebooks at same time.

Error log attached. (May be related to cluster resources?)

Thanks in advance if any insight is provided...

-- Create a temp table for data transformation

drop table if exists ${v.source};

create table if not exists ${v.source}

shallow clone referrals_audit;

truncate table ${v.source}; -- shallow clone also keeps data reference and we should remove it

alter table ${v.source} add columns (Error string);

Error in SQL statement: AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: at least one column must be specified for the table

image 

1 ACCEPTED SOLUTION

Accepted Solutions

The solution is simple, don't include a shallow clone in a ETL that runs frequently. It is not necessary to change the table schema if there is no need to do so. Use a fixed table schema for a temp table in which data get truncated and reloaed.

View solution in original post

6 REPLIES 6

Hubert-Dudek
Esteemed Contributor III

A random error occurs when the ETL is running concurrent notebooks at same time.

Is both using the same variable value for ${v.source}?

If yes it can be just logic issue as one notebook is dropping temp table used by other notebook.

lizou
Contributor II

@Hubert Dudek​  thanks for replying

The variable is different in each notebook.

The error only happened once so far, and I may need to keep eye on it.

The idea to use a notebook template with dynamics table name, all merge statements

will be dynamically generated using table column list.

Attached is _config

notebook call _config to set variables that are different because variable 'table' will be different in each notebook.

image

lizou
Contributor II

The error happens again. I have removed the shallow clone in each notebook run.

My reasoning is there is an overhead of using shallow clone, and there is no need to

recreate the temp table structure each time as the table schema does not change much.

I have out shallow clone in a separate schema management task.

truncate table is the only line remains and the issue should not occur again.

resolved myself

Anonymous
Not applicable

@lizou​ - That's great! I'm so glad. Would you be happy to share your solution with us and then mark the answer as best? That will help others find the solution more quickly. 🙂

The solution is simple, don't include a shallow clone in a ETL that runs frequently. It is not necessary to change the table schema if there is no need to do so. Use a fixed table schema for a temp table in which data get truncated and reloaed.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group