cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Random error related to dynamic variables in SQL - at least one column must be specified (shallowclone table)

lizou
Contributor II

Environment: Azure

A data transformation template that take s table name as variable. The variable is set in separate notebook %run ../_config

Shallow clone is used to generate a staging table with exact same columns of target table.

A random error occurs when the ETL is running concurrent notebooks at same time.

Error log attached. (May be related to cluster resources?)

Thanks in advance if any insight is provided...

-- Create a temp table for data transformation

drop table if exists ${v.source};

create table if not exists ${v.source}

shallow clone referrals_audit;

truncate table ${v.source}; -- shallow clone also keeps data reference and we should remove it

alter table ${v.source} add columns (Error string);

Error in SQL statement: AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: at least one column must be specified for the table

image 

1 ACCEPTED SOLUTION

Accepted Solutions

The solution is simple, don't include a shallow clone in a ETL that runs frequently. It is not necessary to change the table schema if there is no need to do so. Use a fixed table schema for a temp table in which data get truncated and reloaed.

View solution in original post

8 REPLIES 8

Kaniz
Community Manager
Community Manager

Hi @ lizou! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

Hubert-Dudek
Esteemed Contributor III

A random error occurs when the ETL is running concurrent notebooks at same time.

Is both using the same variable value for ${v.source}?

If yes it can be just logic issue as one notebook is dropping temp table used by other notebook.

lizou
Contributor II

@Hubert Dudek​  thanks for replying

The variable is different in each notebook.

The error only happened once so far, and I may need to keep eye on it.

The idea to use a notebook template with dynamics table name, all merge statements

will be dynamically generated using table column list.

Attached is _config

notebook call _config to set variables that are different because variable 'table' will be different in each notebook.

image

lizou
Contributor II

The error happens again. I have removed the shallow clone in each notebook run.

My reasoning is there is an overhead of using shallow clone, and there is no need to

recreate the temp table structure each time as the table schema does not change much.

I have out shallow clone in a separate schema management task.

truncate table is the only line remains and the issue should not occur again.

Kaniz
Community Manager
Community Manager

Hi @lizou​ , Did you solve the problem by yourself or do you still require any help?

lizou
Contributor II

resolved myself

Anonymous
Not applicable

@lizou​ - That's great! I'm so glad. Would you be happy to share your solution with us and then mark the answer as best? That will help others find the solution more quickly. 🙂

The solution is simple, don't include a shallow clone in a ETL that runs frequently. It is not necessary to change the table schema if there is no need to do so. Use a fixed table schema for a temp table in which data get truncated and reloaed.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.