Databricks Community

lizou · ‎11-13-2021

Environment: Azure

A data transformation template that take s table name as variable. The variable is set in separate notebook %run ../_config

Shallow clone is used to generate a staging table with exact same columns of target table.

A random error occurs when the ETL is running concurrent notebooks at same time.

Error log attached. (May be related to cluster resources?)

Thanks in advance if any insight is provided...

-- Create a temp table for data transformation

drop table if exists ${v.source};

create table if not exists ${v.source}

shallow clone referrals_audit;

truncate table ${v.source}; -- shallow clone also keeps data reference and we should remove it

alter table ${v.source} add columns (Error string);

Error in SQL statement: AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: at least one column must be specified for the table

lizou · ‎04-17-2022

The solution is simple, don't include a shallow clone in a ETL that runs frequently. It is not necessary to change the table schema if there is no need to do so. Use a fixed table schema for a temp table in which data get truncated and reloaed.

View solution in original post

Hubert-Dudek · ‎11-14-2021

A random error occurs when the ETL is running concurrent notebooks at same time.

Is both using the same variable value for ${v.source}?

If yes it can be just logic issue as one notebook is dropping temp table used by other notebook.

lizou · ‎11-14-2021

@Hubert Dudek thanks for replying

The variable is different in each notebook.

The error only happened once so far, and I may need to keep eye on it.

The idea to use a notebook template with dynamics table name, all merge statements

will be dynamically generated using table column list.

Attached is _config

notebook call _config to set variables that are different because variable 'table' will be different in each notebook.

lizou · ‎11-28-2021

The error happens again. I have removed the shallow clone in each notebook run.

My reasoning is there is an overhead of using shallow clone, and there is no need to

recreate the temp table structure each time as the table schema does not change much.

I have out shallow clone in a separate schema management task.

truncate table is the only line remains and the issue should not occur again.

lizou · ‎02-16-2022

resolved myself

Anonymous · ‎02-16-2022

@lizou - That's great! I'm so glad. Would you be happy to share your solution with us and then mark the answer as best? That will help others find the solution more quickly. 🙂

lizou · ‎04-17-2022

The solution is simple, don't include a shallow clone in a ETL that runs frequently. It is not necessary to change the table schema if there is no need to do so. Use a fixed table schema for a temp table in which data get truncated and reloaed.