11-13-2021 08:20 AM
Environment: Azure
A data transformation template that take s table name as variable. The variable is set in separate notebook %run ../_config
Shallow clone is used to generate a staging table with exact same columns of target table.
A random error occurs when the ETL is running concurrent notebooks at same time.
Error log attached. (May be related to cluster resources?)
Thanks in advance if any insight is provided...
-- Create a temp table for data transformation
drop table if exists ${v.source};
create table if not exists ${v.source}
shallow clone referrals_audit;
truncate table ${v.source}; -- shallow clone also keeps data reference and we should remove it
alter table ${v.source} add columns (Error string);
Error in SQL statement: AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: at least one column must be specified for the table
04-17-2022 03:28 PM
The solution is simple, don't include a shallow clone in a ETL that runs frequently. It is not necessary to change the table schema if there is no need to do so. Use a fixed table schema for a temp table in which data get truncated and reloaed.
11-14-2021 08:53 AM
A random error occurs when the ETL is running concurrent notebooks at same time.
Is both using the same variable value for ${v.source}?
If yes it can be just logic issue as one notebook is dropping temp table used by other notebook.
11-14-2021 07:42 PM
@Hubert Dudek thanks for replying
The variable is different in each notebook.
The error only happened once so far, and I may need to keep eye on it.
The idea to use a notebook template with dynamics table name, all merge statements
will be dynamically generated using table column list.
Attached is _config
notebook call _config to set variables that are different because variable 'table' will be different in each notebook.
11-28-2021 08:24 PM
The error happens again. I have removed the shallow clone in each notebook run.
My reasoning is there is an overhead of using shallow clone, and there is no need to
recreate the temp table structure each time as the table schema does not change much.
I have out shallow clone in a separate schema management task.
truncate table is the only line remains and the issue should not occur again.
02-16-2022 06:04 AM
resolved myself
02-16-2022 08:10 AM
@lizou - That's great! I'm so glad. Would you be happy to share your solution with us and then mark the answer as best? That will help others find the solution more quickly. 🙂
04-17-2022 03:28 PM
The solution is simple, don't include a shallow clone in a ETL that runs frequently. It is not necessary to change the table schema if there is no need to do so. Use a fixed table schema for a temp table in which data get truncated and reloaed.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group