Undescriptive error when trying to insert overwrite into a table

cmilligan
Contributor II

I have a query that I'm trying to insert overwrite into a table. In an effort to try and speed up the query I added a range join hint. After adding it I started getting the error below.Screenshot_20230118_104626I can get around this though by creating a temporary view of the same query and caching that temporary view. It will then allow me to insert overwrite into the table.

Ajay-Pandey
Databricks MVP

Could you please check for the datatype of your source and target table.

There might be mismatch between both.

Ajay Kumar Pandey

Both are the same

hi , can u help me in this 

I am using this query to create a csv in a volume named test_volsrr that i created

INSERT OVERWRITE DIRECTORY '/Volumes/DATAMAX_DATABRICKS/staging/test_volsrr'

USING CSV OPTIONS ( 'delimiter' = ',' , 'header' = 'true' )

SELECT * FROM staging.extract1gb

DISTRIBUTE BY COALESCE( 1 );  

i added DISTRIBUTE BY COALESCE(1) so that a single csv gets generated instead of multiple csvs , the size of extract1gb table is 1gb but the csv which is getting created is around 230gb , due to this it is taking more than an hour to execute . Can some pls explain this issue and a solution to generate the csv of optimal size so that execution becomes faster . I dont want to use pyspark .

jose_gonzalez
Databricks Employee
Databricks Employee

Could you share your code and the full error stack trace please? Check the driver logs for the full stack trace.