hi , can u help me in this
I am using this query to create a csv in a volume named test_volsrr that i created
INSERT OVERWRITE DIRECTORY '/Volumes/DATAMAX_DATABRICKS/staging/test_volsrr'
USING CSV OPTIONS ( 'delimiter' = ',' , 'header' = 'true' )
SELECT * FROM staging.extract1gb
DISTRIBUTE BY COALESCE( 1 );
i added DISTRIBUTE BY COALESCE(1) so that a single csv gets generated instead of multiple csvs , the size of extract1gb table is 1gb but the csv which is getting created is around 230gb , due to this it is taking more than an hour to execute . Can some pls explain this issue and a solution to generate the csv of optimal size so that execution becomes faster . I dont want to use pyspark .