I am experiencing performance issues when loading a table with 50 million rows into Delta Lake on AWS using Databricks. Despite successfully handling other larger tables, this especific table/process takes hours and doesn't finish. Here's the command I am using:
(df .write .option('overwriteSchema', 'true') .option('mergeSchema', 'true') .save(path=sink, format='delta', mode='overwrite'))
Could you please advise on how to resolve this or optimize the process? Thank you. Best regards, Dener Botta Escaliante Moreira