Hi @cat017,
Here are a few recommendations:
Use Auto Loader with File Notification Mode: Instead of reading the entire CSV file each time, you can use Databricks Auto Loader with File Notification Mode. This mode allows you to efficiently process new data files as they arrive in cloud storage without re-reading the entire file. Auto Loader can be configured to use AWS SQS for file notifications, which helps in detecting new files or changes to existing files. This approach minimizes the risk of encountering the "underlying files have been updated" error.
Implement Change Data Capture (CDC): If the third-party system supports it, consider implementing a Change Data Capture (CDC) mechanism. CDC captures only the changes (inserts, updates, deletes) made to the data and can be ingested into your DLT pipeline. This way, you can process only the new or changed rows instead of the entire file. Databricks provides APIs to simplify CDC with Delta Live Tables.
https://docs.databricks.com/en/delta-live-tables/cdc.html
Would be a good idea to file a case with us to better understand your use-case and suggest: https://docs.databricks.com/en/resources/support.html