Hey!
It looks like the issue you’re facing might be related to the proxy timeout when downloading large files from DBFS. Since modifying the proxy settings might not be an option, there are a couple of alternative approaches you could consider to mitigate this issue.
First, instead of downloading the entire file at once, you can try downloading it in smaller chunks to avoid hitting the request timeout limit. By implementing chunked downloads, you can bypass proxy-imposed limits and improve reliability.
Another effective approach is to store logs directly in an external cloud storage solution, such as AWS S3, Azure Blob Storage, or Google Cloud Storage, instead of DBFS. Databricks allows you to configure cluster logging to automatically save logs to these cloud storage solutions, making them easier to access and download without relying on DBFS.
🙂