Autoloader event vs directory ingestion
For a production work load containing around 15k gzip compressed json files per hour all in a YYYY/MM/DD/HH/id/timestamp.json.gz directoryWhat would be the better approach on ingesting this into a delta table in terms of not only the incremental load...
- 1387 Views
- 2 replies
- 1 kudos
Latest Reply
@Kaniz Fatma​ So i've not found a fix for the small file problem using autoloader, seems to struggle really badly against large directories, had a cluster running for 8h stuck on "listing directory" part with no end, cluster seemed completely idle to...
- 1 kudos