- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-27-2022 10:59 AM
Hi there,
Thanks for getting back to me.
My question is regarding backfilling and loading HISTORICAL data incrementally (day by day) using AutoLoader.
I would like to run Autoloader on data that is partitioned by year/month/day. I would like Autoloader to read this data incrementally and then write it incrementally to prevent CPU overloading and other memory issues.
When I run Autoloader today using the setup above, I see in the SparkUI that it is trying to load the entire 1TB s3 bucket into memory rather than reading it day-by-day (incrementally.)
Do I have the backfill setup incorrectly or am I missing something that can make Autoloader backfill daily first?
Thanks,
Avkash