Too many small files from updates
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Hi ,
I am updating some data into a delta table , each time I only need to update one row due to which after every update statement it is creating new file, How do I tackle this issue , it doesn't make sense to run optimize command after every update command
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Usually this problem is solved with autooptimize property.
For Managed tables this option is enabled by default
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Depending on your table settings, those may be log files or version use for Time Travel. Unless you've mastered partitioning, you really shouldn't worry about the files and let the system do what it does.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
But it is making a performance head , every update command is taking more time than previous since it has to filter in more files
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
set following spark session properties and give a try:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
OK something isn't right, I work with massive datasets and this is not an issue for a single update. If your architecture and Unity Catalog configuration is correct, and there's also not some weird bug, you should not be aware of the underlying files. Are you working against the data files directly or are you querying the tables in Unity Catalog?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
If you performing 100s of update operations on the delta table, you can opt to run an optimize operation after a batch of 100 updates. There should be no significant performance issue up to 100 such updates

