Original answer posted by @Gray Gwizdzโ
This was a fun question to try and find the answer to! Thank you for that ๐
I reviewed some of the most recent issues/bugs reported with Delta Lake and was able to find a similar issue where a user was running into performance issues with 1000 columns here (https://github.com/delta-io/delta/issues/479) however there is a pending Pull Request where they tested with 4000 columns here and saw much better performance (https://github.com/delta-io/delta/pull/584).
I also reviewed internally and saw another approach that I would recommend here. This person was experiencing slower write performance when trying to use a really wide table. Instead of defining thousands of columns, the architect used an ArrayType column to contain most of the features instead which improved write performance significantly. They defined an intermediate state with feature fields as list of tuples List[(key, value)] and final output in the feature store as Map[key, aggregated_value].
Perhaps worth mentioning, Delta Lake tracks statistics for the first 32 columns of the table by default, so query planning for any of the additional rows outside of the first 32 will likely not be as quick as the first 32 columns. https://docs.databricks.com/delta/optimizations/file-mgmt.html#data-skipping