I need to read many delta tables in azure object storage (block blobs). There is no root object delta table, but rather many fragmented delta tables that share a common schema but not common paths.
Iterating over the paths with a for loop is performing very poorly because the list of paths is long and the operation can't be parallelized.
The final result should be a union of all delta tables read from a list of paths. The issue is that I cannot pass directly a list of paths to spark.read.load, because this results in the exception:
Databricks Delta does not support multiple input paths in the load() API. To build a single DataFrame by loading multiple paths from the same Delta table, please load the root path of the Delta table with the corresponding partition filters. If the multiple paths are from different Delta tables, please use Dataset's union()/unionByName() APIs to combine the DataFrames generated by separate load() API calls.