cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Parallel read of many delta tables

leobocci
New Contributor

I need to read many delta tables in azure object storage (block blobs). There is no root object delta table, but rather many fragmented delta tables that share a common schema but not common paths.

Iterating over the paths with a for loop is performing very poorly because the list of paths is long and the operation can't be parallelized.

The final result should be a union of all delta tables read from a list of paths. The issue is that I cannot pass directly a list of paths to spark.read.load, because this results in the exception:
Databricks Delta does not support multiple input paths in the load() API. To build a single DataFrame by loading multiple paths from the same Delta table, please load the root path of the Delta table with the corresponding partition filters. If the multiple paths are from different Delta tables, please use Dataset's union()/unionByName() APIs to combine the DataFrames generated by separate load() API calls.

0 REPLIES 0
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.