Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-24-2015 10:17 AM
Hi Mzaradzki -
In Spark 1.5 which we will be adding a feature to improve metadata caching in parquet specifically so it should greatly improve performance for your use case above.
One option to improve performance in Databricks is to use the dbutils.fs.cacheFiles function to move your parquet files to the SSDs attached to the workers in your cluster.
Cheers,
Richard