Optimizing Large Materialized View to expedite query execution
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
4 weeks ago
Hi All, I have a DLT Pipeline setup which reading Parquets from S3 Bucket and create a materialized view. Created view is quite big and contains Billion of records and contain around few TB of data. Predictive Optimization is already enabled. automatic liquid clustering is enabled at catalogue level. However queries execution takes a bit time for example simple queries with a single JOIN etc. Is there anything I am missing out? And is read_stream and creating DELTA tables is better option then Materialized VIEWs? And should I partition this table based on Date?
spark.read.format("parquet").option("ignoreCorruptFiles", "true")