I have a big dataset which gets divided into smaller datasets. For some of these smaller datasets I'd like to offer a low latency API (*** ms) to query them.
Big dataset 1B entries
Smaller dataset 1 Mio entries
What's the best way to do it?
I thought about the following way:
Big dataset -> 100s of smaller datasets -> push relevant (e.g. 5/100) smaller datasets to Postgres DB-> API over Postgres DB
Ideally I want to update the smaller datasets on a custom schedule.
Is there a better way by staying within the Databricks/Delta ecosystem?
I heard there is a concept of a Delta Live Table. Would that be a viable option?