cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

What are the options to offer a low latency API for small tables derived from big tables?

data_boy_2022
New Contributor III

I have a big dataset which gets divided into smaller datasets. For some of these smaller datasets I'd like to offer a low latency API (*** ms) to query them.

Big dataset 1B entries

Smaller dataset 1 Mio entries

What's the best way to do it?

I thought about the following way:

Big dataset -> 100s of smaller datasets -> push relevant (e.g. 5/100) smaller datasets to Postgres DB-> API over Postgres DB

Ideally I want to update the smaller datasets on a custom schedule.

Is there a better way by staying within the Databricks/Delta ecosystem?

I heard there is a concept of a Delta Live Table. Would that be a viable option?

1 ACCEPTED SOLUTION

Accepted Solutions

Tian
New Contributor III

Hi!

For low latency queries, it'll be great to break this down into two parts: query serving latency, and data freshness latency. Serving the data with DLT can probably get streams in 1 sec intervals, and once that's committed to delta, it's immediately available to readers in DBSQL with about 1 second of query latency.

If you're looking for ms query serving latency, it is highly recommended to use an operation DB for such use cases. Hope that helps!

View solution in original post

2 REPLIES 2

Tian
New Contributor III

Hi!

For low latency queries, it'll be great to break this down into two parts: query serving latency, and data freshness latency. Serving the data with DLT can probably get streams in 1 sec intervals, and once that's committed to delta, it's immediately available to readers in DBSQL with about 1 second of query latency.

If you're looking for ms query serving latency, it is highly recommended to use an operation DB for such use cases. Hope that helps!

Vidula
Honored Contributor

Hi @Jan R​ 

Does @Tian Tan​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?

We'd love to hear from you.

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.