cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

What are the options to offer a low latency API for small tables derived from big tables?

data_boy_2022
New Contributor III

I have a big dataset which gets divided into smaller datasets. For some of these smaller datasets I'd like to offer a low latency API (*** ms) to query them.

Big dataset 1B entries

Smaller dataset 1 Mio entries

What's the best way to do it?

I thought about the following way:

Big dataset -> 100s of smaller datasets -> push relevant (e.g. 5/100) smaller datasets to Postgres DB-> API over Postgres DB

Ideally I want to update the smaller datasets on a custom schedule.

Is there a better way by staying within the Databricks/Delta ecosystem?

I heard there is a concept of a Delta Live Table. Would that be a viable option?

1 ACCEPTED SOLUTION

Accepted Solutions

Tian
New Contributor III

Hi!

For low latency queries, it'll be great to break this down into two parts: query serving latency, and data freshness latency. Serving the data with DLT can probably get streams in 1 sec intervals, and once that's committed to delta, it's immediately available to readers in DBSQL with about 1 second of query latency.

If you're looking for ms query serving latency, it is highly recommended to use an operation DB for such use cases. Hope that helps!

View solution in original post

2 REPLIES 2

Tian
New Contributor III

Hi!

For low latency queries, it'll be great to break this down into two parts: query serving latency, and data freshness latency. Serving the data with DLT can probably get streams in 1 sec intervals, and once that's committed to delta, it's immediately available to readers in DBSQL with about 1 second of query latency.

If you're looking for ms query serving latency, it is highly recommended to use an operation DB for such use cases. Hope that helps!

Vidula
Honored Contributor

Hi @Jan R​ 

Does @Tian Tan​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?

We'd love to hear from you.

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group