cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Is databricks capable of housing OLTP and OLAP?

TinasheChinyati
New Contributor III

Hi data experts.

I currently have an OLTP (Azure SQL DB) that keeps data only for the past 14 days. We use Partition switching to achieve that and have an ETL (Azure data factory) process that feeds the Datawarehouse (Azure Synapse Analytics). My requirement is to migrate the OLTP and OLAP into the lakehouse within databricks. I will have a structured streaming from Eventhub to Bronze then have OLTP as silver and OLAP as Gold. Is this something that is possible within databricks. If so, can you suggest ways to achieve this possibly using delta tables or delta live tables. Thank you

6 REPLIES 6

ChrisCkx
New Contributor II

Hi @Retired_mod I have looked at this topic extensively and have even tried to implement it.
I am a champion of databricks at my organization, but I do not think that it currently enables the OLTP scenarios.

The closest I have gotten to it is by using the Statement Execution REST API using a SQL warehouse as the compute. This avoids the 3~5 minute delay on cluster bootup.

But even with that approach, simple SELECT queries against delta tables are still executed in the order of 5~10 seconds where as any decent OLTP (Oracle, MSSQL, mysql) will perform one order of magnitude faster (under 500 ms).

bsanoop
New Contributor II

Hi @Retired_mod 
Is there any update on this recently? As @ChrisCkx mentioned, we have tried implementing SELECT SQL statements and the latencies are similar to his observations. Are there any solutions that allows me to query with few-100 ms of latency?

Gabriele_Giuff
New Contributor II

HI, i'm very interested on this topic, i'd like to understand if actually Databricks is capable to achieve the OLTP performance while retrieving a specific records making some join select with other entities(typical OLTP read workload). if it is not, maybe i should store the same data into a speed layer in order to serve the inquiry requests faster (e.g using a NoSql DB).

Thanks!

Hi @Gabriele_Giuff ,

I would not use it as an OLTP system. There are few reasons

- lakehouse is heavily dependent on 'big data file formats' like parquet, delta lake, orc, iceberg etc. which are typically immutable

- in an oltp system you have to do a lot of small synchrone updates which is cumbersome in a lakehouse (because of the immutability of the files).

- in oltp workload you have highly normalized data model which means a lot of table that need to be joined, so another reason why databricks it's not the best fit for this kind of workload.

- you don't have enforced relational constraint as in traditional rdbms

Databricks really shines in doing large scale ingest, transformation and analysis on broad swaths of data in the GB/TB/PB range and not particularly on point/needle in a haystack type lookups
You can use it as an OLTP, but in my opinion it would not be as performant as engine specifically crafted for OLPT kind of workload.

bsanoop
New Contributor II

@szymon_dybczak 
Thanks for your explanation. 

While I understand the limitations of Databricks as a OLTP read system, is there any solution at all that is read-optimized? Like an OLAP layer which optimizes for both aggregation and reads with low latency.

Ben_dHont
New Contributor II

@ChrisCkx and @bsanoop

Databricks is currently building a OLTP database functionality which is currently in private preview. It is a serverless PostgREST database. Documentation can be found here: [EXTERNAL] Online Tables REST - Private Preview Documentation [v2.0].pdf - Google Drive.

Contact your Databricks account manager to be added to the private preview.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group