Databricks Community

TinasheChinyati · ‎12-08-2023

Hi data experts.

I currently have an OLTP (Azure SQL DB) that keeps data only for the past 14 days. We use Partition switching to achieve that and have an ETL (Azure data factory) process that feeds the Datawarehouse (Azure Synapse Analytics). My requirement is to migrate the OLTP and OLAP into the lakehouse within databricks. I will have a structured streaming from Eventhub to Bronze then have OLTP as silver and OLAP as Gold. Is this something that is possible within databricks. If so, can you suggest ways to achieve this possibly using delta tables or delta live tables. Thank you

ChrisCkx · ‎05-17-2024

Hi @Retired_mod I have looked at this topic extensively and have even tried to implement it.
I am a champion of databricks at my organization, but I do not think that it currently enables the OLTP scenarios.

The closest I have gotten to it is by using the Statement Execution REST API using a SQL warehouse as the compute. This avoids the 3~5 minute delay on cluster bootup.

But even with that approach, simple SELECT queries against delta tables are still executed in the order of 5~10 seconds where as any decent OLTP (Oracle, MSSQL, mysql) will perform one order of magnitude faster (under 500 ms).

bsanoop · ‎09-04-2024

Hi @Retired_mod
Is there any update on this recently? As @ChrisCkx mentioned, we have tried implementing SELECT SQL statements and the latencies are similar to his observations. Are there any solutions that allows me to query with few-100 ms of latency?

Gabriele_Giuff · ‎09-20-2024

HI, i'm very interested on this topic, i'd like to understand if actually Databricks is capable to achieve the OLTP performance while retrieving a specific records making some join select with other entities(typical OLTP read workload). if it is not, maybe i should store the same data into a speed layer in order to serve the inquiry requests faster (e.g using a NoSql DB).

Thanks!

szymon_dybczak · ‎09-20-2024

Hi @Gabriele_Giuff ,

I would not use it as an OLTP system. There are few reasons

- lakehouse is heavily dependent on 'big data file formats' like parquet, delta lake, orc, iceberg etc. which are typically immutable

- in an oltp system you have to do a lot of small synchrone updates which is cumbersome in a lakehouse (because of the immutability of the files).

- in oltp workload you have highly normalized data model which means a lot of table that need to be joined, so another reason why databricks it's not the best fit for this kind of workload.

- you don't have enforced relational constraint as in traditional rdbms

Databricks really shines in doing large scale ingest, transformation and analysis on broad swaths of data in the GB/TB/PB range and not particularly on point/needle in a haystack type lookups
You can use it as an OLTP, but in my opinion it would not be as performant as engine specifically crafted for OLPT kind of workload.

bsanoop · ‎09-20-2024

@szymon_dybczak
Thanks for your explanation.

While I understand the limitations of Databricks as a OLTP read system, is there any solution at all that is read-optimized? Like an OLAP layer which optimizes for both aggregation and reads with low latency.

Ben_dHont · ‎10-13-2024

@ChrisCkx and @bsanoop

Databricks is currently building a OLTP database functionality which is currently in private preview. It is a serverless PostgREST database. Documentation can be found here: [EXTERNAL] Online Tables REST - Private Preview Documentation [v2.0].pdf - Google Drive.

Contact your Databricks account manager to be added to the private preview.