12-08-2023 11:29 PM
Hi data experts.
I currently have an OLTP (Azure SQL DB) that keeps data only for the past 14 days. We use Partition switching to achieve that and have an ETL (Azure data factory) process that feeds the Datawarehouse (Azure Synapse Analytics). My requirement is to migrate the OLTP and OLAP into the lakehouse within databricks. I will have a structured streaming from Eventhub to Bronze then have OLTP as silver and OLAP as Gold. Is this something that is possible within databricks. If so, can you suggest ways to achieve this possibly using delta tables or delta live tables. Thank you
05-17-2024 06:59 AM
Hi @Retired_mod I have looked at this topic extensively and have even tried to implement it.
I am a champion of databricks at my organization, but I do not think that it currently enables the OLTP scenarios.
The closest I have gotten to it is by using the Statement Execution REST API using a SQL warehouse as the compute. This avoids the 3~5 minute delay on cluster bootup.
But even with that approach, simple SELECT queries against delta tables are still executed in the order of 5~10 seconds where as any decent OLTP (Oracle, MSSQL, mysql) will perform one order of magnitude faster (under 500 ms).
09-04-2024 01:35 AM
Hi @Retired_mod
Is there any update on this recently? As @ChrisCkx mentioned, we have tried implementing SELECT SQL statements and the latencies are similar to his observations. Are there any solutions that allows me to query with few-100 ms of latency?
09-20-2024 03:51 AM
HI, i'm very interested on this topic, i'd like to understand if actually Databricks is capable to achieve the OLTP performance while retrieving a specific records making some join select with other entities(typical OLTP read workload). if it is not, maybe i should store the same data into a speed layer in order to serve the inquiry requests faster (e.g using a NoSql DB).
Thanks!
09-20-2024 04:15 AM
Hi @Gabriele_Giuff ,
I would not use it as an OLTP system. There are few reasons
- lakehouse is heavily dependent on 'big data file formats' like parquet, delta lake, orc, iceberg etc. which are typically immutable
- in an oltp system you have to do a lot of small synchrone updates which is cumbersome in a lakehouse (because of the immutability of the files).
- in oltp workload you have highly normalized data model which means a lot of table that need to be joined, so another reason why databricks it's not the best fit for this kind of workload.
- you don't have enforced relational constraint as in traditional rdbms
Databricks really shines in doing large scale ingest, transformation and analysis on broad swaths of data in the GB/TB/PB range and not particularly on point/needle in a haystack type lookups
You can use it as an OLTP, but in my opinion it would not be as performant as engine specifically crafted for OLPT kind of workload.
09-20-2024 07:02 AM
@szymon_dybczak
Thanks for your explanation.
While I understand the limitations of Databricks as a OLTP read system, is there any solution at all that is read-optimized? Like an OLAP layer which optimizes for both aggregation and reads with low latency.
10-13-2024 11:42 PM
Databricks is currently building a OLTP database functionality which is currently in private preview. It is a serverless PostgREST database. Documentation can be found here: [EXTERNAL] Online Tables REST - Private Preview Documentation [v2.0].pdf - Google Drive.
Contact your Databricks account manager to be added to the private preview.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group