cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Exploring the Use of Databricks as a Transactional Database

amoralca
New Contributor

Hey everyone, Iโ€™m currently working on a project where my team is thinking about using Databricks as a transactional database for our backend application. We're familiar with Databricks for analytics and big data processing, but we're not sure if itโ€™s the right fit for handling real-time transactional workloads. Has anyone in the community successfully used Databricks for this purpose? Is it a good idea, or would it be better to stick with traditional transactional databases? If you have any experience, success stories, or advice, Iโ€™d really appreciate hearing about it. Looking forward to your insights! Best,

3 REPLIES 3

szymon_dybczak
Esteemed Contributor III

Hi @amoralca ,

Databricks is mainly used for Big data processing. In my opinion it's not the best choice for OLTP database. You spin all those cluster nodes, but then your workload is transactional in nature so you're wasting all that compute power.

Additionally, lakehouse is heavily dependent on 'big data file formats' like parquet, delta lake, orc, iceberg etc.These are typically immutable.In an oltp system you have to do a lot of small synchrone updates which is cumbersome in a lakehouse


But this is interesting question and I'd like to hear more voices on this topic.

Edthehead
Contributor II

My 2 cents, Databricks Lakehouse is like a DWH which is similar to Azure Synapse dedicated pool and meant for a certain purpose. With all that power comes a limitation in concurrency and number of queries that can run in parallel. So, it's great if you are loading large data into it or performing analytical queries. But if you are going to have 100s-1000s of queries and inserts, I do not see it as a good fit. These queries and single inserts will not be using spark at all. Normal SQL DBs come with comparatively lower storage limits but have good concurrency for small queries and inserts. Technically though, you can still use a Databricks lakehouse as a OLTP DB. 

movmarcos
New Contributor II

I have a similar situation in my data quality check process. During this stage, I frequently find errors or potential issues that can stop the pipeline. Each of these errors requires manual intervention, which might involve making edits or supplying justifications for the discrepancies. Once all the issues are resolved, the pipeline can resume its operation without any problems.

I was considering two options:

  1. Using Databricks.
  2. Pushing the data to AWS DynamoDB and get the response back to continue the process.

What are your thoughts on these options?

Context: it is a multi-tenancy process with many clients

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group