cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Exploring the Use of Databricks as a Transactional Database

amoralca
New Contributor

Hey everyone, Iโ€™m currently working on a project where my team is thinking about using Databricks as a transactional database for our backend application. We're familiar with Databricks for analytics and big data processing, but we're not sure if itโ€™s the right fit for handling real-time transactional workloads. Has anyone in the community successfully used Databricks for this purpose? Is it a good idea, or would it be better to stick with traditional transactional databases? If you have any experience, success stories, or advice, Iโ€™d really appreciate hearing about it. Looking forward to your insights! Best,

4 REPLIES 4

szymon_dybczak
Esteemed Contributor III

Hi @amoralca ,

Databricks is mainly used for Big data processing. In my opinion it's not the best choice for OLTP database. You spin all those cluster nodes, but then your workload is transactional in nature so you're wasting all that compute power.

Additionally, lakehouse is heavily dependent on 'big data file formats' like parquet, delta lake, orc, iceberg etc.These are typically immutable.In an oltp system you have to do a lot of small synchrone updates which is cumbersome in a lakehouse


But this is interesting question and I'd like to hear more voices on this topic.

Retired_mod
Esteemed Contributor III

Hi @amoralca, Thanks for reaching out! Please review the responses and let us know which best addresses your question. Your feedback is valuable to us and the community. If the response resolves your issue, kindly mark it as the accepted solution. This will help close the thread and assist others with similar queries. We appreciate your participation and are here if you need further assistance!

Edthehead
Contributor III

My 2 cents, Databricks Lakehouse is like a DWH which is similar to Azure Synapse dedicated pool and meant for a certain purpose. With all that power comes a limitation in concurrency and number of queries that can run in parallel. So, it's great if you are loading large data into it or performing analytical queries. But if you are going to have 100s-1000s of queries and inserts, I do not see it as a good fit. These queries and single inserts will not be using spark at all. Normal SQL DBs come with comparatively lower storage limits but have good concurrency for small queries and inserts. Technically though, you can still use a Databricks lakehouse as a OLTP DB. 

movmarcos
New Contributor II

I have a similar situation in my data quality check process. During this stage, I frequently find errors or potential issues that can stop the pipeline. Each of these errors requires manual intervention, which might involve making edits or supplying justifications for the discrepancies. Once all the issues are resolved, the pipeline can resume its operation without any problems.

I was considering two options:

  1. Using Databricks.
  2. Pushing the data to AWS DynamoDB and get the response back to continue the process.

What are your thoughts on these options?

Context: it is a multi-tenancy process with many clients

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now