Exploring the Use of Databricks as a Transactional Database
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ08-13-2024 02:21 AM
Hey everyone, Iโm currently working on a project where my team is thinking about using Databricks as a transactional database for our backend application. We're familiar with Databricks for analytics and big data processing, but we're not sure if itโs the right fit for handling real-time transactional workloads. Has anyone in the community successfully used Databricks for this purpose? Is it a good idea, or would it be better to stick with traditional transactional databases? If you have any experience, success stories, or advice, Iโd really appreciate hearing about it. Looking forward to your insights! Best,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ08-13-2024 02:38 AM
Hi @amoralca ,
Databricks is mainly used for Big data processing. In my opinion it's not the best choice for OLTP database. You spin all those cluster nodes, but then your workload is transactional in nature so you're wasting all that compute power.
Additionally, lakehouse is heavily dependent on 'big data file formats' like parquet, delta lake, orc, iceberg etc.These are typically immutable.In an oltp system you have to do a lot of small synchrone updates which is cumbersome in a lakehouse
But this is interesting question and I'd like to hear more voices on this topic.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ08-14-2024 12:52 AM
Hi @amoralca, Thanks for reaching out! Please review the responses and let us know which best addresses your question. Your feedback is valuable to us and the community. If the response resolves your issue, kindly mark it as the accepted solution. This will help close the thread and assist others with similar queries. We appreciate your participation and are here if you need further assistance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ08-14-2024 09:02 PM
My 2 cents, Databricks Lakehouse is like a DWH which is similar to Azure Synapse dedicated pool and meant for a certain purpose. With all that power comes a limitation in concurrency and number of queries that can run in parallel. So, it's great if you are loading large data into it or performing analytical queries. But if you are going to have 100s-1000s of queries and inserts, I do not see it as a good fit. These queries and single inserts will not be using spark at all. Normal SQL DBs come with comparatively lower storage limits but have good concurrency for small queries and inserts. Technically though, you can still use a Databricks lakehouse as a OLTP DB.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ12-12-2024 11:23 AM
I have a similar situation in my data quality check process. During this stage, I frequently find errors or potential issues that can stop the pipeline. Each of these errors requires manual intervention, which might involve making edits or supplying justifications for the discrepancies. Once all the issues are resolved, the pipeline can resume its operation without any problems.
I was considering two options:
- Using Databricks.
- Pushing the data to AWS DynamoDB and get the response back to continue the process.
What are your thoughts on these options?
Context: it is a multi-tenancy process with many clients

