cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Using databricks as an application database?

646901
New Contributor II

Is databricks suitable to be used as an application database? I have been asked to build a fairly large CRM type app, databricks will use the data from this for analysis. I am thinking if i built the application database inside of databricks then i can simplify the later process of getting the data out of an sql type database.

Is databricks suitable as an application database or should it be only used for Datalake and warehousing.

6 REPLIES 6

karthik_p
Esteemed Contributor

@Matt Userโ€‹ Databricks is mainly used for Big data processing, or if you need to cleaning/transformations and finally want to consume them on BI that can be used, if only for normal application database without any cleansing then that is your call. we will wait for community members responses and see if any one using databricks as normal application database

Anonymous
Not applicable

Hi @Matt Userโ€‹ 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!

1782390347
New Contributor II

So our company adopted databricks earlier 2021, through niavity databricks was going to be the "source of truth" for all our data and application. We could build apps, report and do analysis all in one place, it seemed like the holy grail of consolidate everything into one place.

Our organization has since moved away from this model and now all applications are linked to databricks but hold their own datastroage. In hindsight it was not a smart idea to make databricks the one database to rule them all.

The more we pinned applications to the databricks data model the more fragile things became, application coulpling slowed down teams, there was a dilution of what data mattered in our datalake/warehouse.

Handilng sesion/ application specific state was increasingly painful, as well as scaling the apps.

We were always compromising the application to make things work the pyspark/databricks way. Even managing application data inside of databricks just added extra complextity and clunkiness.

The schemas, ETL and processing became intertwined to the extent the teams were feeling like they were fighting a loosing battle daily where one change would break other things.

Yes, the accessiblity of data was "easy" but at a cost of most other things.

IMHO (and as written in the databricks docs) Databricks is a lakehouse/ warehouse, and as such it should be kept that way. You can add applications to databricks and it will seem fast and easy until you are comprosing and fighting fires daily.

pvignesh92
Honored Contributor

Hi @Matt Userโ€‹. If I understand your question, you wanted to consider using Delta lake for your CRM database.

  1. Latency -> How much read and write latency are you expecting in Delta lake? The way data is organized in databases are much different than in Delta lake as the data will be sitting in different blocks in a cloud object store.
  2. Concurrency -> How many read and write transactions are you expecting to happen in the delta lake? Currently delta lake supports snapshot isolation for read I believe. How do you want your read and write to be isolated?
  3. Indexing -> There is no indexing or Primary key concept in Delta lake. Are you expecting some thing like this to speed up your searches?

There are many such areas that you need to explore before coming to a conclusion what you need to choose for this purpose. Hope this helps.

646901
New Contributor II

@Vigneshraja Palanirajโ€‹ 

Latency under 50-100ms to the database would be ideal, once we start adding 2-5 queries in a request, and time after this really starts to compound and add up.

Concurrency - the number of users initially will be 5-10 users but this will be expanding as we add more functionality and tools. But there maybe APIs and other integrations added over time.

Indexing - we will need a way to cache as indexing is a limitation. Some tables could be 100-thousands big with the need to join data.

pvignesh92
Honored Contributor

Hi @Matt Userโ€‹. Then with this requirement, you might have realized that delta lake cannot be a right choice for databases. It is not possible to achieve a latency of 100ms unless all your data is cached. Since with the introduction of DBSQL with delta lake and photon, you can consider using this as a operational data warehouse but it can't be a solution to your transaction database requirement. Please let me know if this helps.