08-31-2023 11:59 AM
Currently, in our company we are using ADF+DATABRICKS for all batch integration. Using ADF first data is copied to ADLS gen 2 (from different sources like on prem servers, ftp solution file sharing, etc), then it is reformatted to csv and it is copied to delta lake where all transformation and merging of data happens. Data is loaded to salesforce using custom bulk api v2 connector. We plan to remove this additional step of first copying data to ADLS gen 2 and rather do everything end to end ETL on databricks . could you please provide architecture diagram or pros/cons for this upgrade? fee free to ask question.
since it is all banking data, security might be a concern.
09-05-2023 12:37 AM
the benefit is you only use a single system instead of one. But that's about it IMO.
Data Factory is cheaper and works great for ingest purposes. It has a lot of connectors and works basically serverless.
If Databricks would offer such a product (serverless cheap ingest with lots of connectors), I wouldn't hesitate and kick ADF out.
But for now we keep on using it.
08-31-2023 12:36 PM
Also another thing most of the systems that we connect to are on on-prem servers, so we use selft hosted integration runtime in ADF. How can this be done in databricks?
09-01-2023 05:36 AM
Hey Ruby,
Here are few options for you
1. Use Databricks partner connect (fivetran, qlik replicate etc) to send data from on-prem directly into raw / bronze delta tables - there is addtional cost for these software's.
2. Set up connection between databricks workspace and on prem network using express route - this will allow you to use external connection and foreign catalog to be created directly on your on-prem data sources
3. Use custom technique to send data from on-prem data-sources to event hub / kafka streams, then use spark structured streaming / delta live tables to ingest data into your bronze zone.
4. Replicate your on-prem data sources to cloud - azure offers different replication options to cloud for most of database technologies. You can then use option #2 to connect to databases directly from your databricks environment.
Each option comes with additional cost, security, restorability, recovery constraints. We have same architecture as yours, we use autoloader to ingest data into bronze tables and also run micro-batch upsets on the stream to update raw tables based on primary keys. For every source table - we have two tables one running as append only and another with merge. its offering us good flexibility in terms of reprocessing, recovery, running multiple tables in parallel in loop using ADF orchestration. Adding new datasources and schema is very easy as its templatized.
09-01-2023 10:42 AM
Is there any benefit of doing the extract part in databricks itself? Unlike our current architecture, where we first load to adls using adf. I guess it is worth doing all end to end using databricks if there is better processing, lower latency, etc
09-04-2023 02:19 PM
@shrikant_kulkar Is there any benefit of doing the extract part in databricks itself? Unlike our current architecture, where we first load to adls using adf. I guess it is worth doing all end to end using databricks if there is better processing, lower latency, etc
09-04-2023 02:21 PM
@-werners- Is there any benefit of doing the extract part in databricks itself? Unlike our current architecture, where we first load to adls using adf. I guess it is worth doing all end to end using databricks if there is better processing, lower latency, etc
09-05-2023 12:37 AM
the benefit is you only use a single system instead of one. But that's about it IMO.
Data Factory is cheaper and works great for ingest purposes. It has a lot of connectors and works basically serverless.
If Databricks would offer such a product (serverless cheap ingest with lots of connectors), I wouldn't hesitate and kick ADF out.
But for now we keep on using it.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group