topic Re: ADF vs Databricks in Get Started Discussions

ADF vs Databricks

Phani1 — Tue, 23 Jan 2024 07:49:34 GMT

Hi Team ,

I would appreciate your suggestion on which scenario to choose between ADF (Azure Data Factory) and Databricks for orchestration, as well as any significant differences between them.

Regards,

Phanindra

Re: ADF vs Databricks

Phani1 — Wed, 24 Jan 2024 06:59:43 GMT

Thank you for the response. Are there any significant differences between our orchestration/job scheduling methods, particularly in terms of handling databricks workflows and scheduling components other than dbt objects?

Re: ADF vs Databricks

Michael_Galli — Wed, 24 Jan 2024 07:17:11 GMT

Hi, I work with both, so it depends on the usecase.

ADF is easy to set up and good for data integration, e.g. "copy data" job to transfer files from storage 1 to storage 2
ADF data flows (data transformations) can be used to some level, but when the transformations get more complex, I recomment to use Databricks notebooks with PySpark code
I am not sure how much effort Microsoft will put into ADF data flows, as in Fabric there are data flows gen 2, which are completely different to the data flows in ADF

So, for a easy low-code data ingestion and moderate data transformations I recommend ADF, and for more extensive usecases I recommend Databricks workflows.
You can combine both (Pipeline with ADF that runs a Databricks Notebook) but then you have multiple Azure services you need to take care of in terms of version control and change management.