cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

What ETL/ELT used the most within this group?

Doug1
Contributor
 
26 REPLIES 26

PriyaAnanthram
Contributor III

Depends on the environment I guess on azure Ive used Azure datafactory ,Synapse/databricks

on AWS more snowflake +matillion

Interesting thanks for sharing. Have you ever used matillion on Azure? I would like to hear some experiences on that as i hear more azure customers going for matillion from an end to end platform

Matillion does have connectors for azure I am pretty sure it supports Azure well.... I think choice of tool varies from organisation to organisation depends on the skill set of the team ,how much they like to write code (lots of ETL/ELT is lowcode)

Matillion does work very well with Azure without a doubt.... here is some good comparing analysis of the tools, https://www.trustradius.com/compare-products/azure-data-factory-vs-matillion

Of course if you're cloud agnostic then matillion makes more sense as you can help your customers or if your org has a multi-cloud strategy with a single end to end data transformation platform

-werners-
Esteemed Contributor III

Azure here: Data Factory + Databricks + Synapse

what about on AWS or GCP? Even on azure i hear great feedback on matillion - if you have any experience pls share

-werners-
Esteemed Contributor III

Easy: for transformations, code first approach is superior to low/no code tools.

So this leaves extract and orchestration. Keep that as cheap as possible, no need for Matillion.

Can't speak for people who do not like code though.

On AWS I'd try Glue.

code is superior to low/no code tools? that is a debate itself - many tech guys have a very different view on this specially when you consider all the time you free up when you have a low/no code specially from a CIO $$ perspective ๐Ÿ™‚

On Data Factory, is a good tool though only has Azure sources and targets or destinations and comparing to other tools like matillion limited transformation needs. If you need to work with different cloud warehouse providers then you will need to learn all other tools...

Glue is suitable for any developer looking to make use of their data within their cloud data platform, on here Glue works side by side with matillion (many feedback i heard) because they say matillion differentiators are auto-documentation, cloud agnostic, customer connector, data ingesion etc

Thanks for your opinion @Werner Stinckensโ€‹  always valuable ๐Ÿ™‚

-werners-
Esteemed Contributor III

"code is superior to low/no code tools? that is a debate itself "

The debate is mainly driven by low/no code vendors, but that is fine of course. To each his own.

"On Data Factory, is a good tool though only has Azure sources and targets or destinations and comparing to other tools like matillion limited transformation needs."

That is incorrect. Even though I am not a fan of Data Factory at all, it is able to read/write non-Azure sources.

I agree that the transformations are limited, but that is not the main purpose of Data Factory. Transformations were added afterwards (and running on Spark).

If you need to work with different cloud warehouse providers then you will need to learn all other tools...

You are 100% right on this one. One could argue that it is good to learn multiple tools.

I don't wanna diss Matillion though, it is certainly a fine product; and credit given where credit is due, they were cloud based way before the big fellas!!!

Really appreciate your comments...

Thought you have DF as a transformation tool Easy: for transformations, code first approach is superior to low/no code tools?

It's a good discussion to have as there's no right or wrong on this occasion as preference is what drives what to use but at the same time look for what is best, efficient and it's everyone-ready - stack-ready - future-ready

joakon
New Contributor III

it all boils down to ability to maintain code + find talent + costs .

most approaches have overlapping features . can't really go wrong with any one.

I personally prefer code first approach.

Chris_Shehu
Valued Contributor III

Pretty Much the same Azure Data Factory for Orchestration, Databricks for cleaning/modeling/serving data. Basically, Data Factory is the No Code solution when it makes sense, and all of the ingestion processes start in Data Factory to keep a centralized strategy. If it's an API or something that requires cleaning/modifying the Data Factory will launch a notebook to do the lift.

KKo
Contributor III

I have been using Azure: Databricks for compute, ADF for orchestrating databricks notebooks plus executing stored procs and some copy activities, Azure synapse for final destination. But, thinking of using pipeline within Databricks itself rather than in ADF. Let me know anybody has any suggestions around this.

Anonymous
Not applicable

Hi @Douglas Carvalho-Ribeiroโ€‹ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group