โ06-28-2022 09:11 AM
โ06-28-2022 12:45 PM
If you're using the DataFrame API it all gets run in the JVM, just like sql queries. The exception is UDFs which have to transfer data to Python land to execute.
โ06-28-2022 11:16 AM
In short, no there's no difference. However, there does need to be a translation, like you read somewhere, so it could add a negligible amount of time to the workload. However, the performance doesn't degrade significantly enough to matter.
โ06-28-2022 12:45 PM
If you're using the DataFrame API it all gets run in the JVM, just like sql queries. The exception is UDFs which have to transfer data to Python land to execute.
โ06-28-2022 09:50 PM
Can you provide any resource where I would be able to look into it ?
Just wondering is python code converted to SQL at the end ?
Or as the other person mentioned it is converted to scala.
โ06-28-2022 12:47 PM
Python API have an extra layer in runtime, which leverage local socket to transfer data. So it might have some performance gap due to the transformation, but should not large for most of scenarios.
โ06-28-2022 12:52 PM
Quite a bit performance depending upon where you are running
โ06-28-2022 01:37 PM
Hi @Abdullah Durraniโ , We havenโt heard from you in the last few responses, and I was checking back to see if these suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to others.
โ06-28-2022 09:47 PM
Hi just did it. Thanks
โ06-28-2022 02:13 PM
There should not be difference between One or other, at the end, every code should be translated to machine language in orden to run on a computer, itโs possible that the translation process be harder in some cases that others, however, that translation process could be harder for python (some cases) and for SQL (some other cases).
My recomendation is that you use every language for every use case.
SQL as a first option and when you have to process bunch of data on a structured format.
Python when you have certain complexity not supported by SQL.
Regards
โ06-28-2022 02:29 PM
Python is the choice for the ML/AI workloads while SQL would be for data based MDM modeling. Pretty much similar performance with certain assumptions. DB optimization is a must assumption for performance benchmarking.
โ06-28-2022 02:35 PM
Relatively similar performance for simple use cases. Higher end tasks and pythons the better bet
โ06-28-2022 02:37 PM
I think it all gets converted to scala in the end. Shouldnโt be different.
โ06-28-2022 09:49 PM
Can you provide any source ?
โ06-29-2022 10:17 PM
To add on the consideration of UDFs, try to consider using HOFs (Higher Order Functions) whenever possible first as there is a signifcant performance benefit as seen here.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group