06-28-2022 09:11 AM
06-28-2022 12:45 PM
If you're using the DataFrame API it all gets run in the JVM, just like sql queries. The exception is UDFs which have to transfer data to Python land to execute.
06-28-2022 11:16 AM
In short, no there's no difference. However, there does need to be a translation, like you read somewhere, so it could add a negligible amount of time to the workload. However, the performance doesn't degrade significantly enough to matter.
06-28-2022 12:45 PM
If you're using the DataFrame API it all gets run in the JVM, just like sql queries. The exception is UDFs which have to transfer data to Python land to execute.
06-28-2022 09:50 PM
Can you provide any resource where I would be able to look into it ?
Just wondering is python code converted to SQL at the end ?
Or as the other person mentioned it is converted to scala.
06-28-2022 12:47 PM
Python API have an extra layer in runtime, which leverage local socket to transfer data. So it might have some performance gap due to the transformation, but should not large for most of scenarios.
06-28-2022 12:52 PM
Quite a bit performance depending upon where you are running
06-28-2022 01:37 PM
Hi @Abdullah Durrani , We haven’t heard from you in the last few responses, and I was checking back to see if these suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to others.
06-28-2022 09:47 PM
Hi just did it. Thanks
06-28-2022 02:13 PM
There should not be difference between One or other, at the end, every code should be translated to machine language in orden to run on a computer, it’s possible that the translation process be harder in some cases that others, however, that translation process could be harder for python (some cases) and for SQL (some other cases).
My recomendation is that you use every language for every use case.
SQL as a first option and when you have to process bunch of data on a structured format.
Python when you have certain complexity not supported by SQL.
Regards
06-28-2022 02:29 PM
Python is the choice for the ML/AI workloads while SQL would be for data based MDM modeling. Pretty much similar performance with certain assumptions. DB optimization is a must assumption for performance benchmarking.
06-28-2022 02:35 PM
Relatively similar performance for simple use cases. Higher end tasks and pythons the better bet
06-28-2022 02:37 PM
I think it all gets converted to scala in the end. Shouldn’t be different.
06-28-2022 09:49 PM
Can you provide any source ?
06-29-2022 10:17 PM
To add on the consideration of UDFs, try to consider using HOFs (Higher Order Functions) whenever possible first as there is a signifcant performance benefit as seen here.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.