cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Is there any difference between performance of Python and SQL ?

abd
Contributor

I read somewhere that Python code is converted to SQL at the end. So is it true or there is any difference in performance while working with Scala, Python or SQL ?

1 ACCEPTED SOLUTION

Accepted Solutions

adamq
New Contributor III

If you're using the DataFrame API it all gets run in the JVM, just like sql queries. The exception is UDFs which have to transfer data to Python land to execute.

View solution in original post

12 REPLIES 12

CodeMartian
New Contributor II

In short, no there's no difference. However, there does need to be a translation, like you read somewhere, so it could add a negligible amount of time to the workload. However, the performance doesn't degrade significantly enough to matter.

adamq
New Contributor III

If you're using the DataFrame API it all gets run in the JVM, just like sql queries. The exception is UDFs which have to transfer data to Python land to execute.

Can you provide any resource where I would be able to look into it ?

Just wondering is python code converted to SQL at the end ?

Or as the other person mentioned it is converted to scala.

Yuanjian
New Contributor II

Python API have an extra layer in runtime, which leverage local socket to transfer data. So it might have some performance gap due to the transformation, but should not large for most of scenarios.

Adamatg
New Contributor III

Quite a bit performance depending upon where you are running

Hi just did it. Thanks

JBOCACHICA
New Contributor III

There should not be difference between One or other, at the end, every code should be translated to machine language in orden to run on a computer, itโ€™s possible that the translation process be harder in some cases that others, however, that translation process could be harder for python (some cases) and for SQL (some other cases).

My recomendation is that you use every language for every use case.

SQL as a first option and when you have to process bunch of data on a structured format.

Python when you have certain complexity not supported by SQL.

Regards

ABHI_CT
New Contributor III

Python is the choice for the ML/AI workloads while SQL would be for data based MDM modeling. Pretty much similar performance with certain assumptions. DB optimization is a must assumption for performance benchmarking.

Bhew
New Contributor II

Relatively similar performance for simple use cases. Higher end tasks and pythons the better bet

Katieyangcanada
New Contributor II

I think it all gets converted to scala in the end. Shouldnโ€™t be different.

Can you provide any source ?

Rheiman
Contributor II

To add on the consideration of UDFs, try to consider using HOFs (Higher Order Functions) whenever possible first as there is a signifcant performance benefit as seen here.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group