02-06-2023 03:31 AM
The usual way to use Apache Sedona inside pySpark is by first registering Sedona types and functions with
SedonaRegistrator.registerAll(spark)
We need to have these autoregistered when the cluster start (to be able, for example, to perform geospatial queries with the Databricks SQL Connector for Python).
From my understanding, autoregistration can be obtained by adding the following cluster configuration, but it doesn't work:
spark.sql.extensions org.apache.sedona.sql.SedonaSqlExtensions
Do I miss something?
Is my expectation wrong?
04-09-2023 07:35 AM
@Giovanni Allegri :
The configuration you have provided is for registering the Sedona SQL extensions with Spark SQL. However, to register Sedona types and functions with PySpark, you need to use a different configuration.
You can add the following configuration to the Spark cluster configuration to enable automatic registration of Sedona types and functions with PySpark:
spark.extraListeners org.apache.sedona.core.serde.SedonaSQLRegistrator
This will enable automatic registration of Sedona types and functions when a PySpark session is created. Alternatively, you can also register Sedona types and functions explicitly in your PySpark code using the SedonaRegistrator.registerAll(spark) method. However, this would require you to call this method every time you create a new PySpark session.
I hope this helps!
04-09-2023 07:35 AM
@Giovanni Allegri :
The configuration you have provided is for registering the Sedona SQL extensions with Spark SQL. However, to register Sedona types and functions with PySpark, you need to use a different configuration.
You can add the following configuration to the Spark cluster configuration to enable automatic registration of Sedona types and functions with PySpark:
spark.extraListeners org.apache.sedona.core.serde.SedonaSQLRegistrator
This will enable automatic registration of Sedona types and functions when a PySpark session is created. Alternatively, you can also register Sedona types and functions explicitly in your PySpark code using the SedonaRegistrator.registerAll(spark) method. However, this would require you to call this method every time you create a new PySpark session.
I hope this helps!
10-03-2024 07:39 AM
Hi , After adding the suggested config, i am getting the following error
Caused by: java.lang.ClassNotFoundException: org.apache.sedona.core.serde.SedonaSQLRegistrator not found in com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader@7d1cdebf
What should i do to fix this?
04-12-2023 02:41 AM
Hi @Giovanni Allegri
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group