@Ivo Merchiersโ :
The behavior you are seeing is likely due to differences in the underlying version of Apache Spark between your local installation and Databricks.
split() is a function provided by Spark's SQL functions, and different versions of Spark may have differences in their implementation of these functions. You mentioned that you are using PySpark version 3.2.1 locally. To confirm which version of Spark is being used, you can run the following command in your PySpark shell:
import pyspark
print(pyspark.__version__)
You can then check the corresponding version of Spark and its SQL functions documentation for the
split() function behavior. On Databricks, you can check the version of Spark being used by running the command:
spark.version
If you are seeing different results for split() between your local installation and Databricks, you may need to adjust your code to handle the differences in behavior or use the same version of Spark across both environments.