Hi,
I am using the pyspark.testing.assertSchemaEqual() function in my code using the ignoreColumnOrder parameter that is available since pyspark 4.0.0.
https://spark.apache.org/docs/4.0.0/api/python/reference/api/pyspark.testing.assertSchemaEqual.html
Locally I am using Databricks Connect. This "kind off" already includes pyspark, but not really. At least it is not the pyspark you install via pip. You can "import pyspark", but it is not installed explicitly. The code runs.
Now I installed a new packaged (soda-spark-df) which has the "real" pyspark as a dependency. It installs pyspark 3.5.6. as a dependency. Now I am getting an error that ignoreColumnOrder cannot be found, since it does not exist in 3.5.6.
https://spark.apache.org/docs/3.5.6/api/python/reference/api/pyspark.testing.assertSchemaEqual.html
So far so good. What surprises me is that I can use this parameter in my 15.4 runtime cluster even though pyspark 3.5.0 is installed?
My question is now, is the pyspark on Databricks a fork from the OpenSource pyspark?