Hi there! Appreciate this reply is 3 years later than it was originally asked, but people might be coming across it still. A few things:
- Koalas was deprecated in spark 3.2 (runtime 10.4). Instead, the recommendation is to use pandas on spark with `import pyspark.pandas as ps` You can find a link here to the spark migration guide, and here for more usage
- As of writing, photon works with SQL and equivalent DataFrame API statements. So SQL-ish statements like filter, join, and aggregates will work, but more complex ones for analytics or data science it won't.
- In the future, there may be more functionality bought out, but keep in mind that UDFs and RDDs are unlikely to ever work with photon as they bypass sparks catalyst optimizer which is needed for it to work.