I can't say for sure what your exact limitation is going to be but two things I have come across in EMR bake offs are:
1) bottleneck in the network throughput. Verify the S3 reads and writes are happening at similar throughput rates. This is a cloud set up issue and not a Databrick's issue. Could be the node type being used or any network routing in between your Databrick's cluster and S3.
2) Unnecessary auto-scaling in the databricks clusters. Sometimes databricks can get a little too proactive in scaling up and then has to back off. This can slow down the end-to-end execution time. In a well known workload the cluster should be right sized to maximize usage of each node and eliminate spilling to disk. A right sized cluster will also avoid auto-scaling which skips the spin up and spin down times of nodes during the job execution.
Regarding photon... It kicks in for specific workloads. Things like UDFs and RDD APIs won't take advantage of photon.
Definitely take the time to drill into the Spark UI to see if there are any difference in the actual spark execution. This may uncover other differences like job configurations that are impacting your benchmarks.