I have a slight suspicion here that createDataFrame is using the columnar arrow for .display() but when finally writing the row based representation of Spark kicks in and the data is costly reserialized:I cannot find the right place in the Documentat...
The answers here are not correct.TLDR: _After_ the Spark DF is materialized, saveAsTable takes ages. 35seconds for 1million rows.saveAsTable() is SLOW - terribly so. Why? Would be nice to get an answer. The workaround is to avoid spark for delta - no...