User16817872376
Databricks Employee
Databricks Employee

hi @Dominic Robinson​  , my colleague tells me that the CSV source should support UTF-16LE and UTF-16BE, but not plain UTF-16. It may be helpful to look at the test suite for the CSV source - it has simple examples of what is and isn't possible. It seems like you are saying that should be covered by UTF-16LE - if so, you may want to verify that there isn't a discrepancy caused by creating the file in Windows. If I recall correctly, Windows formats text files slightly differently than Unix/Mac does.

Side note, you should not use "com.databricks.spark.csv" anymore. Spark has a built-in csv data source as of Spark 2.0 and the Databricks package is no longer updated.