topic How to create a csv using a Scala notebook that as " in some columns? in Data Engineering

How to create a csv using a Scala notebook that as " in some columns?

tarente — Sat, 18 Sep 2021 18:09:15 GMT

In a project we use Azure Databricks to create csv files to be loaded in ThoughtSpot.

Below is a sample to the code I use to write the file:

val fileRepartition = 1
val fileFormat = "csv"
val fileSaveMode = "overwrite"
var fileOptions = Map (
                        "header" -> "true",
                        "overwriteSchema" -> "true",
                        "delimiter" -> "\t"
                      )
 
dfFinal
  .repartition (fileRepartition.toInt)
  .write
  .format  (fileFormat)
  .mode    (fileSaveMode)
  .options (fileOptions)
  .save    (filePath)

The csv created uses a tab as the column separator and some of the columns may have " in their values. When that happens in the csv file the value of that column is enclosed by ". E.g.:

ProductId	ProductCode	ProductDesc
1234	BD Plastipak	"BD Plastipak 1/4\" Syringes"

Is it possible to change the parameters to write the file as described below?

ProductId	ProductCode	ProductDesc
1234	BD Plastipak	BD Plastipak 1/4" Syringes

I have a workaround to do it in a sub-sequent step to use sed to update the csv, but it would be much easier if I were able to get the file in the correct format when saving it from the notebook.

Thanks in advance,

Tiago R.

Re: How to create a csv using a Scala notebook that as " in some columns?

shan_chandra — Sat, 18 Sep 2021 20:22:19 GMT

could you please try adding - escape as an option while writing to a csv?

Please refer to the below additional options available during writing to a CSV - under CSV-specific option(s) for writing CSV files.

Re: How to create a csv using a Scala notebook that as " in some columns?

tarente — Tue, 21 Sep 2021 08:03:14 GMT

Hi Shan,

Thanks for the link.

I now know more options for creating different csv files.

I have not yet completed the problem, but that is related with a destination application (ThoughtSpot) not being able to load the data in the csv file correctly.

Regards,

Tiago R.