Pyspark - how to save the schema of a csv file in a delta table's column

tarente
New Contributor III

How to save the schema of a csv file in a delta table's column?

In a previous project implemented in Databricks using Scala notebooks, we stored the schema of csv files as a "json string" in a SQL Server table.

When we needed to read or write the csv and the source dataframe das 0 rows, or the source csv does not exist, we use the schema stored in the SQL Server to either create an empty dataframe or empty csv file.

Now, I would like to implement something similar in Databricks but using Python notebook and store the schema of csv files in a delta table.

Any suggestions?

Thanks in advance,

Tiago.

Hubert-Dudek
Databricks MVP

After you read csv to dataframe spark.read.csv ... there are 3 ways

DataFrame.Schema

DataFrame.printSchema() - it is StructType

and 3rd tricky way is DDL string

DataFrame._jdf.schema().toDDL()

Usually DDL as it is simple string is easiest to save somewhere and than reuse. Just insert to some delta table schema and then select when needed.


My blog: https://databrickster.medium.com/

tarente
New Contributor III

Hi Hubert,

Thanks for you answer, but I was not able to make it work.

Let me ask the question in a different way.

I have a csv file with the following basic estruture:

  • ProductId - integer.
  • ProductDesc - string.
  • ProductCost - decimal.

In PySpark I would like to store the file schema in:

  1. In a variable to be used in the spark.read.schema(schema).options(**fileOptions).schema(schema).load(...).
  2. Be able to store the file schema in a delta table's column.

What kind of transformations do I need to do to the variable in 1. to be able to stored in 2., and vice-versa?

Thanks in advance,

Tiago R.

tarente
New Contributor III

Hi Kaniz,

Thanks for your answer, although it did not answer my questions.

RKNutalapati
Valued Contributor

Hi @Tiago Rente​ , Hope below code would help.

image

View solution in original post

tarente
New Contributor III

Hi,

Thanks for you code, I will test it.

Regards,

Tiago.

Anonymous
Not applicable

@Tiago Rente​ - How did the test go?

tarente
New Contributor III

Hi Piper,

Unfortunately, I was not able to test it before I changed to a new employer, so I can no longer test it. However, I think it would work.

Regards,

Tiago R.

Anonymous
Not applicable

@tarente - Thanks for letting us know. 🙂