Hubert-Dudek
Databricks MVP

Just read parquet in python and retrieve auto-generated DDL schema:

parquetFile = spark.read.parquet("people.parquet")
parquetFile.createOrReplaceTempView("parquetFile")
schema_json = spark.sql("SELECT * FROM parquetFile").schema.json()
ddl = spark.sparkContext._jvm.org.apache.spark.sql.types.DataType.fromJson(schema_json).toDDL()
print(ddl)


My blog: https://databrickster.medium.com/