I'm attempting to create DLT tables from a source table that includes an "data" column that is a JSON string. I'm doing something like this:
sales_schema = StructType([
StructField("customer_id", IntegerType(), True),
StructField("order_numbers", ArrayType(LongType()), True)
StructField("data", StructType([
StructField("value", IntegerType())
])
])
@dlt.table(
schema=sales_schema)
def sales():
df = spark.readStream.table("table_name")
.withColumn("parsed_data", from_json(col("data"), sales_schema)
.select("parsed_data.*")
return df
this works to flatten, but for some reason the data column in the DLT sales table becomes a struct where "value" is no longer an Integer but a String.
is there a way to keep the original value type?