I have done the following, ie. crate a Delta Table where I plan to load the Azure Blob Container files that are .json.gz files: df = spark.read.option("multiline", "true").json(f"{container_location}/*.json.gz") DeltaTable.create(spark) \ .addCol...
I'm not sure I'm working this correctly but I'm having some issues with the column names when I try to load to a table in our databricks catalog. I have multiple .json.gz files in our blob container that I want to load to a table:df = spark.read.opti...
@filipniziol Yes thats the issue I'm facing, Im so confused about this because, do i have to rename the child columns? I just want to keep the spark dataframe as it is without flattening, . and if i do this it only updates the parent column names: de...
hi @filipniziol - thanks for replying. I realized it might actually be a different nested (child) column that has a period for the event_properties. (I'm showing it below) This is most likely the issue, but how can i keep the nested values as nested ...
hi @szymon_dybczak - thanks for responding. The error is saying its in the column name. I think it has to do with the space that's in the 'child' column. However, I've been asked to not flatten the pyspark dataframe and just ingest the 'raw' data, ie...