It appears to me that there's a deceptive bug when using the databricks display function and viewing struct data. For whatever reason, multiple spaces are cut down to only one:
from pyspark.sql.functions import struct, col
df = spark.createDataFrame([
("this has two spaces", "this has three spaces"),
("this has one space", "this has nospace")
], ["sc", "osc"])
df = df.select(struct(df.columns).alias("scstruct"))
display( df )
You'll see in the result that the values with 2, 3, and 4 spaces are cut down to single spaces.
I came across this while attempting to diagnose a regex -> due to this bug, I wasn't aware of what the data values actually were.