<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Load parent columns and not unnest using pyspark? Found invalid character(s) ' ,;{}()\n'  in sch in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91232#M38117</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/122107"&gt;@KristiLogos&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;now I understand the problem. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;You have a json and then you want to save this to delta table as struct, right?&lt;BR /&gt;The allowed characters for delta table column names are:&lt;/P&gt;&lt;P&gt;Letters: A-Z, a-z&lt;BR /&gt;Digits: 0-9&lt;BR /&gt;Underscores: _&lt;BR /&gt;&lt;BR /&gt;The key here is that &lt;U&gt;your structs must also adhere to this naming convention&lt;/U&gt;&lt;BR /&gt;Before saving structs to table you need to sanitize the column names like that:&lt;/P&gt;&lt;P&gt;App Brnd -&amp;gt; App_Brnd&lt;BR /&gt;Active.Elapsed Minutes -&amp;gt;&amp;nbsp;Active_Elapsed_Minutes&lt;/P&gt;&lt;P&gt;Hope it helps&lt;/P&gt;</description>
    <pubDate>Fri, 20 Sep 2024 18:16:48 GMT</pubDate>
    <dc:creator>filipniziol</dc:creator>
    <dc:date>2024-09-20T18:16:48Z</dc:date>
    <item>
      <title>Load parent columns and not unnest using pyspark? Found invalid character(s) ' ,;{}()\n'  in schema</title>
      <link>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91219#M38110</link>
      <description>&lt;P&gt;I'm not sure I'm working this correctly but I'm having some issues with the column names when I try to load to a table in our databricks catalog. I have multiple .json.gz files in our blob container that I want to load to a table:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;df = spark.read.option("multiline", "true").json(f"{LOC}/*.json.gz")&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;df.printSchema()&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;The schema looks something like this, for example user_properties has nested values App Brnd and Archit&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;STRONG&gt;|-- user_id: string (nullable = true) &lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;|-- user_properties: struct (nullable = true)&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;| |-- App Brnd: string (nullable = true)&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;STRONG&gt;| |-- Archit: string (nullable = true)&lt;/STRONG&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;when I try to load the df to our table for the first time:&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;df.write.mode("overwrite").saveAsTable("test.events")&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;I see this error:&lt;BR /&gt;Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema. Please use other characters and try again.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 20 Sep 2024 16:24:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91219#M38110</guid>
      <dc:creator>KristiLogos</dc:creator>
      <dc:date>2024-09-20T16:24:01Z</dc:date>
    </item>
    <item>
      <title>Re: Load parent columns and not unnest using pyspark? Found invalid character(s) ' ,;{}()\n'  in sch</title>
      <link>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91222#M38113</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/122107"&gt;@KristiLogos&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Check if your JSON doesn't have characters contained in error message in it's key values.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Sep 2024 16:38:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91222#M38113</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-09-20T16:38:48Z</dc:date>
    </item>
    <item>
      <title>Re: Load parent columns and not unnest using pyspark? Found invalid character(s) ' ,;{}()\n'  in sch</title>
      <link>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91223#M38114</link>
      <description>&lt;P&gt;hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;- thanks for responding. The error is saying its in the column name. I think it has to do with the space that's in the 'child' column. However, I've been asked to not flatten the pyspark dataframe and just ingest the 'raw' data, ie. leave the nested columns nested and not flatten. I feel like that might be the issue because if I convert the pyspark to a pandas dataframe I'm able to load this without issues to my table. Is there a way to not 'flatten' and just load to pyspark dataframe?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Sep 2024 16:58:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91223#M38114</guid>
      <dc:creator>KristiLogos</dc:creator>
      <dc:date>2024-09-20T16:58:20Z</dc:date>
    </item>
    <item>
      <title>Re: Load parent columns and not unnest using pyspark? Found invalid character(s) ' ,;{}()\n'  in sch</title>
      <link>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91230#M38115</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/122107"&gt;@KristiLogos&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Could you show sample json?&lt;BR /&gt;&lt;BR /&gt;I have run the tests, do you have a space in the&amp;nbsp;App Brnd column?&lt;BR /&gt;Is App Brnd wrapped in double-quote?&lt;BR /&gt;Below the check&lt;BR /&gt;1. The first one are keys without double-quotes, space between App Brnd -&amp;gt; returns corrupt record&lt;/P&gt;&lt;P&gt;2. The second one are keys with double-quotes, no spaces -&amp;gt; returns valid json&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="filipniziol_0-1726853294524.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/11387i4480124D8BDAEF07/image-size/medium?v=v2&amp;amp;px=400" role="button" title="filipniziol_0-1726853294524.png" alt="filipniziol_0-1726853294524.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Sep 2024 17:32:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91230#M38115</guid>
      <dc:creator>filipniziol</dc:creator>
      <dc:date>2024-09-20T17:32:11Z</dc:date>
    </item>
    <item>
      <title>Re: Load parent columns and not unnest using pyspark? Found invalid character(s) ' ,;{}()\n'  in sch</title>
      <link>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91231#M38116</link>
      <description>&lt;P&gt;hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/117376"&gt;@filipniziol&lt;/a&gt;&amp;nbsp;- thanks for replying. I realized it might actually be a different nested (child) column that has a period for the event_properties. (I'm showing it below) This is most likely the issue, but how can i keep the nested values as nested and not split out for the event_properties?&amp;nbsp; I've believe what I could do is convert to a pandas dataframe becuase I'm able to load this as such but then i have to convert my Spark df to pandas df then convert back to spark df to load to a databricks table seems like alot.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp; |-- event_properties: struct (nullable = true) &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; | |-- Active.Elapsed Minutes: double (nullable = true)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; | |-- Active.Frequency: long (nullable = true)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; | |-- Active.Maximum: double (nullable = true)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; | |-- Active.Minimum: double (nullable = true)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;This is what&amp;nbsp;&lt;STRONG&gt;event_properties column&lt;/STRONG&gt; looks like in a pandas dataframe which is what I want it to look like for spark&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;df_pd= df.toPandas()&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;event_properties&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;(None, 2, None, None, 1)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 20 Sep 2024 17:58:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91231#M38116</guid>
      <dc:creator>KristiLogos</dc:creator>
      <dc:date>2024-09-20T17:58:18Z</dc:date>
    </item>
    <item>
      <title>Re: Load parent columns and not unnest using pyspark? Found invalid character(s) ' ,;{}()\n'  in sch</title>
      <link>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91232#M38117</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/122107"&gt;@KristiLogos&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;now I understand the problem. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;You have a json and then you want to save this to delta table as struct, right?&lt;BR /&gt;The allowed characters for delta table column names are:&lt;/P&gt;&lt;P&gt;Letters: A-Z, a-z&lt;BR /&gt;Digits: 0-9&lt;BR /&gt;Underscores: _&lt;BR /&gt;&lt;BR /&gt;The key here is that &lt;U&gt;your structs must also adhere to this naming convention&lt;/U&gt;&lt;BR /&gt;Before saving structs to table you need to sanitize the column names like that:&lt;/P&gt;&lt;P&gt;App Brnd -&amp;gt; App_Brnd&lt;BR /&gt;Active.Elapsed Minutes -&amp;gt;&amp;nbsp;Active_Elapsed_Minutes&lt;/P&gt;&lt;P&gt;Hope it helps&lt;/P&gt;</description>
      <pubDate>Fri, 20 Sep 2024 18:16:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91232#M38117</guid>
      <dc:creator>filipniziol</dc:creator>
      <dc:date>2024-09-20T18:16:48Z</dc:date>
    </item>
    <item>
      <title>Re: Load parent columns and not unnest using pyspark? Found invalid character(s) ' ,;{}()\n'  in sch</title>
      <link>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91234#M38118</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/117376"&gt;@filipniziol&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yes thats the issue I'm facing, Im so confused about this because, do i have to rename the child columns? I just want to keep the spark dataframe as it is without flattening, . and if i do this it only updates the parent column names:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;clean_column_names&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;df&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;for&lt;/SPAN&gt;&lt;SPAN&gt; col &lt;/SPAN&gt;&lt;SPAN&gt;in&lt;/SPAN&gt;&lt;SPAN&gt; df.columns:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; new_col_name &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; col.&lt;/SPAN&gt;&lt;SPAN&gt;replace&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'$'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;''&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;replace&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'.'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'_'&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;replace&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;' '&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'_'&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;replace&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;';'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'_'&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;replace&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;','&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'_'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; df.&lt;/SPAN&gt;&lt;SPAN&gt;withColumnRenamed&lt;/SPAN&gt;&lt;SPAN&gt;(col, new_col_name)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;return&lt;/SPAN&gt;&lt;SPAN&gt; df&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;df_cleaned &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;clean_column_names&lt;/SPAN&gt;&lt;SPAN&gt;(df)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;# Check the cleaned column names&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df_cleaned.&lt;/SPAN&gt;&lt;SPAN&gt;printSchema&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df_cleaned.columns&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 20 Sep 2024 18:29:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91234#M38118</guid>
      <dc:creator>KristiLogos</dc:creator>
      <dc:date>2024-09-20T18:29:11Z</dc:date>
    </item>
    <item>
      <title>Re: Load parent columns and not unnest using pyspark? Found invalid character(s) ' ,;{}()\n'  in sch</title>
      <link>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91235#M38119</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/122107"&gt;@KristiLogos&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Ok, now I understand the problem better. Set the Delta table property delta.columnMapping.mode to name to enable column mapping mode. Then try again.&lt;/P&gt;&lt;P&gt;&lt;A href="https://kb.databricks.com/delta/allow-spaces-and-special-characters-in-nested-column-names-with-delta-tables" target="_blank" rel="noopener"&gt;https://kb.databricks.com/delta/allow-spaces-and-special-characters-in-nested-column-names-with-delta-tables&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Sep 2024 18:35:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91235#M38119</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-09-20T18:35:06Z</dc:date>
    </item>
    <item>
      <title>Re: Load parent columns and not unnest using pyspark? Found invalid character(s) ' ,;{}()\n'  in sch</title>
      <link>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91252#M38124</link>
      <description>&lt;P&gt;amazing thanks! that worked flawlessly&lt;/P&gt;</description>
      <pubDate>Fri, 20 Sep 2024 23:17:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91252#M38124</guid>
      <dc:creator>KristiLogos</dc:creator>
      <dc:date>2024-09-20T23:17:25Z</dc:date>
    </item>
    <item>
      <title>Re: Load parent columns and not unnest using pyspark? Found invalid character(s) ' ,;{}()\n'  in sch</title>
      <link>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91259#M38125</link>
      <description>&lt;P&gt;I'm happy that it worked for you &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 21 Sep 2024 05:52:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/load-parent-columns-and-not-unnest-using-pyspark-found-invalid/m-p/91259#M38125</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-09-21T05:52:37Z</dc:date>
    </item>
  </channel>
</rss>

