<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Importing MongoDB with field names containing spaces in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/importing-mongodb-with-field-names-containing-spaces/m-p/27971#M19809</link>
    <description>&lt;P&gt;Solution: It turns out the issue is not the schema reading in, but the fact that I am writing to Delta tables, which do not currently support spaces. So, I need to transform them prior to dumping. I've been following a pattern of reading in raw data, which has spaces in the fields, then transforming after the fact. Since this is a highly nested structure (MongoDB), using renaming columns individually will be difficult. Any thoughts on the best practice? Should I just start transforming the raw data immediately?&lt;/P&gt;</description>
    <pubDate>Wed, 16 Feb 2022 05:29:42 GMT</pubDate>
    <dc:creator>Mr__E</dc:creator>
    <dc:date>2022-02-16T05:29:42Z</dc:date>
    <item>
      <title>Importing MongoDB with field names containing spaces</title>
      <link>https://community.databricks.com/t5/data-engineering/importing-mongodb-with-field-names-containing-spaces/m-p/27970#M19808</link>
      <description>&lt;P&gt;I am currently using a Python notebook with a defined schema to import fairly unstructured documents in MongoDB. Some of these documents have spaces in their field names. I define the schema for the MongoDB PySpark connector like the following:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;StructField("My Field Name", StringType())&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Unfortunately, this gives me the error "Found invalid character(s) among " ,;{}()\n\t=" in the column names of your schema." I would be happy to rename the column, but I have to be able to import it from MongoDB first. Is there a way to do this with the schema? Or am I forced to write a UDF to convert a JSON string with the bad field name into normalized columns?&lt;/P&gt;</description>
      <pubDate>Tue, 15 Feb 2022 23:49:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/importing-mongodb-with-field-names-containing-spaces/m-p/27970#M19808</guid>
      <dc:creator>Mr__E</dc:creator>
      <dc:date>2022-02-15T23:49:45Z</dc:date>
    </item>
    <item>
      <title>Re: Importing MongoDB with field names containing spaces</title>
      <link>https://community.databricks.com/t5/data-engineering/importing-mongodb-with-field-names-containing-spaces/m-p/27971#M19809</link>
      <description>&lt;P&gt;Solution: It turns out the issue is not the schema reading in, but the fact that I am writing to Delta tables, which do not currently support spaces. So, I need to transform them prior to dumping. I've been following a pattern of reading in raw data, which has spaces in the fields, then transforming after the fact. Since this is a highly nested structure (MongoDB), using renaming columns individually will be difficult. Any thoughts on the best practice? Should I just start transforming the raw data immediately?&lt;/P&gt;</description>
      <pubDate>Wed, 16 Feb 2022 05:29:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/importing-mongodb-with-field-names-containing-spaces/m-p/27971#M19809</guid>
      <dc:creator>Mr__E</dc:creator>
      <dc:date>2022-02-16T05:29:42Z</dc:date>
    </item>
    <item>
      <title>Re: Importing MongoDB with field names containing spaces</title>
      <link>https://community.databricks.com/t5/data-engineering/importing-mongodb-with-field-names-containing-spaces/m-p/27972#M19810</link>
      <description>&lt;P&gt;if the structure does not change all the time you could use the renaming of columns in a more automated way like described &lt;A href="https://stackoverflow.com/questions/57318519/remove-spaces-from-all-column-names-in-pyspark" alt="https://stackoverflow.com/questions/57318519/remove-spaces-from-all-column-names-in-pyspark" target="_blank"&gt;here&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;But this example does not handle nested columns.&lt;/P&gt;&lt;P&gt;You could also try to create a schema without spaces and pass that when you read the data.&lt;/P&gt;&lt;P&gt;This can be done manually or programatically (although this can be a challenge for deeply nested structures).&lt;/P&gt;&lt;P&gt;The second method seems better imo.  As the schema method returns a nested list/array/...&lt;/P&gt;&lt;P&gt;Python and Scala have quite some collection parsing possibilities.  Also the fact that the StructField type has an attribute called 'name' is useful.&lt;/P&gt;&lt;P&gt;&lt;A href="https://stackoverflow.com/questions/51329926/renaming-columns-recursively-in-a-nested-structure-in-spark" alt="https://stackoverflow.com/questions/51329926/renaming-columns-recursively-in-a-nested-structure-in-spark" target="_blank"&gt;Example&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Feb 2022 07:01:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/importing-mongodb-with-field-names-containing-spaces/m-p/27972#M19810</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-02-16T07:01:16Z</dc:date>
    </item>
    <item>
      <title>Re: Importing MongoDB with field names containing spaces</title>
      <link>https://community.databricks.com/t5/data-engineering/importing-mongodb-with-field-names-containing-spaces/m-p/27973#M19811</link>
      <description>&lt;P&gt;Thanks! I used this pattern of adding underscores to simplify raw dumping.&lt;/P&gt;</description>
      <pubDate>Sat, 02 Apr 2022 10:38:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/importing-mongodb-with-field-names-containing-spaces/m-p/27973#M19811</guid>
      <dc:creator>Mr__E</dc:creator>
      <dc:date>2022-04-02T10:38:08Z</dc:date>
    </item>
  </channel>
</rss>

