<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Empty xml tag in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/empty-xml-tag/m-p/60415#M31663</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;Hi!&amp;nbsp;I apologise for the late reply!&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":grinning_face_with_sweat:"&gt;😅&lt;/span&gt; I'm using the 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12) version of the cluster. I use the python programming language. When I read other files that do not have empty tags (i.e. a tag that does not have an opening tag and a closing tag, but has a "merged" tag) like in my first post the tag "ItemId", the code works marvellously, but when it encounters an empty tag the reading stops. If the document consists of "normal" tags that have an opening and closing tag, the reading works fine!&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;And it's marvellous and makes it very easy to work with files in databricks, but empty tags don't work for me&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":grinning_face_with_sweat:"&gt;😅&lt;/span&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = spark.read.format('xml').options(rowTag='Item').load(test_file_location)
df.display()&lt;/LI-CODE&gt;&lt;P&gt;Example for normal XML&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;  &amp;lt;people&amp;gt;
    &amp;lt;person&amp;gt;
      &amp;lt;age born="1990-02-24"&amp;gt;25&amp;lt;/age&amp;gt;
    &amp;lt;/person&amp;gt;
    &amp;lt;person&amp;gt;
      &amp;lt;age born="1985-01-01"&amp;gt;30&amp;lt;/age&amp;gt;
    &amp;lt;/person&amp;gt;
    &amp;lt;person&amp;gt;
      &amp;lt;age born="1980-01-01"&amp;gt;30&amp;lt;/age&amp;gt;
    &amp;lt;/person&amp;gt;
  &amp;lt;/people&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;Example of empty tag&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;  &amp;lt;people&amp;gt;
    &amp;lt;person&amp;gt;
      &amp;lt;age_t born="1990-02-24"/&amp;gt;
      &amp;lt;age born="1990-02-24"&amp;gt;25&amp;lt;/age&amp;gt;
    &amp;lt;/person&amp;gt;
    &amp;lt;person&amp;gt;
      &amp;lt;age_t born="1985-01-01"/&amp;gt;
      &amp;lt;age born="1985-01-01"&amp;gt;30&amp;lt;/age&amp;gt;
    &amp;lt;/person&amp;gt;
    &amp;lt;person&amp;gt;
     &amp;lt;age_t born="1980-01-01"/&amp;gt;
      &amp;lt;age born="1980-01-01"&amp;gt;30&amp;lt;/age&amp;gt;
    &amp;lt;/person&amp;gt;
  &amp;lt;/people&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 16 Feb 2024 14:05:26 GMT</pubDate>
    <dc:creator>YevheniiY</dc:creator>
    <dc:date>2024-02-16T14:05:26Z</dc:date>
    <item>
      <title>Empty xml tag</title>
      <link>https://community.databricks.com/t5/data-engineering/empty-xml-tag/m-p/59602#M31458</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&amp;lt;ItemMaintenance&amp;gt;
	&amp;lt;Batch&amp;gt;
	&amp;lt;BathInfo&amp;gt;info&amp;lt;/BathInfo&amp;gt;
		&amp;lt;Item attr1="tekst" attr2="Tekst2"&amp;gt;
			&amp;lt;ItemId type="Type" id="id"/&amp;gt;
			&amp;lt;Dates&amp;gt;
				&amp;lt;Start&amp;gt;2023-11-09&amp;lt;/Start&amp;gt;
				&amp;lt;End&amp;gt;2024-01-02&amp;lt;/End&amp;gt;
			&amp;lt;/Dates&amp;gt;
			&amp;lt;MoreData&amp;gt;
			More data
			&amp;lt;/MoreData&amp;gt;
		&amp;lt;/Item&amp;gt;
	&amp;lt;/Batch&amp;gt;
&amp;lt;/ItemMaintenance&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hello, I'm facing a problem with reading xml file via a new feature xml (auto) loader. Having similar structure and using rowTag as Item, reading is not done correctly. The data frame get only attributes, and the first line ItemID, further reading stops, although the tag is not closed and further there is still a lot of data. There are no errors, and does not say anything about incorrect operation. I tried different options for reading, removed attributes and so on. I used the native documentation &lt;A href="https://docs.databricks.com/en/_extras/documents/native-xml-private-preview.pdf" target="_self"&gt;https://docs.databricks.com/en/_extras/documents/native-xml-private-preview.pdf&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 07 Feb 2024 14:03:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/empty-xml-tag/m-p/59602#M31458</guid>
      <dc:creator>YevheniiY</dc:creator>
      <dc:date>2024-02-07T14:03:03Z</dc:date>
    </item>
    <item>
      <title>Re: Empty xml tag</title>
      <link>https://community.databricks.com/t5/data-engineering/empty-xml-tag/m-p/60415#M31663</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;Hi!&amp;nbsp;I apologise for the late reply!&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":grinning_face_with_sweat:"&gt;😅&lt;/span&gt; I'm using the 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12) version of the cluster. I use the python programming language. When I read other files that do not have empty tags (i.e. a tag that does not have an opening tag and a closing tag, but has a "merged" tag) like in my first post the tag "ItemId", the code works marvellously, but when it encounters an empty tag the reading stops. If the document consists of "normal" tags that have an opening and closing tag, the reading works fine!&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;And it's marvellous and makes it very easy to work with files in databricks, but empty tags don't work for me&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":grinning_face_with_sweat:"&gt;😅&lt;/span&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = spark.read.format('xml').options(rowTag='Item').load(test_file_location)
df.display()&lt;/LI-CODE&gt;&lt;P&gt;Example for normal XML&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;  &amp;lt;people&amp;gt;
    &amp;lt;person&amp;gt;
      &amp;lt;age born="1990-02-24"&amp;gt;25&amp;lt;/age&amp;gt;
    &amp;lt;/person&amp;gt;
    &amp;lt;person&amp;gt;
      &amp;lt;age born="1985-01-01"&amp;gt;30&amp;lt;/age&amp;gt;
    &amp;lt;/person&amp;gt;
    &amp;lt;person&amp;gt;
      &amp;lt;age born="1980-01-01"&amp;gt;30&amp;lt;/age&amp;gt;
    &amp;lt;/person&amp;gt;
  &amp;lt;/people&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;Example of empty tag&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;  &amp;lt;people&amp;gt;
    &amp;lt;person&amp;gt;
      &amp;lt;age_t born="1990-02-24"/&amp;gt;
      &amp;lt;age born="1990-02-24"&amp;gt;25&amp;lt;/age&amp;gt;
    &amp;lt;/person&amp;gt;
    &amp;lt;person&amp;gt;
      &amp;lt;age_t born="1985-01-01"/&amp;gt;
      &amp;lt;age born="1985-01-01"&amp;gt;30&amp;lt;/age&amp;gt;
    &amp;lt;/person&amp;gt;
    &amp;lt;person&amp;gt;
     &amp;lt;age_t born="1980-01-01"/&amp;gt;
      &amp;lt;age born="1980-01-01"&amp;gt;30&amp;lt;/age&amp;gt;
    &amp;lt;/person&amp;gt;
  &amp;lt;/people&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Feb 2024 14:05:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/empty-xml-tag/m-p/60415#M31663</guid>
      <dc:creator>YevheniiY</dc:creator>
      <dc:date>2024-02-16T14:05:26Z</dc:date>
    </item>
  </channel>
</rss>

