<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Delta file partitions in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-file-partitions/m-p/8038#M3766</link>
    <description>&lt;P&gt;Hi Vignesh,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks, the return type was a string, and converted that to a tuple and it is working.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 09 Mar 2023 11:05:36 GMT</pubDate>
    <dc:creator>thushar</dc:creator>
    <dc:date>2023-03-09T11:05:36Z</dc:date>
    <item>
      <title>Delta file partitions</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-file-partitions/m-p/8036#M3764</link>
      <description>&lt;P&gt;Have one function to create files with partitions, in that the partitions are created based on metadata (getPartitionColumns) that we are keeping. In a table we have two columns that are mentioned as partition columns, say 'Team' and 'Speciality'. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;While executing, partition columns are not substituted properly within the datafrme's write method and getting an error like below&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;AnalysisException: Partition column `"Team","Speciality"` not found in schema&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But these columns are already there in the data frame. Any idea how to resolve this?&lt;/P&gt;&lt;P&gt;Seems like the value &lt;B&gt;`"Team","Speciality" &lt;/B&gt;is considered as single column instead of separate columns.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;def dfWrite(df, targetPath,tableName):&lt;/P&gt;&lt;P&gt;   partitionColumn =  getPartitionColumns(tableName)&lt;/P&gt;&lt;P&gt;  # "Team", "Speciality"&lt;/P&gt;&lt;P&gt;   df.write.option("header", True) \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.partitionBy(partitionColumn) \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.mode("overwrite") \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.csv(targetPath)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Mar 2023 07:57:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-file-partitions/m-p/8036#M3764</guid>
      <dc:creator>thushar</dc:creator>
      <dc:date>2023-03-09T07:57:42Z</dc:date>
    </item>
    <item>
      <title>Re: Delta file partitions</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-file-partitions/m-p/8037#M3765</link>
      <description>&lt;P&gt;Hi Thushar,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You have not mentioned the return type of the getPartitionColumns method. You have to return the partition columns as collection Ex list ['Team', 'Speciality']&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Then the below method should work. &lt;/P&gt;&lt;P&gt;df.write.option("header", True) \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.partitionBy(*partitionColumn) \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.mode("overwrite") \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.csv(targetPath)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Kindly try. &lt;/P&gt;</description>
      <pubDate>Thu, 09 Mar 2023 09:54:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-file-partitions/m-p/8037#M3765</guid>
      <dc:creator>pvignesh92</dc:creator>
      <dc:date>2023-03-09T09:54:57Z</dc:date>
    </item>
    <item>
      <title>Re: Delta file partitions</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-file-partitions/m-p/8038#M3766</link>
      <description>&lt;P&gt;Hi Vignesh,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks, the return type was a string, and converted that to a tuple and it is working.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Mar 2023 11:05:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-file-partitions/m-p/8038#M3766</guid>
      <dc:creator>thushar</dc:creator>
      <dc:date>2023-03-09T11:05:36Z</dc:date>
    </item>
    <item>
      <title>Re: Delta file partitions</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-file-partitions/m-p/8039#M3767</link>
      <description>&lt;P&gt;Hi Thushar,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please upvote and mark this as answer so that the thread will be closed&lt;/P&gt;</description>
      <pubDate>Thu, 09 Mar 2023 11:08:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-file-partitions/m-p/8039#M3767</guid>
      <dc:creator>pvignesh92</dc:creator>
      <dc:date>2023-03-09T11:08:16Z</dc:date>
    </item>
    <item>
      <title>Re: Delta file partitions</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-file-partitions/m-p/8040#M3768</link>
      <description>&lt;P&gt;Hi @Thushar R​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope everything is going great.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Cheers!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 01 Apr 2023 00:52:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-file-partitions/m-p/8040#M3768</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-04-01T00:52:51Z</dc:date>
    </item>
  </channel>
</rss>

