<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic what config do we use to set row groups fro delta tables on data bricks. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/what-config-do-we-use-to-set-row-groups-fro-delta-tables-on-data/m-p/65974#M32978</link>
    <description>&lt;P&gt;I have tried multiples way to set row group for delta tables on data bricks notebook its not working where as I am able to set it properly using spark.&lt;BR /&gt;I tried&amp;nbsp;&lt;/P&gt;&lt;P&gt;1.&amp;nbsp;&lt;SPAN&gt;val&lt;/SPAN&gt; &lt;SPAN&gt;blockSize&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;1024&lt;/SPAN&gt; &lt;SPAN&gt;*&lt;/SPAN&gt; &lt;SPAN&gt;1024&lt;/SPAN&gt; &lt;SPAN&gt;*&lt;/SPAN&gt; &lt;SPAN&gt;60&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;spark.sparkContext.hadoopConfiguration.setInt( "dfs.blocksize", blockSize )&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;spark.sparkContext.hadoopConfiguration.setInt( "parquet.block.size", blockSize )&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;2. &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN&gt;df.repartition(&lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;).write&lt;/SPAN&gt;&lt;SPAN&gt;.option("parquet.block.size",blockSize)&lt;/SPAN&gt;&lt;SPAN&gt;.format(&lt;/SPAN&gt;&lt;SPAN&gt;"delta"&lt;/SPAN&gt;&lt;SPAN&gt;).mode(&lt;/SPAN&gt;&lt;SPAN&gt;"overwrite"&lt;/SPAN&gt;&lt;SPAN&gt;).save(&lt;/SPAN&gt;&lt;SPAN&gt;"&amp;lt;path&amp;gt;"&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN&gt;)&lt;BR /&gt;Same configs are working fine on simple parquet.&lt;BR /&gt;df size = 600 MB&lt;BR /&gt;block size = 60 MB&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;P class=""&gt;&lt;SPAN class=""&gt;NumRowGroups should be 10&lt;/SPAN&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Wed, 10 Apr 2024 06:49:25 GMT</pubDate>
    <dc:creator>dlaxminaresh</dc:creator>
    <dc:date>2024-04-10T06:49:25Z</dc:date>
    <item>
      <title>what config do we use to set row groups fro delta tables on data bricks.</title>
      <link>https://community.databricks.com/t5/data-engineering/what-config-do-we-use-to-set-row-groups-fro-delta-tables-on-data/m-p/65974#M32978</link>
      <description>&lt;P&gt;I have tried multiples way to set row group for delta tables on data bricks notebook its not working where as I am able to set it properly using spark.&lt;BR /&gt;I tried&amp;nbsp;&lt;/P&gt;&lt;P&gt;1.&amp;nbsp;&lt;SPAN&gt;val&lt;/SPAN&gt; &lt;SPAN&gt;blockSize&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;1024&lt;/SPAN&gt; &lt;SPAN&gt;*&lt;/SPAN&gt; &lt;SPAN&gt;1024&lt;/SPAN&gt; &lt;SPAN&gt;*&lt;/SPAN&gt; &lt;SPAN&gt;60&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;spark.sparkContext.hadoopConfiguration.setInt( "dfs.blocksize", blockSize )&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;spark.sparkContext.hadoopConfiguration.setInt( "parquet.block.size", blockSize )&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;2. &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN&gt;df.repartition(&lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;).write&lt;/SPAN&gt;&lt;SPAN&gt;.option("parquet.block.size",blockSize)&lt;/SPAN&gt;&lt;SPAN&gt;.format(&lt;/SPAN&gt;&lt;SPAN&gt;"delta"&lt;/SPAN&gt;&lt;SPAN&gt;).mode(&lt;/SPAN&gt;&lt;SPAN&gt;"overwrite"&lt;/SPAN&gt;&lt;SPAN&gt;).save(&lt;/SPAN&gt;&lt;SPAN&gt;"&amp;lt;path&amp;gt;"&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN&gt;)&lt;BR /&gt;Same configs are working fine on simple parquet.&lt;BR /&gt;df size = 600 MB&lt;BR /&gt;block size = 60 MB&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;P class=""&gt;&lt;SPAN class=""&gt;NumRowGroups should be 10&lt;/SPAN&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 10 Apr 2024 06:49:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-config-do-we-use-to-set-row-groups-fro-delta-tables-on-data/m-p/65974#M32978</guid>
      <dc:creator>dlaxminaresh</dc:creator>
      <dc:date>2024-04-10T06:49:25Z</dc:date>
    </item>
  </channel>
</rss>

