<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Change spark configs in Serverless compute clusters in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/change-spark-configs-in-serverless-compute-clusters/m-p/105627#M42211</link>
    <description>&lt;P&gt;Is there anything I can do to increase the memory? Or do you know of a way I could make it not run out of memory? Here is the code block:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;dt = datetime.strptime(input_date, "%Y/%m/%d")
buffer_sec = 6

timestamp_start_ms = int((dt.replace(tzinfo=timezone.utc).timestamp() - buffer_sec) * 1000)
timestamp_end_ms = int((timestamp_start_ms + (24 * 3600 * 1000)) + buffer_sec * 2 * 1000)
interpolated_filtered = f"SELECT * FROM `catalog`.default.events \
WHERE timestamp &amp;gt;= {timestamp_start_ms} AND timestamp &amp;lt;= {timestamp_end_ms} ORDER BY timestamp ASC"
interpolated_df = spark.sql(interpolated_filtered).toPandas()&lt;/LI-CODE&gt;</description>
    <pubDate>Tue, 14 Jan 2025 17:39:30 GMT</pubDate>
    <dc:creator>ls</dc:creator>
    <dc:date>2025-01-14T17:39:30Z</dc:date>
    <item>
      <title>Change spark configs in Serverless compute clusters</title>
      <link>https://community.databricks.com/t5/data-engineering/change-spark-configs-in-serverless-compute-clusters/m-p/105512#M42163</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Howdy!&lt;BR /&gt;I wanted to know how I can change some spark configs in a Serverless compute. I have a base.yml file and tried placing:&amp;nbsp;&lt;BR /&gt;&lt;SPAN class=""&gt;spark_conf:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;- spark.driver.maxResultSize:&lt;/SPAN&gt; &lt;SPAN class=""&gt;"16g"&lt;/SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;but I still get his error:&lt;BR /&gt;[&lt;/SPAN&gt;&lt;A class="" href="https://docs.databricks.com/error-messages/error-classes.html#config_not_available" target="_blank" rel="noopener noreferrer"&gt;CONFIG_NOT_AVAILABLE&lt;/A&gt;&lt;SPAN&gt;]&lt;/SPAN&gt;&lt;SPAN&gt; Configuration spark.driver.maxResultSize is not available. SQLSTATE: 42K0I&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;and trying to change a config within the notebook is not allowed either.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 13 Jan 2025 22:39:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/change-spark-configs-in-serverless-compute-clusters/m-p/105512#M42163</guid>
      <dc:creator>ls</dc:creator>
      <dc:date>2025-01-13T22:39:32Z</dc:date>
    </item>
    <item>
      <title>Re: Change spark configs in Serverless compute clusters</title>
      <link>https://community.databricks.com/t5/data-engineering/change-spark-configs-in-serverless-compute-clusters/m-p/105513#M42164</link>
      <description>&lt;P&gt;Spark configs are limited in Serverless, this are the supported configs you can set&amp;nbsp;&lt;A href="https://docs.databricks.com/en/release-notes/serverless/index.html#supported-spark-configuration-parameters" target="_blank"&gt;https://docs.databricks.com/en/release-notes/serverless/index.html#supported-spark-configuration-parameters&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 13 Jan 2025 22:47:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/change-spark-configs-in-serverless-compute-clusters/m-p/105513#M42164</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2025-01-13T22:47:00Z</dc:date>
    </item>
    <item>
      <title>Re: Change spark configs in Serverless compute clusters</title>
      <link>https://community.databricks.com/t5/data-engineering/change-spark-configs-in-serverless-compute-clusters/m-p/105627#M42211</link>
      <description>&lt;P&gt;Is there anything I can do to increase the memory? Or do you know of a way I could make it not run out of memory? Here is the code block:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;dt = datetime.strptime(input_date, "%Y/%m/%d")
buffer_sec = 6

timestamp_start_ms = int((dt.replace(tzinfo=timezone.utc).timestamp() - buffer_sec) * 1000)
timestamp_end_ms = int((timestamp_start_ms + (24 * 3600 * 1000)) + buffer_sec * 2 * 1000)
interpolated_filtered = f"SELECT * FROM `catalog`.default.events \
WHERE timestamp &amp;gt;= {timestamp_start_ms} AND timestamp &amp;lt;= {timestamp_end_ms} ORDER BY timestamp ASC"
interpolated_df = spark.sql(interpolated_filtered).toPandas()&lt;/LI-CODE&gt;</description>
      <pubDate>Tue, 14 Jan 2025 17:39:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/change-spark-configs-in-serverless-compute-clusters/m-p/105627#M42211</guid>
      <dc:creator>ls</dc:creator>
      <dc:date>2025-01-14T17:39:30Z</dc:date>
    </item>
    <item>
      <title>Re: Change spark configs in Serverless compute clusters</title>
      <link>https://community.databricks.com/t5/data-engineering/change-spark-configs-in-serverless-compute-clusters/m-p/105631#M42212</link>
      <description>&lt;P class="_1t7bu9h1 paragraph"&gt;To address the memory issue in your Serverless compute environment, you can consider the following strategies:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Optimize the Query&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;STRONG&gt;Filter Early&lt;/STRONG&gt;: Ensure that you are filtering the data as early as possible in your query to reduce the amount of data being processed. For example, if you can add more specific conditions to your &lt;CODE&gt;WHERE&lt;/CODE&gt; clause, it will help in reducing the data size.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Limit Columns&lt;/STRONG&gt;: Select only the necessary columns instead of using &lt;CODE&gt;SELECT *&lt;/CODE&gt;. This reduces the amount of data being transferred and processed.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Use Spark DataFrame Operations&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;Instead of converting the entire result to a Pandas DataFrame using &lt;CODE&gt;toPandas()&lt;/CODE&gt;, try to perform as many operations as possible using Spark DataFrame operations. Spark DataFrames are distributed and can handle larger datasets more efficiently than Pandas DataFrames.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Use Delta Tables&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;If you are working with large datasets, consider using Delta tables. Delta tables provide optimized storage and query performance, which can help in managing memory usage more efficiently.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Tue, 14 Jan 2025 17:46:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/change-spark-configs-in-serverless-compute-clusters/m-p/105631#M42212</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2025-01-14T17:46:03Z</dc:date>
    </item>
  </channel>
</rss>

