<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Databricks connect, set spark config in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-connect-set-spark-config/m-p/107281#M42756</link>
    <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Iam using databricks connect to compute with databricks cluster. I need to set some spark configurations, namely&amp;nbsp;&lt;SPAN&gt;spark.files.ignoreCorruptFiles. As I have experienced, setting spark configuration in databricks connect for the current session, has no effect. Also I cannot configure the cluster itself, as it is shared cluster.&lt;/SPAN&gt;&amp;nbsp;Any solution ?&lt;/P&gt;</description>
    <pubDate>Mon, 27 Jan 2025 18:39:12 GMT</pubDate>
    <dc:creator>mrkure</dc:creator>
    <dc:date>2025-01-27T18:39:12Z</dc:date>
    <item>
      <title>Databricks connect, set spark config</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-connect-set-spark-config/m-p/107281#M42756</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Iam using databricks connect to compute with databricks cluster. I need to set some spark configurations, namely&amp;nbsp;&lt;SPAN&gt;spark.files.ignoreCorruptFiles. As I have experienced, setting spark configuration in databricks connect for the current session, has no effect. Also I cannot configure the cluster itself, as it is shared cluster.&lt;/SPAN&gt;&amp;nbsp;Any solution ?&lt;/P&gt;</description>
      <pubDate>Mon, 27 Jan 2025 18:39:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-connect-set-spark-config/m-p/107281#M42756</guid>
      <dc:creator>mrkure</dc:creator>
      <dc:date>2025-01-27T18:39:12Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks connect, set spark config</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-connect-set-spark-config/m-p/107291#M42760</link>
      <description>&lt;P&gt;Have you tried setting it up in your code as:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;from pyspark.sql import SparkSession

# Create a Spark session
spark = SparkSession.builder \
    .appName("YourAppName") \
    .config("spark.files.ignoreCorruptFiles", "true") \
    .getOrCreate()

# Your Spark code here&lt;/LI-CODE&gt;</description>
      <pubDate>Mon, 27 Jan 2025 20:57:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-connect-set-spark-config/m-p/107291#M42760</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2025-01-27T20:57:46Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks connect, set spark config</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-connect-set-spark-config/m-p/107433#M42802</link>
      <description>&lt;P&gt;Yes I did. This time in databricks connect and even in databricks notebook, the behaviour is the same. Small note, I have set the setting to false, as I want the code to fail if any file cannot be loaded.&lt;/P&gt;&lt;P&gt;Following code returns false for the check and ends up with error as expected.&lt;/P&gt;&lt;LI-CODE lang="python"&gt;print(spark.conf.get("spark.sql.files.ignoreCorruptFiles"))
paths = ["path_to_corrupted_file"]
df = spark.read(*paths)&lt;/LI-CODE&gt;&lt;P&gt;But following code returns false for the check, but df is created succesfully with one file loaded. Expected behaviour is to end up also with error. But it seems that there is still fault tolerance.&lt;/P&gt;&lt;LI-CODE lang="python"&gt;print(spark.conf.get("spark.sql.files.ignoreCorruptFiles"))
paths = ["path_to_corrupted_file", "path_to_normal_file"]
df = spark.read(*paths)&lt;/LI-CODE&gt;&lt;P&gt;It is probable, that I do not understand the behaviour of the setting correctly, as I expect it to ends up with error too.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 28 Jan 2025 16:21:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-connect-set-spark-config/m-p/107433#M42802</guid>
      <dc:creator>mrkure</dc:creator>
      <dc:date>2025-01-28T16:21:49Z</dc:date>
    </item>
  </channel>
</rss>

