<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Parallel jobs with individual contexts in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/parallel-jobs-with-individual-contexts/m-p/60072#M6507</link>
    <description>&lt;P&gt;Thanks for the response! So before hitting the Databricks API, we currently initialize the spark session using the corresponding AWS region the bucket is in, and then we read from it using sparkSession.read().text(s3Path) with the s3 path being a dynamic variable based on the exact directory within the bucket we want to read from. Looking at the article you provided, we use an instance profile to access the buckets, which has permissions to read buckets in all regions. Wouldn't that theoretically be enough, since it will read from the correct bucket most of the time? (and when it fails, rerunning it independently does succeed). It seems there's just a conflict in sparkcontexts when multiple jobs are running simultaneously in the same cluster&lt;/P&gt;</description>
    <pubDate>Tue, 13 Feb 2024 17:25:39 GMT</pubDate>
    <dc:creator>cesarc</dc:creator>
    <dc:date>2024-02-13T17:25:39Z</dc:date>
    <item>
      <title>Parallel jobs with individual contexts</title>
      <link>https://community.databricks.com/t5/get-started-discussions/parallel-jobs-with-individual-contexts/m-p/59959#M6504</link>
      <description>&lt;P&gt;I was wondering if someone could help us with implementation here. Our current program will spin up 5 jobs through the Databricks API using the same Databricks cluster but each one needs their own spark context (specifically each one will connect to a different region in AWS). The jobs are run in parallel, but it seems that some jobs will fail because it cannot find the bucket. I'm pretty sure what is happening is they're pulling the sparkcontext from the driver where it was initialized by another job instead of using the spark context we configured for that specific job. By rerunning the failed job, it will find the bucket and pass.&lt;/P&gt;&lt;P&gt;Any ideas on what we can do to either force the job to use a new spark context (instead of getorcreate()), a different cluster configuration, etc.? Thanks!&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 12 Feb 2024 17:08:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/parallel-jobs-with-individual-contexts/m-p/59959#M6504</guid>
      <dc:creator>cesarc</dc:creator>
      <dc:date>2024-02-12T17:08:39Z</dc:date>
    </item>
    <item>
      <title>Re: Parallel jobs with individual contexts</title>
      <link>https://community.databricks.com/t5/get-started-discussions/parallel-jobs-with-individual-contexts/m-p/59992#M6505</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;you can set up buckets with different credentials, endpoints, and so on.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/en/connect/storage/amazon-s3.html#per-bucket-configuration" target="_self"&gt;https://docs.databricks.com/en/connect/storage/amazon-s3.html#per-bucket-configuration&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 13 Feb 2024 00:58:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/parallel-jobs-with-individual-contexts/m-p/59992#M6505</guid>
      <dc:creator>feiyun0112</dc:creator>
      <dc:date>2024-02-13T00:58:12Z</dc:date>
    </item>
    <item>
      <title>Re: Parallel jobs with individual contexts</title>
      <link>https://community.databricks.com/t5/get-started-discussions/parallel-jobs-with-individual-contexts/m-p/60072#M6507</link>
      <description>&lt;P&gt;Thanks for the response! So before hitting the Databricks API, we currently initialize the spark session using the corresponding AWS region the bucket is in, and then we read from it using sparkSession.read().text(s3Path) with the s3 path being a dynamic variable based on the exact directory within the bucket we want to read from. Looking at the article you provided, we use an instance profile to access the buckets, which has permissions to read buckets in all regions. Wouldn't that theoretically be enough, since it will read from the correct bucket most of the time? (and when it fails, rerunning it independently does succeed). It seems there's just a conflict in sparkcontexts when multiple jobs are running simultaneously in the same cluster&lt;/P&gt;</description>
      <pubDate>Tue, 13 Feb 2024 17:25:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/parallel-jobs-with-individual-contexts/m-p/60072#M6507</guid>
      <dc:creator>cesarc</dc:creator>
      <dc:date>2024-02-13T17:25:39Z</dc:date>
    </item>
  </channel>
</rss>

