<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Run driver on spot instance in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/run-driver-on-spot-instance/m-p/53736#M29869</link>
    <description>&lt;P&gt;Thanks for your answer &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt; ! Good overview, and I understand that "driver on-demand and the rest on spot" is a good generall advice. But I am still considering using spot instances for both, and I am left with two concrete questions:&lt;/P&gt;&lt;P&gt;1: Can we end up in a corrupt state if the driver is reclaimed? There are many other scenarios in which a driver can crash/turn off etc, so I assume spark is written to handle this without eating our data, is this correct? (I understand that software can have bugs, my question is if spark is **intended** to be able to handle the case of a driver failure withouth corrupting data, not if you can guarantee that it will actually work in all cases).&lt;/P&gt;&lt;P&gt;2: If we use databricks workflows with retries on the job, and a driver gets reclaimed, will the job get retried? And does it count towards the max retries?&lt;/P&gt;</description>
    <pubDate>Fri, 24 Nov 2023 09:08:28 GMT</pubDate>
    <dc:creator>Erik</dc:creator>
    <dc:date>2023-11-24T09:08:28Z</dc:date>
    <item>
      <title>Run driver on spot instance</title>
      <link>https://community.databricks.com/t5/data-engineering/run-driver-on-spot-instance/m-p/52881#M29646</link>
      <description>&lt;P&gt;The traditional advice seems to be to run the driver on "on demand", and optionally the workers on spot. And this is indeed what happends if one chooses to run with spot instances in Databricks. But I am interested in what happens if we run with a driver which gets evicted? Can we end up with corrupt data?&lt;/P&gt;&lt;P&gt;We have some batch jobs which run as structured streaming every night. They seem like prime candidates to run on 100% spot with retries, but I want to understand why this is not a more common pattern first.&lt;/P&gt;</description>
      <pubDate>Sat, 18 Nov 2023 15:28:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/run-driver-on-spot-instance/m-p/52881#M29646</guid>
      <dc:creator>Erik</dc:creator>
      <dc:date>2023-11-18T15:28:32Z</dc:date>
    </item>
    <item>
      <title>Re: Run driver on spot instance</title>
      <link>https://community.databricks.com/t5/data-engineering/run-driver-on-spot-instance/m-p/53736#M29869</link>
      <description>&lt;P&gt;Thanks for your answer &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt; ! Good overview, and I understand that "driver on-demand and the rest on spot" is a good generall advice. But I am still considering using spot instances for both, and I am left with two concrete questions:&lt;/P&gt;&lt;P&gt;1: Can we end up in a corrupt state if the driver is reclaimed? There are many other scenarios in which a driver can crash/turn off etc, so I assume spark is written to handle this without eating our data, is this correct? (I understand that software can have bugs, my question is if spark is **intended** to be able to handle the case of a driver failure withouth corrupting data, not if you can guarantee that it will actually work in all cases).&lt;/P&gt;&lt;P&gt;2: If we use databricks workflows with retries on the job, and a driver gets reclaimed, will the job get retried? And does it count towards the max retries?&lt;/P&gt;</description>
      <pubDate>Fri, 24 Nov 2023 09:08:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/run-driver-on-spot-instance/m-p/53736#M29869</guid>
      <dc:creator>Erik</dc:creator>
      <dc:date>2023-11-24T09:08:28Z</dc:date>
    </item>
  </channel>
</rss>

