<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Is there any way of determining last stage of SparkSQL Application Execution? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/is-there-any-way-of-determining-last-stage-of-sparksql/m-p/14083#M8634</link>
    <description>&lt;P&gt;@Krishna Kashiv​&amp;nbsp;&lt;/P&gt;&lt;P&gt;May be &lt;B&gt;ExecutorPlugin.java&lt;/B&gt; can help. It has all the methods you might required. Let me know if it works or not.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You need to implement this interface &lt;B&gt;org.apache.spark.api.plugin.SparkPlugin&lt;/B&gt;&lt;/P&gt;&lt;P&gt;and expose it as  spark.plugins  = com.abc.ImplementationClass&lt;/P&gt;</description>
    <pubDate>Wed, 13 Oct 2021 12:16:09 GMT</pubDate>
    <dc:creator>User16763506586</dc:creator>
    <dc:date>2021-10-13T12:16:09Z</dc:date>
    <item>
      <title>Is there any way of determining last stage of SparkSQL Application Execution?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-there-any-way-of-determining-last-stage-of-sparksql/m-p/14079#M8630</link>
      <description>&lt;P&gt;I have created custom UDF's that generate logs. These logs can be flushed by calling another API exposed which is exposed by an internal layer. However I want to call this API just after the execution of the UDF comes to an end. Is there any way of determining if the execution of a particular UDF has finished to invoke the API to flush logs and clean up.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For e.g. When we extend Hive's GenericUDF class for a Hive UDF, there is a close function available in the lifecycle of the UDF that will be called after the execution of the UDF.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is a similar approach possible in SparkSQL UDFs?&lt;/P&gt;</description>
      <pubDate>Mon, 04 Oct 2021 12:49:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-there-any-way-of-determining-last-stage-of-sparksql/m-p/14079#M8630</guid>
      <dc:creator>krishnakash</dc:creator>
      <dc:date>2021-10-04T12:49:21Z</dc:date>
    </item>
    <item>
      <title>Re: Is there any way of determining last stage of SparkSQL Application Execution?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-there-any-way-of-determining-last-stage-of-sparksql/m-p/14081#M8632</link>
      <description>&lt;P&gt;TBH, I don't think callback feature yet supported for UDF. But  we can workaround by implementing  &lt;B&gt;SparkListenerInterface&lt;/B&gt; or extending &lt;B&gt;SparkFirehoseListener&lt;/B&gt; . These have several methods one such method which might help us is &lt;B&gt;onStageCompleted&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Interface definition can be found &lt;A href="https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L296" alt="https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L296" target="_blank"&gt;here&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Once you implement the interface you can add it to spark by using sparkContext.addSparkListener&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;/**
   * :: DeveloperApi ::
   * Register a listener to receive up-calls from events that happen during execution.
   */
  @DeveloperApi
  def addSparkListener(listener: SparkListenerInterface): Unit = {
    listenerBus.addToSharedQueue(listener)
  }&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 06 Oct 2021 08:54:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-there-any-way-of-determining-last-stage-of-sparksql/m-p/14081#M8632</guid>
      <dc:creator>User16763506586</dc:creator>
      <dc:date>2021-10-06T08:54:13Z</dc:date>
    </item>
    <item>
      <title>Re: Is there any way of determining last stage of SparkSQL Application Execution?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-there-any-way-of-determining-last-stage-of-sparksql/m-p/14082#M8633</link>
      <description>&lt;P&gt;We tried adding the SparkListener, we had added loggers for all type of function of the&amp;nbsp;&lt;B&gt;SparkListenerInterface&lt;/B&gt;&amp;nbsp;however according to our observation, we saw that the logs are getting generated in the driver logs. That means the driver node executes the callback methods.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is it possible to call these callback methods from the executor nodes; as the audit logs to be flushed are generated in the executor nodes?&lt;/P&gt;</description>
      <pubDate>Fri, 08 Oct 2021 05:40:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-there-any-way-of-determining-last-stage-of-sparksql/m-p/14082#M8633</guid>
      <dc:creator>krishnakash</dc:creator>
      <dc:date>2021-10-08T05:40:14Z</dc:date>
    </item>
    <item>
      <title>Re: Is there any way of determining last stage of SparkSQL Application Execution?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-there-any-way-of-determining-last-stage-of-sparksql/m-p/14083#M8634</link>
      <description>&lt;P&gt;@Krishna Kashiv​&amp;nbsp;&lt;/P&gt;&lt;P&gt;May be &lt;B&gt;ExecutorPlugin.java&lt;/B&gt; can help. It has all the methods you might required. Let me know if it works or not.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You need to implement this interface &lt;B&gt;org.apache.spark.api.plugin.SparkPlugin&lt;/B&gt;&lt;/P&gt;&lt;P&gt;and expose it as  spark.plugins  = com.abc.ImplementationClass&lt;/P&gt;</description>
      <pubDate>Wed, 13 Oct 2021 12:16:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-there-any-way-of-determining-last-stage-of-sparksql/m-p/14083#M8634</guid>
      <dc:creator>User16763506586</dc:creator>
      <dc:date>2021-10-13T12:16:09Z</dc:date>
    </item>
    <item>
      <title>Re: Is there any way of determining last stage of SparkSQL Application Execution?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-there-any-way-of-determining-last-stage-of-sparksql/m-p/14084#M8635</link>
      <description>&lt;P&gt;Hi, how to properly configure the jar containing the class and spark plugin in Databricks?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;During DBR 7.3 cluster creation, I tried by setting the spark.plugins , spark.driver.extraClassPath and spark.executor.extraClassPath Spark configs. But cluster fails to start and Driver logs contains:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;21/10/28 17:20:34 ERROR SparkContext: Error initializing SparkContext.
java.lang.ClassNotFoundException:com.example.PtyExecSparkPlugin not found in com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader@47a4eee2
	at com.databricks.backend.daemon.driver.ClassLoaders$MultiReplClassLoader.loadClass(ClassLoaders.scala:115)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.spark.util.Utils$.classForName(Utils.scala:226)
	at org.apache.spark.util.Utils$.$anonfun$loadExtensions$1(Utils.scala:3006)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
	at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
	at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:3004)
	at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:160)
	at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:146)
	at org.apache.spark.SparkContext.&amp;lt;init&amp;gt;(SparkContext.scala:591)
	at com.databricks.backend.daemon.driver.DatabricksILoop$.$anonfun$initializeSharedDriverContext$1(DatabricksILoop.scala:347)
	at com.databricks.backend.daemon.driver.ClassLoaders$.withContextClassLoader(ClassLoaders.scala:29)
	at com.databricks.backend.daemon.driver.DatabricksILoop$.initializeSharedDriverContext(DatabricksILoop.scala:347)
	at com.databricks.backend.daemon.driver.DatabricksILoop$.getOrCreateSharedDriverContext(DatabricksILoop.scala:277)
	at com.databricks.backend.daemon.driver.DriverCorral.com$databricks$backend$daemon$driver$DriverCorral$$driverContext(DriverCorral.scala:179)
	at com.databricks.backend.daemon.driver.DriverCorral.&amp;lt;init&amp;gt;(DriverCorral.scala:216)
	at com.databricks.backend.daemon.driver.DriverDaemon.&amp;lt;init&amp;gt;(DriverDaemon.scala:39)
	at com.databricks.backend.daemon.driver.DriverDaemon$.create(DriverDaemon.scala:211)
	at com.databricks.backend.daemon.driver.DriverDaemon$.wrappedMain(DriverDaemon.scala:216)
	at com.databricks.DatabricksMain.$anonfun$main$1(DatabricksMain.scala:106)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.DatabricksMain.$anonfun$withStartupProfilingData$1(DatabricksMain.scala:321)
	at com.databricks.logging.UsageLogging.$anonfun$recordOperation$4(UsageLogging.scala:431)
	at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:239)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:234)
	at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:231)
	at com.databricks.DatabricksMain.withAttributionContext(DatabricksMain.scala:74)
	at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:276)
	at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:269)
	at com.databricks.DatabricksMain.withAttributionTags(DatabricksMain.scala:74)
	at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:412)
	at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:338)
	at com.databricks.DatabricksMain.recordOperation(DatabricksMain.scala:74)
	at com.databricks.DatabricksMain.withStartupProfilingData(DatabricksMain.scala:321)
	at com.databricks.DatabricksMain.main(DatabricksMain.scala:105)
	at com.databricks.backend.daemon.driver.DriverDaemon.main(DriverDaemon.scala)
Caused by: java.lang.ClassNotFoundException: com.example.PtyExecSparkPlugin
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
	at com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
	at com.databricks.backend.daemon.driver.ClassLoaders$MultiReplClassLoader.loadClass(ClassLoaders.scala:112)
	... 43 more&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I have also tried by installing the Jar under Cluster Library. Still Class is not loading.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Oct 2021 03:50:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-there-any-way-of-determining-last-stage-of-sparksql/m-p/14084#M8635</guid>
      <dc:creator>FRG96</dc:creator>
      <dc:date>2021-10-29T03:50:33Z</dc:date>
    </item>
  </channel>
</rss>

