cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to provide custom class extending SparkPlugin/ExecutorPlugin in Databricks 7.3?

krishnakash
New Contributor II

How to properly configure the jar containing the class and spark plugin in Databricks?

During DBR 7.3 cluster creation, I tried by setting the spark.plugins, spark.driver.extraClassPath and spark.executor.extraClassPath Spark configs by copying the jar reqd in /tmp folder.

class PtyExecSparkPlugin extends ExecutorPlugin  {
 
  private val logger: Logger = LoggerFactory.getLogger(Utils.logName(this.getClass))
 
  override def shutdown(): Unit = {
   //custom code statements
  }
}

However the cluster creation fails with com.example.spark.PtyExecSparkPlugin not found in com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader@622d7e4

Full log4j logs:

21/11/01 13:33:01 ERROR SparkContext: Error initializing SparkContext.
java.lang.ClassNotFoundException: com.protegrity.spark.PtyExecSparkPlugin not found in com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader@622d7e4
	at com.databricks.backend.daemon.driver.ClassLoaders$MultiReplClassLoader.loadClass(ClassLoaders.scala:115)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.spark.util.Utils$.classForName(Utils.scala:226)
	at org.apache.spark.util.Utils$.$anonfun$loadExtensions$1(Utils.scala:3006)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
	at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
	at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:3004)
	at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:160)
	at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:146)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:591)
	at com.databricks.backend.daemon.driver.DatabricksILoop$.$anonfun$initializeSharedDriverContext$1(DatabricksILoop.scala:347)
	at com.databricks.backend.daemon.driver.ClassLoaders$.withContextClassLoader(ClassLoaders.scala:29)
	at com.databricks.backend.daemon.driver.DatabricksILoop$.initializeSharedDriverContext(DatabricksILoop.scala:347)
	at com.databricks.backend.daemon.driver.DatabricksILoop$.getOrCreateSharedDriverContext(DatabricksILoop.scala:277)
	at com.databricks.backend.daemon.driver.DriverCorral.com$databricks$backend$daemon$driver$DriverCorral$$driverContext(DriverCorral.scala:179)
	at com.databricks.backend.daemon.driver.DriverCorral.<init>(DriverCorral.scala:216)
	at com.databricks.backend.daemon.driver.DriverDaemon.<init>(DriverDaemon.scala:39)
	at com.databricks.backend.daemon.driver.DriverDaemon$.create(DriverDaemon.scala:211)
	at com.databricks.backend.daemon.driver.DriverDaemon$.wrappedMain(DriverDaemon.scala:216)
	at com.databricks.DatabricksMain.$anonfun$main$1(DatabricksMain.scala:106)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.DatabricksMain.$anonfun$withStartupProfilingData$1(DatabricksMain.scala:321)
	at com.databricks.logging.UsageLogging.$anonfun$recordOperation$4(UsageLogging.scala:431)
	at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:239)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:234)
	at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:231)
	at com.databricks.DatabricksMain.withAttributionContext(DatabricksMain.scala:74)
	at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:276)
	at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:269)
	at com.databricks.DatabricksMain.withAttributionTags(DatabricksMain.scala:74)
	at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:412)
	at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:338)
	at com.databricks.DatabricksMain.recordOperation(DatabricksMain.scala:74)
	at com.databricks.DatabricksMain.withStartupProfilingData(DatabricksMain.scala:321)
	at com.databricks.DatabricksMain.main(DatabricksMain.scala:105)
	at com.databricks.backend.daemon.driver.DriverDaemon.main(DriverDaemon.scala)
Caused by: java.lang.ClassNotFoundException: com.protegrity.spark.PtyExecSparkPlugin
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
	at com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
	at com.databricks.backend.daemon.driver.ClassLoaders$MultiReplClassLoader.loadClass(ClassLoaders.scala:112)
	... 43 more
21/11/01 13:33:02 INFO AbstractConnector: Stopped Spark@b6bccb4{HTTP/1.1,[http/1.1]}{10.88.234.70:40001}
21/11/01 13:33:02 INFO SparkUI: Stopped Spark web UI at http://10.88.234.70:40001
21/11/01 13:33:02 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/11/01 13:33:02 INFO MemoryStore: MemoryStore cleared
21/11/01 13:33:02 INFO BlockManager: BlockManager stopped
21/11/01 13:33:02 INFO BlockManagerMaster: BlockManagerMaster stopped
21/11/01 13:33:02 WARN MetricsSystem: Stopping a MetricsSystem that is not running
21/11/01 13:33:02 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/11/01 13:33:02 INFO SparkContext: Successfully stopped SparkContext

1 ACCEPTED SOLUTION

Accepted Solutions

Hi @Krishna Kashiv​ , You might consider adding this as an init script (https://docs.microsoft.com/en-us/azure/databricks/clusters/init-scripts).

The init scripts give you an opportunity to add jars to the cluster before spark even begins which is probably what the spark plugin is expecting.

  • Upload your jar to dbfs, somewhere like 
  • dbfs:/databricks/plugins
  • Create and upload a bash script like below to the same place.
  • Create / Edit a cluster with the init script specified.
#!/bin/bash
 
STAGE_DIR="/dbfs/databricks/plugins/
 
echo "BEGIN: Upload Spark Plugins"
cp -f $STAGE_DIR/*.jar /mnt/driver-daemon/jars || { echo "Error copying Spark Plugin library file"; exit 1;}
echo "END: Upload Spark Plugin JARs"
 
echo "BEGIN: Modify Spark config settings"
cat << 'EOF' > /databricks/driver/conf/spark-plugin-driver-defaults.conf
[driver] {
   "spark.plugins" = "com.example.CustomExecSparkPlugin"
}
EOF
echo "END: Modify Spark config settings"

I believe the copying of the jar to /mnt/driver-daemons/jars will make Spark aware of the jar before Spark fully initializes (https://docs.microsoft.com/en-us/azure/databricks/clusters/configure#--spark-configuration).

Less certain it will make to the executor though 

View solution in original post

2 REPLIES 2

Anonymous
Not applicable

Hello @Krishna Kashiv​ - I don't know if we've met yet. My name is Piper and I'm a community moderator here. Thank you for your new question. It looks thorough!

Let's give it a while to see what our members have to say. Otherwise, we will circle back to this.

Hi @Krishna Kashiv​ , You might consider adding this as an init script (https://docs.microsoft.com/en-us/azure/databricks/clusters/init-scripts).

The init scripts give you an opportunity to add jars to the cluster before spark even begins which is probably what the spark plugin is expecting.

  • Upload your jar to dbfs, somewhere like 
  • dbfs:/databricks/plugins
  • Create and upload a bash script like below to the same place.
  • Create / Edit a cluster with the init script specified.
#!/bin/bash
 
STAGE_DIR="/dbfs/databricks/plugins/
 
echo "BEGIN: Upload Spark Plugins"
cp -f $STAGE_DIR/*.jar /mnt/driver-daemon/jars || { echo "Error copying Spark Plugin library file"; exit 1;}
echo "END: Upload Spark Plugin JARs"
 
echo "BEGIN: Modify Spark config settings"
cat << 'EOF' > /databricks/driver/conf/spark-plugin-driver-defaults.conf
[driver] {
   "spark.plugins" = "com.example.CustomExecSparkPlugin"
}
EOF
echo "END: Modify Spark config settings"

I believe the copying of the jar to /mnt/driver-daemons/jars will make Spark aware of the jar before Spark fully initializes (https://docs.microsoft.com/en-us/azure/databricks/clusters/configure#--spark-configuration).

Less certain it will make to the executor though 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group