Databricks Community

Karthick · ‎10-25-2022

After installing the latest pyrasterframes (v0.10.1) on Azure databricks 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12), I can create a spark session, read the raster data and print the schema. However, when I try to perform any actions on the dataframe, it throws "NoClassDefFoundError" error.

Should I configure anything additionally apart from what has been described in this notebook?

Sample code:

from pyrasterframes import rf_ipython
from pyrasterframes.utils import create_rf_spark_session
from pyspark.sql.functions import lit
from pyrasterframes.rasterfunctions import *
import pyrasterframes.rf_ipython
from IPython.display import display
 
spark = create_rf_spark_session()
 
df = spark.read.raster('https://aoigeospatial.blob.core.windows.net/public/samples/sample_4326.tif')
 
df.display()

Stack trace:

Py4JJavaError: An error occurred while calling o364._dfToHTML.
: java.lang.NoClassDefFoundError: Could not initialize class org.locationtech.rasterframes.ref.RFRasterSource$
	at org.locationtech.rasterframes.expressions.transformers.URIToRasterSource$.apply(URIToRasterSource.scala:60)
	at org.locationtech.rasterframes.datasource.raster.RasterSourceRelation.$anonfun$buildScan$6(RasterSourceRelation.scala:114)
	at scala.collection.TraversableLike$WithFilter.$anonfun$map$2(TraversableLike.scala:827)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:826)
	at org.locationtech.rasterframes.datasource.raster.RasterSourceRelation.buildScan(RasterSourceRelation.scala:113)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:441)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:69)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:69)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:100)
	at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:75)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$4(QueryPlanner.scala:85)
	at scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:162)
	at scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:162)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:162)
	at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:160)
	at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1429)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:82)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:100)
	at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:75)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$4(QueryPlanner.scala:85)
	at scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:162)
	at scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:162)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:162)
	at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:160)
	at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1429)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:82)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:100)
	at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:75)
	at org.apache.spark.sql.execution.QueryExecution$.createSparkPlan(QueryExecution.scala:493)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$sparkPlan$1(QueryExecution.scala:129)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:250)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:180)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:854)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:180)
	at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:129)
	at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:122)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:141)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:854)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:141)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:136)
	at com.databricks.sql.transaction.tahoe.metering.DeltaMetering$.reportUsage(ScanReport.scala:151)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:217)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:299)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:130)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:854)
	at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:103)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:249)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3823)
	at org.apache.spark.sql.Dataset.collect(Dataset.scala:3008)
	at org.locationtech.rasterframes.util.DataFrameRenderers$DFWithPrettyPrint.toHTML(DataFrameRenderers.scala:100)
	at org.locationtech.rasterframes.py.PyRFContext._dfToHTML(PyRFContext.scala:264)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
	at py4j.Gateway.invoke(Gateway.java:295)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:251)
	at java.lang.Thread.run(Thread.java:748)

Anonymous · ‎11-27-2022

Hi @Karthick Narendran

Great to meet you, and thanks for your question!

Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon.

Thanks.

Noopur_Nigam · ‎11-30-2022

Hi @Karthick Narendran Have you installed the RasterFrames Assembly JAR as described in the above document on the cluster? If not, please try to install it so that it's installed on the cluster.

Please check the below doc to understand how to install libraries on the cluster:

https://docs.databricks.com/libraries/index.html#pypi-package

Nobusuke_Hanaga · ‎01-19-2023

Hi @Karthick Narendran

I got the same error as you.

I tried every branch in the locationtech repository but failed. Then luckily I found a rasterframe branch for databricks here.

https://github.com/mjohns-databricks/rasterframes/tree/0.10.2-databricks

I cloned this repository and built the code with sbt and got Assembly JARs and pyrasterframes Whl. I registered these into the databricks library and it worked fine. For reference, here is the successfully working library file I built, which worked fine with Databricks 7.3 LTS (includes Apache Spark 3.0.1, Scala 2.12) only. It did not work with higher versions.

Databricks Community

java.lang.NoClassDefFoundError: Could not initialize class org.locationtech.rasterframes.ref.RFRasterSource$

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences