10-25-2022 05:39 AM
After installing the latest pyrasterframes (v0.10.1) on Azure databricks 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12), I can create a spark session, read the raster data and print the schema. However, when I try to perform any actions on the dataframe, it throws "NoClassDefFoundError" error.
Should I configure anything additionally apart from what has been described in this notebook?
Sample code:
from pyrasterframes import rf_ipython
from pyrasterframes.utils import create_rf_spark_session
from pyspark.sql.functions import lit
from pyrasterframes.rasterfunctions import *
import pyrasterframes.rf_ipython
from IPython.display import display
spark = create_rf_spark_session()
df = spark.read.raster('https://aoigeospatial.blob.core.windows.net/public/samples/sample_4326.tif')
df.display()
Stack trace:
Py4JJavaError: An error occurred while calling o364._dfToHTML.
: java.lang.NoClassDefFoundError: Could not initialize class org.locationtech.rasterframes.ref.RFRasterSource$
at org.locationtech.rasterframes.expressions.transformers.URIToRasterSource$.apply(URIToRasterSource.scala:60)
at org.locationtech.rasterframes.datasource.raster.RasterSourceRelation.$anonfun$buildScan$6(RasterSourceRelation.scala:114)
at scala.collection.TraversableLike$WithFilter.$anonfun$map$2(TraversableLike.scala:827)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:826)
at org.locationtech.rasterframes.datasource.raster.RasterSourceRelation.buildScan(RasterSourceRelation.scala:113)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:441)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:69)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:69)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:100)
at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:75)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$4(QueryPlanner.scala:85)
at scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:162)
at scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:162)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:162)
at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:160)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1429)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:82)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:100)
at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:75)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$4(QueryPlanner.scala:85)
at scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:162)
at scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:162)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:162)
at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:160)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1429)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:82)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:100)
at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:75)
at org.apache.spark.sql.execution.QueryExecution$.createSparkPlan(QueryExecution.scala:493)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$sparkPlan$1(QueryExecution.scala:129)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:250)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:180)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:854)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:180)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:129)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:122)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:141)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:854)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:141)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:136)
at com.databricks.sql.transaction.tahoe.metering.DeltaMetering$.reportUsage(ScanReport.scala:151)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:217)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:299)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:130)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:854)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:249)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3823)
at org.apache.spark.sql.Dataset.collect(Dataset.scala:3008)
at org.locationtech.rasterframes.util.DataFrameRenderers$DFWithPrettyPrint.toHTML(DataFrameRenderers.scala:100)
at org.locationtech.rasterframes.py.PyRFContext._dfToHTML(PyRFContext.scala:264)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
11-27-2022 08:11 PM
Hi @Karthick Narendran
Great to meet you, and thanks for your question!
Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon.
Thanks.
11-30-2022 07:18 AM
Hi @Karthick Narendran Have you installed the RasterFrames Assembly JAR as described in the above document on the cluster? If not, please try to install it so that it's installed on the cluster.
Please check the below doc to understand how to install libraries on the cluster:
https://docs.databricks.com/libraries/index.html#pypi-package
01-19-2023 08:09 PM
Hi @Karthick Narendran
I got the same error as you.
I tried every branch in the locationtech repository but failed. Then luckily I found a rasterframe branch for databricks here.
https://github.com/mjohns-databricks/rasterframes/tree/0.10.2-databricks
I cloned this repository and built the code with sbt and got Assembly JARs and pyrasterframes Whl. I registered these into the databricks library and it worked fine. For reference, here is the successfully working library file I built, which worked fine with Databricks 7.3 LTS (includes Apache Spark 3.0.1, Scala 2.12) only. It did not work with higher versions.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group