cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Lesson 6.1 of Data Engineering. Error when reading stream - java.lang.UnsupportedOperationException: com.databricks.backend.daemon.data.client.DBFSV1.resolvePathOnPhysicalStorage(path: Path)

LearnerShahid
New Contributor II

Below function executes fine:

def autoload_to_table(data_source, source_format, table_name, checkpoint_directory):

  query = (spark.readStream

         .format("cloudFiles")

         .option("cloudFiles.format", source_format)

         .option("cloudFiles.schemaLocation", checkpoint_directory)

         .load(data_source)

         .writeStream

         .option("checkpointLocation", checkpoint_directory)

         .option("mergeSchema", "true")

         .table(table_name))

  return query

Receiving Error while calling function. Please let me know where is the problem. I have verified that source data exists. 

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

Autoloader is not supported on community edition.

View solution in original post

6 REPLIES 6

Prabakar
Databricks Employee
Databricks Employee

Hi @Shahid Akhterโ€‹ could you please copy the full error stack and paste it here.

java.lang.UnsupportedOperationException: com.databricks.backend.daemon.data.client.DBFSV1.resolvePathOnPhysicalStorage(path: Path)

---------------------------------------------------------------------------

Py4JJavaError Traceback (most recent call last)

<command-1919808831305181> in <module>

----> 1 query = autoload_to_table(data_source = f"{DA.paths.working_dir}/tracker",

2 source_format = "json",

3 table_name = "target_table",

4 checkpoint_directory = f"{DA.paths.checkpoints}/target_table")

<command-1919808831305179> in autoload_to_table(data_source, source_format, table_name, checkpoint_directory)

1 def autoload_to_table(data_source, source_format, table_name, checkpoint_directory):

----> 2 query = (spark.readStream

3 .format("cloudFiles")

4 .option("cloudFiles.format", source_format)

5 .option("cloudFiles.schemaLocation", checkpoint_directory)

/databricks/spark/python/pyspark/sql/streaming.py in load(self, path, format, schema, **options)

450 raise ValueError("If the path is provided for stream, it needs to be a " +

451 "non-empty string. List of paths are not supported.")

--> 452 return self._df(self._jreader.load(path))

453 else:

454 return self._df(self._jreader.load())

Cedric
Databricks Employee
Databricks Employee

Hi @LearnerShahid,

I've tested this on my own environment and I am able to reproduce this as well when I'm using DBR 7.3. Could you please try DBR 10.4+ and see if that solves the issue?

Hi Cedric/Team,

I am using Databricks Community Edition - 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12).

Can you please confirm if you have tried on Community Edition? As per details on Community Edition Cluster Environment Details, it runs on DBFSV1 (screenshot attached) and not the latest DBFSV2. In case, this is an issue due to DBFV1 and DBFSV2, kindly let me know how to fix this as this is causing issues in the subsequent lessons using Spark Streaming.

FYI, I have tried multiple times to execute the code by creating different clusters and it still shows the same error.ENVvariables

Anonymous
Not applicable

Autoloader is not supported on community edition.

Thank you for sharing this. I will mark this as the best response of this thread

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group