cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Lesson 6.1 of Data Engineering. Error when reading stream - java.lang.UnsupportedOperationException: com.databricks.backend.daemon.data.client.DBFSV1.resolvePathOnPhysicalStorage(path: Path)

LearnerShahid
New Contributor II

Below function executes fine:

def autoload_to_table(data_source, source_format, table_name, checkpoint_directory):

  query = (spark.readStream

         .format("cloudFiles")

         .option("cloudFiles.format", source_format)

         .option("cloudFiles.schemaLocation", checkpoint_directory)

         .load(data_source)

         .writeStream

         .option("checkpointLocation", checkpoint_directory)

         .option("mergeSchema", "true")

         .table(table_name))

  return query

Receiving Error while calling function. Please let me know where is the problem. I have verified that source data exists. 

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

Autoloader is not supported on community edition.

View solution in original post

6 REPLIES 6

Prabakar
Esteemed Contributor III
Esteemed Contributor III

Hi @Shahid Akhter​ could you please copy the full error stack and paste it here.

java.lang.UnsupportedOperationException: com.databricks.backend.daemon.data.client.DBFSV1.resolvePathOnPhysicalStorage(path: Path)

---------------------------------------------------------------------------

Py4JJavaError Traceback (most recent call last)

<command-1919808831305181> in <module>

----> 1 query = autoload_to_table(data_source = f"{DA.paths.working_dir}/tracker",

2 source_format = "json",

3 table_name = "target_table",

4 checkpoint_directory = f"{DA.paths.checkpoints}/target_table")

<command-1919808831305179> in autoload_to_table(data_source, source_format, table_name, checkpoint_directory)

1 def autoload_to_table(data_source, source_format, table_name, checkpoint_directory):

----> 2 query = (spark.readStream

3 .format("cloudFiles")

4 .option("cloudFiles.format", source_format)

5 .option("cloudFiles.schemaLocation", checkpoint_directory)

/databricks/spark/python/pyspark/sql/streaming.py in load(self, path, format, schema, **options)

450 raise ValueError("If the path is provided for stream, it needs to be a " +

451 "non-empty string. List of paths are not supported.")

--> 452 return self._df(self._jreader.load(path))

453 else:

454 return self._df(self._jreader.load())

Hi @LearnerShahid,

I've tested this on my own environment and I am able to reproduce this as well when I'm using DBR 7.3. Could you please try DBR 10.4+ and see if that solves the issue?

Hi Cedric/Team,

I am using Databricks Community Edition - 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12).

Can you please confirm if you have tried on Community Edition? As per details on Community Edition Cluster Environment Details, it runs on DBFSV1 (screenshot attached) and not the latest DBFSV2. In case, this is an issue due to DBFV1 and DBFSV2, kindly let me know how to fix this as this is causing issues in the subsequent lessons using Spark Streaming.

FYI, I have tried multiple times to execute the code by creating different clusters and it still shows the same error.ENVvariables

Anonymous
Not applicable

Autoloader is not supported on community edition.

Thank you for sharing this. I will mark this as the best response of this thread

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.