09-02-2022 01:31 AM
Below function executes fine:
def autoload_to_table(data_source, source_format, table_name, checkpoint_directory):
query = (spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", source_format)
.option("cloudFiles.schemaLocation", checkpoint_directory)
.load(data_source)
.writeStream
.option("checkpointLocation", checkpoint_directory)
.option("mergeSchema", "true")
.table(table_name))
return query
Receiving Error while calling function. Please let me know where is the problem.
09-05-2022 04:48 AM
09-02-2022 01:33 AM
Hi @Shahid Akhter could you please copy the full error stack and paste it here.
09-02-2022 01:52 AM
java.lang.UnsupportedOperationException: com.databricks.backend.daemon.data.client.DBFSV1.resolvePathOnPhysicalStorage(path: Path)
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<command-1919808831305181> in <module>
----> 1 query = autoload_to_table(data_source = f"{DA.paths.working_dir}/tracker",
2 source_format = "json",
3 table_name = "target_table",
4 checkpoint_directory = f"{DA.paths.checkpoints}/target_table")
<command-1919808831305179> in autoload_to_table(data_source, source_format, table_name, checkpoint_directory)
1 def autoload_to_table(data_source, source_format, table_name, checkpoint_directory):
----> 2 query = (spark.readStream
3 .format("cloudFiles")
4 .option("cloudFiles.format", source_format)
5 .option("cloudFiles.schemaLocation", checkpoint_directory)
/databricks/spark/python/pyspark/sql/streaming.py in load(self, path, format, schema, **options)
450 raise ValueError("If the path is provided for stream, it needs to be a " +
451 "non-empty string. List of paths are not supported.")
--> 452 return self._df(self._jreader.load(path))
453 else:
454 return self._df(self._jreader.load())
09-02-2022 04:45 AM
Hi @LearnerShahid,
I've tested this on my own environment and I am able to reproduce this as well when I'm using DBR 7.3. Could you please try DBR 10.4+ and see if that solves the issue?
09-02-2022 08:19 AM
Hi Cedric/Team,
I am using Databricks Community Edition - 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12).
Can you please confirm if you have tried on Community Edition? As per details on Community Edition Cluster Environment Details, it runs on DBFSV1 (screenshot attached) and not the latest DBFSV2. In case, this is an issue due to DBFV1 and DBFSV2, kindly let me know how to fix this as this is causing issues in the subsequent lessons using Spark Streaming.
FYI, I have tried multiple times to execute the code by creating different clusters and it still shows the same error.
09-05-2022 04:48 AM
Autoloader is not supported on community edition.
09-09-2022 04:21 PM
Thank you for sharing this. I will mark this as the best response of this thread
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group