Databricks

LearnerShahid · ‎09-02-2022

Below function executes fine:

def autoload_to_table(data_source, source_format, table_name, checkpoint_directory):

query = (spark.readStream

.format("cloudFiles")

.option("cloudFiles.format", source_format)

.option("cloudFiles.schemaLocation", checkpoint_directory)

.load(data_source)

.writeStream

.option("checkpointLocation", checkpoint_directory)

.option("mergeSchema", "true")

.table(table_name))

return query

Receiving Error while calling function. Please let me know where is the problem.

Anonymous · ‎09-05-2022

Autoloader is not supported on community edition.

View solution in original post

Prabakar · ‎09-02-2022

Hi @Shahid Akhter could you please copy the full error stack and paste it here.

LearnerShahid · ‎09-02-2022

java.lang.UnsupportedOperationException: com.databricks.backend.daemon.data.client.DBFSV1.resolvePathOnPhysicalStorage(path: Path)

---------------------------------------------------------------------------

Py4JJavaError Traceback (most recent call last)

<command-1919808831305181> in <module>

----> 1 query = autoload_to_table(data_source = f"{DA.paths.working_dir}/tracker",

2 source_format = "json",

3 table_name = "target_table",

4 checkpoint_directory = f"{DA.paths.checkpoints}/target_table")

<command-1919808831305179> in autoload_to_table(data_source, source_format, table_name, checkpoint_directory)

1 def autoload_to_table(data_source, source_format, table_name, checkpoint_directory):

----> 2 query = (spark.readStream

3 .format("cloudFiles")

4 .option("cloudFiles.format", source_format)

5 .option("cloudFiles.schemaLocation", checkpoint_directory)

/databricks/spark/python/pyspark/sql/streaming.py in load(self, path, format, schema, **options)

450 raise ValueError("If the path is provided for stream, it needs to be a " +

451 "non-empty string. List of paths are not supported.")

--> 452 return self._df(self._jreader.load(path))

453 else:

454 return self._df(self._jreader.load())

Cedric · ‎09-02-2022

Hi @LearnerShahid,

I've tested this on my own environment and I am able to reproduce this as well when I'm using DBR 7.3. Could you please try DBR 10.4+ and see if that solves the issue?

LearnerShahid · ‎09-02-2022

Hi Cedric/Team,

I am using Databricks Community Edition - 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12).

Can you please confirm if you have tried on Community Edition? As per details on Community Edition Cluster Environment Details, it runs on DBFSV1 (screenshot attached) and not the latest DBFSV2. In case, this is an issue due to DBFV1 and DBFSV2, kindly let me know how to fix this as this is causing issues in the subsequent lessons using Spark Streaming.

FYI, I have tried multiple times to execute the code by creating different clusters and it still shows the same error.