02-10-2023 01:58 AM
Hi,
I am using an (Azure) Databricks Compute cluster in a Jupyter notebook using the Databricks connect Python package. Our cluster is on Databrick runtime 10.4 LST and coherently I am using the databricks-connect==10.4.18.
In my notebook I am able to correctly load spark as well as the delta library.
The trouble starts when I try to read one or our tables as a DeltaTable.
When I try to read a DeltaTable from our storage, it complains that the forPath method does not exist, even though my notebook finds the method with type hinting.
I am getting the following error message:
---------------------------------------------------------------------------
Py4JError Traceback (most recent call last)
Cell In[18], line 1
----> 1 DeltaTable.forPath(spark, "abfss://dev@*****.dfs.core.windows.net/example")
File ~\Documents\Trainings\delta_lake\.venv\lib\site-packages\delta\tables.py:364, in DeltaTable.forPath(cls, sparkSession, path, hadoopConf)
361 jvm: "JVMView" = sparkSession._sc._jvm # type: ignore[attr-defined]
362 jsparkSession: "JavaObject" = sparkSession._jsparkSession # type: ignore[attr-defined]
--> 364 jdt = jvm.io.delta.tables.DeltaTable.forPath(jsparkSession, path, hadoopConf)
365 return DeltaTable(sparkSession, jdt)
File ~\Documents\Trainings\delta_lake\.venv\lib\site-packages\py4j\java_gateway.py:1304, in JavaMember.__call__(self, *args)
1298 command = proto.CALL_COMMAND_NAME +\
1299 self.command_header +\
1300 args_command +\
1301 proto.END_COMMAND_PART
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1307 for temp_arg in temp_args:
1308 temp_arg._detach()
File ~\Documents\Trainings\delta_lake\.venv\lib\site-packages\pyspark\sql\utils.py:117, in capture_sql_exception..deco(*a, **kw)
115 def deco(*a, **kw):
116 try:
--> 117 return f(*a, **kw)
118 except py4j.protocol.Py4JJavaError as e:
119 converted = convert_exception(e.java_exception)
File ~\Documents\Trainings\delta_lake\.venv\lib\site-packages\py4j\protocol.py:330, in get_return_value(answer, gateway_client, target_id, name)
326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
--> 330 raise Py4JError(
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
332 format(target_id, ".", name, value))
333 else:
334 raise Py4JError(
335 "An error occurred while calling {0}{1}{2}".
336 format(target_id, ".", name))
Py4JError: An error occurred while calling z:io.delta.tables.DeltaTable.forPath. Trace:
py4j.Py4JException: Method forPath([class org.apache.spark.sql.SparkSession, class java.lang.String, class java.util.HashMap]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:341)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:362)
at py4j.Gateway.invoke(Gateway.java:289)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:748)
The following code example is to clarify what works, and what doesn't:
import pyspark
from delta.tables import *
# Works, to show that I it's a valid Delta table
df_example = spark.sql("SELECT * FROM dev.example")
# Works, but not what I want: returns a normal Spark DF, I want a Delta table object:
df_example = spark.read.format("delta").load("abfss://dev@*****.dfs.core.windows.net/example")
# Gives an error, even though my notebook recognizes the forPath command for type hints:
delta_example = DeltaTable.forPath(spark, "abfss://dev@*****.dfs.core.windows.net/example")
# Also throw an error:
delta_example = DeltaTable.forName(spark, "dev.example")
Does anybody know what the problem is?
02-10-2023 02:11 AM
Hello @Maarten van Raaij ,
According to the documentation (https://docs.databricks.com/dev-tools/databricks-connect.html#limitations) - This is a limitation of databricks connect. Unfortunately, you have to work with spark.sql or dataframes
02-10-2023 02:11 AM
Hello @Maarten van Raaij ,
According to the documentation (https://docs.databricks.com/dev-tools/databricks-connect.html#limitations) - This is a limitation of databricks connect. Unfortunately, you have to work with spark.sql or dataframes
02-10-2023 02:13 AM
Hello @Murthy Ramalingam ,
That explains! Thank you for the quick answer.
Would it work if I am using a notebook from within the Databricks environment, or does that also require me to use databricks connect?
02-10-2023 02:18 AM
Hello @Maarten van Raaij ,
If you are using a databricks notebook - you do not require databricks connect.
Also the functioning of DeltaTable.forPath is super smooth. Having said that, your databricks notebook should connect to the path in Azure blob storage.
02-10-2023 02:20 AM
Perfect! Thanks for the help. 👍
02-12-2023 10:52 PM
Hi @Maarten van Raaij
Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group