Databricks Community

maartenvr · ‎02-10-2023

Hi,

I am using an (Azure) Databricks Compute cluster in a Jupyter notebook using the Databricks connect Python package. Our cluster is on Databrick runtime 10.4 LST and coherently I am using the databricks-connect==10.4.18.

In my notebook I am able to correctly load spark as well as the delta library.

The trouble starts when I try to read one or our tables as a DeltaTable.

When I try to read a DeltaTable from our storage, it complains that the forPath method does not exist, even though my notebook finds the method with type hinting.

I am getting the following error message:

---------------------------------------------------------------------------
Py4JError                                 Traceback (most recent call last)
Cell In[18], line 1
----> 1 DeltaTable.forPath(spark, "abfss://dev@*****.dfs.core.windows.net/example")
 
File ~\Documents\Trainings\delta_lake\.venv\lib\site-packages\delta\tables.py:364, in DeltaTable.forPath(cls, sparkSession, path, hadoopConf)
    361 jvm: "JVMView" = sparkSession._sc._jvm  # type: ignore[attr-defined]
    362 jsparkSession: "JavaObject" = sparkSession._jsparkSession  # type: ignore[attr-defined]
--> 364 jdt = jvm.io.delta.tables.DeltaTable.forPath(jsparkSession, path, hadoopConf)
    365 return DeltaTable(sparkSession, jdt)
 
File ~\Documents\Trainings\delta_lake\.venv\lib\site-packages\py4j\java_gateway.py:1304, in JavaMember.__call__(self, *args)
   1298 command = proto.CALL_COMMAND_NAME +\
   1299     self.command_header +\
   1300     args_command +\
   1301     proto.END_COMMAND_PART
   1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
   1305     answer, self.gateway_client, self.target_id, self.name)
   1307 for temp_arg in temp_args:
   1308     temp_arg._detach()
 
File ~\Documents\Trainings\delta_lake\.venv\lib\site-packages\pyspark\sql\utils.py:117, in capture_sql_exception..deco(*a, **kw)
    115 def deco(*a, **kw):
    116     try:
--> 117         return f(*a, **kw)
    118     except py4j.protocol.Py4JJavaError as e:
    119         converted = convert_exception(e.java_exception)
 
File ~\Documents\Trainings\delta_lake\.venv\lib\site-packages\py4j\protocol.py:330, in get_return_value(answer, gateway_client, target_id, name)
    326         raise Py4JJavaError(
    327             "An error occurred while calling {0}{1}{2}.\n".
    328             format(target_id, ".", name), value)
    329     else:
--> 330         raise Py4JError(
    331             "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332             format(target_id, ".", name, value))
    333 else:
    334     raise Py4JError(
    335         "An error occurred while calling {0}{1}{2}".
    336         format(target_id, ".", name))
 
Py4JError: An error occurred while calling z:io.delta.tables.DeltaTable.forPath. Trace:
py4j.Py4JException: Method forPath([class org.apache.spark.sql.SparkSession, class java.lang.String, class java.util.HashMap]) does not exist
	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:341)
	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:362)
	at py4j.Gateway.invoke(Gateway.java:289)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
	at java.lang.Thread.run(Thread.java:748)

The following code example is to clarify what works, and what doesn't:

import pyspark
from delta.tables import *
 
# Works, to show that I it's a valid Delta table
df_example = spark.sql("SELECT * FROM dev.example") 
 
 # Works, but not what I want: returns a normal Spark DF, I want a Delta table object:
df_example = spark.read.format("delta").load("abfss://dev@*****.dfs.core.windows.net/example") 
 
# Gives an error, even though my notebook recognizes the forPath command for type hints:
delta_example = DeltaTable.forPath(spark, "abfss://dev@*****.dfs.core.windows.net/example")
 
# Also throw an error:
delta_example = DeltaTable.forName(spark, "dev.example")

Does anybody know what the problem is?

Murthy1 · ‎02-10-2023

Hello @Maarten van Raaij ,

According to the documentation (https://docs.databricks.com/dev-tools/databricks-connect.html#limitations) - This is a limitation of databricks connect. Unfortunately, you have to work with spark.sql or dataframes

View solution in original post

Murthy1 · ‎02-10-2023

Hello @Maarten van Raaij ,

According to the documentation (https://docs.databricks.com/dev-tools/databricks-connect.html#limitations) - This is a limitation of databricks connect. Unfortunately, you have to work with spark.sql or dataframes

maartenvr · ‎02-10-2023

Hello @Murthy Ramalingam ,

That explains! Thank you for the quick answer.

Would it work if I am using a notebook from within the Databricks environment, or does that also require me to use databricks connect?

Murthy1 · ‎02-10-2023

Hello @Maarten van Raaij ,

If you are using a databricks notebook - you do not require databricks connect.

Also the functioning of DeltaTable.forPath is super smooth. Having said that, your databricks notebook should connect to the path in Azure blob storage.

maartenvr · ‎02-10-2023

Perfect! Thanks for the help. 👍

Anonymous · ‎02-12-2023

Hi @Maarten van Raaij

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.

We'd love to hear from you.

Thanks!