cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Errors on Python API for DeltaTables on Databricks Spark 10.4 LST

maartenvr
New Contributor III

Hi,

I am using an (Azure) Databricks Compute cluster in a Jupyter notebook using the Databricks connect Python package. Our cluster is on Databrick runtime 10.4 LST and coherently I am using the databricks-connect==10.4.18.

In my notebook I am able to correctly load spark as well as the delta library.

The trouble starts when I try to read one or our tables as a DeltaTable.

When I try to read a DeltaTable from our storage, it complains that the forPath method does not exist, even though my notebook finds the method with type hinting.

I am getting the following error message:

---------------------------------------------------------------------------
Py4JError                                 Traceback (most recent call last)
Cell In[18], line 1
----> 1 DeltaTable.forPath(spark, "abfss://dev@*****.dfs.core.windows.net/example")
 
File ~\Documents\Trainings\delta_lake\.venv\lib\site-packages\delta\tables.py:364, in DeltaTable.forPath(cls, sparkSession, path, hadoopConf)
    361 jvm: "JVMView" = sparkSession._sc._jvm  # type: ignore[attr-defined]
    362 jsparkSession: "JavaObject" = sparkSession._jsparkSession  # type: ignore[attr-defined]
--> 364 jdt = jvm.io.delta.tables.DeltaTable.forPath(jsparkSession, path, hadoopConf)
    365 return DeltaTable(sparkSession, jdt)
 
File ~\Documents\Trainings\delta_lake\.venv\lib\site-packages\py4j\java_gateway.py:1304, in JavaMember.__call__(self, *args)
   1298 command = proto.CALL_COMMAND_NAME +\
   1299     self.command_header +\
   1300     args_command +\
   1301     proto.END_COMMAND_PART
   1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
   1305     answer, self.gateway_client, self.target_id, self.name)
   1307 for temp_arg in temp_args:
   1308     temp_arg._detach()
 
File ~\Documents\Trainings\delta_lake\.venv\lib\site-packages\pyspark\sql\utils.py:117, in capture_sql_exception..deco(*a, **kw)
    115 def deco(*a, **kw):
    116     try:
--> 117         return f(*a, **kw)
    118     except py4j.protocol.Py4JJavaError as e:
    119         converted = convert_exception(e.java_exception)
 
File ~\Documents\Trainings\delta_lake\.venv\lib\site-packages\py4j\protocol.py:330, in get_return_value(answer, gateway_client, target_id, name)
    326         raise Py4JJavaError(
    327             "An error occurred while calling {0}{1}{2}.\n".
    328             format(target_id, ".", name), value)
    329     else:
--> 330         raise Py4JError(
    331             "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332             format(target_id, ".", name, value))
    333 else:
    334     raise Py4JError(
    335         "An error occurred while calling {0}{1}{2}".
    336         format(target_id, ".", name))
 
Py4JError: An error occurred while calling z:io.delta.tables.DeltaTable.forPath. Trace:
py4j.Py4JException: Method forPath([class org.apache.spark.sql.SparkSession, class java.lang.String, class java.util.HashMap]) does not exist
	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:341)
	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:362)
	at py4j.Gateway.invoke(Gateway.java:289)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
	at java.lang.Thread.run(Thread.java:748)

The following code example is to clarify what works, and what doesn't:

import pyspark
from delta.tables import *
 
# Works, to show that I it's a valid Delta table
df_example = spark.sql("SELECT * FROM dev.example") 
 
 # Works, but not what I want: returns a normal Spark DF, I want a Delta table object:
df_example = spark.read.format("delta").load("abfss://dev@*****.dfs.core.windows.net/example") 
 
# Gives an error, even though my notebook recognizes the forPath command for type hints:
delta_example = DeltaTable.forPath(spark, "abfss://dev@*****.dfs.core.windows.net/example")
 
# Also throw an error:
delta_example = DeltaTable.forName(spark, "dev.example")

Does anybody know what the problem is?

1 ACCEPTED SOLUTION

Accepted Solutions

Murthy1
Contributor II

Hello @Maarten van Raaij​ ,

According to the documentation (https://docs.databricks.com/dev-tools/databricks-connect.html#limitations) - This is a limitation of databricks connect. Unfortunately, you have to work with spark.sql or dataframes

View solution in original post

5 REPLIES 5

Murthy1
Contributor II

Hello @Maarten van Raaij​ ,

According to the documentation (https://docs.databricks.com/dev-tools/databricks-connect.html#limitations) - This is a limitation of databricks connect. Unfortunately, you have to work with spark.sql or dataframes

maartenvr
New Contributor III

Hello @Murthy Ramalingam​ ,

That explains! Thank you for the quick answer.

Would it work if I am using a notebook from within the Databricks environment, or does that also require me to use databricks connect?

Hello @Maarten van Raaij​ ,

If you are using a databricks notebook - you do not require databricks connect.

Also the functioning of DeltaTable.forPath is super smooth. Having said that, your databricks notebook should connect to the path in Azure blob storage.

maartenvr
New Contributor III

Perfect! Thanks for the help. 👍

Anonymous
Not applicable

Hi @Maarten van Raaij​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group