08-09-2023 08:04 AM - edited 08-09-2023 08:04 AM
Hi,
I'm migrating my workspaces to Unity Catalog and the application to use three-level notation. (catalog.database.table)
See: Tutorial: Delta Lake | Databricks on AWS
I'm having the following exception when trying to use DeltaTable.forName(string name) or DeltaTable.tableName(string name) with three-level notation such as catalog.database.table :
org.apache.spark.sql.catalyst.parser.ParseException :
[PARSE_SYNTAX_ERROR] Syntax error at or near '.'.(line 1, pos 18)
== SQL ==
spark_catalog.Gold.FactSyPerson
------------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:306)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:144)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:52)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseTableIdentifier(ParseDriver.scala:54)
at io.delta.sql.parser.DeltaSqlParser.parseTableIdentifier(DeltaSqlParser.scala:138)
at io.delta.tables.DeltaTable$.forName(DeltaTable.scala:783)
at io.delta.tables.DeltaTable$.forName(DeltaTable.scala:770)
Looks like it's not supported yet, could you please help? Is there a workaround?
Thank you.
08-09-2023 12:50 PM
Hey @Ludo I am able to access the table using 3 level namespace.
make sure
1. you are using a Unity catalog-enabled cluster
2. try using the latest DBR
10-09-2024 08:08 AM - edited 10-09-2024 08:23 AM
Hi guys, I also faced the same problem but with Pyspark API inside unit tests, mainly the problem happens when we are building standalone spark session with support of delta extesnsion, inside pyspark this still does not work,
pyspark="3.5.3"
delta-spark="3.2.1"
for example I have fixture like this
@pytest.mark.usefixtures("spark_session")
def test_join_operation_with_catalog(self, spark_session: SparkSession):
source_schema = StructType([
StructField("id", StringType(), True),
StructField("derived_column", StringType(), True),
StructField("filter_column", StringType(), True)
])
spark_session.sql("CREATE SCHEMA source")
spark_session.sql("DROP TABLE IF EXISTS spark_catalog.source.source_table_join")
spark_session.catalog.setCurrentCatalog("spark_catalog")
spark_session.catalog.setCurrentDatabase("source")
DeltaTable.createOrReplace(spark_session).tableName("source_table_join").addColumns(
source_schema).execute()
try:
print(DeltaTable.forName(spark_session, "source.source_table_join").toDF().collect())
print('SUCCESS')
except Exception as err:
print("FAILURE")
print(err)
@pytest.fixture(scope="session")
def spark_session():
shutil.rmtree("spark-warehouse", ignore_errors=True)
os.environ['PYSPARK_PYTHON'] = sys.executable
os.environ['PYSPARK_DRIVER_PYTHON'] = sys.executable
builder = (SparkSession.builder
.master("local[*]")
.config("spark.jars.packages", "io.delta:delta-core_2.12:2.3.0")
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
.config("spark.databricks.delta.schema.autoMerge.enabled", "true")
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
.appName("test")
)
yield configure_spark_with_delta_pip(builder).getOrCreate()
shutil.rmtree("spark-warehouse", ignore_errors=True)
at the end it will thrown with Parsing exception like
[PARSE_SYNTAX_ERROR] Syntax error at or near '.'.(line 1, pos 20)
== SQL ==
spark_catalog.source.source_table_join
--------------------^^^
and
DeltaTable.createOrReplace also does not work if I use full qualified name catalog.schema.table
11-20-2024 05:39 AM
Same here:
pyspark="3.5.3"
delta-spark="3.2.1"
spark.sql("SHOW CATALOGS").show()
spark.sql("SHOW TABLES IN mycatalog.myschema").show()
spark.sql("DESCRIBE mycatalog.myschema.mytable").show()
Code above is working with the three-level notation. (catalog.database.table)
When using the three-level notation with following code, the same error occurs:
from delta import DeltaTable
dt = DeltaTable.forName(spark, "mycatalog.myschema.mytable")
Traceback (most recent call last):
File "/workspace/test.py", line 13, in <module>
dt = DeltaTable.forName(spark, "mycatalog.myschema.mytable")
File "/venv/lib/python3.10/site-packages/delta/tables.py", line 419, in forName
jdt = jvm.io.delta.tables.DeltaTable.forName(jsparkSession, tableOrViewName)
File "/venv/lib/python3.10/site-packages/py4j/java_gateway.py", line 1322, in __call__
return_value = get_return_value(
File "/venv/lib/python3.10/site-packages/pyspark/errors/exceptions/captured.py", line 185, in deco
raise converted from None
pyspark.errors.exceptions.captured.ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near '.'.(line 1, pos 11)
== SQL ==
mycatalog.myschema.mytable
------------------^^^
08-10-2023 06:16 AM
Thank you for the quick feedback @saipujari_spark
Indeed, it's working great within a notebook with Databricks Runtime 13.2 which most likely has a custom behavior for unity catalog.
It's not working in my scala application running in local with direct use of delta-core libraries 2.4.0. (no databricks runtime locally) There is a missing piece of software within delta-core libraries I guess.
I hope it will be updated soon if my understanding is correct.
Nevertheless, spark libraries are ready thanks to this MR => [SPARK-39235] Make Catalog API be compatible with 3-layer-namespace - ASF JIRA (apache.org)
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group