cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

not able to create table from pyspark sql using databricks unity catalog open apis

dixonantony
New Contributor II

I was trying to access databricks and do DDL/DML operations using databricks unity catalog open apis. The create schema and select tables are working, but create table is not working due to below issues, could you please help?

I was using pyspark sql for the same as mentioned in 

https://community.databricks.com/t5/technical-blog/integrating-apache-spark-with-databricks-unity-ca...

spark.sql(f"create table datatest.dischema.demoTab1(id int, name VARCHAR(10), age int)").show()

Traceback (most recent call last):
File "/home/spark/shared/user-libs/spark/di-databricks.py", line 89, in <module>
main()
File "/home/spark/shared/user-libs/spark/di-databricks.py", line 77, in main
view_data(spark)
File "/home/spark/shared/user-libs/spark/di-databricks.py", line 70, in view_data
spark.sql(f"create table datatest.dischema.demoTab1(id int, name VARCHAR(10), age int)").show()
File "/opt/ibm/spark/python/pyspark/sql/session.py", line 1440, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
File "/opt/ibm/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
File "/opt/ibm/spark/python/pyspark/errors/exceptions/captured.py", line 169, in deco
return f(*a, **kw)
File "/opt/ibm/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o132.sql.
: java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:208)
at io.unitycatalog.spark.UCProxy.createTable(UCSingleCatalog.scala:299)
at org.apache.spark.sql.connector.catalog.DelegatingCatalogExtension.createTable(DelegatingCatalogExtension.java:102)
at org.apache.spark.sql.delta.catalog.DeltaCatalog.createCatalogTable(DeltaCatalog.scala:256)
at org.apache.spark.sql.delta.catalog.DeltaCatalog.$anonfun$createTable$1(DeltaCatalog.scala:289)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:140)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:138)
at org.apache.spark.sql.delta.catalog.DeltaCatalog.recordFrameProfile(DeltaCatalog.scala:57)
at org.apache.spark.sql.delta.catalog.DeltaCatalog.createTable(DeltaCatalog.scala:278)
at org.apache.spark.sql.delta.catalog.DeltaCatalog.createTable(DeltaCatalog.scala:269)
at io.unitycatalog.spark.UCSingleCatalog.createTable(UCSingleCatalog.scala:101)
at org.apache.spark.sql.execution.datasources.v2.CreateTableExec.run(CreateTableExec.scala:44)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:218)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:95)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:640)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:630)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:662)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:572)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:839)

(python) bash-5.1$

3 REPLIES 3

Alberto_Umana
Databricks Employee
Databricks Employee

Hello @dixonantony 

Can you try running this command?

spark.sql("create table datatest.dischema.demoTab1(id int, name VARCHAR(10), age int)")

Ensure that you have the necessary permissions to create tables in Unity Catalog. You need the CREATE TABLE privilege on the schema where you want to create the table, as well as the USE CATALOG and USE SCHEMA privileges on the parent catalog and schema, respectively

I think the "provider" assertion is failing, how to pass provider properties 

dixonantony_0-1732604389130.png

 

Also I have give all permissions for that user and I am trying it from my spark engine.

 

.config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.3.4,org.apache.hadoop:hadoop-common:3.3.4,io.delta:delta-spark_2.12:3.2.1,io.unitycatalog:unitycatalog-spark_2.12:0.2.0") \
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "io.unitycatalog.spark.UCSingleCatalog") \
.config("spark.hadoop.fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \
.config("spark.sql.catalog.datatest", "io.unitycatalog.spark.UCSingleCatalog") \
.config("spark.sql.adaptive.enabled", "true") \
.config("spark.sql.constraintPropagation.enabled", "false") \
.config("spark.sql.catalog.datatest.uri", "https://adb-34xxx827.7.azuredatabricks.net/api/2.1/unity-catalog") \
.config("spark.sql.catalog.datatest.token", "dapi0xx4b6d85cdf8") \
.config("spark.sql.defaultCatalog", "datatest") \
.config("fs.azure.account.key.dataxxcdixon.dfs.core.windows.net", "DuHZxxLGnyOSxxxxezyzoC+AStvF46+w==") \
.config("spark.hadoop.fs.azure.account.auth.type.dataxxdxixon.dfs.core.windows.net", "OAuth") \
.config("spark.hadoop.fs.azure.account.oauth.provider.type.databxxdixon.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider") \
.config("spark.hadoop.fs.azure.account.oauth2.client.id.datxxcdixon.dfs.core.windows.net", "xxx-xx-xx-xx-xx") \
.config("spark.hadoop.fs.azure.account.oauth2.client.secret.daxdixon.dfs.core.windows.net", "dapi0xxxxx85cdf8") \
.config("spark.hadoop.fs.azure.account.oauth2.client.endpoint.daxxdixon.dfs.core.windows.net", "https://login.microsoftonline.com/fxxxx0/oauth2/token") \
.getOrCreate()

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group