cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

INTERNAL_ERROR occurred while converting Iceberg format table to Delta format using Spark

hskimskydd
Databricks Partner

I used Apache Spark to write an iceberg table to Amazon S3.
I then ran the code below to convert the iceberg table to delta, and the following exception occurred:

```python
spark.sql('convert to delta iceberg.`s3a://BUCKET/path/to/table_name/`')
```

```text
pyspark.errors.exceptions.connect.SparkException: [INTERNAL_ERROR] Eagerly executed command failed. You encountered a bug in Spark or the Spark plugins you use. Please report this bug to the corresponding community or vendor and provide the full stack trace. SQLSTATE: XX000
```

I'm leaving this message because they told me to contact the community or vendor.

```text

>>> result = spark.sql('convert to delta iceberg.`s3a://BUCKET/path/to/iceberg_table/`')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/spark/spark-4.1.1-bin-hadoop3-connect/python/pyspark/sql/connect/session.py", line 828, in sql
    data, properties, ei = self.client.execute_command(cmd.command(self._client))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/spark/spark-4.1.1-bin-hadoop3-connect/python/pyspark/sql/connect/client/core.py", line 1195, in execute_command
    data, _, metrics, observed_metrics, properties = self._execute_and_fetch(
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/spark/spark-4.1.1-bin-hadoop3-connect/python/pyspark/sql/connect/client/core.py", line 1697, in _execute_and_fetch
    for response in self._execute_and_fetch_as_iterator(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/spark/spark-4.1.1-bin-hadoop3-connect/python/pyspark/sql/connect/client/core.py", line 1674, in _execute_and_fetch_as_iterator
    self._handle_error(error)
  File "/usr/local/spark/spark-4.1.1-bin-hadoop3-connect/python/pyspark/sql/connect/client/core.py", line 1982, in _handle_error
    self._handle_rpc_error(error)
  File "/usr/local/spark/spark-4.1.1-bin-hadoop3-connect/python/pyspark/sql/connect/client/core.py", line 2066, in _handle_rpc_error
    raise convert_exception(
pyspark.errors.exceptions.connect.SparkException: [INTERNAL_ERROR] Eagerly executed command failed. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace. SQLSTATE: XX000
 
JVM stacktrace:
org.apache.spark.SparkException
at org.apache.spark.SparkException$.internalError(SparkException.scala:107)
at org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:706)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:719)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$eagerlyExecute$1(QueryExecution.scala:184)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$3.applyOrElse(QueryExecution.scala:201)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$3.applyOrElse(QueryExecution.scala:194)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:491)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:107)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:491)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:37)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:360)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:356)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:37)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:37)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:467)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:194)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$lazyCommandExecuted$1(QueryExecution.scala:155)
at scala.util.Try$.apply(Try.scala:217)
at org.apache.spark.util.Utils$.doTryWithCallerStacktrace(Utils.scala:1392)
at org.apache.spark.util.Utils$.getTryWithCallerStacktrace(Utils.scala:1453)
at org.apache.spark.util.LazyTry.get(LazyTry.scala:58)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:160)
at org.apache.spark.sql.classic.Dataset.<init>(Dataset.scala:276)
at org.apache.spark.sql.classic.Dataset$.$anonfun$ofRows$5(Dataset.scala:139)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
at org.apache.spark.sql.classic.Dataset$.ofRows(Dataset.scala:135)
at org.apache.spark.sql.classic.SparkSession.$anonfun$sql$4(SparkSession.scala:584)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:561)
at org.apache.spark.sql.connect.planner.SparkConnectPlanner.executeSQL(SparkConnectPlanner.scala:3148)
at org.apache.spark.sql.connect.planner.SparkConnectPlanner.handleSqlCommand(SparkConnectPlanner.scala:2996)
at org.apache.spark.sql.connect.planner.SparkConnectPlanner.process(SparkConnectPlanner.scala:2830)
at org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.handlePlan(SparkConnectPlanExecution.scala:96)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1(ExecuteThreadRunner.scala:225)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1$adapted(ExecuteThreadRunner.scala:197)
at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$2(SessionHolder.scala:396)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$1(SessionHolder.scala:396)
at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
at org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:112)
at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:185)
at org.apache.spark.sql.artifact.ArtifactManager.withClassLoaderIfNeeded(ArtifactManager.scala:102)
at org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:111)
at org.apache.spark.sql.connect.service.SessionHolder.withSession(SessionHolder.scala:395)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.executeInternal(ExecuteThreadRunner.scala:197)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.org$apache$spark$sql$connect$execution$ExecuteThreadRunner$$execute(ExecuteThreadRunner.scala:126)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner$ExecutionThread.run(ExecuteThreadRunner.scala:334)
Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.catalog.CatalogPlugin.name()" because "this.delegate" is null
at org.apache.spark.sql.connector.catalog.DelegatingCatalogExtension.name(DelegatingCatalogExtension.java:50)
at org.apache.spark.sql.catalyst.analysis.RelationResolution.toCacheKey(RelationResolution.scala:310)
at org.apache.spark.sql.catalyst.analysis.RelationResolution.$anonfun$resolveRelation$2(RelationResolution.scala:119)
at scala.Option.orElse(Option.scala:477)
at org.apache.spark.sql.catalyst.analysis.RelationResolution.resolveRelation(RelationResolution.scala:117)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:1323)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$12.applyOrElse(Analyzer.scala:1244)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$12.applyOrElse(Analyzer.scala:1205)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:139)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:107)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:139)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:416)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:135)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:131)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:37)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:1205)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:1167)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:248)
at scala.collection.LinearSeqOps.foldLeft(LinearSeq.scala:183)
at scala.collection.LinearSeqOps.foldLeft$(LinearSeq.scala:179)
at scala.collection.immutable.List.foldLeft(List.scala:79)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:245)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:237)
at scala.collection.immutable.List.foreach(List.scala:323)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:237)
at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:343)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:339)
at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:224)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:339)
at org.apache.spark.sql.delta.commands.DeltaCommand.resolveIdentifier(DeltaCommand.scala:187)
at org.apache.spark.sql.delta.commands.DeltaCommand.resolveIdentifier$(DeltaCommand.scala:186)
at org.apache.spark.sql.delta.commands.ConvertToDeltaCommandBase.resolveIdentifier(ConvertToDeltaCommand.scala:70)
at org.apache.spark.sql.delta.commands.DeltaCommand.isCatalogTable(DeltaCommand.scala:199)
at org.apache.spark.sql.delta.commands.DeltaCommand.isCatalogTable$(DeltaCommand.scala:197)
at org.apache.spark.sql.delta.commands.ConvertToDeltaCommandBase.isCatalogTable(ConvertToDeltaCommand.scala:199)
at org.apache.spark.sql.delta.commands.ConvertToDeltaCommandBase.resolveConvertTarget(ConvertToDeltaCommand.scala:125)
at org.apache.spark.sql.delta.commands.ConvertToDeltaCommandBase.run(ConvertToDeltaCommand.scala:97)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:79)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:77)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:88)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$2(QueryExecution.scala:185)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$8(SQLExecution.scala:177)
at org.apache.spark.sql.execution.SQLExecution$.withSessionTagsApplied(SQLExecution.scala:285)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$7(SQLExecution.scala:139)
at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
at org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:112)
at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:185)
at org.apache.spark.sql.artifact.ArtifactManager.withClassLoaderIfNeeded(ArtifactManager.scala:102)
at org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:111)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$6(SQLExecution.scala:139)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:308)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:138)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId0(SQLExecution.scala:92)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:250)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:185)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:717)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$eagerlyExecute$1(QueryExecution.scala:184)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$3.applyOrElse(QueryExecution.scala:201)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$3.applyOrElse(QueryExecution.scala:194)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:491)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:107)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:491)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:37)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:360)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:356)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:37)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:37)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:467)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:194)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$lazyCommandExecuted$1(QueryExecution.scala:155)
at scala.util.Try$.apply(Try.scala:217)
at org.apache.spark.util.Utils$.doTryWithCallerStacktrace(Utils.scala:1392)
at org.apache.spark.util.LazyTry.tryT$lzycompute(LazyTry.scala:46)
at org.apache.spark.util.LazyTry.tryT(LazyTry.scala:46)
at org.apache.spark.util.LazyTry.get(LazyTry.scala:58)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:160)
at org.apache.spark.sql.classic.Dataset.<init>(Dataset.scala:276)
at org.apache.spark.sql.classic.Dataset$.$anonfun$ofRows$5(Dataset.scala:139)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
at org.apache.spark.sql.classic.Dataset$.ofRows(Dataset.scala:135)
at org.apache.spark.sql.classic.SparkSession.$anonfun$sql$4(SparkSession.scala:584)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:561)
at org.apache.spark.sql.connect.planner.SparkConnectPlanner.executeSQL(SparkConnectPlanner.scala:3148)
at org.apache.spark.sql.connect.planner.SparkConnectPlanner.handleSqlCommand(SparkConnectPlanner.scala:2996)
at org.apache.spark.sql.connect.planner.SparkConnectPlanner.process(SparkConnectPlanner.scala:2830)
at org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.handlePlan(SparkConnectPlanExecution.scala:96)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1(ExecuteThreadRunner.scala:225)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1$adapted(ExecuteThreadRunner.scala:197)
at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$2(SessionHolder.scala:396)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$1(SessionHolder.scala:396)
at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
at org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:112)
at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:185)
at org.apache.spark.sql.artifact.ArtifactManager.withClassLoaderIfNeeded(ArtifactManager.scala:102)
at org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:111)
at org.apache.spark.sql.connect.service.SessionHolder.withSession(SessionHolder.scala:395)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.executeInternal(ExecuteThreadRunner.scala:197)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.org$apache$spark$sql$connect$execution$ExecuteThreadRunner$$execute(ExecuteThreadRunner.scala:126)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner$ExecutionThread.run(ExecuteThreadRunner.scala:334)
>>> 

```

2 REPLIES 2

mderela
Contributor

Quick question before diving in: are you running this through Spark Connect? The binary name spark-4.1.1-bin-hadoop3-connect suggests yes, and the stack trace confirms it. The NullPointerException on DelegatingCatalogExtension.name() because “this.delegate” is null points to the Iceberg catalog plugin not initializing on the server side.
Which version of delta-iceberg jar are you including? This matters because iceberg-spark-runtime has to be compiled against a specific Spark version and this exact pattern has happened before. Delta 2.4.0 docs note that “delta-iceberg is currently not available… since iceberg-spark-runtime does not support Spark 3.4 yet”, and the Delta 4.0.1 release notes explicitly state that “hudi and iceberg are currently not compatible with Spark 4.1, as support depends on upcoming releases.” (github.com/delta-io/delta/releases)
Does downgrading to Spark 3.5 with Delta 3.3.x change anything?

Louis_Frolio
Databricks Employee
Databricks Employee

Hi @hskimskydd 

Thanks for sharing the full stack trace — that helps a lot.

What you're seeing looks like a bug at first glance, but there are a few things working together under the surface.

The NullPointerException coming out of DelegatingCatalogExtension.name() (this.delegate being null) is a strong signal that the Iceberg catalog plugin isn't initializing correctly. That's usually not a simple config issue — it points toward a mismatch between Spark, the Iceberg runtime, and the Delta-Iceberg integration layer.

Now layer in your environment. You're on Spark 4.1.1 using Spark Connect. That combination isn't fully supported for Delta-Iceberg interoperability right now. Iceberg's Spark runtime hasn't caught up with Spark 4.1 yet, and Delta's Iceberg integration depends on that alignment. When those pieces don't line up, you get exactly what you're seeing — internal errors instead of a clean "not supported" message.

It naturally follows that the specific pattern you're using — CONVERT TO DELTA against a path-based Iceberg table — is already outside the most stable workflows. Iceberg integrations tend to behave more predictably when they're catalog-driven, and conversion commands are especially sensitive to catalog resolution and metadata alignment. Combine that with an unsupported runtime and things break in non-obvious ways.

So where does that leave you? A few options worth considering:

  1. Move the conversion to a version set where Spark, Delta, and Iceberg are known to work together (Spark 3.5.x with matching runtimes, for example). That alone resolves a lot of these issues.

  2. If you'd rather stay where you are, skip CONVERT TO DELTA and just rewrite the data:

df = spark.read.format("iceberg").load("s3a://…") df.write.format("delta").save("s3a://…/delta-table")

More explicit, but it gives you full control and sidesteps the fragile integration layer.

  1. If Spark 4.1 is a hard requirement, do the conversion in a supported environment and then point your Spark 4.1 workload at the resulting Delta table.

Bottom line — this isn't a random Spark failure. It's what happens when an unsupported runtime combination meets a conversion path that's already a bit outside the happy path. Line those pieces up and the problem usually goes away.

Cheers, Louis