cancel
Showing results for 
Search instead for 
Did you mean: 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results for 
Search instead for 
Did you mean: 

Error while inserting data to unity catalog from AWS EMR (spark) for uniform enable table

adityapa
New Contributor II

Hi Everyone,

I am trying to write data to a delta table created on Unity Catalog with external location. I am using AWS EMR and below are my table and spark properties.

#### Spark Shell

```
spark-shell \
--conf "spark.sql.defaultCatalog=<catalog_name>" \
--conf "spark.sql.catalog.<catalog_name>.warehouse=<catalog_name>" \
--conf spark.databricks.unityCatalog.enabled=true \
--conf spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
--conf "spark.sql.catalog.<catalog_name>=io.unitycatalog.spark.UCSingleCatalog" \
--conf "spark.sql.catalog.<catalog_name>.type=rest" \
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" \
--conf "spark.sql.catalog.<catalog_name>.uri=https://${URI}/api/2.1/unity-catalog" \
--packages "org.apache.hadoop:hadoop-aws:3.4.1,org.apache.hadoop:hadoop-common:3.4.1,io.delta:delta-spark_2.12:3.2.1,io.unitycatalog:unitycatalog-spark_2.12:0.2.1,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.1,io.delta:delta-iceberg_2.12:3.3.2" \
--conf "spark.sql.catalog.<catalog_name>.credential=token" \
--conf "spark.sql.catalog.<catalog_name>.token=${DATABRICKS_TOKEN}" \
--conf "spark.hadoop.fs.s3a.endpoint=s3.us-west-1.amazonaws.com" \
--conf "spark.hadoop.fs.s3a.endpoint.region=us-west-1" \
--conf "spark.hadoop.fs.s3a.region=us-west-1" \
--conf "spark.databricks.delta.uniform.iceberg.sync.convert.enabled=true" \
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions" \
--conf "spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider"
```

#### Table Configs :

```
CREATE EXTERNAL TABLE poc_prod_adm.v1.table8 (a STRING, b STRING, c BIGINT, d BIGINT)
USING DELTA
PARTITIONED BY (a, b)
LOCATION 's3://<bucket>/<subfolder>/<catalog_name>/<schema_name>/table8'
TBLPROPERTIES (
'delta.columnMapping.mode' = 'name',
'delta.enableIcebergCompatV2' = 'true',
'delta.universalFormat.enabledFormats' = 'iceberg',
'delta.minReaderVersion' = 2,
'delta.minWriterVersion' = 5
);
```

-----

While inserting data from AWS EMR (Spark), I am getting following error :

```

scala> spark.sql("""INSERT INTO <catalog_name>.<schema_name>.table8 (
| a,
| b,
| c,
| d
| )
| VALUES (
| 'a',
| 'b',
| 20250820,
| 20250915,
| );""");
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
25/08/20 13:13:02 WARN SparkStringUtils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
25/08/20 13:13:17 WARN HiveConf: HiveConf of name hive.server2.thrift.url does not exist
25/08/20 13:13:18 WARN HiveConf: HiveConf of name hive.server2.thrift.url does not exist
25/08/20 13:13:18 ERROR IcebergConverter: Error when converting to Iceberg metadata
org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: [SCHEMA_NOT_FOUND] The schema `<schema_name>` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a catalog, verify the current_schema() output, or qualify the name with the correct catalog.
To tolerate the error on drop use DROP SCHEMA IF EXISTS.
```


#### Notes :
1. Our requirement is that we should be able to write the data from Spark (delta interface) and read from both delta and iceberg interface tools like spark, duckdb, trino etc.
2. We are using uniform table for our requirement and hence those properties are crucial.
2.1 Only `'delta.universalFormat.enabledFormats' = 'iceberg',` is required and other properties are added to support it (as they need to be enabled or are defaults)
3. Spark config `spark.databricks.delta.uniform.iceberg.sync.convert.enabled=true` is set to true as per the details mentioned in : https://github.com/delta-io/delta/blob/v3.3.2/spark/src/main/scala/org/apache/spark/sql/delta/source...


Any help is appreciated.

2 REPLIES 2

SP_6721
Contributor III

Hi @adityapa ,

Can you first confirm that EMR can actually see your catalog and schema? Try running:

spark.sql("SHOW CATALOGS").show(false)
spark.sql("SHOW SCHEMAS IN <catalog_name>").show(false)

adityapa
New Contributor II

Hi @SP_6721 ,

I am able to read data from Spark (as it is using delta logs) and able to view the schema/catalog details on Trino over EMR.

I am also able to write data to delta files in s3 using UC. However, the metadata/manifest file for iceberg is not getting updated causing the above mentioned issue.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now