cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unable to create Iceberg tables pointing to data in S3 and run queries against the tables.

JohnsonBDSouza
New Contributor II

I need to to set up Iceberg tables in Databricks environment, but the data resides in an S3 bucket. Then read these tables by running SQL queries.

Databricks environment has access to S3. This is done by

  1. setting up the access by mapping the Instance Profile to the compute cluster
  2. AWS access key and secret key are used to connect via Spark code.

Note: Unity catalog has been enabled in our environment.

Access to S3 from databricks environment was tested by copying from S3 into DBFS. This operation was successful.

Tried to create Iceberg tables in running SQL commands  from SQL Editor and from Databricks notebook environment by running Python code and calling spark.sql()

However, we were unsuccessful in setting up Icebergs.

When PySpark code was run to create iceberg table by providing the location of S3 and access key and secret key, encountered an error โ€œData source format iceberg is not supported in Unit Catalogโ€ See below screenshot.

JohnsonBDSouza_0-1705982713662.png

When the code was run against Hive metastore

JohnsonBDSouza_1-1705982713665.png

I got a java exception โ€œIceberg is not valid Spark SQL data sourceโ€

Also, we tried iceberg and apache-iceber Python packages. That did not work as well.

Tried many things from various tech foruns including Demio and Community.databricks.com, but in vain.

References used:

https://www.dremio.com/blog/getting-started-with-apache-iceberg-in-databricks/

https://community.databricks.com/t5/data-engineering/reading-iceberg-table-present-in-s3-from-databr...

Cluster configurations:

JohnsonBDSouza_2-1705982713667.jpeg

JohnsonBDSouza_3-1705982713676.png

What support I need from Databricks community? 

  1. Detailed and specific steps to create Iceberg table and point to data in S3 via SQL or Pyspark code.
  2. List of libraries to attach to Compute resource, Spark variables and Environment variables to set.
  3. Configuration required on SQL Compute resource
  4. List of Python libraries required and location of repository.
1 REPLY 1

shan_chandra
Databricks Employee
Databricks Employee

@JohnsonBDSouza  - could you please let me know if you had a chance to review the Uniform feature that allows to create iceberg tables from the delta format. 

Based on what i could understand from the above, you can create a delta table and use the below example 

CREATE TABLE T
TBLPROPERTIES(
  'delta.columnMapping.mode' = 'name',
  'delta.universalFormat.enabledFormats' = 'iceberg')
AS
  SELECT * FROM source_table;

Please refer to the documentation on pre-requisites, configs to use and limitations associated with using uniform https://docs.databricks.com/en/delta/uniform.html

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group