cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Are you able to create a iceberg table natively in Databricks?

petergriffin1
New Contributor II

Been trying to create a iceberg table natively in databricks with the cluster being 16.4. I also have the Iceberg JAR file for 3.5.2 Spark.

Using a simple command such as:

%sql
CREATE OR REPLACE TABLE catalog1.default.iceberg(
    a INT
)
USING iceberg;

 is running into a error of: "Failed to find the data source: iceberg. Make sure the provider name is correct and the package is properly registered and compatible with your Spark version".

My question is can we build these iceberg tables natively in Databricks (assuming private preview is also turned on + JAR file is loaded) or do we have to use a external client to build it and then push it to Databricks somehow? Or is it just specific formats (Parquet, etc) thats allowed?

1 ACCEPTED SOLUTION

Accepted Solutions

BigRoux
Databricks Employee
Databricks Employee
Databricks supports creating and working with Apache Iceberg tables natively under specific conditions. Managed Iceberg tables in Unity Catalog can be created directly using Databricks Runtime 16.4 LTS or newer. The necessary setup requires enabling the Managed Iceberg private preview and adhering to specified requirements, such as enabling Unity Catalog and ensuring external schema access permissions. Once configured, users can create these tables using SQL commands like CREATE OR REPLACE TABLE <catalog>.<schema>.<table> with USING iceberg.
 
However, for external (foreign) Iceberg tables where metadata is managed outside Databricks (e.g., in Glue or Snowflake catalogs), Databricks only allows read access. Additionally, Iceberg tables written by third-party tools remain read-only in Databricks. In contrast, Managed Iceberg tables in Databricks allow writing through Databricks-specific integrations using Iceberg REST Catalog APIs, enabling interoperability with external Iceberg clients like Spark, Flink, and Trino.
 
For clusters running Apache Spark 3.5.2, an Iceberg JAR compatible with Spark must be loaded alongside configuration steps, including setting proper extensions and specifying Iceberg catalog details. Without proper configuration, errors like the one encountered (Failed to find the data source: iceberg) may arise. For optimal compatibility and functionality, users should follow Databricks' guidelines and preview-specific configurations.
 
Hope this helps, Lou.

View solution in original post

3 REPLIES 3

BigRoux
Databricks Employee
Databricks Employee
Databricks supports creating and working with Apache Iceberg tables natively under specific conditions. Managed Iceberg tables in Unity Catalog can be created directly using Databricks Runtime 16.4 LTS or newer. The necessary setup requires enabling the Managed Iceberg private preview and adhering to specified requirements, such as enabling Unity Catalog and ensuring external schema access permissions. Once configured, users can create these tables using SQL commands like CREATE OR REPLACE TABLE <catalog>.<schema>.<table> with USING iceberg.
 
However, for external (foreign) Iceberg tables where metadata is managed outside Databricks (e.g., in Glue or Snowflake catalogs), Databricks only allows read access. Additionally, Iceberg tables written by third-party tools remain read-only in Databricks. In contrast, Managed Iceberg tables in Databricks allow writing through Databricks-specific integrations using Iceberg REST Catalog APIs, enabling interoperability with external Iceberg clients like Spark, Flink, and Trino.
 
For clusters running Apache Spark 3.5.2, an Iceberg JAR compatible with Spark must be loaded alongside configuration steps, including setting proper extensions and specifying Iceberg catalog details. Without proper configuration, errors like the one encountered (Failed to find the data source: iceberg) may arise. For optimal compatibility and functionality, users should follow Databricks' guidelines and preview-specific configurations.
 
Hope this helps, Lou.

Hey Lou,

That helps quite a bit, clears up the confusion on my end. Quick question, the managed private preview, is it enabled by Databricks (the account rep)? I'm assuming the answer is yes here, just wanted to make sure.

Thanks!

Hey Petergriffin1, I don't know the exact process but your best bet is what you suggested above. Start with your AE, he/she may direct you to your SA and they will take it from there. Hope this help.  Lou.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now