<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Reading an Iceberg table with AWS Glue Data Catalog as metastore in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/reading-an-iceberg-table-with-aws-glue-data-catalog-as-metastore/m-p/101531#M40710</link>
    <description>&lt;P&gt;Thanks for your quick response. I have added the `iceberg-spark-runtime-3.5_2.12-1.7.0.jar` from iceberg.apache.org as a library in my cluster (runtime is "&lt;SPAN&gt;16.0 ML (includes Apache Spark 3.5.0, Scala 2.12)"&lt;/SPAN&gt;,) and have the following for my Spark config:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.glue.catalog-impl org.apache.iceberg.aws.glue.GlueCatalog
spark.databricks.hive.metastore.glueCatalog.enabled true
spark.sql.catalog.glue.type glue
spark.sql.catalog.glue.warehouse s3://&amp;lt;my-bucket&amp;gt;/&amp;lt;my-prefix&amp;gt;/
spark.jars.packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.0
spark.master local[*, 4]
spark.sql.catalog.glue.io-impl org.apache.iceberg.aws.s3.S3FileIO
spark.databricks.cluster.profile singleNode
spark.sql.catalog.glue org.apache.iceberg.spark.SparkCatalog&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;When I run the cluster and try `SHOW TABLES IN glue`, it doesn't find a catalog called `glue`. Are there additional steps that need to be done to make the new `glue` catalog available? I have also tried applying the above `spark.sql.catalog...` changes using the existing catalog name `hive_metastore`, but that does not work either. Some more guidance on reading Iceberg tables with AWS Glue Data Catalog as metastore would be appreciated. The page you linked does not mention Iceberg at all. Thanks!&lt;/P&gt;</description>
    <pubDate>Mon, 09 Dec 2024 22:31:17 GMT</pubDate>
    <dc:creator>ideal_knee</dc:creator>
    <dc:date>2024-12-09T22:31:17Z</dc:date>
    <item>
      <title>Reading an Iceberg table with AWS Glue Data Catalog as metastore</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-an-iceberg-table-with-aws-glue-data-catalog-as-metastore/m-p/101142#M40557</link>
      <description>&lt;P&gt;I have created an Iceberg table using AWS Glue, however whenever I try to read it using a Databricks cluster, I get `java.lang.InstantiationException`. I have tried every combination of Spark configs for my Databricks compute cluster that I can think of based on Databricks, Dremio, AWS, and Iceberg documentation. Most recently I tried&lt;/P&gt;&lt;P&gt;```&lt;BR /&gt;&lt;SPAN&gt;spark.databricks.hive.metastore.glueCatalog.enabled true&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;spark.jars.packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.0&lt;BR /&gt;spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions&lt;BR /&gt;```&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I have also tried including various `spark.sql.catalog.hive_metastore...` configs as mentioned in &lt;A href="https://iceberg.apache.org/docs/latest/spark-configuration/#catalog-configuration" target="_self"&gt;the Iceberg docs&lt;/A&gt; as well, with the same result. Any guidance on the minimal Spark configs necessary (or other suggestions) to allow reading an Iceberg table with AWS Glue Data Catalog as metastore would be greatly appreciated. Thanks!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 05 Dec 2024 22:22:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-an-iceberg-table-with-aws-glue-data-catalog-as-metastore/m-p/101142#M40557</guid>
      <dc:creator>ideal_knee</dc:creator>
      <dc:date>2024-12-05T22:22:54Z</dc:date>
    </item>
    <item>
      <title>Re: Reading an Iceberg table with AWS Glue Data Catalog as metastore</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-an-iceberg-table-with-aws-glue-data-catalog-as-metastore/m-p/101261#M40602</link>
      <description>&lt;P&gt;To read an Iceberg table using AWS Glue Data Catalog as the metastore on a Databricks cluster, you need to configure Spark with the appropriate settings and ensure compatibility with the Iceberg runtime. Here's the example setup:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Enable AWS Glue Catalog by setting &lt;CODE&gt;spark.databricks.hive.metastore.glueCatalog.enabled&lt;/CODE&gt; to &lt;CODE&gt;true&lt;/CODE&gt;.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Include the Iceberg runtime JAR that matches your Spark version. For Spark 3.5, use &lt;CODE&gt;org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.0&lt;/CODE&gt;.&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Add Iceberg extensions to Spark with &lt;CODE&gt;spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions&lt;/CODE&gt;.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Set the following catalog configurations:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;CODE&gt;spark.sql.catalog.glue=org.apache.iceberg.spark.SparkCatalog&lt;/CODE&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;CODE&gt;spark.sql.catalog.glue.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog&lt;/CODE&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;CODE&gt;spark.sql.catalog.glue.warehouse=s3://&amp;lt;your-warehouse-path&amp;gt;&lt;/CODE&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;CODE&gt;spark.sql.catalog.glue.io-impl=org.apache.iceberg.aws.s3.S3FileIO&lt;/CODE&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Once the cluster is configured, you can test it by running queries such as &lt;CODE&gt;SHOW TABLES IN glue.&amp;lt;database_name&amp;gt;&lt;/CODE&gt; or &lt;CODE&gt;SELECT * FROM glue.&amp;lt;database_name&amp;gt;.&amp;lt;table_name&amp;gt;&lt;/CODE&gt; to validate connectivity.&lt;/P&gt;
&lt;P&gt;Make sure your Databricks cluster has the necessary AWS credentials and permissions for Glue and S3. This guidance is based on &lt;A href="https://docs.databricks.com/ja/archive/external-metastores/aws-glue-metastore.html" target="_self"&gt;&lt;SPAN&gt;Databricks&lt;/SPAN&gt;&lt;SPAN&gt; Documentation&lt;/SPAN&gt;&lt;SPAN&gt; on&lt;/SPAN&gt;&lt;SPAN&gt; AWS&lt;/SPAN&gt;&lt;SPAN&gt; Glue&lt;/SPAN&gt;&lt;SPAN&gt; and&lt;/SPAN&gt;&lt;SPAN&gt; Iceberg&lt;/SPAN&gt;&lt;/A&gt;, with specific references to Spark configurations for Iceberg compatibility. Let us know if you need additional help!&lt;/P&gt;</description>
      <pubDate>Fri, 06 Dec 2024 16:20:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-an-iceberg-table-with-aws-glue-data-catalog-as-metastore/m-p/101261#M40602</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-12-06T16:20:55Z</dc:date>
    </item>
    <item>
      <title>Re: Reading an Iceberg table with AWS Glue Data Catalog as metastore</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-an-iceberg-table-with-aws-glue-data-catalog-as-metastore/m-p/101531#M40710</link>
      <description>&lt;P&gt;Thanks for your quick response. I have added the `iceberg-spark-runtime-3.5_2.12-1.7.0.jar` from iceberg.apache.org as a library in my cluster (runtime is "&lt;SPAN&gt;16.0 ML (includes Apache Spark 3.5.0, Scala 2.12)"&lt;/SPAN&gt;,) and have the following for my Spark config:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.glue.catalog-impl org.apache.iceberg.aws.glue.GlueCatalog
spark.databricks.hive.metastore.glueCatalog.enabled true
spark.sql.catalog.glue.type glue
spark.sql.catalog.glue.warehouse s3://&amp;lt;my-bucket&amp;gt;/&amp;lt;my-prefix&amp;gt;/
spark.jars.packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.0
spark.master local[*, 4]
spark.sql.catalog.glue.io-impl org.apache.iceberg.aws.s3.S3FileIO
spark.databricks.cluster.profile singleNode
spark.sql.catalog.glue org.apache.iceberg.spark.SparkCatalog&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;When I run the cluster and try `SHOW TABLES IN glue`, it doesn't find a catalog called `glue`. Are there additional steps that need to be done to make the new `glue` catalog available? I have also tried applying the above `spark.sql.catalog...` changes using the existing catalog name `hive_metastore`, but that does not work either. Some more guidance on reading Iceberg tables with AWS Glue Data Catalog as metastore would be appreciated. The page you linked does not mention Iceberg at all. Thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 09 Dec 2024 22:31:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-an-iceberg-table-with-aws-glue-data-catalog-as-metastore/m-p/101531#M40710</guid>
      <dc:creator>ideal_knee</dc:creator>
      <dc:date>2024-12-09T22:31:17Z</dc:date>
    </item>
    <item>
      <title>Re: Reading an Iceberg table with AWS Glue Data Catalog as metastore</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-an-iceberg-table-with-aws-glue-data-catalog-as-metastore/m-p/101563#M40725</link>
      <description>&lt;P&gt;Apologies about the link, the legacy URL should be&amp;nbsp;&lt;A href="https://docs.databricks.com/ja/archive/external-metastores/aws-glue-metastore.html#use-aws-glue-data-catalog-as-a-metastore-legacy" target="_blank"&gt;https://docs.databricks.com/ja/archive/external-metastores/aws-glue-metastore.html#use-aws-glue-data-catalog-as-a-metastore-legacy&lt;/A&gt;.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2024 08:54:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-an-iceberg-table-with-aws-glue-data-catalog-as-metastore/m-p/101563#M40725</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-12-10T08:54:29Z</dc:date>
    </item>
    <item>
      <title>Re: Reading an Iceberg table with AWS Glue Data Catalog as metastore</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-an-iceberg-table-with-aws-glue-data-catalog-as-metastore/m-p/101663#M40765</link>
      <description>&lt;P&gt;Thank you, though that link also does not mention Iceberg at all.&lt;/P&gt;&lt;P&gt;I am able to see the Iceberg table in Databricks in the `hive_metastore` catalog, and see the schema via `DESCRIBE`, however if I try to actually read the data, I get `&lt;SPAN&gt;java.lang.InstantiationException`. I am able to read other Parquet tables from the `hive_metastore` catalog, which is using AWS Glue Data Catalog as the metastore, however I cannot read the Iceberg table. When I run `&lt;/SPAN&gt;&lt;SPAN&gt;SHOW&lt;/SPAN&gt; &lt;SPAN&gt;CATALOGS`, I see 4 catalogs (hive_metastore, main, samples, and system.) No catalog with the name `glue` appears, even with the Spark config I shared previously.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2024 21:19:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-an-iceberg-table-with-aws-glue-data-catalog-as-metastore/m-p/101663#M40765</guid>
      <dc:creator>ideal_knee</dc:creator>
      <dc:date>2024-12-10T21:19:06Z</dc:date>
    </item>
    <item>
      <title>Re: Reading an Iceberg table with AWS Glue Data Catalog as metastore</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-an-iceberg-table-with-aws-glue-data-catalog-as-metastore/m-p/101723#M40788</link>
      <description>&lt;P&gt;The details are better explained in the document you were already initially using&amp;nbsp;&lt;A href="https://iceberg.apache.org/docs/latest/spark-configuration/#replacing-the-session-catalog" target="_blank"&gt;https://iceberg.apache.org/docs/latest/spark-configuration/#replacing-the-session-catalog&lt;/A&gt;. The previous URL shared was from understanding there was an issue with listing the "glue" catalog that should've been created for Iceberg table's support. Second part of the problem would be actually reading the data from Iceberg table which would be more on having the right JAR file(s) and being present in the Classpath.&lt;/P&gt;
&lt;P&gt;At this point I believe it'll be good to check your setup live through a support ticket, if that's not possible please let us know, we'll continue this way, I'll check if I can set up a similar setup and share the steps.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Dec 2024 09:50:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-an-iceberg-table-with-aws-glue-data-catalog-as-metastore/m-p/101723#M40788</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-12-11T09:50:47Z</dc:date>
    </item>
    <item>
      <title>Re: Reading an Iceberg table with AWS Glue Data Catalog as metastore</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-an-iceberg-table-with-aws-glue-data-catalog-as-metastore/m-p/106944#M42653</link>
      <description>&lt;P&gt;In case someone happens upon this in the future, I ended up using Unity Catalog with &lt;A href="https://docs.databricks.com/en/data-governance/unity-catalog/hms-federation/hms-federation-glue.html" target="_self"&gt;Hive metastore federation for Glue&lt;/A&gt;. The Iceberg support is currently "&lt;A href="https://www.databricks.com/blog/announcing-public-preview-hive-metastore-and-aws-glue-federation-unity-catalog" target="_self"&gt;coming soon in Public Preview&lt;/A&gt;."&lt;/P&gt;</description>
      <pubDate>Fri, 24 Jan 2025 18:31:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-an-iceberg-table-with-aws-glue-data-catalog-as-metastore/m-p/106944#M42653</guid>
      <dc:creator>ideal_knee</dc:creator>
      <dc:date>2025-01-24T18:31:48Z</dc:date>
    </item>
  </channel>
</rss>

