<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Read external iceberg table in a spark dataframe within databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/read-external-iceberg-table-in-a-spark-dataframe-within/m-p/14155#M8699</link>
    <description>&lt;P&gt;I am trying to read an external iceberg database from s3 location using the follwing command&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df_source = (spark.read.format("iceberg")
&amp;nbsp;
   .load(source_s3_path)
&amp;nbsp;
   .drop(*source_drop_columns)
&amp;nbsp;
   .filter(f"{date_column}&amp;lt;='{date_filter}'")
&amp;nbsp;
    )&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;But I get the following error:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Py4JJavaError: An error occurred while calling o632.load.
: java.util.NoSuchElementException: None.get
	at scala.None$.get(Option.scala:529)
	at scala.None$.get(Option.scala:527)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:136)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:323)
	at scala.Option.flatMap(Option.scala:271)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:321)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:237)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
	at java.lang.Thread.run(Thread.java:750)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If I change the format to parquet in the code above it brings all history records, which what i would like to avoid by using its original format&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have installed the iceberg library&lt;B&gt; iceberg-spark-runtime-3.3_2.12&lt;/B&gt; in my cluster and added the following parameters to the advance config:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkCatalog
&amp;nbsp;
spark.sql.catalog.spark_catalog.type hadoop
&amp;nbsp;
spark.sql.catalog.spark_catalog.warehouse /&amp;lt;folder for iceberg data&amp;gt;/&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But I cannot make it work, so not sure if those steps are required (get it from an article by Dremio) or other config is needed. Please let me know if this can be done&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 29 Dec 2022 19:27:17 GMT</pubDate>
    <dc:creator>lrodcon</dc:creator>
    <dc:date>2022-12-29T19:27:17Z</dc:date>
    <item>
      <title>Read external iceberg table in a spark dataframe within databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/read-external-iceberg-table-in-a-spark-dataframe-within/m-p/14155#M8699</link>
      <description>&lt;P&gt;I am trying to read an external iceberg database from s3 location using the follwing command&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df_source = (spark.read.format("iceberg")
&amp;nbsp;
   .load(source_s3_path)
&amp;nbsp;
   .drop(*source_drop_columns)
&amp;nbsp;
   .filter(f"{date_column}&amp;lt;='{date_filter}'")
&amp;nbsp;
    )&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;But I get the following error:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Py4JJavaError: An error occurred while calling o632.load.
: java.util.NoSuchElementException: None.get
	at scala.None$.get(Option.scala:529)
	at scala.None$.get(Option.scala:527)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:136)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:323)
	at scala.Option.flatMap(Option.scala:271)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:321)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:237)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
	at java.lang.Thread.run(Thread.java:750)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If I change the format to parquet in the code above it brings all history records, which what i would like to avoid by using its original format&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have installed the iceberg library&lt;B&gt; iceberg-spark-runtime-3.3_2.12&lt;/B&gt; in my cluster and added the following parameters to the advance config:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkCatalog
&amp;nbsp;
spark.sql.catalog.spark_catalog.type hadoop
&amp;nbsp;
spark.sql.catalog.spark_catalog.warehouse /&amp;lt;folder for iceberg data&amp;gt;/&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But I cannot make it work, so not sure if those steps are required (get it from an article by Dremio) or other config is needed. Please let me know if this can be done&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 29 Dec 2022 19:27:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-external-iceberg-table-in-a-spark-dataframe-within/m-p/14155#M8699</guid>
      <dc:creator>lrodcon</dc:creator>
      <dc:date>2022-12-29T19:27:17Z</dc:date>
    </item>
    <item>
      <title>Re: Read external iceberg table in a spark dataframe within databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/read-external-iceberg-table-in-a-spark-dataframe-within/m-p/14156#M8700</link>
      <description>&lt;P&gt;I followed the same guide you linked and it worked just fine when I was using SQL instead of python. Have you tried using SQL?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Something the Dremio article doesn't discuss is Databricks SQL implementation of MERGE, which is only compatible with delta files, not iceberg. If you need MERGE, I don't know if this has been solved yet.&lt;/P&gt;</description>
      <pubDate>Thu, 29 Dec 2022 21:34:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-external-iceberg-table-in-a-spark-dataframe-within/m-p/14156#M8700</guid>
      <dc:creator>Jfoxyyc</dc:creator>
      <dc:date>2022-12-29T21:34:56Z</dc:date>
    </item>
    <item>
      <title>Re: Read external iceberg table in a spark dataframe within databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/read-external-iceberg-table-in-a-spark-dataframe-within/m-p/14157#M8701</link>
      <description>&lt;P&gt;Thanks for your answer, I have tried SQL as well and it did not work for me. It does not detect iceberg as a valid format. I might have missed something in the steps. I will give it another try&lt;/P&gt;</description>
      <pubDate>Fri, 30 Dec 2022 07:23:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-external-iceberg-table-in-a-spark-dataframe-within/m-p/14157#M8701</guid>
      <dc:creator>lrodcon</dc:creator>
      <dc:date>2022-12-30T07:23:40Z</dc:date>
    </item>
    <item>
      <title>Re: Read external iceberg table in a spark dataframe within databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/read-external-iceberg-table-in-a-spark-dataframe-within/m-p/14158#M8702</link>
      <description>&lt;P&gt;Nothing, I followed the exact steps as the article: &lt;A href="https://www.dremio.com/subsurface/getting-started-with-apache-iceberg-in-databricks/" target="test_blank"&gt;https://www.dremio.com/subsurface/getting-started-with-apache-iceberg-in-databricks/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Even I have used the same Runtime version and same library to see if the problem was related to versioning but I keep getting an error even in SQL. if I try the code in the article:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;%sql&amp;nbsp;
CREATE TABLE default.test_table_1 (id bigint, data string)&amp;nbsp;
USING ICEBERG ;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I get the following error:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;SyntaxError: invalid syntax
  File "&amp;lt;command-388374108764913&amp;gt;", line 2
    CREATE TABLE default.test_table_1 (id bigint, data string)
           ^
SyntaxError: invalid syntax&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Not sure what I am doing wrong&lt;/P&gt;</description>
      <pubDate>Fri, 30 Dec 2022 07:45:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-external-iceberg-table-in-a-spark-dataframe-within/m-p/14158#M8702</guid>
      <dc:creator>lrodcon</dc:creator>
      <dc:date>2022-12-30T07:45:56Z</dc:date>
    </item>
    <item>
      <title>Re: Read external iceberg table in a spark dataframe within databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/read-external-iceberg-table-in-a-spark-dataframe-within/m-p/14159#M8703</link>
      <description>&lt;P&gt;&lt;A href="https://issues.apache.org/jira/browse/SPARK-41344" alt="https://issues.apache.org/jira/browse/SPARK-41344" target="_blank"&gt;https://issues.apache.org/jira/browse/SPARK-41344&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 10 Jun 2023 18:00:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-external-iceberg-table-in-a-spark-dataframe-within/m-p/14159#M8703</guid>
      <dc:creator>dynofu</dc:creator>
      <dc:date>2023-06-10T18:00:48Z</dc:date>
    </item>
    <item>
      <title>Re: Read external iceberg table in a spark dataframe within databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/read-external-iceberg-table-in-a-spark-dataframe-within/m-p/89505#M37833</link>
      <description>&lt;P&gt;Its an article publised couple years back, which doesnt work anymore, i am finding alternatives to it, will keep you posted&lt;/P&gt;</description>
      <pubDate>Wed, 11 Sep 2024 17:08:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-external-iceberg-table-in-a-spark-dataframe-within/m-p/89505#M37833</guid>
      <dc:creator>sk_geekloid</dc:creator>
      <dc:date>2024-09-11T17:08:55Z</dc:date>
    </item>
    <item>
      <title>Re: Read external iceberg table in a spark dataframe within databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/read-external-iceberg-table-in-a-spark-dataframe-within/m-p/110432#M43572</link>
      <description>&lt;P&gt;Did you get any resolution for the issue?&lt;/P&gt;</description>
      <pubDate>Tue, 18 Feb 2025 00:39:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-external-iceberg-table-in-a-spark-dataframe-within/m-p/110432#M43572</guid>
      <dc:creator>chandu402240</dc:creator>
      <dc:date>2025-02-18T00:39:39Z</dc:date>
    </item>
  </channel>
</rss>

