- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago - last edited 3 weeks ago
df = spark.sql("select * from bqms_table;");
df.show();
- Labels:
-
Spark
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
The error you're encountering is related to a compatibility issue between Databricks' GCS implementation and Apache Iceberg when trying to read Iceberg tables from Google Cloud Storage. The specific error is:
```
java.lang.UnsupportedOperationException: Byte-buffer read unsupported by com.databricks.common.filesystem.LokiGCSInputStream
```
This indicates that the Databricks GCS file system implementation (`LokiGCSInputStream`) doesn't support the byte-buffer read operations that Iceberg requires when reading Parquet files.
Potential Solutions
1. Use a Different FileIO Implementation
You need to configure Iceberg to use a different FileIO implementation that's compatible with Databricks' GCS integration. Try setting the following configuration:
```python
spark.conf.set("spark.sql.catalog.your_catalog_name.io-impl", "org.apache.iceberg.gcp.gcs.GCSFileIO")
```
2. Update Catalog Configuration
Ensure your catalog is properly configured with the correct GCS credentials and implementation:
```python
Configure Iceberg catalog
spark.conf.set("spark.sql.catalog.your_catalog_name", "org.apache.iceberg.spark.SparkCatalog")
spark.conf.set("spark.sql.catalog.your_catalog_name.type", "hadoop")
spark.conf.set("spark.sql.catalog.your_catalog_name.warehouse", "gs://your-bucket/path")
spark.conf.set("spark.sql.catalog.your_catalog_name.io-impl", "org.apache.iceberg.gcp.gcs.GCSFileIO")
```
3. Check Iceberg Version Compatibility
The issue might be related to compatibility between Iceberg 1.5.1 and Databricks Runtime 16.3. Try using a different Iceberg version that's known to work with Databricks, such as 1.4.2:
```python
Include in your spark configuration
spark.conf.set("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.2,org.apache.iceberg:iceberg-gcp-bundle:1.4.2")
```
4. Use Absolute Paths
Iceberg requires absolute paths to locate metadata files and data files. Make sure you're using the full GCS path:
```python
# Instead of using a table name reference
df = spark.sql("SELECT * FROM gs://your-bucket/path/to/table")
```
5. Consider Using Unity Catalog
If possible, consider using Databricks Unity Catalog with Iceberg reads enabled, which provides better integration:
```sql
CREATE TABLE T(c1 INT) TBLPROPERTIES(
'delta.columnMapping.mode' = 'name',
'delta.enableIcebergCompatV2' = 'true',
'delta.universalFormat.enabledFormats' = 'iceberg'
);
```
This is a known issue with Iceberg and certain file system implementations that don't support byte-buffer reads. The error occurs during the reading of Parquet file footers, which Iceberg uses to build its metadata model.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
The error you're encountering is related to a compatibility issue between Databricks' GCS implementation and Apache Iceberg when trying to read Iceberg tables from Google Cloud Storage. The specific error is:
```
java.lang.UnsupportedOperationException: Byte-buffer read unsupported by com.databricks.common.filesystem.LokiGCSInputStream
```
This indicates that the Databricks GCS file system implementation (`LokiGCSInputStream`) doesn't support the byte-buffer read operations that Iceberg requires when reading Parquet files.
Potential Solutions
1. Use a Different FileIO Implementation
You need to configure Iceberg to use a different FileIO implementation that's compatible with Databricks' GCS integration. Try setting the following configuration:
```python
spark.conf.set("spark.sql.catalog.your_catalog_name.io-impl", "org.apache.iceberg.gcp.gcs.GCSFileIO")
```
2. Update Catalog Configuration
Ensure your catalog is properly configured with the correct GCS credentials and implementation:
```python
Configure Iceberg catalog
spark.conf.set("spark.sql.catalog.your_catalog_name", "org.apache.iceberg.spark.SparkCatalog")
spark.conf.set("spark.sql.catalog.your_catalog_name.type", "hadoop")
spark.conf.set("spark.sql.catalog.your_catalog_name.warehouse", "gs://your-bucket/path")
spark.conf.set("spark.sql.catalog.your_catalog_name.io-impl", "org.apache.iceberg.gcp.gcs.GCSFileIO")
```
3. Check Iceberg Version Compatibility
The issue might be related to compatibility between Iceberg 1.5.1 and Databricks Runtime 16.3. Try using a different Iceberg version that's known to work with Databricks, such as 1.4.2:
```python
Include in your spark configuration
spark.conf.set("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.2,org.apache.iceberg:iceberg-gcp-bundle:1.4.2")
```
4. Use Absolute Paths
Iceberg requires absolute paths to locate metadata files and data files. Make sure you're using the full GCS path:
```python
# Instead of using a table name reference
df = spark.sql("SELECT * FROM gs://your-bucket/path/to/table")
```
5. Consider Using Unity Catalog
If possible, consider using Databricks Unity Catalog with Iceberg reads enabled, which provides better integration:
```sql
CREATE TABLE T(c1 INT) TBLPROPERTIES(
'delta.columnMapping.mode' = 'name',
'delta.enableIcebergCompatV2' = 'true',
'delta.universalFormat.enabledFormats' = 'iceberg'
);
```
This is a known issue with Iceberg and certain file system implementations that don't support byte-buffer reads. The error occurs during the reading of Parquet file footers, which Iceberg uses to build its metadata model.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
Similar issue exist for Azure as well https://github.com/apache/iceberg/issues/10808#issuecomment-2263673628
Can this be fixed at databricks level.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
I tried given solutions but it seems issue still persist. Appreciate if it can be resolved by Databricks soon for better integration b/w GCP and Databricks .

