How to read Delta Lake table with Spaces/Special Characters in Column Names in Dremio
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yesterday
Hello,
I am currently writing a Delta Lake table from Databricks to Unity Catalog using PySpark 3.5.0 (15.4 LTS Databricks runtime). We want the EXTERNAL Delta Lake tables to be readable from both UC and Dremio. Our Dremio build version is 25.0.6.
The source is an SFTP server where we read CSV files from and the column names have spaces and sometimes special characters (. or ()) in them. To handle this ingestion in Databricks as well as to make it work in Dremio, I’ve added the following settings:
.option(“mergeSchema”, “true”)
.option(“delta.enableDeletionVectors”, “false”)
.option(“delta.minReaderVersion”, “2”)
.option(“delta.minWriterVersion”, “5”)
.option(“delta.columnMapping.mode”, “name”) \
I’ve mentioned the readerVersion to 2 because it’s mentioned here - Delta Lake | Dremio Documentation that “Only Delta Lake tables with minReaderVersion 1 or 2 can be read. Column Mapping is supported with minReaderVersion 2”. I had to disable enableDeletionVectors as it’s setting the readerVersion to 3 and writerVersion to 7 automatically if left enabled. More on that here - How does Databricks manage Delta Lake feature compatibility? | Databricks Documentation
So right now it works fine and I have the data in S3 and the UC Delta table with the column names exactly like in the CSV files. So I checked the 00000.json file in the delta_log folder for this table and it has:
{“protocol”:{“minReaderVersion”:2,“minWriterVersion”:5}}
So far so good, but when I go to the table in Dremio and try to format it, I see things like “col-63f13242-9896-4ab3-bb22-e0c4a34689ff” in the column names instead of names. So when I format it like that and try to run a select * query on that table, I’m getting errors like:
IOException: Attempted to open range reader for invalid range. Requested Range: [4…60). Valid Ranges:
If I go to the Details tab, I can see the correct Column Names but not able to query it or see the IDs instead of the actual column names when formatting it with Delta Lake format.
How can I fix this?
Thank you,
Naga
- Labels:
-
Delta Lake

