Databricks Community

naga93 · 3 weeks ago

Hello,

I am currently writing a Delta Lake table from Databricks to Unity Catalog using PySpark 3.5.0 (15.4 LTS Databricks runtime). We want the EXTERNAL Delta Lake tables to be readable from both UC and Dremio. Our Dremio build version is 25.0.6.

The source is an SFTP server where we read CSV files from and the column names have spaces and sometimes special characters (. or ()) in them. To handle this ingestion in Databricks as well as to make it work in Dremio, I’ve added the following settings:
.option(“mergeSchema”, “true”)
.option(“delta.enableDeletionVectors”, “false”)
.option(“delta.minReaderVersion”, “2”)
.option(“delta.minWriterVersion”, “5”)
.option(“delta.columnMapping.mode”, “name”) \

I’ve mentioned the readerVersion to 2 because it’s mentioned here - Delta Lake | Dremio Documentation that “Only Delta Lake tables with minReaderVersion 1 or 2 can be read. Column Mapping is supported with minReaderVersion 2”. I had to disable enableDeletionVectors as it’s setting the readerVersion to 3 and writerVersion to 7 automatically if left enabled. More on that here - How does Databricks manage Delta Lake feature compatibility? | Databricks Documentation

So right now it works fine and I have the data in S3 and the UC Delta table with the column names exactly like in the CSV files. So I checked the 00000.json file in the delta_log folder for this table and it has:
{“protocol”:{“minReaderVersion”:2,“minWriterVersion”:5}}

So far so good, but when I go to the table in Dremio and try to format it, I see things like “col-63f13242-9896-4ab3-bb22-e0c4a34689ff” in the column names instead of names. So when I format it like that and try to run a select * query on that table, I’m getting errors like:
IOException: Attempted to open range reader for invalid range. Requested Range: [4…60). Valid Ranges:

If I go to the Details tab, I can see the correct Column Names but not able to query it or see the IDs instead of the actual column names when formatting it with Delta Lake format.

How can I fix this?

Thank you,
Naga

Brahmareddy · a week ago

Hi naga93,

How are you doing today?, As per my understanding, you’ve done a great job navigating all the tricky parts of Delta + Unity Catalog + Dremio integration! You're absolutely right to set minReaderVersion to 2 and disable deletion vectors to make the table Dremio-compatible. The issue you're seeing—Dremio showing column IDs like col-xxxx instead of the actual names—is related to column mapping mode, which you're correctly setting to "name" in Databricks, but unfortunately, Dremio doesn’t yet fully support Delta tables that use column mapping, even if the reader version is compliant. While the Delta protocol allows it, Dremio often expects "no column mapping" (i.e., default mode) to resolve and display proper column names.

Right now, the safest workaround (if you want full compatibility with Dremio) is to avoid column mapping altogether, which means dropping the .option("delta.columnMapping.mode", "name"). However, if your column names include special characters or spaces, that becomes tricky without name mapping. So one middle-ground option is to clean or normalize column names during ingestion—replacing spaces or special characters with underscores—so you can avoid using name-based column mapping but still make it work cleanly in both Databricks and Dremio.

Long-term, Dremio may improve support for Delta Lake features like column mapping, but for now, compatibility is best when keeping things simple—no column mapping, deletion vectors off, and readerVersion at 1 or 2. Let me know if you want help writing a quick normalization function for column names or reprocessing without column mapping!

Regards,

Brahma