cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to read Delta Lake table with Spaces/Special Characters in Column Names in Dremio

naga93
New Contributor

Hello,

I am currently writing a Delta Lake table from Databricks to Unity Catalog using PySpark 3.5.0 (15.4 LTS Databricks runtime). We want the EXTERNAL Delta Lake tables to be readable from both UC and Dremio. Our Dremio build version is 25.0.6.

The source is an SFTP server where we read CSV files from and the column names have spaces and sometimes special characters (. or ()) in them. To handle this ingestion in Databricks as well as to make it work in Dremio, I’ve added the following settings:
.option(“mergeSchema”, “true”)
.option(“delta.enableDeletionVectors”, “false”)
.option(“delta.minReaderVersion”, “2”)
.option(“delta.minWriterVersion”, “5”)
.option(“delta.columnMapping.mode”, “name”) \

I’ve mentioned the readerVersion to 2 because it’s mentioned here - Delta Lake | Dremio Documentation that “Only Delta Lake tables with minReaderVersion 1 or 2 can be read. Column Mapping is supported with minReaderVersion 2”. I had to disable enableDeletionVectors as it’s setting the readerVersion to 3 and writerVersion to 7 automatically if left enabled. More on that here - How does Databricks manage Delta Lake feature compatibility? | Databricks Documentation

So right now it works fine and I have the data in S3 and the UC Delta table with the column names exactly like in the CSV files. So I checked the 00000.json file in the delta_log folder for this table and it has:
{“protocol”:{“minReaderVersion”:2,“minWriterVersion”:5}}

So far so good, but when I go to the table in Dremio and try to format it, I see things like “col-63f13242-9896-4ab3-bb22-e0c4a34689ff” in the column names instead of names. So when I format it like that and try to run a select * query on that table, I’m getting errors like:
IOException: Attempted to open range reader for invalid range. Requested Range: [4…60). Valid Ranges: 

If I go to the Details tab, I can see the correct Column Names but not able to query it or see the IDs instead of the actual column names when formatting it with Delta Lake format.

How can I fix this?

Thank you,
Naga

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now