Databricks Community

thains · ‎12-27-2022

I'm getting an error when I try to create a DLT Pipeline from a bunch of third-party app-usage data we have. Here's the error message:

Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema.

Please upgrade your Delta table to reader version 2 and writer version 5

and change the column mapping mode to 'name' mapping. You can use the following command:

ALTER TABLE <table_name> SET TBLPROPERTIES (

'delta.columnMapping.mode' = 'name',

'delta.minReaderVersion' = '2',

'delta.minWriterVersion' = '5')

So, I added the properties to my table definition, and I'm still getting the error. What am I doing wrong? Here's the table definition:

CREATE STREAMING LIVE TABLE clevertap_analytics_bronze

COMMENT "App usage data from CleverTap"

TBLPROPERTIES ("myCustomPipeline.quality" = "bronze",

"delta.columnMapping.mode" = "name",

"delta.minReaderVersion" = "2",

"delta.minWriterVersion" = "5"

)

AS

SELECT

*

FROM

cloud_files(

-- REPLACE THE BELOW LINE WITH THE EXACT S3 LOCATION WHERE YOU DATA LIVES

"s3://clevertap-analytics/",

"json",

-- CHANGE THE FOLLOWING TO "false" IF THE CSV FILE(s) DO NOT INCLUDE A HEADER

map(

"header", "true",

"cloudFiles.inferColumnTypes", "true",

"cloudFiles.schemaEvolutionMode", "rescue",

"rescuedDataColumn", "rescue_col"

)

);

Debayan · ‎01-02-2023

Hi, Could you please confirm if you have also upgraded the Delta table as mentioned?

thains · ‎01-03-2023

I added that version to my table definition, yes. Did I do it right? My table definition is in the OP.

jose_gonzalez · ‎01-30-2023

You might need to do a full refresh if these changes does not work

thains · ‎02-07-2023

That did not help, sadly. However, I think I've identified the actual issue... See my comment from Feb 3rd.

thains · ‎02-03-2023

It appears the problem is that the json files have keys with spaces in the names, like this:

"CT App Version":"3.5.6.6"

I've checked and that is supposedly a valid json key, even though it's not standard. Unfortunately, these files are generated by a third-party, so I don't have a lot of control over the content.

It looks like there might be a solution if I use python for the auto-loader, as I think I need to do something like this:

select([col(c).alias(c.replace(" ", "_")) for c in dlt.readStream("vw_raw").columns])

(from https://community.databricks.com/s/question/0D58Y000092eaqcSAA/ingest-a-csv-file-with-spaces-in-colu...

However, I am a DB guy, not a python guy. Is there something equivalent available for the SQL version of the loader?

thains · ‎02-13-2023

I found this other forum thread that looks potentially useful, but I can’t figure out either how to translate it to SQL to handle JSON, nor how to get the pipeline I’m working with to interpret the Python. When I switch to Python, it complains about the line it inserts telling it that the script is python!

https://community.databricks.com/s/question/0D58Y000092eaqcSAA/ingest-a-csv-file-with-spaces-in-colu...

Still looking for ideas!