topic DLT Autoloader schemaHints from JSON file instead of inline list? in Data Engineering

DLT Autoloader schemaHints from JSON file instead of inline list?

Rohit_hk — Wed, 03 Dec 2025 09:40:20 GMT

I’m using a DLT pipeline to ingest realtime data from Parquet files in S3 into Delta tables using Auto Loader. The pipeline is written in SQL notebooks.

Problem:
Sometimes decimal columns in the Parquet files get inferred as INT, which breaks my downstream logic. To control this I’m using schemaHints, and it works if I pass the column definitions inline.

Working example:

select *
from stream cloud_files(
's3://my-bucket/path',
'parquet',
map('cloudFiles.schemaHints', 'id INT, sal DECIMAL(10,2)')
);

However, I don’t want to hardcode the schema in the SQL. I tried to keep the schema in a JSON file and pass the path instead, something like:

select *
from stream cloud_files(
's3://my-bucket/path',
'parquet',
map('cloudFiles.schemaHints', 'dbfs:/schemas/my_table_schema.json')
);

This does NOT work – Auto Loader treats the value as a literal “id INT, sal DECIMAL…” style string, not as a path.

Questions:

Is this the expected behaviour (schemaHints only accepts an inline string and not a file path)?
Is there any supported way in DLT SQL to:
- load a schema definition from a JSON file, and
- feed it into cloudFiles / schemaHints (or another option) without hardcoding the full “col dtype, col2 dtype…” string in the SQL?

Goal:
I want a single JSON schema file per source, and have multiple DLT SQL pipelines reuse it, while still preventing decimal columns from being inferred as INT.

Any suggestions or patterns (e.g., using Python to read the JSON and set pipeline configuration, schema evolution tricks, or alternative options in Auto Loader) would be really helpful.

Re: DLT Autoloader schemaHints from JSON file instead of inline list?

K_Anudeep — Wed, 03 Dec 2025 14:07:40 GMT

Hello @Rohit_hk ,

Below are the answers to your questions:

Is this the expected behaviour (schemaHints only accepts an inline string and not a file path)?

Ans:

Yes, cloudFiles.schemaHints is defined as a plain String option, and we can only pass DDL strings, not paths. There is no support for interpreting its value as a path to a JSON file .Doc :https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/options

Is there any supported way in DLT SQL to:
- load a schema definition from a JSON file, and
- feed it into cloudFiles / schemaHints (or another option) without hardcoding the full “col dtype, col2 dtype…” string in the SQL?

Ans:

Not directly. In a pure SQL DLT (Lakeflow Declarative Pipelines) notebook:

You cannot read an arbitrary JSON file from DBFS/S3 and inject its contents into cloud_files or schemaHints at runtime.
The options map in cloud_files/read_files must be literals or parameter expansions, not the result of reading a file.

However, DLT / Lakeflow pipelines support parameters whose values are injected into your SQL as strings Doc: https://docs.databricks.com/aws/en/ldp/parameters

You can:

Put this in your pipeline configuration (UI/JSON):

Reference it in your SQL notebook:

Re: DLT Autoloader schemaHints from JSON file instead of inline list?

Hubert-Dudek — Wed, 03 Dec 2025 14:41:19 GMT

- dlt use automatically cloudFiles.schemaLocation So the schema is stored automatically, and in many cases, it will be stable, but it does not

- keep using cloudFiles.schemaHints, but just load JSON to a variable and pass that variable (I guess you will need some format conversion from JSON to SQL ddl, but it can be achieved with a simple python script)