<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic DLT Autoloader schemaHints from JSON file instead of inline list? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-autoloader-schemahints-from-json-file-instead-of-inline-list/m-p/140986#M51597</link>
    <description>&lt;P class=""&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/107959"&gt;@Witold&lt;/a&gt;,&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/25346"&gt;@Hubert-Dudek&lt;/a&gt;,&lt;/P&gt;&lt;P class=""&gt;I’m using a DLT pipeline to ingest realtime data from Parquet files in S3 into Delta tables using Auto Loader. The pipeline is written in SQL notebooks.&lt;/P&gt;&lt;P class=""&gt;Problem:&lt;BR /&gt;Sometimes decimal columns in the Parquet files get inferred as INT, which breaks my downstream logic. To control this I’m using schemaHints, and it works if I pass the column definitions inline.&lt;/P&gt;&lt;P class=""&gt;Working example:&lt;/P&gt;&lt;P class=""&gt;select *&lt;BR /&gt;from stream cloud_files(&lt;BR /&gt;'s3://my-bucket/path',&lt;BR /&gt;'parquet',&lt;BR /&gt;map('cloudFiles.schemaHints', 'id INT, sal DECIMAL(10,2)')&lt;BR /&gt;);&lt;/P&gt;&lt;P class=""&gt;However, I don’t want to hardcode the schema in the SQL. I tried to keep the schema in a JSON file and pass the path instead, something like:&lt;/P&gt;&lt;P class=""&gt;select *&lt;BR /&gt;from stream cloud_files(&lt;BR /&gt;'s3://my-bucket/path',&lt;BR /&gt;'parquet',&lt;BR /&gt;map('cloudFiles.schemaHints', 'dbfs:/schemas/my_table_schema.json')&lt;BR /&gt;);&lt;/P&gt;&lt;P class=""&gt;This does NOT work – Auto Loader treats the value as a literal “id INT, sal DECIMAL…” style string, not as a path.&lt;/P&gt;&lt;P class=""&gt;Questions:&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;P class=""&gt;Is this the expected behaviour (schemaHints only accepts an inline string and not a file path)?&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;Is there any supported way in DLT SQL to:&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;P class=""&gt;load a schema definition from a JSON file, and&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;feed it into cloudFiles / schemaHints (or another option) without hardcoding the full “col dtype, col2 dtype…” string in the SQL?&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;Goal:&lt;BR /&gt;I want a single JSON schema file per source, and have multiple DLT SQL pipelines reuse it, while still preventing decimal columns from being inferred as INT.&lt;/P&gt;&lt;P class=""&gt;Any suggestions or patterns (e.g., using Python to read the JSON and set pipeline configuration, schema evolution tricks, or alternative options in Auto Loader) would be really helpful.&lt;/P&gt;</description>
    <pubDate>Wed, 03 Dec 2025 09:40:20 GMT</pubDate>
    <dc:creator>Rohit_hk</dc:creator>
    <dc:date>2025-12-03T09:40:20Z</dc:date>
    <item>
      <title>DLT Autoloader schemaHints from JSON file instead of inline list?</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-autoloader-schemahints-from-json-file-instead-of-inline-list/m-p/140986#M51597</link>
      <description>&lt;P class=""&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/107959"&gt;@Witold&lt;/a&gt;,&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/25346"&gt;@Hubert-Dudek&lt;/a&gt;,&lt;/P&gt;&lt;P class=""&gt;I’m using a DLT pipeline to ingest realtime data from Parquet files in S3 into Delta tables using Auto Loader. The pipeline is written in SQL notebooks.&lt;/P&gt;&lt;P class=""&gt;Problem:&lt;BR /&gt;Sometimes decimal columns in the Parquet files get inferred as INT, which breaks my downstream logic. To control this I’m using schemaHints, and it works if I pass the column definitions inline.&lt;/P&gt;&lt;P class=""&gt;Working example:&lt;/P&gt;&lt;P class=""&gt;select *&lt;BR /&gt;from stream cloud_files(&lt;BR /&gt;'s3://my-bucket/path',&lt;BR /&gt;'parquet',&lt;BR /&gt;map('cloudFiles.schemaHints', 'id INT, sal DECIMAL(10,2)')&lt;BR /&gt;);&lt;/P&gt;&lt;P class=""&gt;However, I don’t want to hardcode the schema in the SQL. I tried to keep the schema in a JSON file and pass the path instead, something like:&lt;/P&gt;&lt;P class=""&gt;select *&lt;BR /&gt;from stream cloud_files(&lt;BR /&gt;'s3://my-bucket/path',&lt;BR /&gt;'parquet',&lt;BR /&gt;map('cloudFiles.schemaHints', 'dbfs:/schemas/my_table_schema.json')&lt;BR /&gt;);&lt;/P&gt;&lt;P class=""&gt;This does NOT work – Auto Loader treats the value as a literal “id INT, sal DECIMAL…” style string, not as a path.&lt;/P&gt;&lt;P class=""&gt;Questions:&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;P class=""&gt;Is this the expected behaviour (schemaHints only accepts an inline string and not a file path)?&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;Is there any supported way in DLT SQL to:&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;P class=""&gt;load a schema definition from a JSON file, and&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;feed it into cloudFiles / schemaHints (or another option) without hardcoding the full “col dtype, col2 dtype…” string in the SQL?&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;Goal:&lt;BR /&gt;I want a single JSON schema file per source, and have multiple DLT SQL pipelines reuse it, while still preventing decimal columns from being inferred as INT.&lt;/P&gt;&lt;P class=""&gt;Any suggestions or patterns (e.g., using Python to read the JSON and set pipeline configuration, schema evolution tricks, or alternative options in Auto Loader) would be really helpful.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Dec 2025 09:40:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-autoloader-schemahints-from-json-file-instead-of-inline-list/m-p/140986#M51597</guid>
      <dc:creator>Rohit_hk</dc:creator>
      <dc:date>2025-12-03T09:40:20Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Autoloader schemaHints from JSON file instead of inline list?</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-autoloader-schemahints-from-json-file-instead-of-inline-list/m-p/141022#M51605</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/199609"&gt;@Rohit_hk&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;Below are the answers to your questions:&lt;/P&gt;
&lt;UL class=""&gt;
&lt;LI&gt;
&lt;P class=""&gt;Is this the expected behaviour (schemaHints only accepts an inline string and not a file path)?&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;Ans:&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Yes, &lt;STRONG&gt;cloudFiles.schemaHints&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;is defined as a plain String option, and we can only pass&amp;nbsp; DDL strings, not paths. There is no support for interpreting its value as a path to a JSON file .&lt;STRONG&gt;Doc&lt;/STRONG&gt;&amp;nbsp;:&lt;A href="https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/options" target="_blank" rel="noopener"&gt;https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/options&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL class=""&gt;
&lt;LI&gt;
&lt;P class=""&gt;Is there any supported way in DLT SQL to:&lt;/P&gt;
&lt;UL class=""&gt;
&lt;LI&gt;
&lt;P class=""&gt;load a schema definition from a JSON file, and&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class=""&gt;feed it into cloudFiles / schemaHints (or another option) without hardcoding the full “col dtype, col2 dtype…” string in the SQL?&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;Ans:&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Not directly. In a pure SQL DLT (Lakeflow Declarative Pipelines) notebook:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;You cannot read an arbitrary JSON file from DBFS/S3 and inject its contents into cloud_files or schemaHints at runtime.&lt;/LI&gt;
&lt;LI&gt;The options map in cloud_files/read_files must be literals or parameter expansions, not the result of reading a file.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="2062" data-end="2219"&gt;However, DLT / Lakeflow pipelines support &lt;STRONG data-start="2095" data-end="2109"&gt;parameters&lt;/STRONG&gt; whose values are injected into your SQL as strings &lt;STRONG&gt;Doc&lt;/STRONG&gt;:&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/ldp/parameters" target="_blank" rel="noopener"&gt;https://docs.databricks.com/aws/en/ldp/parameters&lt;/A&gt;&lt;/P&gt;
&lt;P data-start="2221" data-end="2229"&gt;You can:&lt;/P&gt;
&lt;OL data-start="2231" data-end="2285"&gt;
&lt;LI data-start="2231" data-end="2285"&gt;
&lt;P data-start="2234" data-end="2285"&gt;Put this in your&amp;nbsp;&lt;EM data-start="2250" data-end="2274"&gt;pipeline configuration&lt;/EM&gt; (UI/JSON):&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary"&gt;
&lt;DIV class="overflow-y-auto p-4" dir="ltr"&gt;&lt;CODE class="whitespace-pre! language-json"&gt;&lt;SPAN&gt;&lt;SPAN class="hljs-attr"&gt;"configuration"&lt;/SPAN&gt;&lt;SPAN class="hljs-punctuation"&gt;:&lt;/SPAN&gt; &lt;SPAN class="hljs-punctuation"&gt;{&lt;/SPAN&gt;
  &lt;SPAN class="hljs-attr"&gt;"myschema.my_table_hints"&lt;/SPAN&gt;&lt;SPAN class="hljs-punctuation"&gt;:&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;"id INT, sal DECIMAL(10,2), other_col STRING"&lt;/SPAN&gt;
&lt;SPAN class="hljs-punctuation"&gt;}&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;OL start="2" data-start="2396" data-end="2433"&gt;
&lt;LI data-start="2396" data-end="2433"&gt;
&lt;P data-start="2399" data-end="2433"&gt;Reference it in your SQL notebook:&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary"&gt;
&lt;DIV class="sticky top-9"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="overflow-y-auto p-4" dir="ltr"&gt;&lt;LI-CODE lang="markup"&gt;CREATE OR REFRESH STREAMING LIVE TABLE bronze_my_table AS
SELECT *
FROM STREAM cloud_files(
  's3://my-bucket/path',
  'parquet',
  map(
    'cloudFiles.schemaHints', '${myschema.my_table_hints}',
    'cloudFiles.schemaLocation', 'dbfs:/schemas/autoloader/my_table'
  )
);&lt;/LI-CODE&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Dec 2025 14:07:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-autoloader-schemahints-from-json-file-instead-of-inline-list/m-p/141022#M51605</guid>
      <dc:creator>K_Anudeep</dc:creator>
      <dc:date>2025-12-03T14:07:40Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Autoloader schemaHints from JSON file instead of inline list?</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-autoloader-schemahints-from-json-file-instead-of-inline-list/m-p/141025#M51607</link>
      <description>&lt;P&gt;- dlt use automatically&amp;nbsp;&lt;CODE&gt;cloudFiles.schemaLocation&lt;/CODE&gt;&amp;nbsp;So the schema is stored automatically, and in many cases, it will be stable, but it does not&lt;/P&gt;
&lt;P&gt;- keep using&amp;nbsp;cloudFiles.schemaHints, but just load JSON to a variable and pass that variable (I guess you will need some format conversion from JSON to SQL ddl, but it can be achieved with a simple python script)&lt;/P&gt;</description>
      <pubDate>Wed, 03 Dec 2025 14:41:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-autoloader-schemahints-from-json-file-instead-of-inline-list/m-p/141025#M51607</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2025-12-03T14:41:19Z</dc:date>
    </item>
  </channel>
</rss>

