<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Setting up my first DLT Pipeline with 3rd party JSON data in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/setting-up-my-first-dlt-pipeline-with-3rd-party-json-data/m-p/14392#M8886</link>
    <description>&lt;P&gt;Hi, Could you please confirm if you have also upgraded the Delta table as mentioned? &lt;/P&gt;</description>
    <pubDate>Mon, 02 Jan 2023 18:54:43 GMT</pubDate>
    <dc:creator>Debayan</dc:creator>
    <dc:date>2023-01-02T18:54:43Z</dc:date>
    <item>
      <title>Setting up my first DLT Pipeline with 3rd party JSON data</title>
      <link>https://community.databricks.com/t5/data-engineering/setting-up-my-first-dlt-pipeline-with-3rd-party-json-data/m-p/14391#M8885</link>
      <description>&lt;P&gt;I'm getting an error when I try to create a DLT Pipeline from a bunch of third-party app-usage data we have. Here's the error message:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema. &lt;/P&gt;&lt;P&gt;Please upgrade your Delta table to reader version 2 and writer version 5&lt;/P&gt;&lt;P&gt; and change the column mapping mode to 'name' mapping. You can use the following command:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; ALTER TABLE &amp;lt;table_name&amp;gt; SET TBLPROPERTIES (&lt;/P&gt;&lt;P&gt;   'delta.columnMapping.mode' = 'name',&lt;/P&gt;&lt;P&gt;   'delta.minReaderVersion' = '2',&lt;/P&gt;&lt;P&gt;   'delta.minWriterVersion' = '5')&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So, I added the properties to my table definition, and I'm still getting the error. What am I doing wrong? Here's the table definition:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;CREATE STREAMING LIVE TABLE clevertap_analytics_bronze&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;COMMENT "App usage data from CleverTap"&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;TBLPROPERTIES ("myCustomPipeline.quality" = "bronze",&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;"delta.columnMapping.mode" = "name",&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;"delta.minReaderVersion" = "2",&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;"delta.minWriterVersion" = "5"&lt;/P&gt;&lt;P&gt;)&amp;nbsp;&lt;/P&gt;&lt;P&gt;AS&lt;/P&gt;&lt;P&gt;SELECT&lt;/P&gt;&lt;P&gt;&amp;nbsp;*&lt;/P&gt;&lt;P&gt;FROM&lt;/P&gt;&lt;P&gt;&amp;nbsp;cloud_files(&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;-- REPLACE THE BELOW LINE WITH THE EXACT S3 LOCATION WHERE YOU DATA LIVES&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;"s3://clevertap-analytics/",&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;"json",&lt;/P&gt;&lt;P&gt;&amp;nbsp;-- CHANGE THE FOLLOWING TO "false" IF THE CSV FILE(s) DO NOT INCLUDE A HEADER&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;map(&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;"header", "true",&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;"cloudFiles.inferColumnTypes", "true",&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;"cloudFiles.schemaEvolutionMode", "rescue",&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;"rescuedDataColumn", "rescue_col"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;)&lt;/P&gt;&lt;P&gt;&amp;nbsp;);&lt;/P&gt;</description>
      <pubDate>Tue, 27 Dec 2022 16:36:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/setting-up-my-first-dlt-pipeline-with-3rd-party-json-data/m-p/14391#M8885</guid>
      <dc:creator>thains</dc:creator>
      <dc:date>2022-12-27T16:36:49Z</dc:date>
    </item>
    <item>
      <title>Re: Setting up my first DLT Pipeline with 3rd party JSON data</title>
      <link>https://community.databricks.com/t5/data-engineering/setting-up-my-first-dlt-pipeline-with-3rd-party-json-data/m-p/14393#M8887</link>
      <description>&lt;P&gt;I added that version to my table definition, yes. Did I do it right? My table definition is in the OP.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Jan 2023 15:08:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/setting-up-my-first-dlt-pipeline-with-3rd-party-json-data/m-p/14393#M8887</guid>
      <dc:creator>thains</dc:creator>
      <dc:date>2023-01-03T15:08:54Z</dc:date>
    </item>
    <item>
      <title>Re: Setting up my first DLT Pipeline with 3rd party JSON data</title>
      <link>https://community.databricks.com/t5/data-engineering/setting-up-my-first-dlt-pipeline-with-3rd-party-json-data/m-p/14394#M8888</link>
      <description>&lt;P&gt;You might need to do a full refresh if these changes does not work&lt;/P&gt;</description>
      <pubDate>Tue, 31 Jan 2023 00:13:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/setting-up-my-first-dlt-pipeline-with-3rd-party-json-data/m-p/14394#M8888</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2023-01-31T00:13:55Z</dc:date>
    </item>
    <item>
      <title>Re: Setting up my first DLT Pipeline with 3rd party JSON data</title>
      <link>https://community.databricks.com/t5/data-engineering/setting-up-my-first-dlt-pipeline-with-3rd-party-json-data/m-p/14395#M8889</link>
      <description>&lt;P&gt;It appears the problem is that the json files have keys with spaces in the names, like this:&lt;/P&gt;&lt;P&gt;"CT App Version":"3.5.6.6"&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I've checked and that is supposedly a valid json key, even though it's not standard. Unfortunately, these files are generated by a third-party, so I don't have a lot of control over the content. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It looks like there might be a solution if I use python for the auto-loader, as I think I need to do something like this:&lt;/P&gt;&lt;P&gt;select([col(c).alias(c.replace(" ", "_")) for c in dlt.readStream("vw_raw").columns])&lt;/P&gt;&lt;P&gt;(from &lt;A href="https://community.databricks.com/s/question/0D58Y000092eaqcSAA/ingest-a-csv-file-with-spaces-in-column-names-using-delta-live-into-a-streaming-table?t=1675275633543)" target="test_blank"&gt;https://community.databricks.com/s/question/0D58Y000092eaqcSAA/ingest-a-csv-file-with-spaces-in-column-names-using-delta-live-into-a-streaming-table?t=1675275633543)&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;However, I am a DB guy, not a python guy. Is there something equivalent available for the SQL version of the loader? &lt;/P&gt;</description>
      <pubDate>Fri, 03 Feb 2023 14:20:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/setting-up-my-first-dlt-pipeline-with-3rd-party-json-data/m-p/14395#M8889</guid>
      <dc:creator>thains</dc:creator>
      <dc:date>2023-02-03T14:20:59Z</dc:date>
    </item>
    <item>
      <title>Re: Setting up my first DLT Pipeline with 3rd party JSON data</title>
      <link>https://community.databricks.com/t5/data-engineering/setting-up-my-first-dlt-pipeline-with-3rd-party-json-data/m-p/14396#M8890</link>
      <description>&lt;P&gt;That did not help, sadly. However, I think I've identified the actual issue... See my comment from Feb 3rd.&lt;/P&gt;</description>
      <pubDate>Tue, 07 Feb 2023 22:25:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/setting-up-my-first-dlt-pipeline-with-3rd-party-json-data/m-p/14396#M8890</guid>
      <dc:creator>thains</dc:creator>
      <dc:date>2023-02-07T22:25:55Z</dc:date>
    </item>
    <item>
      <title>Re: Setting up my first DLT Pipeline with 3rd party JSON data</title>
      <link>https://community.databricks.com/t5/data-engineering/setting-up-my-first-dlt-pipeline-with-3rd-party-json-data/m-p/14397#M8891</link>
      <description>&lt;P&gt;I found this other forum thread that looks potentially useful, but I can’t figure out either how to translate it to SQL to handle JSON, nor how to get the pipeline I’m working with to interpret the Python. When I switch to Python, it complains about the line it inserts telling it that the script is python!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.databricks.com/s/question/0D58Y000092eaqcSAA/ingest-a-csv-file-with-spaces-in-column-names-using-delta-live-into-a-streaming-table" target="test_blank"&gt;https://community.databricks.com/s/question/0D58Y000092eaqcSAA/ingest-a-csv-file-with-spaces-in-column-names-using-delta-live-into-a-streaming-table&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Still looking for ideas!&lt;/P&gt;</description>
      <pubDate>Mon, 13 Feb 2023 21:14:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/setting-up-my-first-dlt-pipeline-with-3rd-party-json-data/m-p/14397#M8891</guid>
      <dc:creator>thains</dc:creator>
      <dc:date>2023-02-13T21:14:53Z</dc:date>
    </item>
    <item>
      <title>Re: Setting up my first DLT Pipeline with 3rd party JSON data</title>
      <link>https://community.databricks.com/t5/data-engineering/setting-up-my-first-dlt-pipeline-with-3rd-party-json-data/m-p/14392#M8886</link>
      <description>&lt;P&gt;Hi, Could you please confirm if you have also upgraded the Delta table as mentioned? &lt;/P&gt;</description>
      <pubDate>Mon, 02 Jan 2023 18:54:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/setting-up-my-first-dlt-pipeline-with-3rd-party-json-data/m-p/14392#M8886</guid>
      <dc:creator>Debayan</dc:creator>
      <dc:date>2023-01-02T18:54:43Z</dc:date>
    </item>
  </channel>
</rss>

