<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Move large SQL data into Databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/move-large-sql-data-into-databricks/m-p/128922#M48374</link>
    <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/176067"&gt;@zychoo&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;I would consider a “Near-real-time” solution into Databricks Bronze, something like:&lt;/P&gt;&lt;P&gt;- Log-based CDC tool (Qlik / Debezium / HVR) captures changes from SQL Server.&lt;BR /&gt;- Tool serializes sql_variant to JSON or string+type metadata.&lt;BR /&gt;- Writes to S3/Blob as Delta-friendly format (JSON or Parquet).&lt;BR /&gt;- Databricks Auto Loader streams into Bronze tables.&lt;BR /&gt;- Silver layer casts back values using the type metadata if needed.&lt;/P&gt;&lt;P&gt;This avoids snapshots + fragile CDC merge logic in ADF.&lt;/P&gt;</description>
    <pubDate>Tue, 19 Aug 2025 23:32:22 GMT</pubDate>
    <dc:creator>WiliamRosa</dc:creator>
    <dc:date>2025-08-19T23:32:22Z</dc:date>
    <item>
      <title>Move large SQL data into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/move-large-sql-data-into-databricks/m-p/125960#M47589</link>
      <description>&lt;P&gt;Hello,&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a large on-prem SQL database (~&lt;STRONG&gt;15TB&lt;/STRONG&gt;). It heavily utilizes the &lt;STRONG&gt;sql_variant&lt;/STRONG&gt; datatype. Would like to move it into a Databricks bronze layer, and have it synchronized as close to 'live' as possible.&amp;nbsp;&lt;/P&gt;&lt;P&gt;What could be the solution?&amp;nbsp;&lt;BR /&gt;It seems like a very basic scenario to use Databricks, but somehow couldn't fine any example nor explanation.&lt;/P&gt;&lt;P&gt;I tried two approaches, neither worked:&lt;/P&gt;&lt;P&gt;SQL CDC -&amp;gt; ADF Pipeline -&amp;gt; Blob Storage -&amp;gt; Databricks&lt;BR /&gt;- it seems unnecessary complex, fragile&lt;BR /&gt;- couldn't create a Databricks DLT that would be initiated from table 'snapshot' and kept updated by CDC exports&lt;/P&gt;&lt;P&gt;Lakeflow Connect&lt;BR /&gt;- does not support sql_variant&lt;BR /&gt;- changing SQL schema (to eliminate/replace/convert sql_variant) is not an option due to many reasons (size, performance, downtime)&lt;/P&gt;</description>
      <pubDate>Tue, 22 Jul 2025 08:25:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/move-large-sql-data-into-databricks/m-p/125960#M47589</guid>
      <dc:creator>zychoo</dc:creator>
      <dc:date>2025-07-22T08:25:23Z</dc:date>
    </item>
    <item>
      <title>Re: Move large SQL data into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/move-large-sql-data-into-databricks/m-p/128922#M48374</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/176067"&gt;@zychoo&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;I would consider a “Near-real-time” solution into Databricks Bronze, something like:&lt;/P&gt;&lt;P&gt;- Log-based CDC tool (Qlik / Debezium / HVR) captures changes from SQL Server.&lt;BR /&gt;- Tool serializes sql_variant to JSON or string+type metadata.&lt;BR /&gt;- Writes to S3/Blob as Delta-friendly format (JSON or Parquet).&lt;BR /&gt;- Databricks Auto Loader streams into Bronze tables.&lt;BR /&gt;- Silver layer casts back values using the type metadata if needed.&lt;/P&gt;&lt;P&gt;This avoids snapshots + fragile CDC merge logic in ADF.&lt;/P&gt;</description>
      <pubDate>Tue, 19 Aug 2025 23:32:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/move-large-sql-data-into-databricks/m-p/128922#M48374</guid>
      <dc:creator>WiliamRosa</dc:creator>
      <dc:date>2025-08-19T23:32:22Z</dc:date>
    </item>
  </channel>
</rss>

