<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Synchronize SQLServer tables to Databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/synchronize-sqlserver-tables-to-databricks/m-p/103296#M41396</link>
    <description>&lt;P&gt;First off, I'd like to thank the community here for all their detailed timely responses, even during the holidays. You have a great group here.&amp;nbsp; I will look into these avenues and may post some follow-up questions.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 27 Dec 2024 12:40:47 GMT</pubDate>
    <dc:creator>peritus</dc:creator>
    <dc:date>2024-12-27T12:40:47Z</dc:date>
    <item>
      <title>Synchronize SQLServer tables to Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/synchronize-sqlserver-tables-to-databricks/m-p/103185#M41356</link>
      <description>&lt;P&gt;I'm new to Databricks and, I'm looking to get data from an external database into Databricks and keep it synchronized when changes occur in the source tables. It seems like I may be able to some form of change data capture and the delta live tables. I've looked at some documentation and some online tutorials related to CDC and Databricks but haven't found anything that really shows an example of setting it all up, so I guess I'm looking for someone to point me in the right direction here. My question is basically how can I configured an external database for synchronizing and maintaining a replicate of that data within Databricks.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Dec 2024 20:20:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/synchronize-sqlserver-tables-to-databricks/m-p/103185#M41356</guid>
      <dc:creator>peritus</dc:creator>
      <dc:date>2024-12-25T20:20:37Z</dc:date>
    </item>
    <item>
      <title>Re: Synchronize SQLServer tables to Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/synchronize-sqlserver-tables-to-databricks/m-p/103186#M41357</link>
      <description>&lt;P&gt;You can refer to&amp;nbsp;&lt;A href="https://docs.databricks.com/en/connect/index.html" target="_blank"&gt;https://docs.databricks.com/en/connect/index.html&lt;/A&gt;&amp;nbsp;and specifically to a connection by using JDBC:&amp;nbsp;&lt;A href="https://docs.databricks.com/en/connect/external-systems/jdbc.html" target="_blank"&gt;https://docs.databricks.com/en/connect/external-systems/jdbc.html&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Dec 2024 20:24:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/synchronize-sqlserver-tables-to-databricks/m-p/103186#M41357</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2024-12-25T20:24:15Z</dc:date>
    </item>
    <item>
      <title>Re: Synchronize SQLServer tables to Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/synchronize-sqlserver-tables-to-databricks/m-p/103188#M41359</link>
      <description>&lt;H3&gt;Step 1: &lt;STRONG&gt;Set Up a JDBC Connection&lt;/STRONG&gt;&lt;/H3&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Install the JDBC Driver&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Ensure you have the JDBC driver for the external database (e.g., MySQL, Postgres, SQL Server).&lt;/LI&gt;&lt;LI&gt;Place the driver in the Databricks cluster using the %pip install command or by uploading the driver file.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Create a JDBC Connection String&lt;/STRONG&gt;&lt;BR /&gt;Use the syntax for the database. For example:&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;php&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;Copiar código&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;jdbc:mysql://&lt;SPAN class=""&gt;&amp;lt;&lt;SPAN class=""&gt;hostname&lt;/SPAN&gt;&amp;gt;&lt;/SPAN&gt;:&lt;SPAN class=""&gt;&amp;lt;&lt;SPAN class=""&gt;port&lt;/SPAN&gt;&amp;gt;&lt;/SPAN&gt;/&lt;SPAN class=""&gt;&amp;lt;&lt;SPAN class=""&gt;database&lt;/SPAN&gt;&amp;gt;&lt;/SPAN&gt;?user=&lt;SPAN class=""&gt;&amp;lt;&lt;SPAN class=""&gt;username&lt;/SPAN&gt;&amp;gt;&lt;/SPAN&gt;&amp;amp;password=&lt;SPAN class=""&gt;&amp;lt;&lt;SPAN class=""&gt;password&lt;/SPAN&gt;&amp;gt;&lt;/SPAN&gt; &lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;Replace &amp;lt;hostname&amp;gt;, &amp;lt;port&amp;gt;, &amp;lt;database&amp;gt;, &amp;lt;username&amp;gt;, and &amp;lt;password&amp;gt; with the appropriate values.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Read Data from the External Database&lt;/STRONG&gt;&lt;BR /&gt;Use the following Databricks command to read data:&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;python&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;Copiar código&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;jdbc_url = &lt;SPAN class=""&gt;"jdbc:mysql://&amp;lt;hostname&amp;gt;:&amp;lt;port&amp;gt;/&amp;lt;database&amp;gt;"&lt;/SPAN&gt; properties = { &lt;SPAN class=""&gt;"user"&lt;/SPAN&gt;: &lt;SPAN class=""&gt;"&amp;lt;username&amp;gt;"&lt;/SPAN&gt;, &lt;SPAN class=""&gt;"password"&lt;/SPAN&gt;: &lt;SPAN class=""&gt;"&amp;lt;password&amp;gt;"&lt;/SPAN&gt;, &lt;SPAN class=""&gt;"driver"&lt;/SPAN&gt;: &lt;SPAN class=""&gt;"com.mysql.cj.jdbc.Driver"&lt;/SPAN&gt; } df = spark.read.jdbc(url=jdbc_url, table=&lt;SPAN class=""&gt;"source_table"&lt;/SPAN&gt;, properties=properties) df.display()&lt;/DIV&gt;&lt;/DIV&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;HR /&gt;&lt;H3&gt;Step 2: &lt;STRONG&gt;Enable Change Data Capture (CDC)&lt;/STRONG&gt;&lt;/H3&gt;&lt;P&gt;Most databases have built-in CDC capabilities. Here are some examples:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;MySQL&lt;/STRONG&gt;: Use binlog with tools like Debezium.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;SQL Server&lt;/STRONG&gt;: Enable CDC for the tables with:&lt;DIV class=""&gt;&lt;DIV class=""&gt;sql&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;Copiar código&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;EXEC&lt;/SPAN&gt; sys.sp_cdc_enable_table &lt;SPAN class=""&gt;@source_schema&lt;/SPAN&gt; &lt;SPAN class=""&gt;=&lt;/SPAN&gt; N&lt;SPAN class=""&gt;'schema_name'&lt;/SPAN&gt;, &lt;SPAN class=""&gt;@source_name&lt;/SPAN&gt; &lt;SPAN class=""&gt;=&lt;/SPAN&gt; N&lt;SPAN class=""&gt;'table_name'&lt;/SPAN&gt;, &lt;SPAN class=""&gt;@role_name&lt;/SPAN&gt; &lt;SPAN class=""&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;NULL&lt;/SPAN&gt;;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Postgres&lt;/STRONG&gt;: Use logical replication slots or tools like Debezium.&lt;/LI&gt;&lt;/UL&gt;&lt;HR /&gt;&lt;H3&gt;Step 3: &lt;STRONG&gt;Stream Data into Databricks&lt;/STRONG&gt;&lt;/H3&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Set Up a Streaming Framework&lt;/STRONG&gt;&lt;BR /&gt;Use a tool like Apache Kafka, AWS DMS, or a Databricks partner integration to stream data into Databricks.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Configure Delta Live Tables (DLT)&lt;/STRONG&gt;&lt;BR /&gt;Create a Delta Live Table pipeline for CDC using the Databricks SQL syntax.&lt;/P&gt;&lt;P&gt;Example:&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;python&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;Copiar código&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;CREATE OR REFRESH STREAMING LIVE TABLE cdc_table AS SELECT * FROM ( SELECT *, _change_type FROM cloud_files( &lt;SPAN class=""&gt;"/path/to/cdc/files"&lt;/SPAN&gt;, &lt;SPAN class=""&gt;"json"&lt;/SPAN&gt;, &lt;SPAN class=""&gt;map&lt;/SPAN&gt;(&lt;SPAN class=""&gt;"mergeSchema"&lt;/SPAN&gt;, &lt;SPAN class=""&gt;"true"&lt;/SPAN&gt;) ) ) WHERE _change_type IN (&lt;SPAN class=""&gt;'insert'&lt;/SPAN&gt;, &lt;SPAN class=""&gt;'update'&lt;/SPAN&gt;, &lt;SPAN class=""&gt;'delete'&lt;/SPAN&gt;);&lt;/DIV&gt;&lt;/DIV&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Merge Updates into Delta Table&lt;/STRONG&gt;&lt;BR /&gt;Use the MERGE INTO command to handle inserts, updates, and deletions:&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;sql&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;Copiar código&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;MERGE&lt;/SPAN&gt; &lt;SPAN class=""&gt;INTO&lt;/SPAN&gt; target_table t &lt;SPAN class=""&gt;USING&lt;/SPAN&gt; cdc_table c &lt;SPAN class=""&gt;ON&lt;/SPAN&gt; t.id &lt;SPAN class=""&gt;=&lt;/SPAN&gt; c.id &lt;SPAN class=""&gt;WHEN&lt;/SPAN&gt; MATCHED &lt;SPAN class=""&gt;AND&lt;/SPAN&gt; c._change_type &lt;SPAN class=""&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;'update'&lt;/SPAN&gt; &lt;SPAN class=""&gt;THEN&lt;/SPAN&gt; &lt;SPAN class=""&gt;UPDATE&lt;/SPAN&gt; &lt;SPAN class=""&gt;SET&lt;/SPAN&gt; &lt;SPAN class=""&gt;*&lt;/SPAN&gt; &lt;SPAN class=""&gt;WHEN&lt;/SPAN&gt; MATCHED &lt;SPAN class=""&gt;AND&lt;/SPAN&gt; c._change_type &lt;SPAN class=""&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;'delete'&lt;/SPAN&gt; &lt;SPAN class=""&gt;THEN&lt;/SPAN&gt; &lt;SPAN class=""&gt;DELETE&lt;/SPAN&gt; &lt;SPAN class=""&gt;WHEN&lt;/SPAN&gt; &lt;SPAN class=""&gt;NOT&lt;/SPAN&gt; MATCHED &lt;SPAN class=""&gt;THEN&lt;/SPAN&gt; &lt;SPAN class=""&gt;INSERT&lt;/SPAN&gt; &lt;SPAN class=""&gt;*&lt;/SPAN&gt;;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;HR /&gt;&lt;H3&gt;Step 4: &lt;STRONG&gt;Schedule Synchronization&lt;/STRONG&gt;&lt;/H3&gt;&lt;UL&gt;&lt;LI&gt;Use Databricks Workflows to schedule your DLT pipelines to run periodically and keep the target data synchronized.&lt;/LI&gt;&lt;/UL&gt;&lt;HR /&gt;&lt;H3&gt;Step 5: &lt;STRONG&gt;Monitor and Optimize&lt;/STRONG&gt;&lt;/H3&gt;&lt;UL&gt;&lt;LI&gt;Monitor the pipeline using the Databricks Jobs and DLT dashboard for errors and performance issues.&lt;/LI&gt;&lt;LI&gt;Optimize the process by using partitioning, Z-order indexing, and caching.&lt;/LI&gt;&lt;/UL&gt;&lt;HR /&gt;&lt;H3&gt;Helpful Resources&lt;/H3&gt;&lt;UL&gt;&lt;LI&gt;&lt;A href="https://docs.databricks.com/en/connect/external-systems/jdbc.html" target="_new" rel="noopener"&gt;&lt;SPAN&gt;Databricks&lt;/SPAN&gt;&lt;SPAN&gt; JDBC&lt;/SPAN&gt;&lt;SPAN&gt; Documentation&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;&lt;A target="_new" rel="noopener"&gt;&lt;SPAN&gt;Delta&lt;/SPAN&gt;&lt;SPAN&gt; Live&lt;/SPAN&gt;&lt;SPAN&gt; Tables&lt;/SPAN&gt;&lt;SPAN&gt; Documentation&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;&lt;A target="_new" rel="noopener"&gt;&lt;SPAN&gt;Change&lt;/SPAN&gt;&lt;SPAN&gt; Data&lt;/SPAN&gt;&lt;SPAN&gt; Capture&lt;/SPAN&gt;&lt;SPAN&gt; (CDC&lt;/SPAN&gt;&lt;SPAN&gt;) on&lt;/SPAN&gt;&lt;SPAN&gt; Databricks&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;This approach allows you to maintain a replica of your external database in Databricks with minimal latency and high accuracy.&lt;/P&gt;</description>
      <pubDate>Thu, 26 Dec 2024 02:32:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/synchronize-sqlserver-tables-to-databricks/m-p/103188#M41359</guid>
      <dc:creator>fmadeiro</dc:creator>
      <dc:date>2024-12-26T02:32:46Z</dc:date>
    </item>
    <item>
      <title>Re: Synchronize SQLServer tables to Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/synchronize-sqlserver-tables-to-databricks/m-p/103223#M41363</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/137755"&gt;@peritus&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;I would suggest using &lt;A href="https://docs.databricks.com/en/query-federation/index.html" target="_self"&gt;Lakehouse Federation&lt;/A&gt;&amp;nbsp;and create a DLT pipeline to read tables as &lt;A href="https://docs.databricks.com/en/views/materialized.html" target="_self"&gt;Materialised Views&lt;/A&gt;. If you trigger a refresh of that pipeline at a scheduled interval, you should have the SQL server data replicated in Databricks.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 26 Dec 2024 09:37:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/synchronize-sqlserver-tables-to-databricks/m-p/103223#M41363</guid>
      <dc:creator>RiyazAliM</dc:creator>
      <dc:date>2024-12-26T09:37:29Z</dc:date>
    </item>
    <item>
      <title>Re: Synchronize SQLServer tables to Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/synchronize-sqlserver-tables-to-databricks/m-p/103296#M41396</link>
      <description>&lt;P&gt;First off, I'd like to thank the community here for all their detailed timely responses, even during the holidays. You have a great group here.&amp;nbsp; I will look into these avenues and may post some follow-up questions.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Dec 2024 12:40:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/synchronize-sqlserver-tables-to-databricks/m-p/103296#M41396</guid>
      <dc:creator>peritus</dc:creator>
      <dc:date>2024-12-27T12:40:47Z</dc:date>
    </item>
    <item>
      <title>Re: Synchronize SQLServer tables to Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/synchronize-sqlserver-tables-to-databricks/m-p/103301#M41398</link>
      <description>&lt;P&gt;To synchronize data from an external database into Databricks with change data capture (CDC), you can use &lt;STRONG&gt;Delta Live Tables (DLT)&lt;/STRONG&gt;. Start by configuring a JDBC connection to your source database and use a CDC tool (like Debezium or database-native CDC mechanisms) to capture changes. In Databricks, create a DLT pipeline that reads the captured changes (e.g., from a Kafka topic or directly from change tables) and writes them to a Delta table. This ensures your Databricks Delta table stays in sync with the source. Check Databricks' documentation on DLT and CDC for setup steps and examples.&amp;nbsp;&lt;A href="https://theyoutubepremiumapk.com/" target="_self"&gt;YouTube APK&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Dec 2024 13:48:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/synchronize-sqlserver-tables-to-databricks/m-p/103301#M41398</guid>
      <dc:creator>john533</dc:creator>
      <dc:date>2024-12-27T13:48:06Z</dc:date>
    </item>
  </channel>
</rss>

