<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to migrate the data from Postgres to Databricks? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-migrate-the-data-from-postgres-to-databricks/m-p/103232#M41369</link>
    <description>&lt;P&gt;&lt;STRONG&gt;Hello Community,&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I have a question about migrating data from PostgreSQL to Databricks. My PostgreSQL database receives new data every hour, and I want to synchronize these hourly inserts with the bronze layer in my Databricks catalog.&lt;/P&gt;&lt;P&gt;Currently, I’m using JDBC to schedule a workflow that syncs the data from PostgreSQL to Databricks. However, each hourly batch contains around 10 million records, making this process challenging. Is there a simpler or more efficient solution to achieve this synchronization?&lt;/P&gt;</description>
    <pubDate>Thu, 26 Dec 2024 12:41:26 GMT</pubDate>
    <dc:creator>jeremy98</dc:creator>
    <dc:date>2024-12-26T12:41:26Z</dc:date>
    <item>
      <title>How to migrate the data from Postgres to Databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-migrate-the-data-from-postgres-to-databricks/m-p/103232#M41369</link>
      <description>&lt;P&gt;&lt;STRONG&gt;Hello Community,&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I have a question about migrating data from PostgreSQL to Databricks. My PostgreSQL database receives new data every hour, and I want to synchronize these hourly inserts with the bronze layer in my Databricks catalog.&lt;/P&gt;&lt;P&gt;Currently, I’m using JDBC to schedule a workflow that syncs the data from PostgreSQL to Databricks. However, each hourly batch contains around 10 million records, making this process challenging. Is there a simpler or more efficient solution to achieve this synchronization?&lt;/P&gt;</description>
      <pubDate>Thu, 26 Dec 2024 12:41:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-migrate-the-data-from-postgres-to-databricks/m-p/103232#M41369</guid>
      <dc:creator>jeremy98</dc:creator>
      <dc:date>2024-12-26T12:41:26Z</dc:date>
    </item>
    <item>
      <title>Re: How to migrate the data from Postgres to Databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-migrate-the-data-from-postgres-to-databricks/m-p/103239#M41371</link>
      <description>&lt;P&gt;You can either use the JDBC connection you are using as of now or you can use the Databricks connector designed for PostgreSQL&amp;nbsp;&lt;A href="https://docs.databricks.com/en/connect/external-systems/postgresql.html#query-postgresql-with-databricks" target="_blank"&gt;https://docs.databricks.com/en/connect/external-systems/postgresql.html#query-postgresql-with-databricks&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 26 Dec 2024 14:30:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-migrate-the-data-from-postgres-to-databricks/m-p/103239#M41371</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2024-12-26T14:30:50Z</dc:date>
    </item>
    <item>
      <title>Re: How to migrate the data from Postgres to Databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-migrate-the-data-from-postgres-to-databricks/m-p/103255#M41384</link>
      <description>&lt;P class=""&gt;Hello Walter,&lt;BR /&gt;Thank you for your help - you're amazing. I wanted to explain my current challenge in more detail:&lt;BR /&gt;We have a platform that stores data in PostgreSQL, with a pipeline ingesting millions of rows every hour. We're trying to migrate this data to Databricks, but we're encountering concurrency issues. While the pipeline can write and update the same table in PostgreSQL, we're getting METADATA_CHANGED exceptions in Databricks. Is there a way to sync this data directly? Our goal is to store the data in blob storage rather than PostgreSQL.&lt;/P&gt;&lt;P class=""&gt;Looking forward to your guidance.&lt;/P&gt;</description>
      <pubDate>Thu, 26 Dec 2024 19:25:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-migrate-the-data-from-postgres-to-databricks/m-p/103255#M41384</guid>
      <dc:creator>jeremy98</dc:creator>
      <dc:date>2024-12-26T19:25:40Z</dc:date>
    </item>
  </channel>
</rss>

