<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What are the prerequisites for connecting Confluent Kafka with Databricks? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/what-are-the-prerequisites-for-connecting-confluent-kafka-with/m-p/144910#M52411</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/149095"&gt;@shan-databricks&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Connecting Confluent Kafka with Databricks creates a powerful "data in motion" to "data at rest" architecture.&lt;BR /&gt;Below are the prerequisites, connection methods, and strategic recommendations for your deliverable.&lt;/P&gt;&lt;P&gt;1. Prerequisites&lt;BR /&gt;Before establishing a connection, ensure the following are in place:&lt;BR /&gt;Confluent Cloud/Platform Details: * Bootstrap Server: The URL of your Kafka brokers (e.g., pkc-xxxx.us-east-1.aws.confluent.cloud:9092).&lt;BR /&gt;API Keys: A cluster-level API Key and Secret for authentication.&lt;BR /&gt;Schema Registry (Optional): If using Avro/Protobuf, you need the Schema Registry URL and its specific API Key/Secret.&lt;/P&gt;&lt;P&gt;Databricks Workspace: * Network Connectivity: Ensure your Databricks cluster has egress access to Confluent. For production, VNet Injection or Private Link is recommended to avoid routing traffic over the public internet.&lt;BR /&gt;Libraries: Install the spark-sql-kafka connector (typically built into Databricks Runtime) and confluent-kafka (for Python-based schema handling).&lt;BR /&gt;Secrets Management: Store your API Secrets in Databricks Secret Scopes rather than hardcoding them in notebooks.&lt;/P&gt;&lt;P&gt;2. Connection Options&lt;BR /&gt;There are three primary ways to bridge these platforms, each suited for different use cases.&lt;/P&gt;&lt;P&gt;Option A: Spark Structured Streaming (Native Integration)&lt;BR /&gt;This is the most common "Pull" method where Databricks acts as the consumer.&lt;BR /&gt;Pros: * Granular Control: Complete control over transformations (PySpark/SQL) within Databricks.&lt;BR /&gt;Exactly-Once Semantics: Built-in fault tolerance using Spark Checkpointing.&lt;BR /&gt;Unified Batch/Streaming: Use the same code for real-time streams and historical batch processing.&lt;BR /&gt;Cons: * Compute Costs: Requires a running Databricks cluster (Always-on or Job cluster).&lt;BR /&gt;Management: You are responsible for managing the Spark code and scaling logic.&lt;/P&gt;&lt;P&gt;Option B: Confluent Delta Lake Sink Connector&lt;BR /&gt;A "Push" method where Confluent managed-service writes directly to your cloud storage (S3/ADLS).&lt;BR /&gt;Pros: * No-Code Ingestion: Managed by Confluent; no Spark code is required for the initial landing.&lt;BR /&gt;Offloads Compute: Does not consume Databricks cluster resources during ingestion.&lt;BR /&gt;Simplicity: Best for simple "mirroring" of Kafka topics to Delta tables.&lt;BR /&gt;Cons: * Latency: Often involves landing data in object storage first before Databricks picks it up (Metadata discovery overhead).&lt;BR /&gt;Limited Transformation: Only supports basic Single Message Transforms (SMTs).&lt;/P&gt;&lt;P&gt;Option C: Confluent Tableflow (The New "Zero-Copy" Way)&lt;BR /&gt;Confluent’s newest feature that materializes Kafka topics as Delta Lake or Iceberg tables automatically.&lt;BR /&gt;Pros: * Lowest Overhead: Data is stored once but accessible by both platforms.&lt;BR /&gt;Performance: Eliminates the need for custom ETL/Connectors.&lt;BR /&gt;Cons: * Maturity: Newer feature with specific region and cloud provider availability.&lt;/P&gt;&lt;P&gt;My Recommendation: For a robust enterprise deliverable, start with Structured Streaming as it demonstrates the highest technical proficiency with your Databricks/Spark skill set and provides the most flexibility for future business requirements.&lt;/P&gt;</description>
    <pubDate>Thu, 22 Jan 2026 17:10:16 GMT</pubDate>
    <dc:creator>lingareddy_Alva</dc:creator>
    <dc:date>2026-01-22T17:10:16Z</dc:date>
    <item>
      <title>What are the prerequisites for connecting Confluent Kafka with Databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-are-the-prerequisites-for-connecting-confluent-kafka-with/m-p/144740#M52382</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Please provide the prerequisites for connecting Confluent Kafka with Databricks, the different connection options, their respective advantages and disadvantages, and the best option for the deliverable.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Shanmugam&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Jan 2026 12:41:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-are-the-prerequisites-for-connecting-confluent-kafka-with/m-p/144740#M52382</guid>
      <dc:creator>shan-databricks</dc:creator>
      <dc:date>2026-01-21T12:41:29Z</dc:date>
    </item>
    <item>
      <title>Re: What are the prerequisites for connecting Confluent Kafka with Databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-are-the-prerequisites-for-connecting-confluent-kafka-with/m-p/144910#M52411</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/149095"&gt;@shan-databricks&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Connecting Confluent Kafka with Databricks creates a powerful "data in motion" to "data at rest" architecture.&lt;BR /&gt;Below are the prerequisites, connection methods, and strategic recommendations for your deliverable.&lt;/P&gt;&lt;P&gt;1. Prerequisites&lt;BR /&gt;Before establishing a connection, ensure the following are in place:&lt;BR /&gt;Confluent Cloud/Platform Details: * Bootstrap Server: The URL of your Kafka brokers (e.g., pkc-xxxx.us-east-1.aws.confluent.cloud:9092).&lt;BR /&gt;API Keys: A cluster-level API Key and Secret for authentication.&lt;BR /&gt;Schema Registry (Optional): If using Avro/Protobuf, you need the Schema Registry URL and its specific API Key/Secret.&lt;/P&gt;&lt;P&gt;Databricks Workspace: * Network Connectivity: Ensure your Databricks cluster has egress access to Confluent. For production, VNet Injection or Private Link is recommended to avoid routing traffic over the public internet.&lt;BR /&gt;Libraries: Install the spark-sql-kafka connector (typically built into Databricks Runtime) and confluent-kafka (for Python-based schema handling).&lt;BR /&gt;Secrets Management: Store your API Secrets in Databricks Secret Scopes rather than hardcoding them in notebooks.&lt;/P&gt;&lt;P&gt;2. Connection Options&lt;BR /&gt;There are three primary ways to bridge these platforms, each suited for different use cases.&lt;/P&gt;&lt;P&gt;Option A: Spark Structured Streaming (Native Integration)&lt;BR /&gt;This is the most common "Pull" method where Databricks acts as the consumer.&lt;BR /&gt;Pros: * Granular Control: Complete control over transformations (PySpark/SQL) within Databricks.&lt;BR /&gt;Exactly-Once Semantics: Built-in fault tolerance using Spark Checkpointing.&lt;BR /&gt;Unified Batch/Streaming: Use the same code for real-time streams and historical batch processing.&lt;BR /&gt;Cons: * Compute Costs: Requires a running Databricks cluster (Always-on or Job cluster).&lt;BR /&gt;Management: You are responsible for managing the Spark code and scaling logic.&lt;/P&gt;&lt;P&gt;Option B: Confluent Delta Lake Sink Connector&lt;BR /&gt;A "Push" method where Confluent managed-service writes directly to your cloud storage (S3/ADLS).&lt;BR /&gt;Pros: * No-Code Ingestion: Managed by Confluent; no Spark code is required for the initial landing.&lt;BR /&gt;Offloads Compute: Does not consume Databricks cluster resources during ingestion.&lt;BR /&gt;Simplicity: Best for simple "mirroring" of Kafka topics to Delta tables.&lt;BR /&gt;Cons: * Latency: Often involves landing data in object storage first before Databricks picks it up (Metadata discovery overhead).&lt;BR /&gt;Limited Transformation: Only supports basic Single Message Transforms (SMTs).&lt;/P&gt;&lt;P&gt;Option C: Confluent Tableflow (The New "Zero-Copy" Way)&lt;BR /&gt;Confluent’s newest feature that materializes Kafka topics as Delta Lake or Iceberg tables automatically.&lt;BR /&gt;Pros: * Lowest Overhead: Data is stored once but accessible by both platforms.&lt;BR /&gt;Performance: Eliminates the need for custom ETL/Connectors.&lt;BR /&gt;Cons: * Maturity: Newer feature with specific region and cloud provider availability.&lt;/P&gt;&lt;P&gt;My Recommendation: For a robust enterprise deliverable, start with Structured Streaming as it demonstrates the highest technical proficiency with your Databricks/Spark skill set and provides the most flexibility for future business requirements.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Jan 2026 17:10:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-are-the-prerequisites-for-connecting-confluent-kafka-with/m-p/144910#M52411</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2026-01-22T17:10:16Z</dc:date>
    </item>
    <item>
      <title>Re: What are the prerequisites for connecting Confluent Kafka with Databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-are-the-prerequisites-for-connecting-confluent-kafka-with/m-p/145333#M52484</link>
      <description>&lt;DIV&gt;Thank you for your response. I will try the integration and options and will reach out if I need further assistance.&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Shanmugam&lt;/DIV&gt;</description>
      <pubDate>Tue, 27 Jan 2026 04:28:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-are-the-prerequisites-for-connecting-confluent-kafka-with/m-p/145333#M52484</guid>
      <dc:creator>shan-databricks</dc:creator>
      <dc:date>2026-01-27T04:28:35Z</dc:date>
    </item>
  </channel>
</rss>

