Data Engineering

Forum Posts

Sorted by:

by avnish26 • New Contributor III

08-23-2022 4:40:36 AM

12083 Views
5 replies
9 kudos

Spark 3.3.0 connect kafka problem

I am trying to connect to my Kafka from spark but getting an error:Kafka Version: 2.4.1Spark Version: 3.3.0I am using jupyter notebook to execute the pyspark code below:```from pyspark.sql.functions import *from pyspark.sql.types import *#import libr...

Data Engineering

12083 Views
5 replies
9 kudos

08-23-2022 4:40:36 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

01-03-2024 2:23:22 PM

9 kudos

Hi @avnish26, did you added the Jar files to the cluster? do you still have issues? please let us know

9 kudos

01-03-2024 2:23:22 PM

4 More Replies

by Jayanth746 • New Contributor III

11-07-2022 7:04:31 PM

17627 Views
9 replies
4 kudos

Kafka unable to read client.keystore.jks.

Below is the error we have received when trying to read the stream Caused by: kafkashaded.org.apache.kafka.common.KafkaException: Failed to load SSL keystore /dbfs/FileStore/Certs/client.keystore.jksCaused by: java.nio.file.NoSuchFileException: /dbfs...

Data Engineering

17627 Views
9 replies
4 kudos

11-07-2022 7:04:31 PM

View Replies

Latest Reply

mwoods
New Contributor III

09-22-2023 12:13:13 PM

4 kudos

Ok, scrub that - the problem in my case was that I was using the 14.0 databricks runtime, which appears to have a bug relating to abfss paths here. Switching back to the 13.3 LTS release resolved it for me. So if you're in the same boat finding abfss...

4 kudos

09-22-2023 12:13:13 PM

8 More Replies

by Dhana • New Contributor

05-29-2023 8:45:38 AM

753 Views
0 replies
0 kudos

Databricks and Kafka connectivity

I am trying to read data from Kafka, which is installed on my local system. I am using Databricks Community Edition with a cluster version of 12.2. However, I am unable to read data from Kafka. My use case is to read data from Kafka installed on my l...

Data Engineering

753 Views
0 replies
0 kudos

05-29-2023 8:45:38 AM

by AdamRink • New Contributor III

02-23-2022 9:45:08 AM

2100 Views
1 replies
0 kudos

Apply Avro defaults when writing to Confluent Kafka

I have an avro schema for my Kafka topic. In that schema it has defaults. I would like to exclude the defaulted columns from databricks and just let them default as an empty array. Sample avro, trying to not provide the UserFields because I can't...

Data Engineering

2100 Views
1 replies
0 kudos

02-23-2022 9:45:08 AM

View Replies

by kris08 • New Contributor

04-17-2023 6:52:08 AM

2125 Views
1 replies
0 kudos

Kafka consumer groups in Databricks

I was trying to find information about configuring the consumer groups for kafka stream in databricks. By doing so I want to parallelize the stream and load it into databricks tables. Does the databricks handle this internally? If we can configure th...

Data Engineering

2125 Views
1 replies
0 kudos

04-17-2023 6:52:08 AM

View Replies

Latest Reply

Debayan
Databricks Employee

04-17-2023 11:55:14 PM

0 kudos

Hi, we have a few examples on stream processing using Kafka (https://docs.databricks.com/structured-streaming/kafka.html), there is no straight public document for Kafka consumer group creation. You can refer to https://kafka.apache.org/documentation...

0 kudos

04-17-2023 11:55:14 PM

by SS0201 • New Contributor II

02-01-2023 8:43:48 AM

3988 Views
4 replies
0 kudos

Slow updates/upserts in Delta tables

When using Delta tables with DBR jobs or even with DLT pipelines, the upserts (especially updates) (on key and timestamp) are taking quite higher than expected time to update the files/tables data (~2 mins for even 1 record poll) (Inserts are lightni...

Data Engineering

3988 Views
4 replies
0 kudos

02-01-2023 8:43:48 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-08-2023 9:28:02 PM

0 kudos

Hi @Surya Agarwal Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

0 kudos

04-08-2023 9:28:02 PM

3 More Replies

by ramravi • Contributor II

01-03-2023 1:19:32 AM

2446 Views
3 replies
4 kudos

Issue while reading data from Kafka topic to Spark strutured streaming

py4j.security.Py4JSecurityException: Method public org.apache.spark.sql.streaming.DataStreamReader org.apache.spark.sql.SQLContext.readStream() is not whitelisted on class class org.apache.spark.sql.SQLContextI already disable acl for cluster using "...

Data Engineering

2446 Views
3 replies
4 kudos

01-03-2023 1:19:32 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

04-06-2023 11:27:50 AM

4 kudos

Hi @Ravi Teja,Just a friendly follow-up. Do you still need help? if you do, please share more details, like DBR version, standard or High concurrency cluster? etc

4 kudos

04-06-2023 11:27:50 AM

2 More Replies

by dheeraj2444 • New Contributor II

01-11-2023 7:53:04 PM

2792 Views
3 replies
0 kudos

I am trying to write a data frame to Kafka topic with Avro schema for key and value using a schema registry URL. The to_avro function is not writing t...

I am trying to write a data frame to Kafka topic with Avro schema for key and value using a schema registry URL. The to_avro function is not writing to the topic and throwing an exception with code 40403 something. Is there an alternate way to do thi...

Data Engineering

2792 Views
3 replies
0 kudos

01-11-2023 7:53:04 PM

View Replies

Latest Reply

Debayan
Databricks Employee

01-12-2023 2:16:13 PM

0 kudos

Hi,Could you please refer to https://github.com/confluentinc/kafka-connect-elasticsearch/issues/59 and let us know if this helps.

0 kudos

01-12-2023 2:16:13 PM

2 More Replies

by julie • New Contributor III

01-05-2023 10:51:02 AM

4177 Views
5 replies
3 kudos

Resolved! Scope creation in Databricks or Confluent?

Hello I am a newbie in this field and trying to access confluent kafka stream in Databricks Azure based on a beginner's video by Databricks. I have a free trial of Databricks cluster right now. When I run the below notebook, it errors out on line 5 o...

Data Engineering

4177 Views
5 replies
3 kudos

01-05-2023 10:51:02 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-05-2023 10:53:47 AM

3 kudos

For testing, create without secret scope. It will be unsafe, but you can post secrets as strings in the notebook for testing. Here is the code which I used for loading data from confluent:inputDF = (spark .readStream .format("kafka") .option("kafka.b...

3 kudos

01-05-2023 10:53:47 AM

4 More Replies

by qasimhassan • Contributor

01-04-2023 11:55:28 PM

2501 Views
2 replies
4 kudos

Resolved! How to Kafka configured on your PC with Databricks?

I'm working on the case to configure Kafka that is installed on my machine (Laptop) & I want to connect it with my Databricks account hosted on the AWS cloud.Secondly, I have CSV files that I want to use for real-time processing from Kafka to Databri...

Data Engineering

2501 Views
2 replies
4 kudos

01-04-2023 11:55:28 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-05-2023 2:20:06 AM

4 kudos

For CSV, you need just to readStream in the notebook and append output to CSV using forEachBatch method.Your Kafka on PC needs to have the public address or you need to set AWS VPN and connect from your laptop to be in the same VPC as databricks.

4 kudos

01-05-2023 2:20:06 AM

1 More Replies

by architect • New Contributor

12-21-2022 12:45:52 AM

1741 Views
1 replies
0 kudos

Does Databricks provide a mechanism to have rate limiting for receivers?

from pyspark.sql import SparkSession scala_version = '2.12' spark_version = '3.3.0' packages = [ f'org.apache.spark:spark-sql-kafka-0-10_{scala_version}:{spark_version}', 'org.apache.kafka:kafka-clients:3.2.1' ] spark = SparkSession.bui...

Data Engineering

1741 Views
1 replies
0 kudos

12-21-2022 12:45:52 AM

View Replies

Latest Reply

Rajani
Contributor II

12-21-2022 1:48:56 AM

0 kudos

hi @Software Architect i dont think so

0 kudos

12-21-2022 1:48:56 AM

by Ajay-Pandey • Esteemed Contributor III

11-30-2022 2:53:55 AM

1924 Views
2 replies
9 kudos

Kafka integration with Databricks

Hi allI want to integrate Kafka with databricks if anyone can share any doc or code it will help me a lot.Thanks in advance

Data Engineering

1924 Views
2 replies
9 kudos

11-30-2022 2:53:55 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

11-30-2022 4:53:59 AM

9 kudos

This is code that I am using to read from KafkainputDF = (spark .readStream .format("kafka") .option("kafka.bootstrap.servers", host) .option("kafka.ssl.endpoint.identification.algorithm", "https") .option("kafka.sasl.mechanism", "PLAIN") .option("ka...

9 kudos

11-30-2022 4:53:59 AM

1 More Replies

by AdamRink • New Contributor III

11-28-2022 6:03:26 AM

2273 Views
2 replies
6 kudos

How to limit batch size from Confluent Kafka

I have a large stream of data read from Confluent Kafka, 500+ millions of row. When I initialize the stream I cannot control the batch sizes that are read.I've tried setting options on the readstream - maxBytesPerTrigger, maxOffsetsPerTrigger, fetc...

Data Engineering

2273 Views
2 replies
6 kudos

11-28-2022 6:03:26 AM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

11-29-2022 10:46:52 AM

6 kudos

Hi @Adam Rink Just checking for further info on your question. How are you deducing that the batch sizes are more than what you are providing as maxOffsetsPerTrigger ?

6 kudos

11-29-2022 10:46:52 AM

1 More Replies

by Jayanth746 • New Contributor III

11-17-2022 9:36:53 AM

5227 Views
2 replies
2 kudos

Databricks <-> Kafka - SSL handshake failed

I am receiving SSL handshake error even though the trust-store I have created is based on server certificate and the fingerprint in the certificate matches the trust-store fingerprint.kafkashaded.org.apache.kafka.common.errors.SslAuthenticationExcept...

Data Engineering

5227 Views
2 replies
2 kudos

11-17-2022 9:36:53 AM

View Replies

Latest Reply

Debayan
Databricks Employee

11-17-2022 11:18:44 PM

2 kudos

Hi @Jayanth Goulla , worth a try ,https://stackoverflow.com/questions/54903381/kafka-failed-authentication-due-to-ssl-handshake-failedDid you follow: https://docs.microsoft.com/en-us/azure/databricks/spark/latest/structured-streaming/kafka?

2 kudos

11-17-2022 11:18:44 PM

1 More Replies

by JesseLancaster • New Contributor III

11-04-2022 12:54:56 PM

3134 Views
2 replies
5 kudos

Hello, I'm trying to use Databricks on Azure with a Spark structured streaming job and an having very mysterious issue. I boiled the job down it i...

Hello,I'm trying to use Databricks on Azure with a Spark structured streaming job and an having very mysterious issue.I boiled the job down it it's basics for testing, reading from a Kafka topic and writing to console in a forEachBatch.On local, ever...

Data Engineering

3134 Views
2 replies
5 kudos

11-04-2022 12:54:56 PM

View Replies