Data Engineering

Forum Posts

Sorted by:

by lnights • New Contributor II

02-08-2023 2:12:28 PM

5318 Views
5 replies
2 kudos

High cost of storage when using structured streaming

Hi there, I read data from Azure Event Hub and after manipulating with data I write the dataframe back to Event Hub (I use this connector for that): #read data df = (spark.readStream .format("eventhubs") .options(**ehConf) ...

Data Engineering

5318 Views
5 replies
2 kudos

02-08-2023 2:12:28 PM

View Replies

Latest Reply

PetePP
New Contributor II

08-31-2023 7:02:55 AM

2 kudos

I had the same problem when starting with databricks. As outlined above, it is the shuffle partitions setting that results in number of files equal to number of partitions. Thus, you are writing low data volume but get taxed on the amount of write (a...

2 kudos

08-31-2023 7:02:55 AM

4 More Replies

by blackcoffeeAR • Contributor

02-02-2023 1:02:53 AM

13327 Views
10 replies
5 kudos

How to use/access in a python notebook a scala library installed from JAR file?

I'm using Azure Event Hubs Connector https://github.com/Azure/azure-event-hubs-spark to connect an Even Hub.When I install this library from Maven , then everything works, I can access lib classes using JVM:connection_string = "<connection_string>" s...

Data Engineering

13327 Views
10 replies
5 kudos

02-02-2023 1:02:53 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-08-2023 7:12:54 PM

5 kudos

Hi @blackcoffee AR Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

5 kudos

04-08-2023 7:12:54 PM

9 More Replies

by guru1 • New Contributor II

12-01-2022 12:58:27 AM

4271 Views
2 replies
0 kudos

Resolved! facing issue mentioned in body when connecting event hub with databricks , followed earlier discussion on this but no solution

ERROR: Query termination received for [id=37bada03-131b-4fbb-8992-a427263fef2c, runId=cf3d7c18-780e-43ae-aed0-9daf2939b823], with exception: java.lang.IllegalArgumentException: Input byte array has wrong 4-byte ending unit at java.util.Base64$Decoder...

Data Engineering

4271 Views
2 replies
0 kudos

12-01-2022 12:58:27 AM

View Replies

Latest Reply

Annapurna_Hiriy
Databricks Employee

01-30-2023 10:19:47 AM

0 kudos

The issue could be due to the mismatch in the eventHub jar and the dependencies added. Also, not all the required dependencies may be added.Suggestions:Using the azure_eventhubs_spark_2_12_.jar eventHub spark jar along with the following dependencies...

0 kudos

01-30-2023 10:19:47 AM

1 More Replies

by Gilg • Contributor II

12-13-2022 9:07:10 PM

5682 Views
4 replies
5 kudos

Avro Deserialization from Event Hub capture and Autoloader

Hi All,I am getting data from Event Hub capture in Avro format and using Auto Loader to process it.I get into the point where I can read the Avro by casting the Body into a string.Now I wanted to deserialized the Body column so it will in table forma...

Data Engineering

5682 Views
4 replies
5 kudos

12-13-2022 9:07:10 PM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

12-13-2022 9:43:46 PM

5 kudos

If you still want to go with the above approach and don't want to provide schema manually, then you can fetch a tiny batch with 1 record and build the schema into a variable using a .schema option. Once done, you can add a new Body column by providin...

5 kudos

12-13-2022 9:43:46 PM

3 More Replies

by VN11111 • New Contributor III

03-23-2022 2:31:29 AM

10075 Views
5 replies
6 kudos

Resolved! ERROR: Some streams terminated before this command could finish!

I have a databricks notebook which is to read stream from Azure Event Hub.My code does the following:1.Configure path for Eventhubs2.Read Streamdf_read_stream = (spark.readStream .format("eventhubs") .options(**conf)...

Data Engineering

10075 Views
5 replies
6 kudos

03-23-2022 2:31:29 AM

View Replies

Latest Reply

guru1
New Contributor II

11-22-2022 11:56:16 PM

6 kudos

I am also facing same issue , using Cluster11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12) liberary : com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.21Please help me for sameconf = {}conf["eventhubs.connectionString"] = "Endpoint=sb://xxxx.ser...

6 kudos

11-22-2022 11:56:16 PM

4 More Replies

by Rahul_Tiwary • New Contributor II

11-04-2022 4:20:41 AM

6651 Views
1 replies
4 kudos

Getting Error "java.lang.NoSuchMethodError: org.apache.spark.sql.AnalysisException" while writing data to event hub for streaming. It is working fine if I am writing it to another data brick table

import org.apache.spark.sql._import scala.collection.JavaConverters._import com.microsoft.azure.eventhubs._import java.util.concurrent._import scala.collection.immutable._import org.apache.spark.eventhubs._import scala.concurrent.Futureimport scala.c...

Data Engineering

6651 Views
1 replies
4 kudos

11-04-2022 4:20:41 AM

View Replies

Latest Reply

Gepap
New Contributor II

11-17-2022 8:47:24 AM

4 kudos

The dataframe to write needs to have the following schema:Column | Type ---------------------------------------------- body (required) | string or binary partitionId (*optional) | string partitionKey...

4 kudos

11-17-2022 8:47:24 AM

by databricksuser2 • New Contributor II

08-27-2022 6:31:14 AM

1426 Views
1 replies
2 kudos

Structured streaming job sees throughput being capped after running normally for a few days

The job (written in PySpark) uses azure eventhub as source and use Databricks delta table as sink. The job is hosted in Azure Databricks.Transformation part is simple, the message body is converted from bytes to json string, the json string is then a...

Data Engineering

1426 Views
1 replies
2 kudos

08-27-2022 6:31:14 AM

View Replies

Latest Reply

Noopur_Nigam
Databricks Employee

09-29-2022 10:29:27 PM

2 kudos

Hi @Databricks User10293847 You can try using auto-inflate and let the TU increase automatically. The feature then scales automatically to the maximum limit of TUs you need, depending on the increase in your traffic. You can check the below doc: htt...

2 kudos

09-29-2022 10:29:27 PM

by Aran_Oribu • New Contributor II

09-08-2022 3:43:52 AM

4827 Views
5 replies
2 kudos

Resolved! Create and update a csv/json file in ADLSG2 with Eventhub in Databricks streaming

Hello ,This is my first post here and I am a total beginner with DataBricks and spark.Working on an IoT Cloud project with azure , I'm looking to set up a continuous stream processing of data.A current architecture already exists thanks to Stream Ana...

Data Engineering

4827 Views
5 replies
2 kudos

09-08-2022 3:43:52 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

09-08-2022 3:48:23 AM

2 kudos

So the event hub creates files (json/csv) on adls.You can read those files into databricks with the spark.read.csv/json method. If you want to read many files in one go, you can use wildcards.f.e. spark.read.json("/mnt/datalake/bronze/directory/*/*...

2 kudos

09-08-2022 3:48:23 AM

4 More Replies

by RengarLee • Contributor

04-28-2022 9:13:56 PM

4302 Views
5 replies
0 kudos

Resolved! How to improve Spark Streaming writer Input Rate and Processing rate?

Hi!I have many questions about Spark Streaming and Evnethub。Can you help me?Q1:How to improve Spark Streaming writer Input Rate and Processing rate?I connect Azure Eventhubs using Spark Streaming(Azure Databricks), but I found if I use display, this ...

Data Engineering

4302 Views
5 replies
0 kudos

04-28-2022 9:13:56 PM

View Replies

Latest Reply

RengarLee
Contributor

05-04-2022 12:51:44 AM

0 kudos

setMaxEventsPerTrigger not equal to numInputRow is my problem

0 kudos

05-04-2022 12:51:44 AM

4 More Replies

by Jreco • Contributor

02-01-2022 3:01:03 AM

5651 Views
6 replies
4 kudos

Resolved! messages from event hub does not flow after a time

Hi Team,I'm trying to build a Real-time solution using Databricks and Event hubs.Something weird happens after a time that the process start.At the begining the messages flow through the process as expected with this rate: please, note that the last ...

Data Engineering

5651 Views
6 replies
4 kudos

02-01-2022 3:01:03 AM

View Replies

Latest Reply

Jreco
Contributor

02-01-2022 7:18:38 AM

4 kudos

Thanks for your answer @Hubert Dudek , Is already specifiedWhat do youn mean with this? This is the weird part of this, bucause the data is flowing good, but at any time is like the Job stop the reading or somethign like that and if I restart the ...

4 kudos

02-01-2022 7:18:38 AM

5 More Replies

by Jreco • Contributor

10-21-2021 2:50:15 PM

13760 Views
13 replies
3 kudos

Event hub streaming improve processing rate

Hi all,I'm working with event hubs and data bricks to process and enrich data in real-time.Doing a "simple" test, I'm getting some weird values (input rate vs processing rate) and I think I'm losing data:If you can see, there is a peak with 5k record...

Data Engineering

13760 Views
13 replies
3 kudos

10-21-2021 2:50:15 PM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

10-22-2021 3:24:24 PM

3 kudos

hi @Jhonatan Reyes ,How many Event hubs partitions are you readying from? your micro-batch takes a few milliseconds to complete, which I think is good time, but I would like to undertand better what are you trying to improve here.Also, in this case ...

3 kudos

10-22-2021 3:24:24 PM

12 More Replies

by User16868770416 • Contributor

10-11-2021 3:56:47 PM

4592 Views
1 replies
0 kudos

What is the best way to decode protobuf using pyspark?

I am using spark structured streaming to read a protobuf encoded message from the event hub. We use a lot of Delta tables, but there isn't a simple way to integrate this. We are currently using K-SQL to transform into avro on the fly and then use Dat...

Data Engineering

4592 Views
1 replies
0 kudos

10-11-2021 3:56:47 PM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

10-11-2021 4:38:23 PM

0 kudos

hi @Will Block ,I think there is a related question being asked in the past. I think it was this one I found this library, I hope it helps.

0 kudos

10-11-2021 4:38:23 PM