Data Engineering

Forum Posts

Sorted by:

by explorer • New Contributor III

02-09-2023 5:47:38 AM

7280 Views
4 replies
1 kudos

Resolved! Deleting records manually in databricks streaming table.

Hi Team , Let me know if there is any ways I can delete records manually from databricks streaming table without corrupting table and data.Can we delete the few records (based on some condition) manually in databricks streaming table (having checkpoi...

Data Engineering

7280 Views
4 replies
1 kudos

02-09-2023 5:47:38 AM

View Replies

Latest Reply

SparkJun
Databricks Employee

08-16-2024 1:55:27 PM

1 kudos

If you use the applyChanges method in DLT for Change Data Capture (CDC), you can delete records manually without affecting the consistency of the table, as applyChanges respects manual deletions. You must configure your DLT pipeline to respect manu...

1 kudos

08-16-2024 1:55:27 PM

3 More Replies

by sparkstreaming • New Contributor III

12-22-2021 6:53:34 AM

10262 Views
5 replies
4 kudos

Resolved! Missing rows while processing records using foreachbatch in spark structured streaming from Azure Event Hub

I am new to real time scenarios and I need to create a spark structured streaming jobs in databricks. I am trying to apply some rule based validations from backend configurations on each incoming JSON message. I need to do the following actions on th...

Data Engineering

10262 Views
5 replies
4 kudos

12-22-2021 6:53:34 AM

View Replies

Latest Reply

Rishi045
New Contributor III

08-03-2023 3:33:51 AM

4 kudos

Were you able to achieve any solutions if yes please can you help with it.

4 kudos

08-03-2023 3:33:51 AM

4 More Replies

by AzureDatabricks • New Contributor III

11-21-2021 11:18:10 PM

12771 Views
7 replies
2 kudos

Resolved! Can we store 300 million records and what is the preferable compute type and config?

How we can persist 300 million records? What is the best option to persist data databricks hive metastore/Azure storage/Delta table?What is the limitations we have for deltatables of databricks in terms of data?We have usecase where testers should be...

Data Engineering

12771 Views
7 replies
2 kudos

11-21-2021 11:18:10 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

11-21-2021 11:26:42 PM

2 kudos

You can certainly store 300 million records without any problem.The best option kinda depends on the use case. If you want to do a lot of online querying on the table, I suggest using delta lake, which is optimeized (using z-order, bloom filter, par...

2 kudos

11-21-2021 11:26:42 PM

6 More Replies

by AzureDatabricks • New Contributor III

11-21-2021 11:25:29 PM

7756 Views
8 replies
4 kudos

Resolved! Need to see all the records in DeltaTable. Exception - java.lang.OutOfMemoryError: GC overhead limit exceeded

Truncate False not working in Delta table. df_delta.show(df_delta.count(),False)Computer size Single Node - Standard_F4S - 8GB Memory, 4 coresHow much max data we can persist in Delta table in Parquet file and How fast we can retrieve data.

Data Engineering

7756 Views
8 replies
4 kudos

11-21-2021 11:25:29 PM

View Replies

Latest Reply

AzureDatabricks
New Contributor III

11-22-2021 7:47:01 PM

4 kudos

thank you !!!

4 kudos

11-22-2021 7:47:01 PM

7 More Replies

by Jreco • Contributor

10-21-2021 2:50:15 PM

19792 Views
13 replies
3 kudos

Event hub streaming improve processing rate

Hi all,I'm working with event hubs and data bricks to process and enrich data in real-time.Doing a "simple" test, I'm getting some weird values (input rate vs processing rate) and I think I'm losing data:If you can see, there is a peak with 5k record...

Data Engineering

19792 Views
13 replies
3 kudos

10-21-2021 2:50:15 PM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

10-22-2021 3:24:24 PM

3 kudos

hi @Jhonatan Reyes ,How many Event hubs partitions are you readying from? your micro-batch takes a few milliseconds to complete, which I think is good time, but I would like to undertand better what are you trying to improve here.Also, in this case ...

3 kudos

10-22-2021 3:24:24 PM

12 More Replies

Databricks Community

Resolved! Deleting records manually in databricks streaming table.

Resolved! Missing rows while processing records using foreachbatch in spark structured streaming from Azure Event Hub

Resolved! Can we store 300 million records and what is the preferable compute type and config?

Resolved! Need to see all the records in DeltaTable. Exception - java.lang.OutOfMemoryError: GC overhead limit exceeded

Event hub streaming improve processing rate