Data Engineering

Forum Posts

Sorted by:

by APol • New Contributor II

09-08-2022 8:16:22 AM

3226 Views
2 replies
2 kudos

Read/Write concurrency issue

Hi. I assume that it can be concurrency issue. (a Read thread from Databricks and a Write thread from another system)From the start:I read 12-16 csv files (approximately 250Mb each of them) to dataframe. df = spark.read.option("header", "False").opti...

Data Engineering

3226 Views
2 replies
2 kudos

09-08-2022 8:16:22 AM

View Replies

Latest Reply

FerArribas
Contributor

01-02-2023 2:02:49 PM

2 kudos

Hi @Anastasiia Polianska,I agree, it looks like a concurrency issue. Very possibly this concurrency problem will be caused by an erroneous ETAG in the HTTP call to the Azure Storage API (https://azure.microsoft.com/de-de/blog/managing-concurrency-in...

2 kudos

01-02-2023 2:02:49 PM

1 More Replies

by Saikrishna2 • New Contributor III

11-22-2022 8:03:53 AM

6714 Views
7 replies
11 kudos

Data bricks SQL is allowing 10 queries only ?

•Power BI is a publisher that uses AD group authentication to publish result sets. Since the publisher's credentials are maintained, the same user can access the data bricks database.•Number of the users are retrieving the data from the power bi or i...

Data Engineering

6714 Views
7 replies
11 kudos

11-22-2022 8:03:53 AM

View Replies

Latest Reply

VaibB
Contributor

12-02-2022 12:26:24 PM

11 kudos

I believe 10 is a limit as of now. See if you can increase the concurrency limit from the source.

11 kudos

12-02-2022 12:26:24 PM

6 More Replies

by Phani1 • Valued Contributor II

09-28-2022 9:45:35 AM

1973 Views
2 replies
5 kudos

Delta table Concurrent Updates for Non-partitioned tables

When we implemented the concurrent updates on a table which do not have a partition column we ran into ConcurrentAppendException [ensured where the condition is different for each concurrent update statement]So do we need to go by partition approach ...

Data Engineering

1973 Views
2 replies
5 kudos

09-28-2022 9:45:35 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

09-28-2022 12:14:21 PM

5 kudos

Please check that both streaming queries don't use the same checkpoint,Auto increment id can also make problems as it is kept in schemaSchema evolution also can make problems

5 kudos

09-28-2022 12:14:21 PM

1 More Replies

by KrishZ • Contributor

09-14-2022 11:35:08 PM

2470 Views
4 replies
0 kudos

How to prevent sql queries in 2 notebooks from reading the same row from a Table ?

I have an SQL query to select and update rows in a table. I do this in batches of 300 rows (select 300 , update the selected 300 , select new 300 and update the newly selected and so on..) I run this query in 2 different notebooks concurrently to spe...

Data Engineering

2470 Views
4 replies
0 kudos

09-14-2022 11:35:08 PM

View Replies

Latest Reply

Anonymous
Not applicable

09-28-2022 12:34:20 AM

0 kudos

Hi @Krishna Zanwar Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

0 kudos

09-28-2022 12:34:20 AM

3 More Replies

by jwilliam • Contributor

08-30-2022 9:25:48 PM

3764 Views
3 replies
4 kudos

Resolved! What is the maximum of concurrent streaming jobs for a cluster?

What is the maximum of concurrent streaming jobs for a cluster? How can I have the right amount of concurrent streaming jobs for different cluster configuration?Should I use multiple cluster for different jobs or combine it into a big cluster to hand...

Data Engineering

3764 Views
3 replies
4 kudos

08-30-2022 9:25:48 PM

View Replies

Latest Reply

Prabakar
Databricks Employee

09-01-2022 5:22:05 AM

4 kudos

Hi @John William it would be better to use different clusters for each streaming jobs.

4 kudos

09-01-2022 5:22:05 AM

2 More Replies

by MiguelKulisic • New Contributor II

01-21-2022 1:52:10 PM

8380 Views
2 replies
4 kudos

Resolved! ProtocolChangedException on concurrent blind appends to delta table

Hello, I am developing an application that runs multiple processes that write their results to a common delta table as blind appends. According to the docs I've read online: https://docs.databricks.com/delta/concurrency-control.html#protocolchangedex...

Data Engineering

8380 Views
2 replies
4 kudos

01-21-2022 1:52:10 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

01-25-2022 6:36:33 AM

4 kudos

I think you are right, the mergeSchema will change the schema of the table, but if you both write to that same table with another schema, which one will it be?Can you check if both of you actually write the same schema, or remove the mergeschema?

4 kudos

01-25-2022 6:36:33 AM

1 More Replies

by User16790091296 • Contributor II

06-04-2021 11:42:27 AM

2272 Views
1 replies
0 kudos

Why doesn’t high concurrency cluster support Scala?

Data Engineering

2272 Views
1 replies
0 kudos

06-04-2021 11:42:27 AM

View Replies

Latest Reply

sean_owen
Databricks Employee

06-17-2021 4:45:25 PM

0 kudos

Broadly, it's because high-concurrency cluster have to have much more control of user workloads in order to enforce resource sharing constraints. Scala is the lowest-level language you can access in Databricks, as you execute directly in the JVM, and...

0 kudos

06-17-2021 4:45:25 PM

Databricks Community

Read/Write concurrency issue

Data bricks SQL is allowing 10 queries only ?

Delta table Concurrent Updates for Non-partitioned tables

How to prevent sql queries in 2 notebooks from reading the same row from a Table ?

Resolved! What is the maximum of concurrent streaming jobs for a cluster?

Resolved! ProtocolChangedException on concurrent blind appends to delta table

Why doesn’t high concurrency cluster support Scala?